Main Article Title
This is the main content of the article that will be extracted.
Additional paragraph with important information.
# MCP Server Fetch MCP Server Fetch is a Model Context Protocol (MCP) server that provides advanced web content fetching capabilities for Large Language Models. It enables LLMs to retrieve and process content from web pages using browser automation, OCR, and multiple extraction methods, even for pages that require JavaScript rendering or employ anti-scraping techniques. The server implements a sophisticated multi-method extraction pipeline that includes browser automation with undetected-chromedriver, OCR using pytesseract with layout detection, HTML extraction via requests/BeautifulSoup, and document parsing for PDF, DOCX, and PPTX files. A scoring system automatically selects the best extraction result based on content length, structure quality, and error detection. ## MCP Tool: fetch The `fetch` tool retrieves content from a URL using browser automation and multi-method extraction. It automatically handles cookie consent banners, captures full-page screenshots for OCR, and supports various document formats including HTML, PDF, DOCX, and PPTX. ```python # Tool schema (as exposed via MCP) { "name": "fetch", "description": "Fetches a URL from the internet using browser automation and multi-method extraction (including OCR).", "inputSchema": { "type": "object", "properties": { "url": { "type": "string", "format": "uri", "description": "URL to fetch" }, "raw": { "type": "boolean", "default": false, "description": "Get the actual HTML content if the requested page, without simplification." } }, "required": ["url"] } } # Example MCP tool call (JSON-RPC format) { "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "fetch", "arguments": { "url": "https://example.com/article", "raw": false } }, "id": 1 } # Response format { "jsonrpc": "2.0", "result": { "content": [ { "type": "text", "text": "Content extracted using Browser (detected type: html):\n\nContents of https://example.com/article:\n\n[Extracted markdown content...]" } ] }, "id": 1 } ``` ## MCP Prompt: fetch The `fetch` prompt allows users to request URL content extraction through the MCP prompt interface. It uses the same multi-method extraction pipeline as the tool but is initiated via user-specified prompts rather than autonomous tool calls. ```python # Prompt schema (as exposed via MCP) { "name": "fetch", "description": "Fetch a URL and extract its contents as markdown using browser automation", "arguments": [ { "name": "url", "description": "URL to fetch", "required": true } ] } # Example MCP prompt request (JSON-RPC format) { "jsonrpc": "2.0", "method": "prompts/get", "params": { "name": "fetch", "arguments": { "url": "https://docs.python.org/3/library/asyncio.html" } }, "id": 2 } # Response format { "jsonrpc": "2.0", "result": { "description": "Contents of https://docs.python.org/3/library/asyncio.html", "messages": [ { "role": "user", "content": { "type": "text", "text": "Content extracted using HTML_Original (detected type: html):\n\n[Extracted content...]" } } ] }, "id": 2 } ``` ## Docker Installation and Configuration The server is designed to run in a Docker container that includes Chrome, Tesseract OCR, and all required dependencies. Build and run the container, then configure your MCP client to use it. ```bash # Build the Docker image docker build -t mcp-server-fetch . # Run the server (interactive mode for MCP stdio communication) docker run --rm -i mcp-server-fetch # Run with custom logging level docker run --rm -i mcp-server-fetch mcp-server-fetch --log-level DEBUG # Run with custom user agent docker run --rm -i mcp-server-fetch mcp-server-fetch --user-agent "MyCustomAgent/1.0" # Run with mounted volumes for logs and output docker run --rm -i \ -v $(pwd)/logs:/app/logs \ -v $(pwd)/output:/app/output \ mcp-server-fetch ``` ## Claude/Roo Code MCP Configuration Configure your MCP client (Claude Desktop, Roo Code, etc.) to use the fetch server by adding the appropriate configuration to your settings file. ```json { "mcpServers": { "fetch": { "command": "docker", "args": [ "run", "--rm", "-i", "mcp-server-fetch" ], "disabled": false, "alwaysAllow": [] } } } ``` ## Content Extraction Functions ### extract_content_from_html Converts raw HTML content to simplified Markdown format using readability algorithms for clean text extraction. ```python from mcp_server_fetch.server import extract_content_from_html html_content = """
This is the main content of the article that will be extracted.
Additional paragraph with important information.