# ScrapeGraph MCP Server ScrapeGraph MCP Server is a production-ready Model Context Protocol (MCP) implementation that provides seamless integration with the ScrapeGraph AI API. It enables language models like Claude and other AI assistants to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability. The server exposes 8 powerful tools for converting webpages to markdown, extracting structured data using natural language prompts, performing AI-powered web searches, crawling multiple pages asynchronously, and executing complex multi-step scraping workflows. Built on FastMCP and Python 3.10+, the server implements the MCP protocol over stdio transport, making it compatible with Claude Desktop, Cursor, and any MCP-compatible client. It supports flexible deployment options including Smithery automated installation, local development setup, Docker containerization, and remote HTTP deployment via Render. The server features comprehensive parameter validation, robust error handling with detailed messages, timeout management, and automatic retry logic. All tools return consistent error responses with actionable guidance for common issues like missing API keys, malformed URLs, rate limiting, and quota management. ## Tools and APIs ### markdownify - Convert Webpage to Clean Markdown Transforms any webpage into clean, structured markdown format for content extraction and processing. ```python # Using with MCP client (e.g., Claude Desktop) # The AI assistant will automatically invoke this tool when prompted # Example prompt to AI: "Convert https://docs.python.org/3/tutorial/ to markdown" # Expected response structure: { "markdown": "# Python Tutorial\n\n## Introduction\n\n...", "metadata": { "title": "Python Tutorial", "url": "https://docs.python.org/3/tutorial/", "description": "Official Python tutorial" }, "status": "success", "credits_used": 2 } # Error handling example: { "error": "website_url must include protocol (http:// or https://)", "error_type": "ValidationError", "parameter": "website_url" } ``` ### smartscraper - AI-Powered Data Extraction Extracts structured data from webpages using natural language prompts with support for infinite scrolling and pagination. ```python # Extract product information from e-commerce page # Example prompt to AI: "Extract product name, price, description, and availability from https://example-shop.com/products/laptop" # With schema for structured output: { "user_prompt": "Extract product information", "website_url": "https://example-shop.com/products/laptop", "output_schema": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "number"}, "description": {"type": "string"}, "available": {"type": "boolean"} }, "required": [] } } # Response: { "extracted_data": { "name": "Professional Laptop", "price": 999.99, "description": "High-performance laptop for professionals", "available": true }, "credits_used": 10, "processing_time": 15.3, "status": "success" } # With infinite scroll support: { "user_prompt": "Extract all product listings", "website_url": "https://shop.example.com/category/electronics", "number_of_scrolls": 5, # Scroll 5 times to load more content "total_pages": 3 # Process 3 pages of pagination } # Process local HTML content: { "user_prompt": "Extract article metadata", "website_html": "

Title

...", "render_heavy_js": false, "stealth": true } ``` ### searchscraper - AI Web Search with Extraction Performs AI-powered web searches and extracts structured information from multiple sources. ```python # Research latest AI developments # Example prompt to AI: "Search for the latest AI research papers published in 2024 with author names and abstracts" # API call: { "user_prompt": "Find latest AI research papers published in 2024 with authors and abstracts", "num_results": 5, # Search 5 websites (50 credits total) "number_of_scrolls": 2 # Scroll each result page twice } # Response: { "search_results": [ { "title": "Advances in Large Language Models", "authors": ["John Doe", "Jane Smith"], "abstract": "This paper explores...", "url": "https://arxiv.org/paper1", "publication_date": "2024-03-15" }, { "title": "Neural Architecture Search", "authors": ["Alice Johnson"], "abstract": "We present a novel approach...", "url": "https://arxiv.org/paper2", "publication_date": "2024-02-20" } ], "sources": ["https://arxiv.org", "https://papers.nips.cc", ...], "total_websites_processed": 5, "credits_used": 50, "processing_time": 45.7 } # Market research example: { "user_prompt": "Find current cryptocurrency prices and market caps for top 10 coins", "num_results": 3 } ``` ### smartcrawler_initiate - Start Multi-Page Crawling Initiates asynchronous multi-page web crawling with AI extraction or markdown conversion. ```python # Crawl documentation site with AI extraction # Example prompt to AI: "Crawl the API documentation at https://docs.example.com and extract all endpoint names, methods, and parameters" # API call: { "url": "https://docs.example.com/api", "prompt": "Extract API endpoint name, HTTP method, parameters, and description", "extraction_mode": "ai", # 10 credits per page "depth": 2, # Follow links 2 levels deep "max_pages": 50, # Stop at 50 pages max "same_domain_only": true # Stay on same domain } # Response: { "request_id": "req_abc123xyz", "status": "initiated", "estimated_cost": 500, # 50 pages × 10 credits "crawl_parameters": { "mode": "ai", "starting_url": "https://docs.example.com/api", "max_pages": 50, "depth": 2 }, "estimated_time": "2-5 minutes", "next_steps": "Use smartcrawler_fetch_results with request_id to get results" } # Markdown conversion mode (cheaper): { "url": "https://blog.example.com", "extraction_mode": "markdown", # 2 credits per page "max_pages": 100, "depth": 1 } # Response: { "request_id": "req_def456uvw", "status": "processing", "estimated_cost": 200 # 100 pages × 2 credits } ``` ### smartcrawler_fetch_results - Retrieve Crawling Results Polls and retrieves results from asynchronous crawling operations. ```python # Check crawl status and get results # Example prompt to AI: "Check the status of crawl request req_abc123xyz" # API call: { "request_id": "req_abc123xyz" } # While processing: { "status": "processing", "progress": { "pages_crawled": 23, "pages_remaining": 27, "estimated_completion": "2 minutes" } } # When completed (AI extraction mode): { "status": "completed", "results": [ { "url": "https://docs.example.com/api/users", "extracted_data": { "endpoint": "/api/users", "method": "GET", "parameters": ["id", "email"], "description": "Retrieve user information" } }, { "url": "https://docs.example.com/api/posts", "extracted_data": { "endpoint": "/api/posts", "method": "POST", "parameters": ["title", "content", "author_id"], "description": "Create a new post" } } ], "metadata": { "total_pages_processed": 50, "urls_visited": ["https://docs.example.com/api", ...], "credits_used": 500, "processing_time": 183.4 } } # When completed (markdown mode): { "status": "completed", "results": [ { "url": "https://blog.example.com/post1", "markdown": "# Blog Post Title\n\n## Introduction\n\n..." }, { "url": "https://blog.example.com/post2", "markdown": "# Another Post\n\n..." } ], "metadata": { "total_pages_processed": 100, "credits_used": 200 } } ``` ### scrape - Basic Page Content Fetching Fetches raw HTML content with optional JavaScript rendering for dynamic sites. ```python # Fetch page with JavaScript rendering # Example prompt to AI: "Get the HTML content of https://app.example.com/dashboard with JavaScript rendering" # API call: { "website_url": "https://app.example.com/dashboard", "render_heavy_js": true # Enable for React/Vue/Angular apps } # Response: { "html_content": "\n\n...\n...\n", "page_title": "User Dashboard", "status_code": 200, "final_url": "https://app.example.com/dashboard", "content_length": 45678, "processing_time": 18.2, "javascript_rendered": true, "credits_used": 1 } # Static site (faster): { "website_url": "https://example.com/page", "render_heavy_js": false # Default, faster processing } # Response for static site: { "html_content": "...", "processing_time": 3.1, "javascript_rendered": false, "credits_used": 1 } ``` ### sitemap - Extract Website Structure Discovers and maps all accessible URLs and pages within a website. ```python # Discover all pages on a website # Example prompt to AI: "Extract the sitemap structure from https://docs.example.com" # API call: { "website_url": "https://docs.example.com" } # Response: { "discovered_urls": [ "https://docs.example.com/", "https://docs.example.com/getting-started", "https://docs.example.com/api/reference", "https://docs.example.com/guides/tutorial", "https://docs.example.com/guides/advanced" ], "site_structure": { "/": ["getting-started", "api", "guides"], "/api": ["reference", "authentication", "errors"], "/guides": ["tutorial", "advanced", "best-practices"] }, "url_categories": { "documentation": 45, "api_reference": 23, "guides": 12, "blog": 8 }, "total_pages": 88, "sitemap_sources": ["sitemap.xml", "robots.txt", "internal_links"], "processing_time": 5.8, "credits_used": 1 } # Use for planning large crawls: { "website_url": "https://blog.company.com" } # Response helps plan crawling: { "discovered_urls": [...], # 500+ URLs found "total_pages": 523, "depth_analysis": { "level_0": 1, "level_1": 12, "level_2": 156, "level_3": 354 } } ``` ### agentic_scrapper - AI-Powered Workflow Automation Executes complex multi-step scraping workflows with form interaction and navigation. ```python # Automated search and data extraction # Example prompt to AI: "Navigate to the search page, search for 'web scraping tools', and extract the top 5 results with ratings" # API call: { "url": "https://example.com/search", "user_prompt": "Search for web scraping tools and extract top 5 results with names, descriptions, and ratings", "steps": [ "Find and click on the search input field", "Type 'web scraping tools' into search box", "Press Enter or click search button", "Wait for search results to load", "Extract product information from top 5 results" ], "output_schema": { "type": "object", "properties": { "results": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "description": {"type": "string"}, "rating": {"type": "number"} }, "required": [] } } }, "required": [] }, "ai_extraction": true, "persistent_session": false, "timeout_seconds": 180 } # Response: { "extracted_data": { "results": [ { "name": "Beautiful Soup", "description": "Python library for web scraping", "rating": 4.8 }, { "name": "Scrapy", "description": "Fast web crawling framework", "rating": 4.7 } ] }, "workflow_log": [ "Navigated to https://example.com/search", "Clicked search input field", "Entered search term: web scraping tools", "Submitted search form", "Waited 3.2s for results to load", "Extracted 5 product entries" ], "pages_visited": ["https://example.com/search", "https://example.com/results"], "actions_performed": ["navigate", "click", "type", "submit", "extract"], "execution_time": 45.3, "steps_completed": 5, "credits_used": 35, "status": "success" } # Login flow with persistent session: { "url": "https://app.example.com/login", "user_prompt": "Login and extract dashboard notifications", "steps": [ "Enter email in login form: test@example.com", "Enter password in password field", "Click login button", "Wait for dashboard to load", "Navigate to notifications section", "Extract all notification messages" ], "persistent_session": true, # Maintain login state "timeout_seconds": 300 } # Complex form workflow: { "url": "https://forms.example.com/multi-step", "steps": ["Fill step 1 fields", "Click Next", "Fill step 2 fields", "Click Next", "Submit final form"], "persistent_session": true, "ai_extraction": true, "timeout_seconds": 240 } ``` ## Configuration and Integration The ScrapeGraph MCP Server supports multiple deployment modes and integration patterns. For local development, install with `pip install -e .` and set the `SGAI_API_KEY` environment variable, then run `scrapegraph-mcp` or `python -m scrapegraph_mcp.server`. The server communicates via stdio transport using JSON-RPC 2.0 protocol, making it compatible with any MCP client. For Claude Desktop integration, add the server configuration to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows) with the server command and API key. Cursor integration is configured through the MCP settings panel with direct HTTP support for remote deployments. Remote deployment is supported via HTTP transport on platforms like Render and Koyeb by setting `MCP_TRANSPORT=http` and configuring host/port. The Smithery automated installation method (`npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude`) handles all configuration automatically. Docker deployment is available with the included Dockerfile, supporting both stdio and HTTP modes. All tools implement comprehensive error handling with structured error responses including error type, affected parameter, valid ranges, and actionable resolution steps. The server supports API key authentication via environment variables, HTTP headers (`X-API-Key`), or MCP session configuration, with automatic detection based on deployment mode. Health check endpoints are available for container orchestration, and the server includes timeout management with configurable limits per tool (default 120 seconds, up to 600 seconds for complex operations).