### Install Dependencies and Setup

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/CONTRIBUTING.md

Clone the repository, install project dependencies using uv, and set up Playwright browsers. This is the initial setup for development.

```bash
git clone https://github.com/potterdigital/crawl4ai-mcp.git
cd crawl4ai-mcp

# Install dependencies (uv manages the virtualenv)
uv sync

# Install Playwright browser (Chromium — required by crawl4ai)
uv run crawl4ai-setup

# Verify everything works
uv run pytest
uv run ruff check src/
```

--------------------------------

### Install Dependencies and Setup

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Installs project dependencies using uv and sets up Playwright browsers. Run `uv run crawl4ai-doctor` to verify.

```bash
git clone https://github.com/potterdigital/crawl4ai-mcp.git
cd crawl4ai-mcp

# Install dependencies (uv manages the virtualenv automatically)
uv sync

# Install Playwright browser (required by crawl4ai — downloads Chromium)
uv run crawl4ai-setup

# Verify the installation
uv run crawl4ai-doctor
```

--------------------------------

### Check Playwright/Chromium Setup

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

Verify the setup and configuration of Playwright and Chromium, which are dependencies for certain project functionalities.

```bash
# Check Playwright/Chromium
uv run crawl4ai-setup
```

--------------------------------

### API Reference File Example

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/00-START-HERE.md

Example of a specific API reference file for the 'crawl_url' tool, illustrating the type of documentation provided for each MCP tool.

```markdown
/api-reference/crawl_url.md
```

--------------------------------

### Install Playwright and Chromium

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Execute these commands to install Playwright and its necessary browser binaries, typically after a Playwright version upgrade or if they are missing.

```bash
uv sync
uv run crawl4ai-setup
```

--------------------------------

### Create Session with Initial URL

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/create_session.md

Create a session and immediately navigate to a specified URL. This is useful for starting workflows on login pages or specific entry points.

```python
response = await client.call_tool("create_session", {
    "session_id": "github-auth",
    "url": "https://github.com/login"
})
# Response includes login page HTML
```

--------------------------------

### Manifest JSON Format Example

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/crawl_many.md

Shows the structure of the manifest.json file generated when `output_dir` is specified. It lists each crawled URL, its corresponding file path, and the success status.

```json
[
  {
    "url": "https://example.com/page1",
    "file": "example_com_page1.md",
    "success": true
  },
  {
    "url": "https://example.com/page2",
    "success": false,
    "error": "HTTP 404 Not Found"
  }
]
```

--------------------------------

### Check Session Existence Example

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Before performing operations like destroying a session, verify its existence by calling `list_sessions` and checking the response.

```python
response = await client.call_tool("list_sessions")
# Check if session_id appears in response
```

--------------------------------

### Destroy and Create Session Example

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Use this snippet when a session already exists and you need to re-initialize it. It first destroys the existing session and then creates a new one with the same ID.

```python
await client.call_tool("destroy_session", {"session_id": "my-session"})
await client.call_tool("create_session", {"session_id": "my-session", "url": "..."})
```

--------------------------------

### Example Custom Profile Definition

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_profiles.md

Defines a custom crawl profile in YAML format. This example sets specific configurations for handling complex pages, including wait conditions, page timeout, and scan settings.

```yaml
wait_until: networkidle
page_timeout: 120000  # 120 seconds for complex pages
scan_full_page: true
word_count_threshold: 5
```

--------------------------------

### Custom Profile Example for SPAs

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

An example custom profile for heavy JavaScript SPA pages, with extended timeouts and full page scanning.

```yaml
# profiles/slow_spa.yaml
wait_until: networkidle
page_timeout: 120000         # 120 seconds for complex SPAs
scan_full_page: true
scroll_delay: 1.0
word_count_threshold: 5
```

--------------------------------

### Example Error Response Format

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

This is a general example of the structured string format returned by crawl4ai-mcp tools on error.

```text
Crawl failed
URL: https://example.com
HTTP status: 404
Error: Not Found
```

--------------------------------

### Setting LLM API Keys via Environment Variables

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Configures API keys for various LLM providers using bash export commands before starting the MCP server.

```bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."

# Then start the server
uv run python -m crawl4ai_mcp.server
```

--------------------------------

### Create Session with Pre-injected Cookies

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/create_session.md

Initialize a session with a set of cookies already provided. This is useful if you have obtained session cookies through other means and want to start an authenticated session.

```python
response = await client.call_tool("create_session", {
    "session_id": "authenticated",
    "cookies": [
        {
            "name": "sessionid",
            "value": "abc123def456",
            "domain": "example.com"
        },
        {
            "name": "user_pref",
            "value": "dark_mode",
            "domain": "example.com",
            "path": "/"
        }
    ]
})
```

--------------------------------

### Example Sitemap Fetch Error

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

A sample error message indicating a failure to fetch or parse a sitemap XML file.

```text
Sitemap fetch failed
URL: https://example.com/sitemap.xml
Error: 404 Client Error: Not Found
```

--------------------------------

### Run Crawl4AI MCP Server Locally

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

Provides instructions for running the crawl4ai-mcp server locally during development. It involves navigating to the project directory, syncing dependencies, and starting the server.

```bash
cd crawl4ai-mcp
uv sync
uv run python -m crawl4ai_mcp.server
```

--------------------------------

### Attempt to Destroy Non-Existent Session

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/destroy_session.md

This example demonstrates the expected output when attempting to destroy a session that does not exist. The API will return an informative error message.

```python
response = await client.call_tool("destroy_session", {
    "session_id": "unknown-session"
})
# Response: "Session not found: unknown-session"
```

--------------------------------

### Limited Depth Crawl

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a website up to a specified depth and page limit. This example limits the crawl to depth 2 (start page, first level links, second level links) and a maximum of 50 pages.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://example.com",
    "max_depth": 2,
    "max_pages": 50
})
```

--------------------------------

### Calling crawl_url with js_heavy profile

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Example of how to call the crawl_url function with a specific profile to handle slow-loading pages.

```python
crawl_url(url="https://slow-spa.example.com", profile="js_heavy")
```

--------------------------------

### Handle No Active Sessions

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_sessions.md

When no active sessions are found, the list_sessions tool returns a specific string indicating this state. This example shows how to handle that response.

```python
# When no sessions exist
response = await client.call_tool("list_sessions")
# Response: "No active sessions."
```

--------------------------------

### Fix Missing or Stale Playwright Chromium Binary

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Run crawl4ai-setup to download or update the Playwright Chromium binary, resolving issues with Chromium failing to start.

```bash
uv run crawl4ai-setup
```

--------------------------------

### Example Crawl with Profile and Override

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_profiles.md

Demonstrates how per-call parameters override profile settings. In this case, the 'page_timeout' parameter directly in the crawl_url call takes precedence over the timeout defined in the 'fast' profile.

```python
crawl_url(
    url="...",
    profile="fast",
    page_timeout=30
)
```

--------------------------------

### Deep Crawl with Path Filtering

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a website while applying include and exclude patterns to filter which URLs are followed. This example only follows links under '/docs/' and excludes pages under '/docs/admin/'.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://docs.example.com",
    "include_pattern": "/docs/*",
    "exclude_pattern": "/docs/admin/*"
})
```

--------------------------------

### Perform a Deep Crawl (BFS)

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

Initiates a Breadth-First Search crawl starting from a given URL, respecting maximum depth and page count limits. Results are saved to a specified directory.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://docs.example.com",
    "max_depth": 2,
    "max_pages": 50,
    "output_dir": "/tmp/crawl_results"
})
# Crawls start page, follows links to depth 2, saves to disk
```

--------------------------------

### Deep Crawl Saving to Disk

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a website and saves the crawled content as Markdown files and a manifest.json to a specified output directory. This example crawls up to depth 2.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://blog.example.com",
    "max_depth": 2,
    "output_dir": "/tmp/blog_crawl"
})
```

--------------------------------

### Extract Structured Data with Local Ollama

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_structured.md

This example demonstrates how to use the extract_structured tool with a local Ollama provider to extract executive summary and key metrics from a given URL, based on a provided JSON schema.

```APIDOC
## call_tool extract_structured (Local Ollama)

### Description
Extracts structured data from a web page using a specified schema and instruction, leveraging a local Ollama provider.

### Method
`client.call_tool("extract_structured", { ... })

### Parameters
#### Tool Arguments
- **url** (string) - Required - The URL of the web page to extract data from.
- **schema** (object) - Required - A JSON schema defining the structure of the data to be extracted.
- **instruction** (string) - Required - A natural language instruction detailing what data to extract.
- **provider** (string) - Required - The LLM provider to use. Example: "ollama/llama2".

### Request Example
```json
{
  "url": "https://internal.example.com/report",
  "schema": {
    "type": "object",
    "properties": {
      "summary": {"type": "string"},
      "metrics": {
        "type": "object",
        "properties": {
          "revenue": {"type": "number"},
          "growth": {"type": "number"}
        }
      }
    }
  },
  "instruction": "Extract executive summary and key metrics",
  "provider": "ollama/llama2"
}
```

### Response
#### Success Response
- The structured data extracted from the page, conforming to the provided schema.
```

--------------------------------

### Extract API Endpoints with Anthropic Claude Sonnet

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_structured.md

This example demonstrates extracting API endpoint details from a documentation page using Anthropic's Claude Sonnet model. It specifies a schema for API paths, methods, and descriptions.

```python
response = await client.call_tool("extract_structured", {
    "url": "https://docs.example.com",
    "schema": {
        "type": "object",
        "properties": {
            "api_endpoints": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "path": {"type": "string"},
                        "method": {"type": "string"},
                        "description": {"type": "string"}
                    }
                }
            }
        }
    },
    "instruction": "Extract all API endpoints with their HTTP methods and descriptions",
    "provider": "anthropic/claude-sonnet-4-20250514"
})
```

--------------------------------

### Deep Crawl with Politeness Delay

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Performs a deep crawl with a specified delay between page requests to be polite to the target server. This example sets a 1-second delay and limits the crawl to 30 pages.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://target.com",
    "delay": 1.0,  # 1 second between requests
    "max_pages": 30
})
```

--------------------------------

### Extract Structured Data with CSS Scoping and Wait Condition

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_structured.md

This example shows how to extract structured data from a dynamic web page using CSS selectors for scoping and a wait condition to ensure content is loaded, utilizing an OpenAI provider.

```APIDOC
## call_tool extract_structured (with CSS Scoping)

### Description
Extracts structured data from a web page, allowing for specific element targeting using CSS selectors and defining a wait condition for dynamic content loading. Uses a specified LLM provider.

### Method
`client.call_tool("extract_structured", { ... })

### Parameters
#### Tool Arguments
- **url** (string) - Required - The URL of the web page to extract data from.
- **css_selector** (string) - Optional - A CSS selector to scope the extraction to a specific part of the page.
- **wait_for** (string) - Optional - A condition to wait for before extraction (e.g., "css:table.data-table").
- **schema** (object) - Required - A JSON schema defining the structure of the data to be extracted.
- **instruction** (string) - Required - A natural language instruction detailing what data to extract.
- **provider** (string) - Required - The LLM provider to use. Example: "openai/gpt-4o-mini".

### Request Example
```json
{
  "url": "https://spa.example.com/table",
  "css_selector": "div.main-table",
  "wait_for": "css:table.data-table",
  "schema": {
    "type": "object",
    "properties": {
      "rows": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "id": {"type": "string"},
            "status": {"type": "string"}
          }
        }
      }
    }
  },
  "instruction": "Extract all rows with ID and status",
  "provider": "openai/gpt-4o-mini"
}
```

### Response
#### Success Response
- The structured data extracted from the specified page section, conforming to the provided schema.
```

--------------------------------

### Manifest JSON Output Example

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

This JSON structure represents the output of a deep crawl operation when an output directory is specified. It details crawled URLs, their corresponding files, success status, depth, and parent URL.

```json
[
  {
    "url": "https://example.com",
    "file": "example_com.md",
    "success": true,
    "depth": 0
  },
  {
    "url": "https://example.com/page1",
    "file": "example_com_page1.md",
    "success": true,
    "depth": 1,
    "parent_url": "https://example.com"
  },
  {
    "url": "https://example.com/broken",
    "success": false,
    "error": "HTTP 404 Not Found"
  }
]
```

--------------------------------

### Multi-step Authentication Flow - Step 1: Create Session and Load Login Page

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/create_session.md

Initiates a session and loads the initial login page. This is the first step in a multi-step authentication process, setting up the environment for subsequent interactions.

```python
# Step 1: Create session and load login page
response = await client.call_tool("create_session", {
    "session_id": "authenticated-user",
    "url": "https://app.example.com/login"
})
```

--------------------------------

### Crawl and Save to Disk with Profile

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/crawl_many.md

Illustrates crawling multiple URLs and saving the results to disk using a specific profile ('js_heavy'). The response will be a metadata summary, and the actual content will be in markdown files within the specified output directory.

```python
response = await client.call_tool("crawl_many", {
    "urls": [
        "https://docs.example.com/intro",
        "https://docs.example.com/api",
        "https://docs.example.com/examples"
    ],
    "profile": "js_heavy",
    "output_dir": "/tmp/crawl_results"
})
```

--------------------------------

### Diagnose Crawl4ai/Playwright Health

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Run the crawl4ai-doctor command to diagnose the health of crawl4ai and Playwright installations.

```bash
uv run crawl4ai-doctor
```

--------------------------------

### List All Profiles

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_profiles.md

This tool lists all available crawl profiles and their configuration settings. Profiles provide named starting-point configurations that can be referenced in crawl tools via the `profile` parameter.

```APIDOC
## list_profiles

### Description
Lists all available crawl profiles and their configuration settings.

### Method
`list_profiles` (Tool Call)

### Parameters
None

### Response Example
```
## default (base layer — applied to every crawl)
  wait_until: domcontentloaded
  page_timeout: 60000
  word_count_threshold: 10

## fast
  wait_until: domcontentloaded
  page_timeout: 15000
  word_count_threshold: 5

## js_heavy
  delay_before_return_html: 1.0
  page_timeout: 90000
  remove_overlay_elements: true
  scan_full_page: true
  scroll_delay: 0.5
  wait_until: networkidle

## stealth
  delay_before_return_html: 2.0
  magic: true
  max_range: 2.0
  mean_delay: 1.5
  override_navigator: true
  page_timeout: 90000
  simulate_user: true
  wait_until: networkidle
```
```

--------------------------------

### Initialize BrowserConfig for MCP Server

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Configure the Chromium browser settings for the MCP server. Key settings include running in headless mode and ensuring verbose logging is disabled to maintain the integrity of the MCP transport.

```python
browser_cfg = BrowserConfig(
    headless=True,
    verbose=False,  # CRITICAL: must be False to protect MCP transport
    extra_args=[
        "--disable-gpu",
        "--disable-dev-shm-usage",
        "--no-sandbox",
    ],
)
```

--------------------------------

### Simple Product Extraction with CSS

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_css.md

Extracts basic product information like title, price, and URL from a product listing page. Ensure the schema accurately reflects the page structure.

```python
# Simple product extraction
response = await client.call_tool("extract_css", {
    "url": "https://shop.example.com/products",
    "schema": {
        "name": "Products",
        "baseSelector": "div.product-card",
        "fields": [
            {
                "name": "title",
                "selector": "h2.product-title",
                "type": "text"
            },
            {
                "name": "price",
                "selector": "span.price",
                "type": "text"
            },
            {
                "name": "url",
                "selector": "a.product-link",
                "type": "attribute",
                "attribute": "href"
            }
        ]
    }
})
# Response: JSON string
# [
#   {"title": "Product 1", "price": "$29.99", "url": "/products/1"},
#   {"title": "Product 2", "price": "$39.99", "url": "/products/2"}
# ]
```

--------------------------------

### Run Tests and Linting

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/CONTRIBUTING.md

Execute the test suite and run the linter to ensure code quality and correctness. All tests must pass and the code must be clean.

```bash
uv run pytest
```

```bash
uv run ruff check src/
```

--------------------------------

### Extract Product Data with Default Provider

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_structured.md

Use this snippet to extract product information from a URL using the default LLM provider (OpenAI GPT-4o mini). The schema defines the expected structure for product name, price, and description.

```python
response = await client.call_tool("extract_structured", {
    "url": "https://shop.example.com/products",
    "schema": {
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "price": {"type": "number"},
                        "description": {"type": "string"}
                    }
                }
            }
        }
    },
    "instruction": "Extract all product names, prices, and descriptions from this page"
})
```

--------------------------------

### Basic Deep Crawl

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Performs a basic deep crawl starting from the given URL. It follows links on the site up to a default depth of 3 and a maximum of 100 pages.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://docs.example.com"
})
```

--------------------------------

### Using a Custom Profile

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_profiles.md

Shows how to utilize a custom-defined crawl profile by specifying its name in the crawl_url function call.

```python
crawl_url(url="...", profile="heavy_js")
```

--------------------------------

### Fast Profile Configuration

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Optimized for static pages and quick fetches, with a shorter page timeout and lower word count threshold.

```yaml
wait_until: domcontentloaded
page_timeout: 15000          # 15 seconds — fail fast
word_count_threshold: 5      # Retain short blocks
```

--------------------------------

### check_update

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/check_update.md

Checks if a newer version of crawl4ai is available on PyPI. It compares the installed version against the latest release and reports version information, including changelog highlights if an update is found.

```APIDOC
## check_update

### Description
Checks if a newer version of crawl4ai is available on PyPI. Compares the installed version against the latest release and reports version information with changelog highlights.

### Method
`call_tool` (as used in the example, implying an internal tool call mechanism)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
response = await client.call_tool("check_update")
```

### Response
#### Success Response
**Type:** `str`

Returns version comparison result:
- If up to date: "crawl4ai is up to date\nInstalled: X.Y.Z\nLatest: X.Y.Z"
- If update available: version info + release link + changelog highlights
- If check fails: error description with installed version and failure reason

#### Response Example
```
# Response (if up to date):
crawl4ai is up to date
Installed: 0.8.2
Latest: 0.8.2

# Response (if update available):
Update available
Installed: 0.8.1
Latest: 0.8.2
Release: https://github.com/unclecode/crawl4ai/releases/tag/v0.8.2
To upgrade: stop the server and run: scripts/update.sh

Changelog highlights:
### Bug Fixes
- **Fixed** headless browser detection bypass
- **Fixed** cookie handling in sessions
### Features
- **Added** support for custom extraction strategies

# Response (if check fails):
Version check failed
Installed: 0.8.1
Error: Could not reach PyPI (Connection timeout)
```

### Throws
Does not raise exceptions. Returns error as string if PyPI check fails.
```

--------------------------------

### Configuring API Key in MCP Client

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Illustrates how to configure an API key within the MCP client configuration for crawl4ai.

```json
{
  "mcpServers": {
    "crawl4ai": {
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}
```

--------------------------------

### Run Server and Redirect Output for Logging

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Run the crawl4ai-mcp server, redirecting stdout to stderr and discarding stdout to capture logs effectively.

```bash
uv run python -m crawl4ai_mcp.server 2>&1 1>/dev/null
```

--------------------------------

### Authenticated Multi-Step Workflow

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

This snippet demonstrates an authenticated multi-step workflow. It includes creating a session, interacting via JavaScript to log in, and then crawling authenticated pages.

```python
# 1. Create session and log in
await client.call_tool("create_session", {
    "session_id": "user-auth",
    "url": "https://app.example.com/login"
})

# 2. Interact via JavaScript
await client.call_tool("crawl_url", {
    "session_id": "user-auth",
    "url": "https://app.example.com/login",
    "js_code": "document.querySelector('button[type=submit]').click();"
})

# 3. Crawl authenticated pages
response = await client.call_tool("crawl_url", {
    "session_id": "user-auth",
    "url": "https://app.example.com/dashboard"
})
```

--------------------------------

### BFS Crawl Metadata Structure

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/types.md

Metadata included with results from deep_crawl, indicating the depth from the start URL and the parent URL. This metadata is preserved in manifest.json if an output directory is specified.

```python
{
    "depth": int,           # Distance from start URL (0 for root)
    "parent_url": str       # URL that linked to this page
}
```

--------------------------------

### deep_crawl

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Initiates a deep crawl of a website starting from a given URL. It follows links recursively up to a specified depth or page limit, with options for scope, filtering, delays, and output.

```APIDOC
## deep_crawl

### Description
Initiates a deep crawl of a website starting from a given URL. It follows links recursively up to a specified depth or page limit, with options for scope, filtering, delays, and output.

### Method
POST

### Endpoint
/deep_crawl

### Parameters
#### Query Parameters
- **url** (str) - Required - Starting URL for the crawl
- **max_depth** (int) - Optional - Maximum link levels to follow (depth 0 is start page, 1 is linked pages, etc.). Default: 3
- **max_pages** (int) - Optional - Hard cap on total pages crawled. Stops when reached. Default: 100
- **scope** (str) - Optional - Domain scope: "same-domain" (include subdomains), "same-origin", or "any" (follow external links). Default: "same-domain"
- **include_pattern** (str) - Optional - Glob pattern to filter which URLs to follow (e.g., "/docs/*")
- **exclude_pattern** (str) - Optional - Glob pattern to exclude URLs (e.g., "/internal/*")
- **delay** (float) - Optional - Politeness delay in seconds between page fetches. Default: 0
- **output_dir** (str) - Optional - Directory for .md files and manifest.json. Returns metadata summary instead of content.
- **profile** (str) - Optional - Named profile for per-page configuration
- **cache_mode** (str) - Optional - Cache behavior: "enabled", "bypass", "disabled", "read_only", "write_only". Default: "enabled"
- **css_selector** (str) - Optional - CSS selector to restrict extraction on all pages
- **excluded_selector** (str) - Optional - CSS selector to exclude elements on all pages
- **wait_for** (str) - Optional - Wait condition before extracting each page
- **js_code** (str) - Optional - JavaScript to execute on each page
- **user_agent** (str) - Optional - Custom User-Agent for all requests
- **page_timeout** (int) - Optional - Page load timeout in seconds for each page. Default: 60
- **word_count_threshold** (int) - Optional - Minimum word count for content blocks. Default: 10

### Request Example
```json
{
  "url": "https://docs.example.com",
  "max_depth": 3,
  "max_pages": 100,
  "scope": "same-domain",
  "include_pattern": null,
  "exclude_pattern": null,
  "delay": 0,
  "output_dir": null,
  "profile": null,
  "cache_mode": "enabled",
  "css_selector": null,
  "excluded_selector": null,
  "wait_for": null,
  "js_code": null,
  "user_agent": null,
  "page_timeout": 60,
  "word_count_threshold": 10
}
```

### Response
#### Success Response (200)
**Type:** `str`

When `output_dir` is None: returns full markdown content for all crawled pages organized by depth, with parent URL info.

When `output_dir` is set: returns metadata-only summary listing file paths and manifest location.

Results include depth metadata (how far from start) and parent_url for each page.

#### Response Example
```json
"# Crawl Results\n\n## Depth 0\n### https://docs.example.com\n... markdown content ..."
```

### Error Handling
Does not raise exceptions. Partial failures are reported in the result string.
```

--------------------------------

### Set Environment Variables for API Keys

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

Configures API keys for various services (OpenAI, Anthropic, Groq) using environment variables. These are necessary for LLM-based extraction tools.

```bash
export OPENAI_API_KEY="sk-..."          # For OpenAI extraction
export ANTHROPIC_API_KEY="sk-ant-..."   # For Anthropic extraction
export GROQ_API_KEY="gsk_..."           # For Groq extraction
```

--------------------------------

### Call check_update Tool

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/check_update.md

Use this snippet to call the check_update tool from the client. The response will vary based on whether the crawl4ai installation is up to date, an update is available, or if the version check fails.

```python
# Check for updates
response = await client.call_tool("check_update")
# Response (if up to date):
# crawl4ai is up to date
# Installed: 0.8.2
# Latest: 0.8.2

# Response (if update available):
# Update available
# Installed: 0.8.1
# Latest: 0.8.2
# Release: https://github.com/unclecode/crawl4ai/releases/tag/v0.8.2
# To upgrade: stop the server and run: scripts/update.sh
#
# Changelog highlights:
# ### Bug Fixes
# - **Fixed** headless browser detection bypass
# - **Fixed** cookie handling in sessions
# ### Features
# - **Added** support for custom extraction strategies

# Response (if check fails):
# Version check failed
# Installed: 0.8.1
# Error: Could not reach PyPI (Connection timeout)
```

--------------------------------

### Setting API Key Environment Variable

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Shows how to set the required environment variable for an LLM provider before calling an extraction tool.

```bash
export OPENAI_API_KEY="sk-..."
```

--------------------------------

### Deep Crawl with JavaScript Execution

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a JavaScript-heavy website, executing custom JavaScript code and waiting for specific content to load. This example scrolls to the bottom of the page and waits for '#app-content' to be present.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://spa.example.com",
    "profile": "js_heavy",
    "js_code": "window.scrollTo(0, document.body.scrollHeight);",
    "wait_for": "css:#app-content"
})
```

--------------------------------

### Handling HTTP 404 Errors in Python

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/errors.md

Demonstrates how to check for specific HTTP status code errors in the crawl response and take action.

```python
result = await client.call_tool("crawl_url", {"url": "..."})
if result.startswith("Crawl failed"):
    # Parse error and decide next action
    if "404" in result:
        print("Page not found")
    elif "403" in result:
        print("Access denied — may need authentication")
```

--------------------------------

### Register crawl4ai-mcp with Other MCP Clients

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Configuration for adding crawl4ai-mcp as an MCP server in other clients. Ensure the `--directory` flag points to your local clone.

```json
{
  "crawl4ai": {
    "type": "stdio",
    "command": "uv",
    "args": [
      "run",
      "--directory",
      "/path/to/crawl4ai-mcp",
      "python",
      "-m",
      "crawl4ai_mcp.server"
    ]
  }
}
```

--------------------------------

### Deploy MCP Server using Claude CLI

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/README.md

Registers the crawl4ai MCP server for user scope using the Claude CLI, specifying the command and arguments to run the server. This is for integrating with Claude environments.

```bash
claude mcp add-json --scope user crawl4ai '{ 
  "type": "stdio",
  "command": "uv",
  "args": ["run", "--directory", "/path/to/crawl4ai-mcp", "python", "-m", "crawl4ai_mcp.server"]
}'
```

--------------------------------

### Configure PruningContentFilter with word_count_threshold

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Demonstrates how the 'word_count_threshold' parameter is used to configure the PruningContentFilter. The threshold is popped from the merged configuration and used to initialize the filter, allowing per-call and profile-level control over content pruning.

```python
wct = merged.pop("word_count_threshold", 10)
merged["markdown_generator"] = DefaultMarkdownGenerator(
    content_filter=PruningContentFilter(
        threshold=0.48,
        threshold_type="fixed",
        min_word_threshold=wct,
    ),
)
```

--------------------------------

### Deep Crawl Following External Links

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a website and follows external links, expanding the crawl scope beyond the initial domain. This example follows external links up to depth 1 and limits the crawl to 50 pages.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://example.com",
    "scope": "any",  # Follow external links
    "max_depth": 1,
    "max_pages": 50
})
```

--------------------------------

### Run Server Directly for Debugging

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Execute the crawl4ai-mcp server directly for debugging purposes.

```bash
uv run python -m crawl4ai_mcp.server
```

--------------------------------

### Extract Structured Data with CSS Scoping and Wait Condition

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/extract_structured.md

Extract data from dynamic web pages by specifying a CSS selector for the target element and a wait condition for content to load. This example uses OpenAI's GPT-4o mini.

```python
response = await client.call_tool("extract_structured", {
    "url": "https://spa.example.com/table",
    "css_selector": "div.main-table",
    "wait_for": "css:table.data-table",
    "schema": {
        "type": "object",
        "properties": {
            "rows": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "id": {"type": "string"},
                        "status": {"type": "string"}
                    }
                }
            }
        }
    },
    "instruction": "Extract all rows with ID and status",
    "provider": "openai/gpt-4o-mini"
})
```

--------------------------------

### Basic Sitemap Crawl

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/crawl_sitemap.md

Initiates a crawl of up to 500 URLs from the provided sitemap using 10 concurrent requests.

```python
response = await client.call_tool("crawl_sitemap", {
    "sitemap_url": "https://example.com/sitemap.xml"
})
```

--------------------------------

### Deep Crawl with CSS Scoping

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/deep_crawl.md

Crawls a website and extracts content only from elements matching a specified CSS selector, while excluding elements matching another selector. This example targets content within 'div.main-content' and excludes 'aside' and 'nav' elements.

```python
response = await client.call_tool("deep_crawl", {
    "url": "https://catalog.example.com",
    "css_selector": "div.main-content",
    "excluded_selector": "aside, nav",
    "max_pages": 100
})
```

--------------------------------

### Main Documents Overview

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/00-START-HERE.md

Table outlining the main documentation files in the crawl4ai-mcp project, their purpose, and when to read them.

```markdown
| File | Purpose | Read When |
|------|---------|-----------|
| [INDEX.md](INDEX.md) | Navigation guide for all docs | You want an overview of what's available |
| [README.md](README.md) | Quick reference, architecture, patterns | You want examples and system overview |
| [configuration.md](configuration.md) | Profiles, env vars, per-call overrides | You need to customize behavior |
| [types.md](types.md) | Type definitions, data structures | You're building against the API |
| [errors.md](errors.md) | Error response formats and handling | Something failed, you need to debug |
```

--------------------------------

### Upgrade crawl4ai using uv

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/check_update.md

This bash snippet outlines the manual upgrade process for crawl4ai using the `uv` package manager. It involves upgrading the package, synchronizing dependencies, and updating Playwright/Chromium.

```bash
uv lock --upgrade crawl4ai
uv sync
uv run crawl4ai-setup
```

--------------------------------

### Deep Crawl with Delay and Output Directory

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/README.md

Perform a deep crawl of a site, respecting a politeness delay and saving results to disk.

```bash
deep_crawl(url="...", delay=0.5, output_dir="/tmp/crawl")
```

--------------------------------

### AppContext Usage in a Tool

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/types.md

Demonstrates how to access the AppContext from the request context within an MCP tool to utilize the shared crawler instance and active sessions.

```python
@mcp.tool()
async def my_tool(ctx: Context[ServerSession, AppContext]) -> str:
    app: AppContext = ctx.request_context.lifespan_context
    crawler = app.crawler  # Use the shared crawler instance
    sessions = app.sessions  # Check active sessions
```

--------------------------------

### Python Tool Call with Custom Profile

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Demonstrates how to use a custom profile named 'slow_spa' when calling the crawl_url tool.

```python
crawl_url(url="https://spa.example.com", profile="slow_spa")
```

--------------------------------

### Configure MCP Server in Claude Desktop Config

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/configuration.md

Add a 'crawl4ai' MCP server configuration to the Claude desktop client's config.json file. This includes specifying the server type, command, arguments, and environment variables for API keys.

```json
{
  "mcpServers": {
    "crawl4ai": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/path/to/crawl4ai-mcp",
        "python",
        "-m",
        "crawl4ai_mcp.server"
      ],
      "env": {
        "OPENAI_API_KEY": "sk-வுகளை",
        "ANTHROPIC_API_KEY": "sk-ant-வுகளை"
      }
    }
  }
}
```

--------------------------------

### List All Profiles

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/list_profiles.md

Call the list_profiles tool to retrieve a formatted list of all loaded crawl profiles and their configurations. The 'default' profile is marked as the base layer.

```python
# List all profiles
response = await client.call_tool("list_profiles")
# Response:
# ## default (base layer — applied to every crawl)
#   wait_until: domcontentloaded
#   page_timeout: 60000
#   word_count_threshold: 10
#
# ## fast
#   wait_until: domcontentloaded
#   page_timeout: 15000
#   word_count_threshold: 5
#
# ## js_heavy
#   delay_before_return_html: 1.0
#   page_timeout: 90000
#   remove_overlay_elements: true
#   scan_full_page: true
#   scroll_delay: 0.5
#   wait_until: networkidle
#
# ## stealth
#   delay_before_return_html: 2.0
#   magic: true
#   max_range: 2.0
#   mean_delay: 1.5
#   override_navigator: true
#   page_timeout: 90000
#   simulate_user: true
#   wait_until: networkidle
```

--------------------------------

### Crawl with Politeness Delay

Source: https://github.com/potterdigital/crawl4ai-mcp/blob/main/_autodocs/api-reference/crawl_many.md

Shows how to implement a politeness delay between requests to avoid overwhelming the target server. This is useful for respecting rate limits.

```python
response = await client.call_tool("crawl_many", {
    "urls": [
        "https://target.com/api/page1",
        "https://target.com/api/page2"
    ],
    "delay": 1.5,  # 1.5 second delay between requests
    "max_concurrent": 2
})
```