Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
Firecrawl
https://github.com/firecrawl/firecrawl
Admin
Firecrawl is an API service that crawls websites and extracts clean markdown or structured data from
...
Tokens:
64,064
Snippets:
496
Trust Score:
9.4
Update:
2 weeks ago
Context
Skills
Chat
Benchmark
90.3
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Firecrawl Firecrawl is an open-source web scraping and data extraction API that transforms any website into clean, LLM-ready data. It powers AI agents with reliable web access, handling the complexity of JavaScript rendering, proxy rotation, rate limiting, and content extraction. The platform supports multiple output formats including markdown, HTML, JSON (via AI extraction), and screenshots, covering 96% of the web including dynamic JavaScript-heavy pages. The core architecture provides a unified API (v2) with SDKs for Python, Node.js, Java, Elixir, Go, and Rust. Key capabilities include single-page scraping, full-site crawling, web search with content extraction, AI-powered data extraction (Agent), batch processing, and interactive browser sessions. The system is designed for both cloud deployment at api.firecrawl.dev and self-hosted installations using Docker. ## Scrape API Scrape a single URL and convert it to markdown, HTML, screenshots, or structured JSON data. Supports advanced options like custom headers, viewport settings, wait conditions, and browser actions (click, scroll, type). ```bash # Basic scrape - returns markdown by default curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://example.com" }' # Advanced scrape with multiple formats and options curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://example.com", "formats": ["markdown", "html", "screenshot", "links"], "onlyMainContent": true, "waitFor": 2000, "timeout": 30000, "headers": {"User-Agent": "CustomBot/1.0"}, "actions": [ {"type": "wait", "milliseconds": 1000}, {"type": "click", "selector": "#load-more"}, {"type": "screenshot", "fullPage": true} ] }' # Response { "success": true, "data": { "markdown": "# Page Title\n\nContent here...", "html": "<h1>Page Title</h1>...", "screenshot": "data:image/png;base64,...", "links": ["https://example.com/page1", "https://example.com/page2"], "metadata": { "title": "Page Title", "description": "Page description", "sourceURL": "https://example.com", "statusCode": 200 } } } ``` ## Search API Search the web and get full page content from results. Combines web search with automatic scraping of each result page. ```bash # Basic web search curl -X POST 'https://api.firecrawl.dev/v2/search' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "query": "firecrawl web scraping", "limit": 5 }' # Advanced search with sources and scrape options curl -X POST 'https://api.firecrawl.dev/v2/search' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "query": "machine learning tutorials", "limit": 10, "sources": ["web", "news"], "lang": "en", "country": "us", "scrapeOptions": { "formats": ["markdown", "links"], "onlyMainContent": true } }' # Response { "success": true, "data": { "web": [ { "url": "https://example.com/article", "title": "Article Title", "description": "Article description", "markdown": "# Full article content..." } ], "news": [ { "url": "https://news.example.com/story", "title": "News Story", "markdown": "# News content..." } ] }, "creditsUsed": 5 } ``` ## Interact API Scrape a page and then interact with it using AI prompts or code. Enables multi-step browser automation. ```bash # Step 1: Initial scrape to get session curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"url": "https://amazon.com"}' # Returns: {"success": true, "data": {..., "metadata": {"scrapeId": "abc123"}}} # Step 2: Interact with the page using AI prompt curl -X POST 'https://api.firecrawl.dev/v2/scrape/abc123/interact' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Search for mechanical keyboard and click the first result" }' # Step 3: Continue interaction curl -X POST 'https://api.firecrawl.dev/v2/scrape/abc123/interact' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Add the item to cart" }' # Response { "success": true, "output": "Added mechanical keyboard to cart", "liveViewUrl": "https://liveview.firecrawl.dev/session/abc123" } ``` ## Agent API AI-powered autonomous data gathering. Describe what you need and the agent searches, navigates, and extracts data without requiring specific URLs. ```bash # Basic agent request curl -X POST 'https://api.firecrawl.dev/v2/agent' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Find the pricing plans for Notion" }' # Agent with structured output schema curl -X POST 'https://api.firecrawl.dev/v2/agent' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Find the founders of Firecrawl and their roles", "schema": { "type": "object", "properties": { "founders": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "role": {"type": "string"} } } } } }, "model": "spark-1-pro" }' # Agent with specific URLs to focus on curl -X POST 'https://api.firecrawl.dev/v2/agent' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Compare features and pricing", "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"] }' # Response (async - poll for status) { "success": true, "id": "agent-job-123" } # Get agent status curl -X GET 'https://api.firecrawl.dev/v2/agent/agent-job-123' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' # Completed response { "success": true, "status": "completed", "data": { "founders": [ {"name": "Eric Ciarla", "role": "Co-founder"}, {"name": "Nicolas Camara", "role": "Co-founder"} ] }, "creditsUsed": 15 } ``` ## Crawl API Crawl an entire website and extract content from all pages. Supports depth limits, path filtering, sitemap handling, and webhooks. ```bash # Start a crawl job curl -X POST 'https://api.firecrawl.dev/v2/crawl' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "maxDiscoveryDepth": 3, "includePaths": ["/docs/*", "/guides/*"], "excludePaths": ["/blog/*"], "scrapeOptions": { "formats": ["markdown"], "onlyMainContent": true } }' # Response - returns job ID { "success": true, "id": "crawl-job-456", "url": "https://api.firecrawl.dev/v2/crawl/crawl-job-456" } # Check crawl status curl -X GET 'https://api.firecrawl.dev/v2/crawl/crawl-job-456' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' # Status response { "success": true, "status": "scraping", "completed": 25, "total": 100, "creditsUsed": 25, "expiresAt": "2024-01-15T12:00:00Z", "data": [ { "markdown": "# Page content...", "metadata": {"sourceURL": "https://docs.firecrawl.dev/intro"} } ] } # Cancel a crawl curl -X DELETE 'https://api.firecrawl.dev/v2/crawl/crawl-job-456' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' ``` ## Map API Discover all URLs on a website instantly without scraping content. Uses sitemaps and intelligent crawling. ```bash # Basic URL mapping curl -X POST 'https://api.firecrawl.dev/v2/map' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://firecrawl.dev" }' # Map with search filter curl -X POST 'https://api.firecrawl.dev/v2/map' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://firecrawl.dev", "search": "pricing", "limit": 100, "includeSubdomains": true }' # Response { "success": true, "links": [ {"url": "https://firecrawl.dev", "title": "Firecrawl", "description": "Turn websites into LLM-ready data"}, {"url": "https://firecrawl.dev/pricing", "title": "Pricing", "description": "Firecrawl pricing plans"}, {"url": "https://docs.firecrawl.dev", "title": "Documentation", "description": "API documentation"} ] } ``` ## Batch Scrape API Scrape multiple URLs asynchronously with a single request. Ideal for processing large lists of pages. ```bash # Start batch scrape curl -X POST 'https://api.firecrawl.dev/v2/batch/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "urls": [ "https://example.com/page1", "https://example.com/page2", "https://example.com/page3" ], "formats": ["markdown", "links"], "onlyMainContent": true }' # Response { "success": true, "id": "batch-job-789", "url": "https://api.firecrawl.dev/v2/batch/scrape/batch-job-789" } # Check batch status curl -X GET 'https://api.firecrawl.dev/v2/batch/scrape/batch-job-789' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' # Completed response { "success": true, "status": "completed", "completed": 3, "total": 3, "data": [ {"markdown": "# Page 1...", "metadata": {"sourceURL": "https://example.com/page1"}}, {"markdown": "# Page 2...", "metadata": {"sourceURL": "https://example.com/page2"}}, {"markdown": "# Page 3...", "metadata": {"sourceURL": "https://example.com/page3"}} ] } ``` ## Browser Sessions API Create persistent browser sessions for complex multi-step automation with code execution. ```bash # Create browser session curl -X POST 'https://api.firecrawl.dev/v2/browser' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "ttl": 300, "streamWebView": true }' # Response { "success": true, "sessionId": "browser-session-abc", "cdpUrl": "wss://browser.firecrawl.dev/session/abc", "liveViewUrl": "https://liveview.firecrawl.dev/session/abc", "expiresAt": "2024-01-15T12:05:00Z" } # Execute code in browser curl -X POST 'https://api.firecrawl.dev/v2/browser/browser-session-abc/execute' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "code": "await page.goto(\"https://example.com\"); return await page.title();", "language": "node" }' # Response { "success": true, "output": "Example Domain", "stdout": "", "stderr": "", "exitCode": 0 } # Delete browser session curl -X DELETE 'https://api.firecrawl.dev/v2/browser/browser-session-abc' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' ``` ## Python SDK The Python SDK provides a convenient wrapper around all Firecrawl APIs with automatic polling for async operations. ```python from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional # Initialize client app = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape a single URL doc = app.scrape("https://example.com", formats=["markdown", "links"]) print(doc.markdown) print(doc.links) # Search the web results = app.search("best web scraping tools 2024", limit=10) for result in results.data.web: print(f"{result.title}: {result.url}") # Crawl a website (automatically polls until complete) crawl_result = app.crawl( "https://docs.firecrawl.dev", limit=50, scrape_options={"formats": ["markdown"]} ) for doc in crawl_result.data: print(doc.metadata.source_url, doc.markdown[:100]) # Map a website map_result = app.map("https://firecrawl.dev", search="pricing") for link in map_result.links: print(link.url, link.title) # Batch scrape multiple URLs batch_result = app.batch_scrape([ "https://example.com/page1", "https://example.com/page2" ], formats=["markdown"]) # Agent with structured output class Founder(BaseModel): name: str = Field(description="Full name") role: Optional[str] = Field(None, description="Role or position") class FoundersSchema(BaseModel): founders: List[Founder] agent_result = app.agent( prompt="Find the founders of Stripe", schema=FoundersSchema ) print(agent_result.data) # Interactive scraping scrape_result = app.scrape("https://amazon.com") scrape_id = scrape_result.metadata.scrape_id app.interact(scrape_id, prompt="Search for laptops") app.interact(scrape_id, prompt="Click the first result") ``` ## Node.js SDK The Node.js SDK provides TypeScript support and async/await patterns for all Firecrawl operations. ```javascript import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; // Initialize client const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Scrape with typed JSON extraction const ProductSchema = z.object({ name: z.string(), price: z.number(), description: z.string() }); const doc = await app.scrape('https://example.com/product', { formats: [{ type: 'json', schema: ProductSchema }] }); console.log(doc.json); // Typed as { name: string, price: number, description: string } // Search the web const searchResults = await app.search('machine learning tutorials', { limit: 5, sources: ['web', 'news'], scrapeOptions: { formats: ['markdown'] } }); // Crawl with real-time updates using watcher const crawlResponse = await app.startCrawl('https://docs.example.com', { limit: 100, scrapeOptions: { formats: ['markdown'] } }); const watcher = app.watcher(crawlResponse.id, { kind: 'crawl' }); watcher.addEventListener('document', (doc) => { console.log('New document:', doc.metadata.sourceURL); }); watcher.addEventListener('done', (status) => { console.log('Crawl complete:', status.completed, 'pages'); }); // Agent for autonomous data gathering const agentResult = await app.agent({ prompt: 'Find and compare pricing for Notion, Coda, and Airtable', model: 'spark-1-pro' }); console.log(agentResult.data); // Interactive browser session const result = await app.scrape('https://example.com'); await app.interact(result.metadata.scrapeId, { prompt: 'Click the login button and fill in test@example.com' }); ``` ## Java SDK The Java SDK provides type-safe access to all Firecrawl APIs with builder patterns. ```java import dev.firecrawl.client.FirecrawlClient; import dev.firecrawl.model.*; // Initialize client FirecrawlClient client = new FirecrawlClient( System.getenv("FIRECRAWL_API_KEY"), null, null ); // Scrape a URL ScrapeParams scrapeParams = new ScrapeParams(); scrapeParams.setFormats(new String[]{"markdown", "links"}); FirecrawlDocument doc = client.scrapeURL("https://example.com", scrapeParams); System.out.println(doc.getMarkdown()); // Crawl a website CrawlParams crawlParams = new CrawlParams(); crawlParams.setLimit(50); crawlParams.setIncludePaths(new String[]{"/docs/*"}); CrawlStatusResponse crawl = client.crawlURL( "https://docs.example.com", crawlParams, null, 10 ); for (FirecrawlDocument page : crawl.getData()) { System.out.println(page.getMetadata().get("sourceURL")); } // Search the web SearchParams searchParams = new SearchParams("web scraping tools"); searchParams.setLimit(10); SearchResponse results = client.search(searchParams); for (SearchResult r : results.getResults()) { System.out.println(r.getTitle() + ": " + r.getUrl()); } // Map a website MapData mapData = client.map("https://example.com"); for (MapLink link : mapData.getLinks()) { System.out.println(link.getUrl()); } // Agent request AgentParams agentParams = new AgentParams("Find pricing for Slack"); AgentResponse start = client.createAgent(agentParams); AgentStatusResponse result = client.getAgentStatus(start.getId()); System.out.println(result.getData()); ``` ## Self-Hosting with Docker Firecrawl can be self-hosted using Docker Compose for complete control over your scraping infrastructure. ```bash # Clone the repository git clone https://github.com/firecrawl/firecrawl.git cd firecrawl # Copy environment template cp apps/api/.env.example apps/api/.env # Configure environment (edit apps/api/.env) # Required variables: # - REDIS_URL=redis://redis:6379 # - POSTGRES_USER=firecrawl # - POSTGRES_PASSWORD=your_secure_password # - POSTGRES_DB=firecrawl # Start services docker-compose up -d # Test the API (no API key required for self-hosted) curl -X POST 'http://localhost:3002/v2/scrape' \ -H 'Content-Type: application/json' \ -d '{"url": "https://firecrawl.dev"}' # View logs docker-compose logs -f api ``` ## MCP Integration Connect Firecrawl to any MCP-compatible AI client (Claude Desktop, Cursor, etc.) for seamless web access. ```json { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "fc-YOUR_API_KEY" } } } } ``` ```bash # Or install the CLI skill for agent integration npx -y firecrawl-cli@latest init --all --browser # CLI commands firecrawl scrape https://example.com firecrawl search "web scraping tools" --limit 5 firecrawl crawl https://docs.example.com --limit 50 firecrawl map https://example.com ``` ## Summary Firecrawl serves three primary use cases: (1) **AI/LLM Data Pipelines** - converting web pages to clean markdown or structured JSON for RAG systems, chatbots, and AI agents; (2) **Web Scraping at Scale** - batch processing thousands of URLs with automatic handling of JavaScript rendering, proxies, and rate limits; (3) **Browser Automation** - interactive sessions for complex workflows requiring multi-step navigation, form filling, and dynamic content extraction. Integration patterns follow a consistent model across all SDKs: synchronous methods for single operations (scrape, search, map), async job patterns with polling for bulk operations (crawl, batch_scrape, agent), and event-driven watchers for real-time progress updates. The v2 API is the current stable version exposed directly on SDK clients, while v1 remains available under a `.v1` namespace for backward compatibility. Self-hosted deployments use the same API surface without requiring API keys, making it seamless to migrate between cloud and on-premise installations.