Try Live
Add Docs
Rankings
Pricing
Docs
Install
Theme
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
FireCrawl
https://github.com/mendableai/firecrawl
Admin
Firecrawl is an API service that crawls websites and extracts clean markdown or structured data from
...
Tokens:
177,394
Snippets:
620
Trust Score:
8.6
Update:
2 months ago
Context
Skills
Chat
Benchmark
80.6
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Firecrawl Firecrawl is an API service that takes a URL, crawls it, and converts web content into clean markdown or structured data ready for use with language models (LLMs). It handles the complex challenges of web scraping including JavaScript rendering, anti-bot mechanisms, proxies, and dynamic content extraction. The service supports scraping single pages, crawling entire websites, mapping site structures, searching the web, and extracting structured data using AI. The platform provides both a cloud-hosted solution at api.firecrawl.dev and an open-source self-hosted option. Firecrawl offers official SDKs for Python and Node.js, with community SDKs available for Go and Rust. The API supports multiple output formats including markdown, HTML, screenshots, and JSON extraction via LLM-powered schemas. ## Scrape API Scrapes a single URL and returns content in specified formats (markdown, HTML, links, screenshots). Supports LLM extraction with schemas, custom headers, location settings, and browser actions. ```bash curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "formats": ["markdown", "html"], "onlyMainContent": true, "timeout": 30000 }' # Response: # { # "success": true, # "data": { # "markdown": "# Firecrawl Documentation\n\nWelcome to Firecrawl...", # "html": "<!DOCTYPE html><html>...</html>", # "metadata": { # "title": "Firecrawl Docs", # "description": "Documentation for Firecrawl API", # "sourceURL": "https://docs.firecrawl.dev", # "statusCode": 200 # } # } # } ``` ## Crawl API Crawls a website starting from a URL, following links to scrape all accessible subpages. Returns a job ID for async status polling. Supports depth limits, path filters, and scrape options per page. ```bash # Start a crawl job curl -X POST https://api.firecrawl.dev/v2/crawl \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "maxDepth": 3, "includePaths": ["docs/*"], "excludePaths": ["blog/*"], "scrapeOptions": { "formats": ["markdown", "html"] } }' # Response: # { # "success": true, # "id": "550e8400-e29b-41d4-a716-446655440000", # "url": "https://api.firecrawl.dev/v2/crawl/550e8400-e29b-41d4-a716-446655440000" # } # Check crawl status curl -X GET https://api.firecrawl.dev/v2/crawl/550e8400-e29b-41d4-a716-446655440000 \ -H 'Authorization: Bearer fc-YOUR_API_KEY' # Response: # { # "status": "completed", # "total": 36, # "creditsUsed": 36, # "data": [ # { # "markdown": "# Page Content...", # "html": "<!DOCTYPE html>...", # "metadata": { # "title": "Page Title", # "sourceURL": "https://docs.firecrawl.dev/page" # } # } # ] # } ``` ## Map API Generates a list of all URLs from a website without scraping content. Extremely fast for site discovery. Supports sitemap integration and search filtering to find specific pages. ```bash # Basic map curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "limit": 100 }' # Response: # { # "success": true, # "links": [ # { "url": "https://firecrawl.dev", "title": "Firecrawl Home" }, # { "url": "https://firecrawl.dev/pricing", "title": "Pricing" }, # { "url": "https://firecrawl.dev/blog", "title": "Blog" } # ] # } # Map with search filter curl -X POST https://api.firecrawl.dev/v2/map \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://firecrawl.dev", "search": "documentation" }' ``` ## Search API Performs web searches and optionally scrapes the search results. Combines SERP functionality with Firecrawl's scraping capabilities to return full page content for any query. ```bash # Basic search curl -X POST https://api.firecrawl.dev/v2/search \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "query": "web scraping best practices", "limit": 5 }' # Response: # { # "success": true, # "data": [ # { # "url": "https://example.com/article", # "title": "Web Scraping Best Practices", # "description": "Learn how to scrape websites effectively..." # } # ] # } # Search with content scraping curl -X POST https://api.firecrawl.dev/v2/search \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "query": "firecrawl tutorials", "limit": 3, "scrapeOptions": { "formats": ["markdown", "links"] } }' ``` ## Extract API Extracts structured data from one or multiple URLs using AI. Supports JSON schemas, prompts, and wildcard URLs for extracting data from entire websites. ```bash curl -X POST https://api.firecrawl.dev/v2/extract \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "urls": ["https://firecrawl.dev/*"], "prompt": "Extract company information from the pages", "schema": { "type": "object", "properties": { "company_name": { "type": "string" }, "company_mission": { "type": "string" }, "is_open_source": { "type": "boolean" }, "pricing_plans": { "type": "array", "items": { "type": "string" } } }, "required": ["company_name", "company_mission"] } }' # Response: # { # "success": true, # "id": "44aa536d-f1cb-4706-ab87-ed0386685740", # "urlTrace": [] # } # Poll for results (SDKs handle this automatically) # Final response: # { # "success": true, # "data": { # "company_name": "Firecrawl", # "company_mission": "Turn any website into LLM-ready data", # "is_open_source": true, # "pricing_plans": ["Free", "Hobby", "Standard", "Growth"] # } # } ``` ## LLM Extraction via Scrape Extracts structured JSON data from a single page using the scrape endpoint with a JSON format specification. Supports both schema-based and prompt-based extraction. ```bash # Schema-based extraction curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://news.ycombinator.com", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "top_stories": { "type": "array", "items": { "type": "object", "properties": { "title": { "type": "string" }, "points": { "type": "number" }, "author": { "type": "string" } } } } } } } ] }' # Prompt-based extraction (no schema) curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://example.com/about", "formats": [ { "type": "json", "prompt": "Extract the company mission and founding year from this page" } ] }' ``` ## Batch Scrape API Scrapes multiple URLs concurrently in a single request. Returns a job ID for async status polling, similar to the crawl endpoint. ```bash curl -X POST https://api.firecrawl.dev/v2/batch/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "urls": [ "https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/python", "https://docs.firecrawl.dev/sdks/node" ], "formats": ["markdown", "html"], "webhook": { "url": "https://your-server.com/webhook", "events": ["completed", "page"] } }' # Response: # { # "success": true, # "id": "batch-job-id-123", # "url": "https://api.firecrawl.dev/v2/batch/scrape/batch-job-id-123" # } # Check batch status curl -X GET https://api.firecrawl.dev/v2/batch/scrape/batch-job-id-123 \ -H 'Authorization: Bearer fc-YOUR_API_KEY' ``` ## Actions API (Browser Automation) Performs browser actions (click, scroll, type, wait, screenshot) before scraping. Useful for interacting with dynamic content, forms, and JavaScript-heavy pages. ```bash curl -X POST https://api.firecrawl.dev/v2/scrape \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -d '{ "url": "https://google.com", "formats": ["markdown", "screenshot"], "actions": [ { "type": "wait", "milliseconds": 2000 }, { "type": "click", "selector": "textarea[title=\"Search\"]" }, { "type": "write", "text": "firecrawl web scraping" }, { "type": "press", "key": "ENTER" }, { "type": "wait", "milliseconds": 3000 }, { "type": "screenshot" } ] }' # Response includes the final page state after all actions # { # "success": true, # "data": { # "markdown": "# Search Results\n...", # "screenshot": "data:image/png;base64,..." # } # } ``` ## Python SDK Python SDK providing a simple interface for all Firecrawl operations. Supports both sync and async operations, Pydantic schemas for extraction, and WebSocket-based real-time updates. ```python from firecrawl import Firecrawl, AsyncFirecrawl from firecrawl.types import ScrapeOptions from pydantic import BaseModel, Field from typing import List # Initialize client firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY") # Basic scrape doc = firecrawl.scrape( "https://firecrawl.dev", formats=["markdown", "html"] ) print(doc.markdown) # Crawl with options crawl_result = firecrawl.crawl( "https://docs.firecrawl.dev", limit=50, scrape_options=ScrapeOptions(formats=["markdown"]), poll_interval=5 ) for page in crawl_result.data: print(page.metadata.title) # Map a website map_result = firecrawl.map("https://firecrawl.dev", limit=100) for link in map_result.links: print(link.url) # Search the web search_result = firecrawl.search("firecrawl tutorials", limit=5) for result in search_result.data: print(result.title, result.url) # Extract with Pydantic schema class CompanyInfo(BaseModel): name: str mission: str = Field(description="Company mission statement") is_open_source: bool doc = firecrawl.scrape( "https://firecrawl.dev", formats=[{"type": "json", "schema": CompanyInfo}] ) print(doc.json) # Async operations async def async_example(): async_firecrawl = AsyncFirecrawl(api_key="fc-YOUR_API_KEY") result = await async_firecrawl.scrape("https://example.com") return result # WebSocket crawl with real-time updates import nest_asyncio nest_asyncio.apply() async def crawl_with_updates(): watcher = firecrawl.crawl_url_and_watch( "https://firecrawl.dev", limit=10 ) watcher.add_event_listener("document", lambda d: print(f"Scraped: {d['url']}")) watcher.add_event_listener("done", lambda d: print("Crawl complete!")) await watcher.connect() ``` ## Node.js SDK Node.js/TypeScript SDK for Firecrawl with full async/await support, Zod schema integration, and real-time event watchers. ```javascript import Firecrawl from '@mendable/firecrawl-js'; import { z } from 'zod'; const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Basic scrape const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'], }); console.log(scrapeResult.markdown); // Crawl with options const crawlResult = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 50, scrapeOptions: { formats: ['markdown', 'html'] }, pollInterval: 5, }); crawlResult.data.forEach(page => console.log(page.metadata.title)); // Map a website const mapResult = await firecrawl.map('https://firecrawl.dev'); mapResult.links.forEach(link => console.log(link.url)); // Extract with Zod schema const schema = z.object({ company_name: z.string(), mission: z.string(), features: z.array(z.string()).describe('Key product features'), }); const extractResult = await firecrawl.extract({ urls: ['https://firecrawl.dev'], prompt: 'Extract company information', schema, showSources: true, }); console.log(extractResult.data); // Batch scrape multiple URLs const batchResult = await firecrawl.batchScrape( ['https://firecrawl.dev', 'https://docs.firecrawl.dev'], { formats: ['markdown'] } ); // Real-time crawl with watcher const start = await firecrawl.startCrawl('https://firecrawl.dev', { limit: 10 }); const watch = firecrawl.watcher(start.id, { kind: 'crawl', pollInterval: 2 }); watch.on('document', (doc) => console.log('Scraped:', doc.metadata.sourceURL)); watch.on('error', (err) => console.error('Error:', err)); watch.on('done', (state) => console.log('Complete:', state.status)); await watch.start(); ``` ## Self-Hosting Configuration Docker-based self-hosting setup for running Firecrawl locally. Includes Redis for job queues, Playwright for JavaScript rendering, and optional PostgreSQL for persistence. ```bash # 1. Clone the repository git clone https://github.com/firecrawl/firecrawl.git cd firecrawl # 2. Create .env file cat > .env << 'EOF' PORT=3002 HOST=0.0.0.0 USE_DB_AUTHENTICATION=false # Optional: Enable AI features OPENAI_API_KEY=sk-your-openai-key # Optional: Configure proxy # PROXY_SERVER=http://proxy.example.com:8080 # PROXY_USERNAME=user # PROXY_PASSWORD=pass # Queue admin panel auth BULL_AUTH_KEY=your-secure-key # Optional: Use Ollama instead of OpenAI # OLLAMA_BASE_URL=http://localhost:11434/api # MODEL_NAME=llama3 # Optional: SearXNG for search API # SEARXNG_ENDPOINT=http://your.searxng.server EOF # 3. Build and run docker compose build docker compose up -d # 4. Test the API (no auth needed for self-hosted) curl -X POST http://localhost:3002/v1/crawl \ -H 'Content-Type: application/json' \ -d '{ "url": "https://firecrawl.dev" }' # View queue admin panel # http://localhost:3002/admin/your-secure-key/queues # Using SDKs with self-hosted instance # Python: # firecrawl = Firecrawl(api_url="http://localhost:3002") # # Node.js: # const firecrawl = new Firecrawl({ apiUrl: "http://localhost:3002" }); ``` ## Summary Firecrawl is designed for AI/LLM applications that need clean, structured data from websites. Primary use cases include building RAG (Retrieval-Augmented Generation) pipelines with crawled documentation, training datasets from web content, automated data extraction for business intelligence, and content aggregation for knowledge bases. The service handles the complexity of modern web scraping including JavaScript rendering, bot detection bypass, and content cleaning. Integration patterns include direct REST API calls for serverless functions, SDK usage for application backends, webhooks for event-driven architectures, and WebSocket connections for real-time processing. Firecrawl integrates with popular LLM frameworks like LangChain, LlamaIndex, and CrewAI, as well as low-code platforms like Dify and Flowise. For high-volume needs, the batch scrape and async crawl endpoints allow processing thousands of URLs efficiently with status polling or webhook notifications.