### Run the example Source: https://github.com/firecrawl/firecrawl-docs/blob/main/learn/scrape-analyze-airbnb-data.mdx Execute the main script to start the scraping and analysis process. ```bash npm run start ``` -------------------------------- ### Install Firecrawl Go SDK Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/search-v0.mdx Install the Firecrawl Go SDK using go get. ```bash go get github.com/mendableai/firecrawl-go ``` -------------------------------- ### Quick Start with FirecrawlTools Source: https://github.com/firecrawl/firecrawl-docs/blob/main/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk.mdx Use the bundled FirecrawlTools() for a quick start. This example demonstrates a multi-step prompt involving interaction, search, scraping, and summarization. ```typescript import { generateText, stepCountIs } from 'ai'; import { FirecrawlTools } from 'firecrawl-aisdk'; const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: FirecrawlTools(), stopWhen: stepCountIs(30), prompt: ` 1. Use interact on Hacker News to identify the top story 2. Search for other perspectives on the same topic 3. Scrape the most relevant pages you found 4. Summarize everything you found `, }); ``` -------------------------------- ### Start Crawl with cURL Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/start-crawl/base/curl.mdx This example demonstrates how to start a crawl using cURL, specifying the target URL and a limit for the crawl depth. ```APIDOC ## POST /v2/crawl ### Description Initiates a web crawl from a specified URL. ### Method POST ### Endpoint /v2/crawl ### Parameters #### Request Body - **url** (string) - Required - The starting URL for the crawl. - **limit** (integer) - Optional - The maximum depth of the crawl. Defaults to 5. ### Request Example ```json { "url": "https://docs.firecrawl.dev", "limit": 10 } ``` ### Response #### Success Response (200) (Response structure not provided in source) #### Response Example (Response example not provided in source) ``` -------------------------------- ### Install Firecrawl and Groq dependencies Source: https://github.com/firecrawl/firecrawl-docs/blob/main/learn/data-extraction-using-llms.mdx Install the necessary Python libraries for Firecrawl and Groq. This is a prerequisite for running the subsequent code examples. ```bash pip install groq firecrawl-py ``` -------------------------------- ### Python Browser Quickstart Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/browser/quickstart/python.mdx Installs the Firecrawl Python SDK, initializes the client, launches a browser session, executes code within the browser, and closes the session. ```python # pip install firecrawl from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR-API-KEY") # 1. Launch a session session = app.browser() print(session.cdp_url) # wss://cdp-proxy.firecrawl.dev/cdp/... # 2. Execute code result = app.browser_execute( session.id, code='await page.goto("https://news.ycombinator.com")\ntitle = await page.title()\nprint(title)', language="python", ) print(result.result) # "Hacker News" # 3. Close app.delete_browser(session.id) ``` -------------------------------- ### Start Crawl with cURL Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/es/v2/start-crawl/base/curl.mdx This example demonstrates how to start a crawl using cURL. It sends a POST request to the /v2/crawl endpoint with the target URL and a limit for the crawl. ```APIDOC ## POST /v2/crawl ### Description Initiates a web crawl starting from the specified URL. You can set a limit for the number of pages to crawl. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/crawl ### Parameters #### Request Body - **url** (string) - Required - The starting URL for the crawl. - **limit** (integer) - Optional - The maximum number of pages to crawl. ### Request Example ```json { "url": "https://docs.firecrawl.dev", "limit": 10 } ``` ### Response #### Success Response (200) (Response structure not provided in source) #### Response Example (Response example not provided in source) ``` -------------------------------- ### Start a Crawl with cURL Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/fr/v2/start-crawl/base/curl.mdx This example demonstrates how to start a web crawl using cURL. You need to provide the URL to crawl and optionally set a limit for the number of pages to crawl. ```APIDOC ## POST /v2/crawl ### Description Initiates a web crawl starting from the provided URL. You can specify a limit for the number of pages to crawl. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/crawl ### Request Body - **url** (string) - Required - The starting URL for the crawl. - **limit** (integer) - Optional - The maximum number of pages to crawl. ### Request Example ```json { "url": "https://docs.firecrawl.dev", "limit": 10 } ``` ### Response #### Success Response (200) - **data** (array) - An array of crawled page data. - **message** (string) - A success message. #### Response Example ```json { "data": [ { "url": "https://docs.firecrawl.dev/", "title": "Firecrawl Documentation", "content": "..." } ], "message": "Crawl completed successfully" } ``` ``` -------------------------------- ### Map Endpoint Example Source: https://github.com/firecrawl/firecrawl-docs/blob/main/api-reference/v2-introduction.mdx This example demonstrates how to use the map endpoint to get a complete list of URLs from any website. ```APIDOC ## POST /v2/map ### Description Get a complete list of URLs from any website quickly and reliably. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/map ### Parameters #### Request Body - **url** (string) - Required - The starting URL for the map. - **options** (object) - Optional - Additional options for mapping. - **max_depth** (number) - Optional - Maximum crawl depth. - **max_pages** (number) - Optional - Maximum number of pages to crawl. - **only_from_start** (boolean) - Optional - Whether to only map from the start URL. ### Response #### Success Response (200) - **urls** (array) - An array of URLs found on the website. #### Response Example ```json { "urls": [ "https://example.com", "https://example.com/about", "https://example.com/contact" ] } ``` ``` -------------------------------- ### Start Crawl Example (Python) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/crawl.mdx Initiate a crawl asynchronously using 'start_crawl' in Python. This method returns immediately with a crawl ID for later status polling. ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="YOUR_API_KEY") # Start a crawl and get the job ID response = app.start_crawl("https://example.com") print(response) ``` -------------------------------- ### Install All Firecrawl CLI and Skills Source: https://github.com/firecrawl/firecrawl-docs/blob/main/ai-onboarding.mdx Installs all Firecrawl skill segments (CLI, build, workflows) and automatically opens the browser for authentication. Use this for a complete setup. ```bash npx -y firecrawl-cli@latest init --all --browser ``` -------------------------------- ### Start Crawl Example (Node.js) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/crawl.mdx Initiate a crawl asynchronously using 'startCrawl' in Node.js. This method returns immediately with a crawl ID for later status polling. ```javascript import FirecrawlApp from "@fire-crawl/sdk"; const app = new FirecrawlApp("YOUR_API_KEY"); // Start a crawl and get the job ID app.startCrawl("https://example.com").then(response => { console.log(response); }); ``` -------------------------------- ### Start Crawl Example (cURL) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/crawl.mdx Initiate a crawl asynchronously using cURL. This method returns immediately with a crawl ID for later status polling. Replace 'YOUR_API_KEY' with your actual API key. ```bash curl -s -H "Authorization: Bearer YOUR_API_KEY" -X POST -d '{"url": "https://example.com"}' https://api.firecrawl.dev/crawl ``` -------------------------------- ### Setup Cloudflare Worker Project Source: https://github.com/firecrawl/firecrawl-docs/blob/main/quickstarts/cloudflare-workers.mdx Initialize a new Cloudflare Worker project and install the Firecrawl SDK. This sets up the basic project structure and dependencies. ```bash npm create cloudflare@latest my-scraper cd my-scraper npm install firecrawl ``` -------------------------------- ### Start Crawl Example (CLI) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/crawl.mdx Initiate a crawl asynchronously using the Firecrawl CLI. This command returns immediately with a crawl ID for later status polling. Ensure you are logged in with 'firecrawl login'. ```bash firecrawl crawl start --url https://example.com ``` -------------------------------- ### Run the Brand Style Guide Generator Source: https://github.com/firecrawl/firecrawl-docs/blob/main/developer-guides/cookbooks/brand-style-guide-generator-cookbook.mdx Execute the Node.js script using the npm start command. This will trigger the Firecrawl API to scrape brand data from the specified URL and then generate the `brand-style-guide.pdf` file in the project directory. ```bash npm start ``` -------------------------------- ### Launch a Cloud Browser Session Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/browser/cli/basic.mdx Use this command to start a new cloud-based browser session. No specific setup is required beyond having the Firecrawl CLI installed. ```bash firecrawl browser launch-session ``` -------------------------------- ### Quickstart: Scrape, Interact, and Stop Session (CLI) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/interact.mdx Demonstrates the basic workflow of scraping a page, interacting with it via a prompt, and then stopping the session to save changes for writable profiles. ```bash # 1. Scrape a URL RESPONSE=$(firecrawl scrape "https://example.com" --profile default) SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact with the page using a prompt firecrawl interact $SCRAPE_ID --prompt "What is the main content of the page?" # 3. Stop the session firecrawl stop-interaction $SCRAPE_ID ``` -------------------------------- ### Start a Crawl Source: https://github.com/firecrawl/firecrawl-docs/blob/main/sdks/elixir.mdx Start a crawl job and get back the job ID without blocking. ```APIDOC ## start_crawl ### Description Starts a crawl job asynchronously. ### Method `Firecrawl.start_crawl/2` ### Parameters #### Keyword Arguments - **url** (String.t) - Required - The starting URL for the crawl. - **options** (Keyword.t) - Optional - Additional options like `limit`, `depth`, `include_links`, `allowed_domains`, `output_format`, `api_key`, `base_url`. ### Request Example ```elixir Firecrawl.start_crawl([url: "https://example.com"], limit: 10) ``` ### Response #### Success Response - **body** (Map) - Contains the `job_id` for the started crawl. ``` -------------------------------- ### Quickstart: Scrape, Interact, and Stop Session (Python) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/interact.mdx Demonstrates the basic workflow of scraping a page, interacting with it via a prompt, and then stopping the session to save changes for writable profiles. ```python from firecrawl import Firecrawl client = Firecrawl("YOUR_API_KEY") # 1. Scrape a URL scrape_result = client.scrape( "https://example.com", { "profile": "default" } # Optional: for persistent browser state ) scrape_id = scrape_result["data"]["metadata"]["scrapeId"] # 2. Interact with the page using a prompt interact_result = client.interact( scrape_id, { "prompt": "What is the main content of the page?" } ) print(interact_result["data"]["output"]) # 3. Stop the session client.stop_interaction(scrape_id) ``` -------------------------------- ### Quickstart: Scrape, Interact, and Stop Session (JavaScript) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/interact.mdx Demonstrates the basic workflow of scraping a page, interacting with it via a prompt, and then stopping the session to save changes for writable profiles. ```javascript import Firecrawl from 'firecrawl.js' const firecrawl = new Firecrawl("YOUR_API_KEY") async function main() { // 1. Scrape a URL const scrapeResult = await firecrawl.scrape( "https://example.com", { profile: "default" } // Optional: for persistent browser state ) const scrapeId = scrapeResult.data.metadata.scrapeId // 2. Interact with the page using a prompt const interactResult = await firecrawl.interact( scrapeId, { prompt: "What is the main content of the page?" } ) console.log(interactResult.data.output) // 3. Stop the session await firecrawl.stopInteraction(scrapeId) } main() ``` -------------------------------- ### Search Endpoint Example Source: https://github.com/firecrawl/firecrawl-docs/blob/main/api-reference/v2-introduction.mdx This example demonstrates how to use the search endpoint to search the web and get full page content. ```APIDOC ## POST /v2/search ### Description Search the web and get full page content in any format. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/search ### Parameters #### Request Body - **query** (string) - Required - The search query. - **options** (object) - Optional - Additional options for searching. - **include_elements** (array) - Optional - List of HTML elements to include. - **exclude_elements** (array) - Optional - List of HTML elements to exclude. - **include_tags** (array) - Optional - List of HTML tags to include. - **exclude_tags** (array) - Optional - List of HTML tags to exclude. - **mode** (string) - Optional - The output format (e.g., 'markdown', 'json'). - **max_depth** (number) - Optional - Maximum crawl depth. - **max_pages** (number) - Optional - Maximum number of pages to crawl. - **only_from_start** (boolean) - Optional - Whether to only scrape from the start URL. - **with_images** (boolean) - Optional - Whether to include images. - **with_links** (boolean) - Optional - Whether to include links. - **with_headers** (boolean) - Optional - Whether to include headers. - **with_tables** (boolean) - Optional - Whether to include tables. - **with_code** (boolean) - Optional - Whether to include code blocks. - **with_subtitles** (boolean) - Optional - Whether to include subtitles. - **with_bold** (boolean) - Optional - Whether to include bold text. - **with_italic** (boolean) - Optional - Whether to include italic text. - **with_underline** (boolean) - Optional - Whether to include underlined text. - **with_strike** (boolean) - Optional - Whether to include strikethrough text. - **with_lists** (boolean) - Optional - Whether to include lists. - **with_blockquotes** (boolean) - Optional - Whether to include blockquotes. - **with_links_only** (boolean) - Optional - Whether to include only links. - **with_images_only** (boolean) - Optional - Whether to include only images. - **with_tables_only** (boolean) - Optional - Whether to include only tables. - **with_code_only** (boolean) - Optional - Whether to include only code blocks. - **with_headers_only** (boolean) - Optional - Whether to include only headers. - **with_subtitles_only** (boolean) - Optional - Whether to include only subtitles. - **with_bold_only** (boolean) - Optional - Whether to include only bold text. - **with_italic_only** (boolean) - Optional - Whether to include only italic text. - **with_underline_only** (boolean) - Optional - Whether to include only underlined text. - **with_strike_only** (boolean) - Optional - Whether to include only strikethrough text. - **with_lists_only** (boolean) - Optional - Whether to include only lists. - **with_blockquotes_only** (boolean) - Optional - Whether to include only blockquotes. ### Response #### Success Response (200) - **results** (array) - An array of search results. - **url** (string) - The URL of the search result. - **title** (string) - The title of the search result. - **content** (string) - The extracted content from the search result. - **markdown** (string) - The extracted content in markdown format. - **json** (object) - The extracted content in JSON format. #### Response Example ```json { "results": [ { "url": "https://example.com", "title": "Example Domain", "content": "This is the content of the example domain.", "markdown": "# Example Domain\nThis is the content of the example domain.", "json": { "title": "Example Domain", "html": "

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission." } } ] } ``` ``` -------------------------------- ### Example: .env File with Default Output Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/alpha/llmstxt-npx.mdx Shows how to run the NPX package using the API key from a `.env` file, specifying the URL and maximum number of URLs, with the default output directory. ```bash # Using .env file with default output directory npx generate-llmstxt -u https://your-website.com -m 20 ``` -------------------------------- ### Crawl Endpoint Example Source: https://github.com/firecrawl/firecrawl-docs/blob/main/api-reference/v2-introduction.mdx This example demonstrates how to use the crawl endpoint to crawl entire websites and get content from all pages. ```APIDOC ## POST /v2/crawl ### Description Crawl entire websites and get content from all pages. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/crawl ### Parameters #### Request Body - **url** (string) - Required - The starting URL for the crawl. - **options** (object) - Optional - Additional options for crawling. - **include_elements** (array) - Optional - List of HTML elements to include. - **exclude_elements** (array) - Optional - List of HTML elements to exclude. - **include_tags** (array) - Optional - List of HTML tags to include. - **exclude_tags** (array) - Optional - List of HTML tags to exclude. - **mode** (string) - Optional - The output format (e.g., 'markdown', 'json'). - **max_depth** (number) - Optional - Maximum crawl depth. - **max_pages** (number) - Optional - Maximum number of pages to crawl. - **only_from_start** (boolean) - Optional - Whether to only scrape from the start URL. - **with_images** (boolean) - Optional - Whether to include images. - **with_links** (boolean) - Optional - Whether to include links. - **with_headers** (boolean) - Optional - Whether to include headers. - **with_tables** (boolean) - Optional - Whether to include tables. - **with_code** (boolean) - Optional - Whether to include code blocks. - **with_subtitles** (boolean) - Optional - Whether to include subtitles. - **with_bold** (boolean) - Optional - Whether to include bold text. - **with_italic** (boolean) - Optional - Whether to include italic text. - **with_underline** (boolean) - Optional - Whether to include underlined text. - **with_strike** (boolean) - Optional - Whether to include strikethrough text. - **with_lists** (boolean) - Optional - Whether to include lists. - **with_blockquotes** (boolean) - Optional - Whether to include blockquotes. - **with_links_only** (boolean) - Optional - Whether to include only links. - **with_images_only** (boolean) - Optional - Whether to include only images. - **with_tables_only** (boolean) - Optional - Whether to include only tables. - **with_code_only** (boolean) - Optional - Whether to include only code blocks. - **with_headers_only** (boolean) - Optional - Whether to include only headers. - **with_subtitles_only** (boolean) - Optional - Whether to include only subtitles. - **with_bold_only** (boolean) - Optional - Whether to include only bold text. - **with_italic_only** (boolean) - Optional - Whether to include only italic text. - **with_underline_only** (boolean) - Optional - Whether to include only underlined text. - **with_strike_only** (boolean) - Optional - Whether to include only strikethrough text. - **with_lists_only** (boolean) - Optional - Whether to include only lists. - **with_blockquotes_only** (boolean) - Optional - Whether to include only blockquotes. ### Response #### Success Response (200) - **pages** (array) - An array of objects, where each object represents a crawled page. - **url** (string) - The URL of the crawled page. - **content** (string) - The extracted content from the page. - **markdown** (string) - The extracted content in markdown format. - **json** (object) - The extracted content in JSON format. #### Response Example ```json { "pages": [ { "url": "https://example.com", "content": "Extracted content...", "markdown": "# Extracted content...", "json": { "title": "Example Domain", "html": "

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission." } } ] } ``` ``` -------------------------------- ### Quickstart: Scrape, Interact, and Stop Session (cURL) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/interact.mdx Demonstrates the basic workflow of scraping a page, interacting with it via a prompt, and then stopping the session to save changes for writable profiles. ```curl # 1. Scrape a URL RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \ -H "accept: application/json" \ -H "content-type: application/json" \ -H "authorization: Bearer YOUR_API_KEY" \ -d '{ "url": "https://example.com", "profile": "default" }') SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId') # 2. Interact with the page using a prompt curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "accept: application/json" \ -H "content-type: application/json" \ -H "authorization: Bearer YOUR_API_KEY" \ -d '{ "prompt": "What is the main content of the page?" }' \ | jq -r '.data.output' # 3. Stop the session curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \ -H "authorization: Bearer YOUR_API_KEY" ``` -------------------------------- ### Start Mintlify Development Server Source: https://github.com/firecrawl/firecrawl-docs/blob/main/development.mdx Navigate to your documentation directory (where `mint.json` is located) and run this command to start the local development server. The website will be available at http://localhost:3000. ```bash mintlify dev ``` -------------------------------- ### Example: Command Line with Default Output Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/alpha/llmstxt-npx.mdx Demonstrates using a command-line argument for the API key, specifying the URL and maximum number of URLs to analyze, while using the default output directory. ```bash # Using command line argument with default output directory npx generate-llmstxt -k your_api_key -u https://your-website.com -m 20 ``` -------------------------------- ### Quick Start: Execute Code in a Session Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/browser.mdx Demonstrates the basic workflow of creating a session, executing code, and closing it using different interfaces. ```javascript import { firecrawl } from "@firecrawl/javascript"; const client = new firecrawl.FirecrawlApiClient("YOUR_API_KEY"); async function main() { const session = await client.createSession(); const result = await client.execute(session.id, "console.log('Hello World!');"); console.log(result); await client.destroySession(session.id); } main(); ``` ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="YOUR_API_KEY") session = app.create_session() result = app.execute(session.id, "print('Hello World!')") print(result) app.destroy_session(session.id) ``` ```bash curl -s -H "Authorization: Bearer YOUR_API_KEY" \ "https://api.firecrawl.dev/v0/create-session" \ | jq -r '.id' \ | xargs -I {} sh -c 'curl -s -X POST -H "Authorization: Bearer YOUR_API_KEY" -d "code=console.log(\"Hello World!\");" "https://api.firecrawl.dev/v0/execute/{}?use_sandbox=true"' \ | jq '.' \ && curl -s -X DELETE -H "Authorization: Bearer YOUR_API_KEY" "https://api.firecrawl.dev/v0/destroy-session/{}" ``` ```curl curl -s -X POST \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ --data-raw '{ "code": "console.log(\"Hello World!\");" }' \ "https://api.firecrawl.dev/v0/create-session" \ | jq -r '.id' \ | xargs -I {} sh -c 'curl -s -X POST -H "Authorization: Bearer YOUR_API_KEY" -d "code=console.log(\"Hello World!\");" "https://api.firecrawl.dev/v0/execute/{}?use_sandbox=true"' \ | jq '.' \ && curl -s -X DELETE -H "Authorization: Bearer YOUR_API_KEY" "https://api.firecrawl.dev/v0/destroy-session/{}" ``` -------------------------------- ### Example: Custom Output Directory with .env File Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/alpha/llmstxt-npx.mdx Demonstrates using a custom output directory when the API key is configured via a `.env` file. ```bash # Using .env file and custom output directory npx generate-llmstxt -u https://your-website.com -o content/llms ``` -------------------------------- ### Raw Markdown Output Source: https://github.com/firecrawl/firecrawl-docs/blob/main/sdks/cli.mdx Example of getting raw markdown content from a URL. ```bash # Raw markdown output firecrawl https://example.com --format markdown ``` -------------------------------- ### Start Mintlify Development Server Source: https://github.com/firecrawl/firecrawl-docs/blob/main/README.md Starts the Mintlify development server for local preview of the documentation. The command `mint dev` is used, as `mintlify dev` is deprecated. ```bash mint dev ``` -------------------------------- ### Example: Custom Output Directory with API Key Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/alpha/llmstxt-npx.mdx Illustrates how to specify a custom output directory for the generated files when providing the API key via a command-line argument. ```bash # Specifying a custom output directory npx generate-llmstxt -k your_api_key -u https://your-website.com -o custom/path/to/output ``` -------------------------------- ### Test API Endpoints with Curl Source: https://github.com/firecrawl/firecrawl-docs/blob/main/quickstarts/fastapi.mdx Examples of how to test the search, scrape, and interactive start endpoints using curl. ```bash # Search the web curl -X POST http://localhost:8000/search \ -H "Content-Type: application/json" \ -d '{"query": "firecrawl web scraping", "limit": 5}' # Scrape a page curl -X POST http://localhost:8000/scrape \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Start an interactive session, then send prompts curl -X POST http://localhost:8000/interact/start \ -H "Content-Type: application/json" \ -d '{"url": "https://www.amazon.com"}' ``` -------------------------------- ### Basic SDK Usage Source: https://github.com/firecrawl/firecrawl-docs/blob/main/sdks/php.mdx Initialize the client and perform scrape and crawl operations. Ensure the API key is set as an environment variable or passed during client creation. ```php use Firecrawl\Client\FirecrawlClient; use Firecrawl\Models\CrawlOptions; use Firecrawl\Models\ScrapeOptions; $client = FirecrawlClient::fromEnv(); $doc = $client->scrape( 'https://firecrawl.dev', ScrapeOptions::with(formats: ['markdown']) ); $crawl = $client->crawl( 'https://firecrawl.dev', CrawlOptions::with(limit: 5) ); echo $doc->getMarkdown(); echo 'Crawled pages: ' . count($crawl->getData()); ``` -------------------------------- ### Start an Interactive Session with interact() Source: https://github.com/firecrawl/firecrawl-docs/blob/main/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk.mdx Use `interact()` to create a scrape-backed interactive session. Call `start(url)` to get a live view URL and allow the model to reuse the session. Remember to close the session with `interactTool.close()` when done. ```typescript import { generateText, stepCountIs } from 'ai'; import { interact, search } from 'firecrawl-aisdk'; const interactTool = interact(); console.log('Live view:', await interactTool.start('https://news.ycombinator.com')); const { text } = await generateText({ model: 'anthropic/claude-sonnet-4-5', tools: { interact: interactTool, search }, stopWhen: stepCountIs(25), prompt: 'Use interact on the current Hacker News session, find the top story, then search for more context.', }); await interactTool.close(); ``` -------------------------------- ### Get Crawl Status Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/crawl-status/base/curl.mdx Poll the status of a crawl job by providing the jobId. This is useful for tracking the progress of a crawl after it has been started. ```APIDOC ## GET /v2/crawl/ ### Description Retrieve the status of a specific crawl job using its unique identifier. ### Method GET ### Endpoint /v2/crawl/ ### Parameters #### Path Parameters - **jobId** (string) - Required - The unique identifier of the crawl job. ### Request Example ```bash curl -s -X GET "https://api.firecrawl.dev/v2/crawl/" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" ``` ### Response #### Success Response (200) - **status** (string) - The current status of the crawl job (e.g., "pending", "running", "completed", "failed"). - **progress** (number) - The completion percentage of the crawl job. - **resultsUrl** (string) - A URL to access the crawl results if the job is completed. #### Response Example ```json { "status": "running", "progress": 50, "resultsUrl": null } ``` ``` -------------------------------- ### Install Shadcn UI Components Source: https://github.com/firecrawl/firecrawl-docs/blob/main/learn/guide/firecrawl-ui-template.mdx Install the necessary shadcn components for the UI template. Ensure you have shadcn/ui set up in your project. ```bash npx shadcn-ui@latest add button card checkbox collapsible input label ``` -------------------------------- ### Get Check (Python) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/monitoring.mdx Retrieve a specific check using its ID with the Python SDK. Ensure you have the SDK installed and authenticated. ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="YOUR_API_KEY") # Get a specific check check = app.get_check(monitor_id="monitor-123", check_id="check-abc") print(check) ``` -------------------------------- ### Python Quickstart: Scrape and Interact Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/interact/quickstart/python.mdx Initialize the Firecrawl client, scrape a webpage, interact with the scraped content to search and extract data, and finally stop the interaction session. An API key is optional for basic usage but recommended for higher rate limits. ```python from firecrawl import Firecrawl app = Firecrawl( # No API key needed to get started — add one for higher rate limits: # api_key="fc-YOUR-API-KEY", ) # 1. Scrape Amazon's homepage result = app.scrape("https://www.amazon.com", formats=["markdown"]) scrape_id = result.metadata.scrape_id # 2. Interact — search for a product and get its price app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max") response = app.interact(scrape_id, prompt="Click on the first result and tell me the price") print(response.output) # 3. Stop the session app.stop_interaction(scrape_id) ``` -------------------------------- ### Scrape a URL with Node.js Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/scrape.mdx Use the Node.js SDK to scrape a URL and get clean markdown. Ensure you have the SDK installed. ```javascript import { FirecrawlApp } from "firecrawl"; const app = new FirecrawlApp("YOUR_API_KEY"); const response = await app.scrapeUrl("https://example.com"); console.log(response.data.markdown); ``` -------------------------------- ### Basic Webhook Configuration Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/fr/v2/webhook-config/basic/json.mdx This example demonstrates a basic webhook configuration. It includes the webhook URL, optional metadata, and a list of events to subscribe to. ```APIDOC ## POST /v2/webhook-config ### Description Configure your webhook to receive real-time event notifications. ### Method POST ### Endpoint /v2/webhook-config ### Request Body - **webhook** (object) - Required - The webhook configuration object. - **url** (string) - Required - The URL where the webhook events will be sent. - **metadata** (object) - Optional - Custom key-value pairs to be included in the webhook payload. - **events** (array) - Required - A list of events to subscribe to. Possible values: "start", "page", "completed", "failed". ### Request Example ```json { "webhook": { "url": "https://your-domain.com/webhook", "metadata": { "any_key": "any_value" }, "events": ["start", "page", "completed", "failed"] } } ``` ### Response #### Success Response (200) - **message** (string) - Confirmation message indicating the webhook has been configured successfully. #### Response Example ```json { "message": "Webhook configured successfully." } ``` ``` -------------------------------- ### Scrape a URL with Python Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/scrape.mdx Use the Python SDK to scrape a URL and get clean markdown. Ensure you have the SDK installed. ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="YOUR_API_KEY") response = app.scrape_url("https://example.com") print(response.data.markdown) ``` -------------------------------- ### Initialize Project and Install Firecrawl Source: https://github.com/firecrawl/firecrawl-docs/blob/main/quickstarts/aws-lambda.mdx Sets up a new project directory for AWS Lambda integration and installs the Firecrawl Node.js package using npm. ```bash mkdir firecrawl-lambda && cd firecrawl-lambda npm init -y npm install firecrawl ``` -------------------------------- ### Setup Local MCP Server with Firecrawl Source: https://github.com/firecrawl/firecrawl-docs/blob/main/developer-guides/llm-sdks-and-frameworks/google-adk.mdx Configure an ADK agent to use a local Firecrawl MCP server. This requires Node.js and npm to be installed for running the firecrawl-mcp command. Ensure you have a Firecrawl API key and have installed the Google ADK. ```python from google.adk.agents.llm_agent import Agent from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset from mcp import StdioServerParameters root_agent = Agent( model='gemini-2.5-pro', name='firecrawl_agent', description='A helpful assistant for scraping websites with Firecrawl', instruction='Help the user search for website content', tools=[ MCPToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', args=[ "-y", "firecrawl-mcp", ], env={ "FIRECRAWL_API_KEY": "YOUR-API-KEY", } ), timeout=30, ), ) ], ) ``` -------------------------------- ### Clone the repository Source: https://github.com/firecrawl/firecrawl-docs/blob/main/learn/scrape-analyze-airbnb-data.mdx Clone the GitHub repository for the example. Navigate into the cloned directory. ```bash git clone https://github.com/e2b-dev/e2b-cookbook.git cd e2b-cookbook/examples/scrape-and-analyze-airbnb-data-with-firecrawl ``` -------------------------------- ### Scrape a URL and get JSON data Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/scrape/json/events-example/curl.mdx This example demonstrates how to scrape a URL and specify a JSON schema to extract particular fields. ```APIDOC ## POST /v2/scrape ### Description Scrapes a given URL and returns the content in a specified format. This example focuses on extracting data into a JSON format based on a provided schema. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/scrape ### Parameters #### Request Body - **url** (string) - Required - The URL of the webpage to scrape. - **formats** (array) - Required - An array of format objects specifying how the data should be returned. - **type** (string) - Required - The desired output format (e.g., "json"). - **schema** (object) - Optional - Defines the structure for JSON output. - **type** (string) - Required - The schema type, typically "object". - **properties** (object) - Required - An object defining the properties to extract. - **company_mission** (string) - Type definition for company_mission. - **supports_sso** (boolean) - Type definition for supports_sso. - **is_open_source** (boolean) - Type definition for is_open_source. - **is_in_yc** (boolean) - Type definition for is_in_yc. - **required** (array) - Required - An array of strings listing the required properties. ### Request Example ```json { "url": "https://firecrawl.dev/", "formats": [ { "type": "json", "schema": { "type": "object", "properties": { "company_mission": { "type": "string" }, "supports_sso": { "type": "boolean" }, "is_open_source": { "type": "boolean" }, "is_in_yc": { "type": "boolean" } }, "required": [ "company_mission", "supports_sso", "is_open_source", "is_in_yc" ] } } ] } ``` ### Response #### Success Response (200) - **data** (object) - The scraped data conforming to the specified schema. #### Response Example ```json { "data": { "company_mission": "To make web data extraction accessible and efficient for everyone.", "supports_sso": true, "is_open_source": true, "is_in_yc": false } } ``` ``` -------------------------------- ### Get Crawl Status in Elixir Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/es/v2/crawl-status/short/elixir.mdx Use this to retrieve the status of a specific crawl by its ID. Ensure you have the Firecrawl library installed and configured. ```elixir {:ok, status} = Firecrawl.get_crawl_status("") IO.inspect(status.body) ``` -------------------------------- ### OpenAPI Configuration Examples Source: https://github.com/firecrawl/firecrawl-docs/blob/main/_essentials/settings.mdx Configure the OpenAPI file path(s) for API documentation generation. Supports absolute, relative, and multiple URLs. ```json "openapi": "https://example.com/openapi.json" ``` ```json "openapi": "/openapi.json" ``` ```json "openapi": ["https://example.com/openapi1.json", "/openapi2.json", "/openapi3.json"] ``` -------------------------------- ### Scrape a URL with CLI Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/scrape.mdx Use the Firecrawl CLI to scrape a URL and get clean markdown. Ensure the CLI is installed and configured. ```bash firecrawl scrape "https://example.com" --api-key YOUR_API_KEY ``` -------------------------------- ### Copy Environment File Source: https://github.com/firecrawl/firecrawl-docs/blob/main/contributing/guide.mdx Copies the example environment file to create a new .env file in the apps/api directory. This file will store your local configuration settings. ```bash cp apps/api/.env.example apps/api/.env ``` -------------------------------- ### POST Docs Search cURL Example Source: https://github.com/firecrawl/firecrawl-docs/blob/main/agent-source-of-truth/curl.mdx Use this cURL command to search Firecrawl documentation. Provide a question to get a docs-grounded answer. ```bash curl -X POST "https://api.firecrawl.dev/v2/support/docs-search" \ -H "Authorization: Bearer $FIRECRAWL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "question": "how do I verify webhook signatures?" }' ``` -------------------------------- ### Create Page Monitor (Node.js) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/monitoring-page.mdx This Node.js example demonstrates how to create a page monitor using the Firecrawl SDK. It requires the 'firecrawl' package to be installed. ```javascript import { Firecrawl } from "firecrawl"; const client = new Firecrawl("YOUR_API_KEY"); async function createMonitor() { const response = await client.monitor.create({ name: "My Page Monitor", schedule: "every 30 minutes", target: { type: "scrape", urls: ["https://example.com/page1", "https://example.com/page2"] } }); console.log(response); } createMonitor(); ``` -------------------------------- ### Initialize Node.js Project and Configure ES Modules Source: https://github.com/firecrawl/firecrawl-docs/blob/main/developer-guides/cookbooks/brand-style-guide-generator-cookbook.mdx Set up a new Node.js project directory and initialize it with npm. Update package.json to enable ES module support and define a start script for running TypeScript files. ```bash mkdir brand-style-guide-generator && cd brand-style-guide-generator npm init -y ``` ```json { "name": "brand-style-guide-generator", "version": "1.0.0", "type": "module", "scripts": { "start": "npx tsx index.ts" } } ``` -------------------------------- ### Crawl Started Webhook Source: https://github.com/firecrawl/firecrawl-docs/blob/main/api-reference/endpoint/webhook-crawl-started.mdx This event is sent when a crawl job begins processing. Refer to the linked documentation for payload examples, configuration, and retry behavior. ```APIDOC ## Webhook: Crawl Started ### Description This webhook event is triggered when a crawl job starts processing. ### Event Name crawlStarted ### Further Information For payload examples, configuration details, and retry mechanisms, consult the [Webhook Event Types](/webhooks/events#crawlstarted) and [Webhook Overview](/webhooks/overview) documentation. ``` -------------------------------- ### Create Monitor (Crawl) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/pt-BR/v2/monitor/create/crawl/curl.mdx This example demonstrates how to create a monitor that crawls a specified URL. It includes setting a name, schedule, webhook for notifications, a goal description, and target crawl options. ```APIDOC ## POST /v2/monitor ### Description Creates a new monitor to track changes on a web page or crawl a website. This specific example focuses on setting up a crawl monitor. ### Method POST ### Endpoint https://api.firecrawl.dev/v2/monitor ### Parameters #### Request Body - **name** (string) - Required - The name of the monitor. - **schedule** (object) - Required - The schedule for the monitor. - **cron** (string) - Required - The cron expression for the schedule. - **timezone** (string) - Required - The timezone for the schedule. - **webhook** (object) - Optional - Configuration for webhook notifications. - **url** (string) - Required - The URL to send webhook notifications to. - **events** (array of strings) - Required - A list of events to trigger webhooks for. - **goal** (string) - Required - A description of the monitor's goal. - **targets** (array of objects) - Required - A list of targets for the monitor. - **type** (string) - Required - The type of target. Must be "crawl" for this example. - **url** (string) - Required - The URL to crawl. - **crawlOptions** (object) - Optional - Options for the crawl. - **limit** (integer) - Optional - The maximum number of pages to crawl. - **maxDiscoveryDepth** (integer) - Optional - The maximum depth to discover links. ### Request Example ```json { "name": "Docs monitor", "schedule": { "cron": "7-59/15 * * * *", "timezone": "UTC" }, "webhook": { "url": "https://example.com/webhooks/firecrawl", "events": ["monitor.page", "monitor.check.completed"] }, "goal": "Notify me when docs pages add, remove, or materially change API behavior", "targets": [ { "type": "crawl", "url": "https://example.com/docs", "crawlOptions": { "limit": 100, "maxDiscoveryDepth": 3 } } ] } ``` ### Response #### Success Response (200) (Response structure not provided in the source) #### Response Example (Response example not provided in the source) ``` -------------------------------- ### Start and Get Crawl Status Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/crawl-status/base/js.mdx Initiates a web crawl and then retrieves its status using the provided crawl ID. Ensure you have your API key configured. ```javascript import { Firecrawl } from 'firecrawl'; const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" }); // Start a crawl const { id } = await firecrawl.startCrawl('https://docs.firecrawl.dev', { limit: 5 }); // Get the status of the crawl const status = await firecrawl.getCrawlStatus(id); console.log(status); ``` -------------------------------- ### Crawl and Wait Example (Node.js) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/crawl.mdx Initiate a crawl and wait for completion, automatically handling pagination. Recommended for most use cases. Requires the '@fire-crawl/sdk' package. ```javascript import FirecrawlApp from "@fire-crawl/sdk"; const app = new FirecrawlApp("YOUR_API_KEY"); // Crawl a URL and wait for the results app.crawl("https://example.com").then(response => { console.log(response); }); ``` -------------------------------- ### Get Check (Node.js) Source: https://github.com/firecrawl/firecrawl-docs/blob/main/features/monitoring.mdx Fetch a specific check using its ID with the Node.js SDK. Make sure the SDK is installed and your API key is configured. ```javascript import { FirecrawlApp } from "firecrawl"; const app = new FirecrawlApp("YOUR_API_KEY"); // Get a specific check const check = await app.getCheck({ monitorId: "monitor-123", checkId: "check-abc" }); console.log(check); ``` -------------------------------- ### Install Firecrawl Node.js SDK Source: https://github.com/firecrawl/firecrawl-docs/blob/main/snippets/v2/installation/js.mdx Install the SDK using npm. An API key is optional for initial use but recommended for higher rate limits. ```javascript // npm install firecrawl import { Firecrawl } from 'firecrawl'; const firecrawl = new Firecrawl({ // No API key needed to get started — add one for higher rate limits: // apiKey: "fc-YOUR-API-KEY", }); ```