### Install Crawlbase Node.js Client Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Install the Crawlbase Node.js client using npm. ```bash npm install crawlbase ``` -------------------------------- ### CrawlingAPI GET and POST Requests Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/INDEX.md Demonstrates basic GET, POST, and PUT requests using the CrawlingAPI. Includes examples with and without optional parameters. ```javascript const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); // GET request api.get('https://example.com'); api.get('https://example.com', { userAgent: '...', pageWait: 5000 }); // POST request api.post('https://example.com', { key: 'value' }); api.post('https://example.com', { key: 'value' }, { postType: 'json' }); // PUT request api.put('https://example.com', { key: 'value' }); ``` -------------------------------- ### Example POST Request with CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md An example of a POST request to search on Product Hunt, sending a text query and logging the response body. ```javascript api.post('https://producthunt.com/search', { text: 'example search' }).then(response => { if (response.statusCode === 200) { console.log(response.body); } }).catch(error => console.error); ``` -------------------------------- ### Example PUT Request with CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md An example of making a PUT request to send data for a search on Product Hunt. ```javascript api.put('https://producthunt.com/search', { text: 'example search' }).then(response => { if (response.statusCode === 200) { console.log(response.body); } }).catch(error => console.error); ``` -------------------------------- ### Example GET Request with CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md An example of making a GET request to scrape a Facebook profile and logging the response body if successful. Includes basic error handling. ```javascript api.get('https://www.facebook.com/britneyspears').then(response => { if (response.statusCode === 200) { console.log(response.body); } }).catch(error => console.error); ``` -------------------------------- ### Basic CrawlingAPI Configuration Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Demonstrates the basic setup for initializing the CrawlingAPI with a token and a custom timeout value. ```javascript const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: process.env.CRAWLBASE_TOKEN, timeout: 120000 // 2 minutes }); ``` -------------------------------- ### GET Request with JavaScript Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Example of making a GET request for a JavaScript-rendered page, including additional options like 'page_wait' to control rendering time. ```javascript api.get('https://www.freelancer.com', { page_wait: 5000 }).then(response => { if (response.statusCode === 200) { console.log(response.body); } }).catch(error => console.error); ``` -------------------------------- ### Install Crawlbase Node.js Module Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Install the crawlbase package using npm. This is the first step before requiring the API classes. ```javascript npm i crawlbase ``` -------------------------------- ### Backward Compatibility Example Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Shows how to use the `CrawlbaseAPI` alias for `CrawlingAPI` in older codebases, while also demonstrating the preferred modern import. ```javascript // Old code (still works) const { CrawlbaseAPI } = require('crawlbase'); const api = new CrawlbaseAPI({ token: 'YOUR_TOKEN' }); // New code (preferred) const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); // Both refer to the same class ``` -------------------------------- ### Example Usage of LeadsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Demonstrates how to call the getFromDomain method of the LeadsAPI and shows the expected shape of the response, including the 'leads' array. ```javascript const response = await leadsApi.getFromDomain('example.com'); // Response shape: { statusCode: 200, body: '...', json: { leads: [ { email: 'contact@example.com', name: 'John Doe' }, { email: 'info@example.com', phone: '+1-555-0100' } ] }, leads: [ { email: 'contact@example.com', name: 'John Doe' }, { email: 'info@example.com', phone: '+1-555-0100' } ], headers: { ... }, url: 'https://example.com' } ``` -------------------------------- ### Example APIResponse Structure Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Provides an example of the shape of the APIResponse object returned by API methods, showing typical fields and values. ```javascript const response = await api.get('https://example.com'); // Response shape: { statusCode: 200, body: '...', headers: { 'content-type': 'text/html; charset=UTF-8', 'content-length': '1024' }, url: 'https://example.com', originalStatus: 200, cbStatus: 200 } ``` -------------------------------- ### Example Usage of LeadsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Demonstrates calling the 'getFromDomain' method of the LeadsAPI to retrieve leads from a specified domain and logging the 'leads' property of the response. ```javascript api.getFromDomain('somesite.com').then(response => { console.log(response.leads); }); ``` -------------------------------- ### Configuration via .env File Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Example of storing API token and timeout in a `.env` file for environment-specific configuration. ```dotenv CRAWLBASE_TOKEN=your_token_here CRAWLBASE_TIMEOUT=120000 ``` -------------------------------- ### Initialize and Use Screenshots API Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Initialize the API with your token and call the get method to fetch screenshot binary content. Save the response body to a file. ```javascript const api = new ScreenshotsAPI({ token: 'YOUR_TOKEN' }); api.get('https://www.amazon.com').then(response => { fs.writeFileSync('amazon.jpg', response.body, { encoding: 'binary' }); }); ``` -------------------------------- ### Promise Return Types Example Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Illustrates how to import and use CrawlingAPI and LeadsAPI with TypeScript, demonstrating type annotations for responses. ```typescript import { CrawlingAPI, LeadsAPI } from 'crawlbase'; const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const response: APIResponse = await api.get('https://example.com'); const leadsApi = new LeadsAPI({ token: 'YOUR_TOKEN' }); const leadsResponse: LeadsAPIResponse = await leadsApi.getFromDomain('example.com'); ``` -------------------------------- ### TypeScript Usage Example Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Demonstrates how to import and use the CrawlingAPI in TypeScript. Ensure you have your API token ready. ```typescript import { CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI } from 'crawlbase'; const crawlingApi = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const response = await crawlingApi.get('https://example.com', { format: 'json' }); if (response.statusCode === 200) { console.log(response.json); } ``` -------------------------------- ### Complete Configuration Setup for Multiple APIs Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Load and validate configuration for multiple Crawlbase services (Crawling, Scraper, Leads, Screenshots) from environment variables. Initializes all specified APIs with a common timeout. ```javascript const fs = require('fs'); const path = require('path'); const { CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI } = require('crawlbase'); // Load configuration from environment const config = { crawlingToken: process.env.CRAWLBASE_CRAWLING_TOKEN, scrapingToken: process.env.CRAWLBASE_SCRAPER_TOKEN, leadsToken: process.env.CRAWLBASE_LEADS_TOKEN, screenshotsToken: process.env.CRAWLBASE_SCREENSHOTS_TOKEN, timeout: parseInt(process.env.CRAWLBASE_TIMEOUT || '90000') }; // Validate configuration Object.entries(config).forEach(([key, value]) => { if (typeof value === 'string' && !value) { throw new Error(`Configuration error: ${key} is required`); } }); // Initialize APIs const apis = { crawling: new CrawlingAPI({ token: config.crawlingToken, timeout: config.timeout }), scraper: new ScraperAPI({ token: config.scrapingToken, timeout: config.timeout }), leads: new LeadsAPI({ token: config.leadsToken, timeout: config.timeout }), screenshots: new ScreenshotsAPI({ token: config.screenshotsToken, timeout: config.timeout }) }; module.exports = apis; ``` -------------------------------- ### Example Usage of lowerHeaders Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Shows how to import and use the lowerHeaders utility function to convert header keys to lowercase. ```javascript const { lowerHeaders } = require('crawlbase/src/utils.js'); const headers = { 'Content-Type': 'application/json', 'X-Custom-Header': 'value' }; lowerHeaders(headers); // { // 'content-type': 'application/json', // 'x-custom-header': 'value' // } ``` -------------------------------- ### TypeScript Usage Example Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Demonstrates how to import and instantiate the `CrawlingAPI` in a TypeScript environment, including making a basic request. ```typescript import { CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI } from 'crawlbase'; const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const response = await api.get('https://example.com'); ``` -------------------------------- ### Example Usage of CrawlingAPI.get() with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Demonstrates calling the CrawlingAPI.get() method with specific options for user agent, format, and page wait time. ```javascript api.get('https://example.com', { userAgent: 'Mozilla/5.0 (Custom)', format: 'json', pageWait: 5000 }); ``` -------------------------------- ### Configuration via JavaScript Object Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Example of defining API token and timeout within a JavaScript configuration object. ```javascript const config = { crawlbase: { token: 'your_token_here', timeout: 120000 } }; ``` -------------------------------- ### Example Usage of ScreenshotsAPI.get() with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Illustrates calling ScreenshotsAPI.get() with options to specify a mobile device, custom dimensions, image format, and wait time. ```javascript api.get('https://example.com', { device: 'mobile', browserWidth: 375, browserHeight: 667, format: 'jpg', waitTime: 3000 }); ``` -------------------------------- ### Example Usage of snakeCase Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Demonstrates how to import and use the snakeCase utility function to convert camelCase strings to snake_case. ```javascript const { snakeCase } = require('crawlbase/src/utils.js'); snakeCase('userAgent'); // 'user_agent' snakeCase('pageWait'); // 'page_wait' ``` -------------------------------- ### Example of a LeadObject Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Illustrates the structure of a single lead object, showing common fields like email, name, title, and social media links. ```javascript const lead = { email: 'contact@example.com', name: 'Jane Smith', title: 'Marketing Manager', company: 'Example Corp', linkedin: 'https://linkedin.com/in/janesmith', twitter: 'https://twitter.com/janesmith' }; ``` -------------------------------- ### LeadsAPI Get From Domain Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/INDEX.md Illustrates how to use the LeadsAPI to extract email and contact information from a given domain. ```javascript const { LeadsAPI } = require('crawlbase'); const api = new LeadsAPI({ token: 'YOUR_TOKEN' }); api.getFromDomain('example.com'); // Returns: { statusCode, body, leads: [ /* email/contact objects */ ] } ``` -------------------------------- ### GET Request with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-crawling-api.md Perform a GET request with custom options, such as specifying a User-Agent or requesting the response in JSON format. This allows for more targeted scraping and data retrieval. ```javascript api.get('https://www.facebook.com/britneyspears', { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)', format: 'json' }) .then(response => { if (response.statusCode === 200) { console.log(response.json); } }) .catch(error => console.error(error)); ``` -------------------------------- ### get(url, options) Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-crawling-api.md Performs a GET request to scrape a URL. This method can fetch both static HTML and JavaScript-rendered content, depending on the type of token used. ```APIDOC ## `get(url, options)` Performs a GET request to scrape a URL. Supports both static HTML and JavaScript-rendered content (when using a JavaScript token). **Parameters:** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | url | string | yes | — | URL to scrape | | options | object | no | {} | Crawlbase API parameters (see Crawlbase API documentation) | | options.userAgent | string | no | — | Custom User-Agent header | | options.format | string | no | — | Response format ('json' or 'html') | | options.pageWait | number | no | — | Time to wait for JavaScript rendering (milliseconds) | **Returns:** `Promise` — Resolves with response object containing: - `statusCode` (number): HTTP status code from Crawlbase - `body` (string): HTML response body - `json` (object): Parsed JSON if format was set to 'json' - `originalStatus` (number): Original website HTTP status code - `cbStatus` or `pcStatus` (number): Crawlbase processing status code - `headers` (object): Response headers - `url` (string): Requested URL **Rejects:** | Error Type | Condition | |-----------|-----------| | Error | Request timeout, network error, or invalid URL | **Example:** ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); // Simple GET request api.get('https://www.example.com') .then(response => { if (response.statusCode === 200) { console.log(response.body); console.log('Original status:', response.originalStatus); console.log('Crawlbase status:', response.cbStatus); } }) .catch(error => console.error('Request failed:', error)); // With options api.get('https://www.facebook.com/britneyspears', { userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)', format: 'json' }) .then(response => { if (response.statusCode === 200) { console.log(response.json); } }) .catch(error => console.error(error)); // JavaScript rendering const jsApi = new CrawlingAPI({ token: 'YOUR_JAVASCRIPT_TOKEN' }); jsApi.get('https://www.nfl.com', { pageWait: 5000 }) .then(response => { console.log('Rendered content:', response.body); }); ``` ``` -------------------------------- ### Example Usage of CrawlingAPI.post() with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Shows how to use CrawlingAPI.post() with options to send data as JSON and specify a custom content type. ```javascript api.post('https://example.com', { key: 'value' }, { postType: 'json', postContentType: 'application/json' }); ``` -------------------------------- ### Perform a Simple GET Request Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-crawling-api.md Execute a basic GET request to scrape a URL. Handles static HTML content and provides response details like status code and body. ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); // Simple GET request api.get('https://www.example.com') .then(response => { if (response.statusCode === 200) { console.log(response.body); console.log('Original status:', response.originalStatus); console.log('Crawlbase status:', response.cbStatus); } }) .catch(error => console.error('Request failed:', error)); ``` -------------------------------- ### Capture Desktop and Mobile Screenshots Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-screenshots-api.md Capture screenshots for both desktop and mobile viewports of a given URL and save them to files. This example demonstrates handling successful responses and basic error logging. ```javascript const fs = require('fs'); const { ScreenshotsAPI } = require('crawlbase'); const api = new ScreenshotsAPI({ token: 'YOUR_TOKEN' }); async function captureComparison(url, filename) { try { // Desktop const desktopResponse = await api.get(url, { device: 'desktop' }); if (desktopResponse.statusCode === 200) { fs.writeFileSync(`${filename}-desktop.jpg`, desktopResponse.body, { encoding: 'binary' }); console.log(`Desktop: ${filename}-desktop.jpg`); } // Mobile const mobileResponse = await api.get(url, { device: 'mobile' }); if (mobileResponse.statusCode === 200) { fs.writeFileSync(`${filename}-mobile.jpg`, mobileResponse.body, { encoding: 'binary' }); console.log(`Mobile: ${filename}-mobile.jpg`); } } catch (error) { console.error('Screenshot capture failed:', error); } } captureComparison('https://www.example.com', 'example'); ``` -------------------------------- ### CrawlingAPI for Static HTML Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Example of using CrawlingAPI to fetch static HTML content. This is the default behavior when no special options are provided. ```javascript // Static HTML content const api = new CrawlingAPI({ token: 'NORMAL_TOKEN' }); api.get('https://example.com').then(response => { console.log(response.body); // HTML string }); ``` -------------------------------- ### Leads API Get From Domain Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Initializes the Leads API and retrieves lead information from a specified domain. This method is designed to find contact information or business leads associated with a website. ```APIDOC ## GET /leads ### Description Retrieves lead information from a given domain using the Leads API. This is useful for business development and sales prospecting. ### Method GET ### Endpoint / ### Parameters #### Path Parameters None #### Query Parameters - **domain** (string) - Required - The domain name to search for leads (e.g., 'somesite.com'). - **options** (object) - Optional - Additional options for the API request. ### Request Example ```javascript const api = new LeadsAPI({ token: 'YOUR_TOKEN' }); api.getFromDomain('example.com').then(response => { console.log(response.leads); }); ``` ### Response #### Success Response (200) - **leads** (array) - A list of lead objects found for the domain. #### Response Example ```json { "leads": [ { "name": "John Doe", "email": "john.doe@example.com" } ] } ``` ``` -------------------------------- ### Example GET Request with ScraperAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md An example of using the ScraperAPI to fetch data from Amazon. It logs the 'json' property of the response if the status code is 200. ```javascript api.get('https://www.amazon.com/Halo-SleepSack-Swaddle-Triangle-Neutral/dp/B01LAG1TOS').then(response => { if (response.statusCode === 200) { console.log(response.json); } }).catch(error => console.error); ``` -------------------------------- ### Configuring Request Timeout Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/errors.md Provides examples of setting custom timeout values for CrawlingAPI requests. Use longer timeouts for slow connections and shorter ones for quick requests. ```javascript // Increase timeout for slow connections const api = new CrawlingAPI({ token: 'YOUR_TOKEN', timeout: 180000 // 3 minutes }); // Use shorter timeouts for quick requests const quickApi = new CrawlingAPI({ token: 'YOUR_TOKEN', timeout: 30000 // 30 seconds }); ``` -------------------------------- ### Override User-Agent Header Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Customize request headers by providing an options object to the API's get method. This example shows how to override the User-Agent string for a specific request. ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); api.get('https://example.com', { userAgent: 'Custom Bot/1.0' // Override User-Agent }) ``` -------------------------------- ### GET Request for JavaScript-Rendered Pages Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Perform a GET request to scrape a URL that requires JavaScript rendering. Note that only GET requests are supported for javascript tokens. ```javascript api.get('https://www.nfl.com').then(response => { if (response.statusCode === 200) { console.log(response.body); } }).catch(error => console.error); ``` -------------------------------- ### Instantiate CrawlingAPI with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-base-api.md Shows how to initialize a specialized API class like CrawlingAPI, including setting an authentication token and an optional custom timeout. ```javascript const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN', timeout: 120000 // Optional: custom timeout }); ``` -------------------------------- ### Initialize Multiple API Clients Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Shows how to initialize instances of all major Crawlbase API clients. Each client requires an API token for authentication. ```javascript const { CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI } = require('crawlbase'); const crawlingApi = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const scraperApi = new ScraperAPI({ token: 'YOUR_TOKEN' }); const leadsApi = new LeadsAPI({ token: 'YOUR_TOKEN' }); const screenshotsApi = new ScreenshotsAPI({ token: 'YOUR_TOKEN' }); ``` -------------------------------- ### Perform a GET Request with CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Make a GET request to scrape a URL. You can pass additional options as specified in the Crawlbase API documentation. ```javascript api.get(url, options); ``` -------------------------------- ### ScraperAPI GET Request Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/INDEX.md Shows how to perform a GET request using the ScraperAPI, which is designed for extracting structured data. Note that POST/PUT requests are not supported. ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); // GET only (POST/PUT throw errors) api.get('https://www.amazon.com/product/123'); // Returns: { statusCode, body, json: { /* structured data */ } } ``` -------------------------------- ### CrawlingAPI GET Request (JavaScript-rendered) Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md The CrawlingAPI supports GET requests for JavaScript-rendered content. You can specify a `pageWait` option to control the time the client waits for the page to render. ```APIDOC ## CrawlingAPI GET Request (JavaScript-rendered) JavaScript-rendered content (React, Vue, Angular, etc.): ```javascript // JavaScript-rendered content (React, Vue, Angular, etc.) const jsApi = new CrawlingAPI({ token: 'JAVASCRIPT_TOKEN' }); jsApi.get('https://example.com', { pageWait: 5000 }).then(response => { console.log(response.body); // Rendered HTML }); ``` ``` -------------------------------- ### Initialize LeadsAPI with Token Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-leads-api.md Instantiate the LeadsAPI client by providing your Crawlbase Leads API token during initialization. This is a required step before making any API calls. ```javascript const { LeadsAPI } = require('crawlbase'); const api = new LeadsAPI({ token: 'YOUR_LEADS_API_TOKEN' }); ``` -------------------------------- ### ScraperAPI GET Method Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-scraper-api.md Performs a GET request to scrape structured data from a given URL using predefined scraper templates. Supports optional parameters for customization. ```APIDOC ## `get(url, options)` ### Description Performs a GET request using the Scraper API to extract structured data from a website using predefined scraper templates. ### Method GET ### Endpoint `/v1/scraper` (Implicit, as this is an SDK method that calls the API) ### Parameters #### Path Parameters None #### Query Parameters - **url** (string) - Required - The URL of the website to scrape. - **options** (object) - Optional - Additional parameters for the Scraper API. - **deviceType** (string) - Optional - Specifies the device type ('mobile' or 'desktop'). - **timeoutMs** (number) - Optional - Custom timeout in milliseconds for the request. ### Request Example ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); // Scrape Amazon product page api.get('https://www.amazon.com/Halo-SleepSack-Swaddle-Triangle-Neutral/dp/B01LAG1TOS') .then(response => { if (response.statusCode === 200) { console.log('Product data:', response.json); } }) .catch(error => console.error('Scraping failed:', error)); // Scrape with additional options api.get('https://example.com/product', { deviceType: 'mobile', timeoutMs: 60000 }) .then(response => { if (response.statusCode === 200) { console.log(response.json); } }) .catch(error => console.error(error)); ``` ### Response #### Success Response (200) - **statusCode** (number) - HTTP status code from Crawlbase. - **body** (string) - Scraped data content. - **json** (object) - Parsed JSON response containing structured data extracted by the scraper template. - **originalStatus** (number) - Original website HTTP status code. - **cbStatus** or **pcStatus** (number) - Crawlbase processing status code. - **headers** (object) - Response headers. - **url** (string) - The requested URL. #### Response Example ```json { "statusCode": 200, "body": "...", "json": { "title": "Example Product", "price": "$19.99", "availability": "In Stock" }, "originalStatus": 200, "cbStatus": 1, "headers": { ... }, "url": "https://example.com/product" } ``` ### Rejects - **Error**: Request timeout or network error. ``` -------------------------------- ### Scraper API GET Request Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-scraper-api.md This snippet demonstrates how to use the Scraper API to make a GET request to a specified URL and handle the response, including checking for errors and extracting structured data. ```APIDOC ## GET /scraper ### Description Sends a GET request to the specified URL to scrape website content and extract structured data using available scraper templates. ### Method GET ### Endpoint /scraper ### Parameters #### Query Parameters - **url** (string) - Required - The URL of the website to scrape. ### Request Example ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); api.get('https://www.example.com'); ``` ### Response #### Success Response (200) - **statusCode** (number) - HTTP status code from Crawlbase API. - **body** (string) - Response body content. - **json** (object) - Structured data extracted by scraper template. - **headers** (object) - Response headers (lowercase keys). - **url** (string) - Requested URL. - **originalStatus** (number) - Original website HTTP status code. - **cbStatus** (number) - Crawlbase processing status code. - **pcStatus** (number) - Crawlbase processing status code. #### Response Example ```json { "statusCode": 200, "body": "...", "json": { "title": "Example Product", "price": "$19.99" }, "headers": { "content-type": "text/html" }, "url": "https://www.example.com", "originalStatus": 200, "cbStatus": 1, "pcStatus": 1 } ``` ### Error Handling - **Non-200 Status Codes**: Indicate request failures. Check `cbStatus` for details. - **Missing `json` field**: Indicates no structured data was extracted. - **Network Errors**: Caught by the `.catch()` block in the example. ``` -------------------------------- ### Crawling API GET Request Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Performs a GET request to the Crawling API. This method is used to scrape a given URL with optional parameters. It's also used for JavaScript rendering when initialized with a JavaScript token. ```APIDOC ## GET /crawl ### Description Performs a GET request to scrape a URL. Can be used for standard scraping or for websites requiring JavaScript rendering when initialized with a JavaScript token. ### Method GET ### Endpoint / ### Parameters #### Path Parameters None #### Query Parameters - **url** (string) - Required - The URL to scrape. - **options** (object) - Optional - Additional options for the API request, such as `userAgent`, `format`, `page_wait` (for JS rendering). ### Request Example ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); api.get('https://www.example.com', { format: 'json' }).then(response => { console.log(response.body); }).catch(error => console.error); ``` ### Response #### Success Response (200) - **body** (string/object) - The scraped content of the page. - **statusCode** (number) - The HTTP status code of the response. - **originalStatus** (number) - The original status code from the target website. - **cbStatus** (number) - The status code from Crawlbase. #### Response Example ```json { "body": "...", "statusCode": 200, "originalStatus": 200, "cbStatus": 200 } ``` ``` -------------------------------- ### ScraperAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md The ScraperAPI is designed to extract structured data from websites. It exclusively supports the GET method. ```APIDOC ## ScraperAPI ### Description This API endpoint is used to extract structured data from websites. It only supports the GET method. ### Method GET ### Endpoint `https://api.crawlbase.com//scraper` ``` -------------------------------- ### ScraperAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md HTTP client for Crawlbase Scraper API, extending CrawlingAPI. It only allows GET requests. ```APIDOC ## ScraperAPI ### Description HTTP client for Crawlbase Scraper API, extending CrawlingAPI. It only allows GET requests. ### Constructor `new ScraperAPI({ token: string, timeout?: number })` ### Public Methods #### get - **Signature**: `get(url: string, options?: object)` - **Returns**: `Promise` ### Overridden Methods #### post - **Signature**: `post()` - **Behavior**: Throws `Error: Only GET is allowed for the ScraperAPI` #### put - **Signature**: `put()` - **Behavior**: Throws `Error: Only GET is allowed for the ScraperAPI` ``` -------------------------------- ### Instantiate BaseAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-base-api.md Demonstrates how to create an instance of the BaseAPI class with a token. This class is intended for extension, not direct use. ```javascript const api = new BaseAPI({ token: 'your_token_here' }); ``` -------------------------------- ### ScreenshotsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md The ScreenshotsAPI allows users to capture screenshots of websites. This API endpoint utilizes the GET method. ```APIDOC ## ScreenshotsAPI ### Description This API endpoint is used to capture screenshots of websites. It uses the GET method. ### Method GET ### Endpoint `https://api.crawlbase.com//screenshots` ``` -------------------------------- ### LeadsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md The LeadsAPI focuses on extracting email addresses and contact information from websites. It uses the GET method. ```APIDOC ## LeadsAPI ### Description This API endpoint is used to extract email addresses and contact information from websites. It uses the GET method. ### Method GET ### Endpoint `https://api.crawlbase.com//leads` ``` -------------------------------- ### CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md HTTP client for Crawlbase Crawling API. It allows making GET, POST, and PUT requests. ```APIDOC ## CrawlingAPI ### Description HTTP client for Crawlbase Crawling API. It allows making GET, POST, and PUT requests. ### Constructor `new CrawlingAPI({ token: string, timeout?: number })` ### Public Methods #### get - **Signature**: `get(url: string, options?: object)` - **Returns**: `Promise` #### post - **Signature**: `post(url: string, data: object | string, options?: object)` - **Returns**: `Promise` #### put - **Signature**: `put(url: string, data: object | string, options?: object)` - **Returns**: `Promise` ``` -------------------------------- ### LeadsAPI Constructor Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-leads-api.md Initializes the LeadsAPI client with your Crawlbase Leads API token and optional configuration. ```APIDOC ## Constructor Inherits from `BaseAPI`. Initialize with a Crawlbase Leads API token. ```javascript const api = new LeadsAPI({ token: 'YOUR_TOKEN' }); ``` **Parameters:** | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | options | object | yes | — | Configuration object | | options.token | string | yes | — | Crawlbase Leads API token | | options.timeout | number | no | 90000 | Request timeout in milliseconds | **Throws:** | Error Type | Condition | |-----------|-----------| | Error | Token is undefined, null, or empty string | ``` -------------------------------- ### Initialize LeadsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Instantiate the LeadsAPI with your Leads API token. This API is designed for extracting business lead information. ```javascript const api = new LeadsAPI({ token: 'YOUR_TOKEN' }); ``` -------------------------------- ### Scraper API Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/SUMMARY.txt The Scraper API is designed for extracting structured data from web pages and only supports GET requests. ```APIDOC ## Scraper API ### Description This API is used to scrape structured data from web pages. It exclusively supports GET requests. ### Method GET ### Endpoint `/v1/scraper` ### Parameters #### Query Parameters - **token** (string) - Required - Your Crawlbase API token. - **url** (string) - Required - The URL of the page to scrape. ### Request Example ```json { "token": "YOUR_API_TOKEN", "url": "https://example.com/data" } ``` ### Response #### Success Response (200) - **json** (object) - The structured data extracted from the page. #### Response Example ```json { "json": { "data_field": "value" } } ``` ``` -------------------------------- ### ScraperAPI Method Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/INDEX.md The ScraperAPI is designed for extracting structured data from web pages, primarily supporting GET requests. ```APIDOC ## ScraperAPI ### Description Specialized for extracting structured data from web pages. ### Method #### `get(url)` Performs an HTTP GET request to scrape structured data. - **url** (string) - The URL of the page to scrape. ### Returns An object containing: - **statusCode** (number) - The HTTP status code of the response. - **body** (string) - The raw response body. - **json** (object) - The parsed JSON data if the response is JSON. ### Request Example ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); api.get('https://www.amazon.com/product/123'); ``` ``` -------------------------------- ### Environment-Based Configuration Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Shows how to configure the CrawlingAPI using environment variables for the token and timeout, with a default timeout if not specified. ```javascript require('dotenv').config(); const config = { token: process.env.CRAWLBASE_TOKEN, timeout: parseInt(process.env.CRAWLBASE_TIMEOUT || '90000') }; const api = new CrawlingAPI(config); ``` -------------------------------- ### Constructor Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md All API classes accept the same constructor options, including a required API token and an optional timeout in milliseconds. ```APIDOC ## Constructor Options All API classes accept the same constructor options: ```typescript interface BaseAPIOptions { token: string; // Required: API authentication token timeout?: number; // Optional: request timeout in milliseconds (default: 90000) } ``` ``` -------------------------------- ### LeadsAPI Constructor Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Instantiate the LeadsAPI client. Requires an API token and an optional timeout value. ```javascript new LeadsAPI({ token: string, timeout?: number }) ``` -------------------------------- ### API Classes Initialization Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md All API clients inherit from BaseAPI and follow the same initialization pattern. You can instantiate CrawlingAPI, ScraperAPI, LeadsAPI, and ScreenshotsAPI with an API token. ```APIDOC ## API Classes Initialization All API clients inherit from `BaseAPI` and follow the same initialization pattern: ```javascript const { CrawlingAPI, ScraperAPI, LeadsAPI, ScreenshotsAPI } = require('crawlbase'); const crawlingApi = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const scraperApi = new ScraperAPI({ token: 'YOUR_TOKEN' }); const leadsApi = new LeadsAPI({ token: 'YOUR_TOKEN' }); const screenshotsApi = new ScreenshotsAPI({ token: 'YOUR_TOKEN' }); ``` ``` -------------------------------- ### ScreenshotsAPI Constructor Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Instantiate the ScreenshotsAPI client. Requires an API token and accepts an optional timeout. ```javascript new ScreenshotsAPI({ token: string, timeout?: number }) ``` -------------------------------- ### ScraperAPI Public Methods Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md The ScraperAPI primarily supports GET requests. Other methods like POST and PUT are overridden to disallow them. ```javascript get(url: string, options?: object) ``` -------------------------------- ### ScreenshotsAPI GET Request Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/INDEX.md Demonstrates capturing a screenshot of a webpage using the ScreenshotsAPI. You can specify device types like 'mobile'. ```javascript const { ScreenshotsAPI } = require('crawlbase'); const api = new ScreenshotsAPI({ token: 'YOUR_TOKEN' }); api.get('https://example.com', { device: 'mobile' }); // Returns: { statusCode, body: /* binary image data */ } ``` -------------------------------- ### Initialize ScraperAPI with Token Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-scraper-api.md Initialize the ScraperAPI client with your Crawlbase API token. Ensure the token is provided to avoid errors. ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_SCRAPER_API_TOKEN' }); ``` -------------------------------- ### Synchronous Error Handling for ScraperAPI Method Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Demonstrates how the ScraperAPI throws an error if a POST or PUT request is attempted, as only GET is allowed. ```javascript try { scraperApi.post('https://example.com', { data: 'test' }); // Throws } catch (error) { console.error(error.message); // "Only GET is allowed for the ScraperAPI" } ``` -------------------------------- ### Initialize CrawlingAPI with Environment Variables Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Load API token and timeout from environment variables for secure and flexible configuration. Ensure your environment variables are set before running the script. ```javascript require('dotenv').config(); const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: process.env.CRAWLBASE_TOKEN || process.env.CB_TOKEN, timeout: parseInt(process.env.CRAWLBASE_TIMEOUT || '90000') }); ``` -------------------------------- ### CrawlingAPI Public Methods Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Methods available on the CrawlingAPI class for making HTTP requests. These include GET, POST, and PUT operations. ```javascript get(url: string, options?: object) ``` ```javascript post(url: string, data: object | string, options?: object) ``` ```javascript put(url: string, data: object | string, options?: object) ``` -------------------------------- ### Initialize ScreenshotsAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-screenshots-api.md Initialize the ScreenshotsAPI client with your Crawlbase API token. Ensure the token is valid to avoid errors. ```javascript const { ScreenshotsAPI } = require('crawlbase'); const api = new ScreenshotsAPI({ token: 'YOUR_SCREENSHOTS_API_TOKEN' }); ``` -------------------------------- ### Initialize CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Instantiate the CrawlingAPI with your Crawlbase account token. This token can be a normal or a javascript token, depending on the scraping needs. ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); ``` -------------------------------- ### Setting a Custom User-Agent Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Demonstrates how to specify a custom user-agent string for a specific API request. ```javascript api.get('https://example.com', { userAgent: 'Custom Bot/1.0' }); ``` -------------------------------- ### ScraperAPI Initialization Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-scraper-api.md Initialize the ScraperAPI client with your Crawlbase API token and optional configuration like timeout. ```APIDOC ## ScraperAPI Constructor ### Description Initializes the ScraperAPI client. Requires a Crawlbase API token. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Parameters - **options** (object) - Required - Configuration object. - **options.token** (string) - Required - Your Crawlbase Scraper API token. - **options.timeout** (number) - Optional - Request timeout in milliseconds. Defaults to 90000. ### Throws - **Error**: If the token is undefined, null, or an empty string. ### Request Example ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_SCRAPER_API_TOKEN' }); ``` ``` -------------------------------- ### CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md The main CrawlingAPI allows for general website scraping and crawling. It supports GET, POST, and PUT methods for various crawling needs. ```APIDOC ## CrawlingAPI ### Description This API endpoint is used for general website scraping and crawling. It supports multiple HTTP methods to facilitate various crawling tasks. ### Method GET, POST, PUT ### Endpoint `https://api.crawlbase.com/` ``` -------------------------------- ### Handle Unsupported POST Request in ScraperAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-scraper-api.md Attempting to use the post method with ScraperAPI will always result in an error, as only GET requests are supported. ```javascript const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); try { api.post('https://example.com', { data: 'test' }); } catch (error) { console.error(error.message); // "Only GET is allowed for the ScraperAPI" } ``` -------------------------------- ### Usage of ScreenshotResponse Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/types.md Shows how to use the response from ScreenshotsAPI.get() to save the binary image data to a file. ```javascript const response = await screenshotsApi.get('https://example.com'); // Write binary image data to file fs.writeFileSync('screenshot.jpg', response.body, { encoding: 'binary' }); ``` -------------------------------- ### Basic Crawling API Usage with TypeScript Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/README.md Demonstrates basic usage of the Crawling API with TypeScript, including type definitions for responses. Ensure you have the necessary types imported. ```typescript import { CrawlingAPI, APIResponse } from 'crawlbase'; const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); const response: APIResponse = await api.get('https://example.com'); if (response.statusCode === 200) { console.log(response.body); } ``` -------------------------------- ### Extract Leads with Options Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-leads-api.md Call getFromDomain with an optional second argument to pass additional parameters to the API request. This allows for further customization of the lead extraction process. ```javascript const { LeadsAPI } = require('crawlbase'); const api = new LeadsAPI({ token: 'YOUR_TOKEN' }); // Extract leads with options api.getFromDomain('example.com', { // Additional API parameters if supported }) .then(response => { if (response.statusCode === 200 && response.leads && response.leads.length > 0) { response.leads.forEach(lead => { console.log(`Email: ${lead.email}, Name: ${lead.name}`); }); } }) .catch(error => console.error(error)); ``` -------------------------------- ### Default Configuration Values in Code Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Illustrates the default configuration object structure, including request timeout and POST content type, as defined in `src/config.js`. ```javascript const config = { defaults: { timeout: 90000, postContentType: 'application/x-www-form-urlencoded', }, test: { normalToken: '', javascriptToken: '', }, }; ``` -------------------------------- ### Catching ScraperAPI PUT Not Allowed Error Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/errors.md Demonstrates how to catch an error when attempting to use the PUT method on a ScraperAPI instance, which only supports GET requests. ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); try { api.put('https://example.com', { data: 'test' }); } catch (error) { if (error.message.includes('Only GET is allowed')) { console.error('ScraperAPI does not support PUT requests'); } } ``` -------------------------------- ### Catching Missing Token Error Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/errors.md Demonstrates how to catch an initialization error when the API token is missing or invalid. Ensure the token option is provided in the constructor. ```javascript const { CrawlingAPI } = require('crawlbase'); try { const api = new CrawlingAPI({ token: '' }); } catch (error) { if (error.message.includes('Token is required')) { console.error('Invalid configuration: token missing'); } } ``` -------------------------------- ### Catching ScraperAPI POST Not Allowed Error Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/errors.md Illustrates how to catch an error when attempting to use the POST method on a ScraperAPI instance, which only supports GET requests. ```javascript const { ScraperAPI } = require('crawlbase'); const api = new ScraperAPI({ token: 'YOUR_TOKEN' }); try { api.post('https://example.com', { data: 'test' }); } catch (error) { if (error.message.includes('Only GET is allowed')) { console.error('ScraperAPI does not support POST requests'); } } ``` -------------------------------- ### Constructor with Default Timeout Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/configuration.md Instantiate the CrawlingAPI with a token and the default 90-second timeout. ```javascript const { CrawlingAPI } = require('crawlbase'); // Default timeout (90 seconds) const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); ``` -------------------------------- ### Initialize CrawlingAPI Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/api-reference-crawling-api.md Initialize the CrawlingAPI with your Crawlbase API token. Ensure the token is provided to enable authentication and access to the API. ```javascript const api = new CrawlingAPI({ token: 'YOUR_TOKEN' }); ``` ```javascript const { CrawlingAPI } = require('crawlbase'); const api = new CrawlingAPI({ token: 'YOUR_CRAWLBASE_TOKEN' }); ``` -------------------------------- ### Import Main API Classes Source: https://github.com/crawlbase/crawlbase-node/blob/main/_autodocs/module-exports.md Import the primary API classes from the Crawlbase Node client library. This includes aliases for backward compatibility. ```javascript const { CrawlingAPI, CrawlbaseAPI, // Backward-compatible alias for CrawlingAPI ScraperAPI, LeadsAPI, ScreenshotsAPI } = require('crawlbase'); ``` -------------------------------- ### Use Screenshots API with Parameters Source: https://github.com/crawlbase/crawlbase-node/blob/main/README.md Capture a screenshot with specific parameters, such as device type. The response body is then saved to a file. ```javascript // Example with parameters api.get('https://www.amazon.com', { device: 'mobile' }).then(response => { fs.writeFileSync('amazon-mobile.jpg', response.body, { encoding: 'binary' }); }); ```