### Start a Crawlee project with CLI Source: https://crawlee.dev/js/docs/quick-start Navigate to your project directory and start the crawler using the npm start script. ```bash cd my-crawler && npm start ``` -------------------------------- ### Start the crawler project Source: https://crawlee.dev/js/docs/3.16/quick-start Navigate to the created project directory and start the crawler using npm. ```bash cd my-crawler && npm start ``` -------------------------------- ### Install @apify/tsconfig Source: https://crawlee.dev/js/docs/3.16/guides/typescript-project Install the recommended TypeScript configuration preset from Apify. ```bash npm install --save-dev @apify/tsconfig ``` -------------------------------- ### Install Impit HTTP Client Package Source: https://crawlee.dev/js/docs/3.16/guides/http-clients Install the necessary package for using the `ImpitHttpClient` with Crawlee. ```bash npm i @crawlee/impit-client ``` -------------------------------- ### Example Dockerfile for Node.js/JavaScript Actor Source: https://crawlee.dev/js/docs/3.16/guides/docker-images A standard Dockerfile for Node.js actors. It optimizes build times by copying package files first and installs only necessary dependencies. ```dockerfile # Specify the base Docker image. You can read more about # the available images at https://crawlee.dev/js/docs/guides/docker-images # You can also use any other image from Docker Hub. FROM apify/actor-node:20 # Copy just package.json and package-lock.json # to speed up the build using Docker layer cache. COPY package*.json ./ # Install NPM packages, skip optional and development dependencies to # keep the image small. Avoid logging too much and print the dependency # tree for debugging RUN npm --quiet set progress=false && npm install --omit=dev --omit=optional && echo "Installed NPM packages:" && (npm list --omit=dev --all || true) && echo "Node.js version:" && node --version && echo "NPM version:" && npm --version # Next, copy the remaining files and directories with the source code. # Since we do this after NPM install, quick build will be really fast # for most source file changes. COPY . ./ # Run the image. CMD npm start --silent ``` -------------------------------- ### Basic HTTP Crawler Setup and Run Source: https://crawlee.dev/js/docs/examples/http-crawler Sets up and runs an HttpCrawler with basic configurations for concurrency, retries, timeouts, and request limits. It defines handlers for processing successful requests and handling failed ones, then initiates the crawl with a list of starting URLs. ```javascript import { HttpCrawler, log, LogLevel } from 'crawlee'; log.setLevel(LogLevel.DEBUG); const crawler = new HttpCrawler({ minConcurrency: 10, maxConcurrency: 50, maxRequestRetries: 1, requestHandlerTimeoutSecs: 30, maxRequestsPerCrawl: 10, async requestHandler({ pushData, request, body }) { log.debug(`Processing ${request.url}...`); await pushData({ url: request.url, body, }); }, failedRequestHandler({ request }) { log.debug(`Request ${request.url} failed twice.`); }, }); await crawler.run(['https://crawlee.dev']); log.debug('Crawler finished.'); ``` -------------------------------- ### Quick Start: Using Proxy URLs Source: https://crawlee.dev/js/docs/guides/proxy-management Initialize ProxyConfiguration with a list of proxy URLs to start using them immediately. Crawlee will rotate through these proxies. ```javascript import { ProxyConfiguration } from 'crawlee'; const proxyConfiguration = new ProxyConfiguration({ proxyUrls: [ 'http://proxy-1.com', 'http://proxy-2.com', ] }); const proxyUrl = await proxyConfiguration.newUrl(); ``` -------------------------------- ### Install Apify CLI and Log In Source: https://crawlee.dev/js/docs/deployment/apify-platform Install the Apify CLI globally and log in to your Apify account using your API token. This is a prerequisite for using the CLI to manage your Apify resources. ```bash npm install -g apify-cli apify login -t YOUR_API_TOKEN ``` -------------------------------- ### Quick Start with Proxy URLs Source: https://crawlee.dev/js/docs/3.16/guides/proxy-management Initialize ProxyConfiguration with a list of proxy URLs to enable automatic rotation. This is a quick way to start using your own proxy servers. ```javascript import { ProxyConfiguration } from 'crawlee'; const proxyConfiguration = new ProxyConfiguration({ proxyUrls: [ 'http://proxy-1.com', 'http://proxy-2.com', ] }); const proxyUrl = await proxyConfiguration.newUrl(); ``` -------------------------------- ### Basic Dockerfile for Crawlee with Playwright Source: https://crawlee.dev/js/docs/guides/docker-images This Dockerfile installs NPM dependencies, copies project files, and sets the start command. It's suitable for projects using Playwright. ```docker # Specify the base Docker image. You can read more about # the available images at https://crawlee.dev/js/docs/guides/docker-images # You can also use any other image from Docker Hub. FROM apify/actor-node-playwright-chrome:20 # Copy just package.json and package-lock.json # to speed up the build using Docker layer cache. COPY --chown=myuser package*.json ./ # Install NPM packages, skip optional and development dependencies to # keep the image small. Avoid logging too much and print the dependency # tree for debugging RUN npm --quiet set progress=false \ && npm install --omit=dev --omit=optional \ && echo "Installed NPM packages:" && (npm list --omit=dev --all || true) \ && echo "Node.js version:" && node --version \ && echo "NPM version:" && npm --version # Next, copy the remaining files and directories with the source code. # Since we do this after NPM install, quick build will be really fast # for most source file changes. COPY --chown=myuser . . # Run the image. CMD npm run start:prod --silent ``` -------------------------------- ### Install SDK v1 with Puppeteer Source: https://crawlee.dev/js/docs/upgrading/upgrading-to-v1 Install Apify SDK v1 along with the Puppeteer package for browser automation. ```bash npm install apify puppeteer ``` -------------------------------- ### Basic CheerioCrawler Setup Source: https://crawlee.dev/js/docs/3.16/introduction/first-crawler Demonstrates the fundamental setup of a CheerioCrawler, including importing necessary classes, opening a RequestQueue, adding an initial request, and defining a request handler to extract page titles. ```javascript import { RequestQueue, CheerioCrawler } from 'crawlee'; const requestQueue = await RequestQueue.open(); await requestQueue.addRequest({ url: 'https://crawlee.dev' }); const crawler = new CheerioCrawler({ requestQueue, async requestHandler({ $, request }) { const title = $('title').text(); console.log(`The title of "${request.url}" is: ${title}.`); } }) await crawler.run(); ``` -------------------------------- ### Install @sparticuz/chromium and Zip Dependencies Source: https://crawlee.dev/js/docs/deployment/aws-browsers Install the @sparticuz/chromium package and zip the node_modules folder for use as a Lambda Layer. ```bash # Install the package npm i -S @sparticuz/chromium # Zip the dependencies zip -r dependencies.zip ./node_modules ``` -------------------------------- ### Install Apify CLI Source: https://crawlee.dev/js/docs/3.16/deployment/apify-platform Install the Apify CLI globally to manage your Apify account and projects from your local machine. ```bash npm install -g apify-cli ``` -------------------------------- ### Install SDK v1 with Playwright Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v1 Install the SDK v1 and the Playwright package to leverage Playwright's browser automation capabilities. ```bash npm install apify playwright ``` -------------------------------- ### Install Crawlee Meta-Package Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v3 Install the main `crawlee` package which re-exports most of the `@crawlee/*` packages, including all crawler classes. ```bash npm install crawlee ``` -------------------------------- ### Install SDK v1 with Puppeteer Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v1 Install the SDK v1 along with the Puppeteer package to maintain compatibility with previous versions. ```bash npm install apify puppeteer ``` -------------------------------- ### Install SDK v1 with Playwright Source: https://crawlee.dev/js/docs/upgrading/upgrading-to-v1 Install Apify SDK v1 along with the Playwright package for browser automation. ```bash npm install apify playwright ``` -------------------------------- ### Install Crawlee with Playwright Source: https://crawlee.dev/js/docs/quick-start Install Crawlee along with Playwright for headless browser automation. Playwright is installed separately to reduce the core library size. ```bash npm install crawlee playwright ``` -------------------------------- ### Install TypeScript Compiler Source: https://crawlee.dev/js/docs/3.16/guides/typescript-project Install the TypeScript compiler as a development dependency using npm. ```bash npm install --save-dev typescript ``` -------------------------------- ### Install Specific Crawlee Package (Cheerio) Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v3 Install only the `@crawlee/cheerio` package if you only need Cheerio support, reducing the number of dependencies. ```bash npm install @crawlee/cheerio ``` -------------------------------- ### Install Crawlee and Playwright Source: https://crawlee.dev/js/docs/3.16/quick-start Install Crawlee along with Playwright for browser automation. Playwright is not bundled with Crawlee. ```bash npm install crawlee playwright ``` -------------------------------- ### Install Apify CLI Globally Source: https://crawlee.dev/js/docs/3.16/introduction/deployment Install the Apify CLI globally to manage authentication and deployment for all your Crawlee/Apify projects. ```bash npm install -g apify-cli ``` -------------------------------- ### Install Crawlee and Puppeteer Source: https://crawlee.dev/js/docs/3.16/quick-start Install Crawlee along with Puppeteer for browser automation. Puppeteer is not bundled with Crawlee. ```bash npm install crawlee puppeteer ``` -------------------------------- ### Install Node.js Type Declarations Source: https://crawlee.dev/js/docs/3.16/guides/typescript-project Install type declarations for Node.js to enable type-checking for Node.js features. ```bash npm install --save-dev @types/node ``` -------------------------------- ### Handle Start Page with Crawlee.js Source: https://crawlee.dev/js/docs/introduction/scraping Enqueues category links from the initial start page. This is the entry point for navigating the website structure. ```javascript } else { // This means we're on the start page, with no label. // On this page, we just want to enqueue all the category pages. await page.waitForSelector('.collection-block-item'); await enqueueLinks({ selector: '.collection-block-item', label: 'CATEGORY', }); } ``` -------------------------------- ### Install Apify SDK Source: https://crawlee.dev/js/docs/3.16/introduction/deployment Install the Apify SDK as a dependency for your Node.js project to interact with Apify Platform's cloud products like RequestQueue and Dataset. ```bash npm install apify ``` -------------------------------- ### Development Start Script with ts-node-esm Source: https://crawlee.dev/js/docs/3.16/guides/typescript-project Configure an NPM script to start the development server using ts-node-esm, with the --transpileOnly flag for faster compilation. ```json { "scripts": { "start:dev": "ts-node-esm -T src/main.ts" } } ``` -------------------------------- ### Basic PlaywrightCrawler Setup Source: https://crawlee.dev/js/docs/3.16/deployment/gcp-browsers Initial setup for a PlaywrightCrawler with a router and configuration. This is a foundational step before integrating with an HTTP server for Cloud Run deployment. ```javascript import { Configuration, PlaywrightCrawler } from 'crawlee'; import { router } from './routes.js'; const startUrls = ['https://crawlee.dev']; const crawler = new PlaywrightCrawler({ requestHandler: router, }, new Configuration({ persistStorage: false, })); await crawler.run(startUrls); ``` -------------------------------- ### Basic StagehandCrawler Example Source: https://crawlee.dev/js/docs/3.16/guides/stagehand-crawler-guide This example demonstrates how to initialize and run a StagehandCrawler to extract data from a website. It shows how to configure the crawler with AI model options and implement a request handler for extracting page titles, interacting with navigation, and gathering structured data. ```typescript import { StagehandCrawler } from '@crawlee/stagehand'; import { z } from 'zod'; const crawler = new StagehandCrawler({ stagehandOptions: { env: 'LOCAL', model: 'openai/gpt-4.1-mini', verbose: 1, }, async requestHandler({ page, request, log, pushData }) { log.info(`Processing ${request.url}`); // Use AI to extract the page title const title = await page.extract('Get the main heading of the page', z.string()); // Use AI to click on a navigation element await page.act('Click on the Documentation link'); // Extract structured data after navigation const navItems = await page.extract('Get all sidebar navigation items', z.array(z.string())); log.info(`Found ${navItems.length} navigation items`); await pushData({ url: request.url, title, navItems, }); }, }); await crawler.run(['https://crawlee.dev']); ``` -------------------------------- ### Puppeteer Recursive Crawl Example Source: https://crawlee.dev/js/docs/3.16/examples/puppeteer-recursive-crawl Use this snippet to perform a recursive crawl of a website with PuppeteerCrawler. It starts by adding initial requests and then recursively crawls links matching a glob pattern. Ensure you have Puppeteer installed. ```javascript import { PuppeteerCrawler } from 'crawlee'; const crawler = new PuppeteerCrawler({ async requestHandler({ request, page, enqueueLinks, log }) { const title = await page.title(); log.info(`Title of ${request.url}: ${title}`); await enqueueLinks({ globs: ['http?(s)://www.iana.org/**'], }); }, maxRequestsPerCrawl: 10, }); await crawler.addRequests(['https://www.iana.org/']); await crawler.run(); ``` -------------------------------- ### Complete Crawlee Example with Data Saving Source: https://crawlee.dev/js/docs/3.16/introduction/saving-data A full PlaywrightCrawler example that scrapes product details from a website and saves them using `Dataset.pushData()`. It handles pagination and different page types (start, category, detail). ```javascript import { PlaywrightCrawler, Dataset } from 'crawlee'; const crawler = new PlaywrightCrawler({ requestHandler: async ({ page, request, enqueueLinks }) => { console.log(`Processing: ${request.url}`); if (request.label === 'DETAIL') { const urlPart = request.url.split('/').slice(-1); // ['sennheiser-mke-440-professional-stereo-shotgun-microphone-mke-440'] const manufacturer = urlPart[0].split('-')[0]; // 'sennheiser' const title = await page.locator('.product-meta h1').textContent(); const sku = await page.locator('span.product-meta__sku-number').textContent(); const priceElement = page .locator('span.price') .filter({ hasText: '$', }) .first(); const currentPriceString = await priceElement.textContent(); const rawPrice = currentPriceString?.split('$')[1]; const price = Number(rawPrice?.replaceAll(',', '')); const inStockElement = page .locator('span.product-form__inventory') .filter({ hasText: 'In stock', }) .first(); const inStock = (await inStockElement.count()) > 0; const results = { url: request.url, manufacturer, title, sku, currentPrice: price, availableInStock: inStock, }; await Dataset.pushData(results); } else if (request.label === 'CATEGORY') { // We are now on a category page. We can use this to paginate through and enqueue all products, // as well as any subsequent pages we find await page.waitForSelector('.product-item > a'); await enqueueLinks({ selector: '.product-item > a', label: 'DETAIL', // <= note the different label }); // Now we need to find the "Next" button and enqueue the next page of results (if it exists) const nextButton = await page.$('a.pagination__next'); if (nextButton) { await enqueueLinks({ selector: 'a.pagination__next', label: 'CATEGORY', // <= note the same label }); } } else { // This means we're on the start page, with no label. // On this page, we just want to enqueue all the category pages. await page.waitForSelector('.collection-block-item'); await enqueueLinks({ selector: '.collection-block-item', label: 'CATEGORY', }); } }, // Let's limit our crawls to make our tests shorter and safer. maxRequestsPerCrawl: 50, }); await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections']); ``` -------------------------------- ### Faster Request Addition with CheerioCrawler Source: https://crawlee.dev/js/docs/3.16/introduction/first-crawler Shows a more concise way to start a CheerioCrawler by passing URLs directly to the crawler.run() method, simplifying the setup by implicitly managing the RequestQueue. ```javascript import { CheerioCrawler } from 'crawlee'; const crawler = new CheerioCrawler({ async requestHandler({ $, request }) { const title = $('title').text(); console.log(`The title of "${request.url}" is: ${title}.`); } }) await crawler.run(['https://crawlee.dev']); ``` -------------------------------- ### Crawl Single URL with got-scraping Source: https://crawlee.dev/js/docs/3.16/examples/crawl-single-url Use this snippet to fetch the HTML of a web page using the got-scraping package. Ensure the 'got-scraping' package is installed. The URL is hard-coded in this example. ```javascript import { gotScraping } from 'got-scraping'; // Get the HTML of a web page const { body } = await gotScraping({ url: 'https://www.example.com' }); console.log(body); ``` -------------------------------- ### Initialize Project with Apify CLI Source: https://crawlee.dev/js/docs/introduction/deployment Use this command to initialize your project for Apify. It creates a .actor folder and an actor.json file for platform configuration. ```bash apify init ``` -------------------------------- ### Crawler Setup with Router Source: https://crawlee.dev/js/docs/3.16/introduction/refactoring Sets up a PlaywrightCrawler using a router instance for request handling. This replaces traditional if-clause logic for better organization. Ensure Crawlee is installed and imported. ```javascript import { PlaywrightCrawler, log } from 'crawlee'; import { router } from './routes.mjs'; log.setLevel(log.LEVELS.DEBUG); log.debug('Setting up crawler.'); const crawler = new PlaywrightCrawler({ requestHandler: router, }); await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections']); ``` -------------------------------- ### Basic HTTP Server Setup Source: https://crawlee.dev/js/docs/guides/running-in-web-server Sets up a basic Node.js HTTP server that listens on port 3000 and logs incoming requests. This serves as the foundation for handling web requests. ```typescript import { createServer } from 'http'; import { log } from 'crawlee'; const server = createServer(async (req, res) => { log.info(`Request received: ${req.method} ${req.url}`); res.writeHead(200, { 'Content-Type': 'text/plain' }); // We will return the page title here later instead res.end('Hello World\n'); }); server.listen(3000, () => { log.info('Server is listening for user requests'); }); ``` -------------------------------- ### PuppeteerCrawler Recursive Crawl Example Source: https://crawlee.dev/js/docs/examples/puppeteer-recursive-crawl Sets up and runs a PuppeteerCrawler to recursively crawl a website starting from a given URL. It enqueues links matching a specific glob pattern and limits the number of requests. ```javascript import { PuppeteerCrawler } from 'crawlee'; const crawler = new PuppeteerCrawler({ async requestHandler({ request, page, enqueueLinks, log }) { const title = await page.title(); log.info(`Title of ${request.url}: ${title}`); await enqueueLinks({ globs: ['http?(s)://www.iana.org/**'], }); }, maxRequestsPerCrawl: 10, }); await crawler.addRequests(['https://www.iana.org/']); await crawler.run(); ``` -------------------------------- ### Puppeteer Crawler Setup and Execution Source: https://crawlee.dev/js/docs/3.16/examples/puppeteer-crawler Sets up and runs a PuppeteerCrawler to scrape Hacker News. It configures crawler options, defines request and failure handlers, and starts the crawl from a given URL. Results are stored in the default dataset. ```javascript import { PuppeteerCrawler } from 'crawlee'; const crawler = new PuppeteerCrawler({ launchContext: { launchOptions: { headless: true, }, }, maxRequestsPerCrawl: 50, async requestHandler({ pushData, request, page, enqueueLinks, log }) { log.info(`Processing ${request.url}...`); const data = await page.$$eval('.athing', ($posts) => { const scrapedData: { title?: string; rank?: string; href?: string }[] = []; $posts.forEach(($post) => { scrapedData.push({ title: $post.querySelector('.title a')?.innerText, rank: $post.querySelector('.rank')?.innerText, href: $post.querySelector('.title a')?.href, }); }); return scrapedData; }); await pushData(data); const infos = await enqueueLinks({ selector: '.morelink', }); if (infos.processedRequests.length === 0) log.info(`${request.url} is the last page!`); }, failedRequestHandler({ request, log }) { log.error(`Request ${request.url} failed too many times.`); }, }); await crawler.addRequests(['https://news.ycombinator.com/']); await crawler.run(); console.log('Crawler finished.'); ``` -------------------------------- ### Migrating to Actor.init() and Actor.exit() Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v3 Shows the equivalent of using `Actor.main()` by directly calling `Actor.init()` and `Actor.exit()`. This pattern is useful when you need more control over the initialization and exit process. ```javascript import { Actor } from 'apify'; await Actor.init(); // your code await Actor.exit('Crawling finished!'); ``` -------------------------------- ### Publish Project to Apify Platform Source: https://crawlee.dev/js/docs/introduction/deployment Run this command to package your project, upload it to Apify, and start a Docker build. You will receive a link to your new Actor upon completion. ```bash apify push ``` -------------------------------- ### Basic HTTP Server Setup Source: https://crawlee.dev/js/docs/3.16/guides/running-in-web-server Sets up a basic Node.js HTTP server using the built-in 'http' module. This server listens for incoming requests and logs them. It's the foundation for handling web requests that will be passed to the crawler. ```javascript import { createServer } from 'http'; import { log } from 'crawlee'; const server = createServer(async (req, res) => { log.info(`Request received: ${req.method} ${req.url}`); res.writeHead(200, { 'Content-Type': 'text/plain' }); // We will return the page title here later instead res.end('Hello World '); }); server.listen(3000, () => { log.info('Server is listening for user requests'); }); ``` -------------------------------- ### Using Actor.main() for Initialization and Exit Source: https://crawlee.dev/js/docs/3.16/upgrading/upgrading-to-v3 Illustrates the simplified approach to managing crawler lifecycle using `Actor.main()`, which handles initialization and exit automatically. This is the recommended way for most use cases. ```javascript import { Actor } from 'apify'; await Actor.main(async () => { // your code }, { statusMessage: 'Crawling finished!' }); ``` -------------------------------- ### Creating a Custom HTTP Client Source: https://crawlee.dev/js/docs/guides/custom-http-client Demonstrates how to instantiate a custom HTTP client using the `BasicCrawler` and providing a custom `request` function. ```javascript const { BasicCrawler } = require('crawlee'); const crawler = new BasicCrawler({ request: async ({ url, method, headers, body, response, options }) => { // Custom logic here console.log(`Requesting ${url}`); // You can use a library like 'axios' or 'node-fetch' here // For example: // const axios = require('axios'); // const response = await axios.request({ url, method, headers, data: body, ...options }); // return response.data; // Or the full response object // For simplicity, we'll just return a placeholder response return { body: `Hello from custom client for ${url}`, statusCode: 200, headers: { 'content-type': 'text/html' }, }; }, }); await crawler.run(['http://example.com']); ``` -------------------------------- ### Initialize Project for Apify Source: https://crawlee.dev/js/docs/3.16/introduction/deployment Initialize your Crawlee project for deployment on the Apify Platform using the Apify CLI. This creates an `.actor` folder with `actor.json` for platform-specific configuration. ```bash apify init ``` -------------------------------- ### HTML Link Example Source: https://crawlee.dev/js/docs/3.16/introduction/adding-urls Example of an anchor `` element with an `href` attribute, which is the default for finding links. ```html This is a link to Crawlee introduction ``` -------------------------------- ### Crawl Output Example Source: https://crawlee.dev/js/docs/3.16/introduction/setting-up Example log messages observed in the terminal during a crawl of the Crawlee website. ```log INFO PlaywrightCrawler: Starting the crawl INFO PlaywrightCrawler: Title of https://crawlee.dev/ is 'Crawlee ยท Build reliable crawlers. Fast. | Crawlee' INFO PlaywrightCrawler: Title of https://crawlee.dev/js/docs/examples is 'Examples | Crawlee' INFO PlaywrightCrawler: Title of https://crawlee.dev/js/api/core is '@crawlee/core | API | Crawlee' INFO PlaywrightCrawler: Title of https://crawlee.dev/js/api/core/changelog is 'Changelog | API | Crawlee' INFO PlaywrightCrawler: Title of https://crawlee.dev/js/docs/quick-start is 'Quick Start | Crawlee' ``` -------------------------------- ### Use Pre-release Node.js Version Source: https://crawlee.dev/js/docs/3.16/guides/docker-images Demonstrates how to use a pre-release version of a Node.js image, typically denoted by a 'beta' suffix. This is useful for testing upcoming changes. ```dockerfile # Without library version. FROM apify/actor-node:24-beta ``` -------------------------------- ### Configure Crawlee with crawlee.json Source: https://crawlee.dev/js/docs/3.16/guides/configuration Specify `ConfigurationOptions` in a `crawlee.json` file at the project root to set global configuration. This example sets the state persistence interval and log level. ```json { "persistStateIntervalMillis": 10000, "logLevel": "DEBUG" } ``` -------------------------------- ### Push Project to Apify Platform Source: https://crawlee.dev/js/docs/3.16/introduction/deployment Use this command to archive your project, upload it to the Apify Platform, and start a Docker build. After completion, you will receive a link to your new Actor. ```bash apify push ``` -------------------------------- ### Run Crawler Source: https://crawlee.dev/js/docs/3.16/guides/request-storage This is the basic command to start the crawler. Ensure that your crawler is properly configured before running. ```javascript await crawler.run(); ``` -------------------------------- ### Install Playwright Extra and Stealth Plugin Source: https://crawlee.dev/js/docs/3.16/examples/crawler-plugins Install the necessary packages for using Playwright Extra and its stealth plugin. ```bash npm install playwright-extra puppeteer-extra-plugin-stealth ``` -------------------------------- ### Install Puppeteer Extra and Stealth Plugin Source: https://crawlee.dev/js/docs/3.16/examples/crawler-plugins Install the necessary packages for using Puppeteer Extra and its stealth plugin. ```bash npm install puppeteer-extra puppeteer-extra-plugin-stealth ``` -------------------------------- ### Configure Proxy with Impit HTTP Client Source: https://crawlee.dev/js/docs/3.16/guides/impit-http-client Demonstrates how to set up proxy configurations for requests made using the Impit HTTP Client within a CheerioCrawler. Ensure the `ProxyConfiguration` is passed to the crawler. ```javascript import { CheerioCrawler, ProxyConfiguration } from 'crawlee'; import { ImpitHttpClient, Browser } from '@crawlee/impit-client'; const proxyConfiguration = new ProxyConfiguration({ proxyUrls: ['http://proxy1.example.com:8080', 'http://proxy2.example.com:8080'], }); const crawler = new CheerioCrawler({ httpClient: new ImpitHttpClient({ browser: Browser.Chrome }), proxyConfiguration, async requestHandler({ $, request }) { console.log(`Scraped ${request.url}`); }, }); ```