### Basic SQLStorageClient Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates the basic usage of the SqlStorageClient. Ensure you have installed the necessary dependencies, e.g., 'pip install "crawlee[sql_sqlite]"'.

```python
from crawlee.storage import SqlStorageClient

# Use default SQLite database
client = SqlStorageClient()

# Or use a different database with a connection string
# client = SqlStorageClient(connection_string="postgresql://user:password@host:port/database")

# Example: Saving data to a dataset
await client.datasets.push_items({"key": "value"})

# Example: Getting data from a key-value store
value = await client.key_value_stores.get_record_value("my_key")

# Example: Enqueuing a request
await client.request_queues.add_request({"url": "http://example.com"})

```

--------------------------------

### Launch Crawler with uv

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Execute this command to start the crawler after installing dependencies with uv.

```sh
uv run python -m {{cookiecutter.__package_name}}
```

--------------------------------

### Initialize Apify Project

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/09_running_in_cloud.mdx

Use this command to initialize your project for Apify. It checks the project structure and guides you through the setup process.

```bash
apify init
```

--------------------------------

### Launch Crawler with pip

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Start the crawler after installing dependencies with pip.

```sh
python -m {{cookiecutter.__package_name}}
```

--------------------------------

### Launch Crawler with Poetry

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Execute this command to start the crawler after installing dependencies with Poetry.

```sh
poetry run python -m {{cookiecutter.__package_name}}
```

--------------------------------

### FastAPI Web Server Setup

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/running_in_web_server.mdx

Sets up a FastAPI application with endpoints for serving scraped data. Requires installation of fastapi[standard]. Run with 'fastapi dev server.py'.

```python
from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
from apify_client import ApifyClient
from apify_client.models import RequestQueue
from pydantic import BaseModel

class ScrapeUrl(BaseModel):
    url: str


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize ApifyClient
    client = ApifyClient("YOUR_APIFY_API_TOKEN")
    # Get a request queue
    request_queue: RequestQueue = await client.request_queues.get_or_create(
        name="my-queue"
    )
    # Save client and queue to app state
    app.state.client = client
    app.state.request_queue = request_queue
    # Save dictionary for mapping requests to results
    app.state.requests_to_results = {}
    yield


app = FastAPI(lifespan=lifespan)


@app.get("/")
def index():
    return {
        "message": "Welcome to the Crawlee web server! Use the /scrape endpoint to get a page title.",
        "example": {
            "url": "/scrape?url=https://example.com"
        },
    }


@app.post("/scrape")
async def scrape(request: Request, scrape_url: ScrapeUrl):
    # Add the URL to the request queue
    await request.app.state.request_queue.add_request({
        "url": scrape_url.url,
        "method": "GET",
    })
    # Store the URL in the dictionary to retrieve the result later
    request_id = len(request.app.state.requests_to_results)
    request.app.state.requests_to_results[request_id] = None
    # Wait for the result to be available
    while request.app.state.requests_to_results[request_id] is None:
        # Check for new items in the queue
        items = await request.app.state.request_queue.list_items(limit=10)
        for item in items.items:
            if item["url"] == scrape_url.url:
                # Store the result and remove it from the queue
                request.app.state.requests_to_results[request_id] = item["metadata"]["page_title"]
                await request.app.state.request_queue.delete_item(item["id"])
                break
        if request.app.state.requests_to_results[request_id] is not None:
            break
    # Return the page title
    return {"url": scrape_url.url, "title": request.app.state.requests_to_results[request_id]}

```

--------------------------------

### Quick Start with Custom Proxies

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/proxy_management.mdx

Demonstrates how to quickly start using your own proxy URLs with Crawlee.

```python
from crawlee import ProxyConfiguration

# Use your own proxy URLs
proxy_configuration = ProxyConfiguration(proxy_urls=["http://user:password@your-proxy.com:8080"])

# You can then use this proxy_configuration object when initializing your crawler.
```

--------------------------------

### FileSystemStorageClient Configuration Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Shows how to configure the FileSystemStorageClient with a custom storage directory.

```python
from crawlee.storage import FileSystemStorageClient

# Initialize the FileSystemStorageClient with a custom directory
custom_dir_storage = FileSystemStorageClient(storage_dir="./my_custom_storage")

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# custom_dir_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to custom file system storage.")

print("FileSystemStorageClient initialized with custom directory.")
```

--------------------------------

### Registering Storage Clients Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Shows how to register custom storage clients with the StorageClient registry.

```python
from crawlee.storage import StorageClient

# Assume MyCustomStorageClient is defined as shown in the previous example
class MyCustomStorageClient(StorageClient):
    def __init__(self):
        super().__init__()
        print("MyCustomStorageClient initialized.")

# Register the custom client
StorageClient.register_client(MyCustomStorageClient, name="my_custom")

# Retrieve and use the registered client
retrieved_client = StorageClient.get_client("my_custom")

print(f"Registered and retrieved client: {retrieved_client}")
```

--------------------------------

### SQLStorageClient Configuration Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Shows how to configure the SQLStorageClient with a specific database connection URL.

```python
from crawlee.storage import SqlStorageClient

# Initialize the SQLStorageClient with a PostgreSQL connection URL
postgres_url = "postgresql://user:password@host:port/database"
postgres_storage = SqlStorageClient(db_url=postgres_url)

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# postgres_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to PostgreSQL storage.")

print("SqlStorageClient initialized with PostgreSQL connection.")
```

--------------------------------

### Install Crawlee with httpx extra

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/http_clients.mdx

Install Crawlee with the `httpx` extra to enable the HttpxHttpClient.

```sh
python -m pip install 'crawlee[httpx]'
```

--------------------------------

### Install Crawlee with Extras

Source: https://context7.com/apify/crawlee-python/llms.txt

Install Crawlee with all extras or selectively install packages for specific functionalities like BeautifulSoup or Playwright. Playwright also requires a separate installation.

```bash
pip install 'crawlee[all]'
playwright install
```

```bash
pip install 'crawlee[beautifulsoup]'
```

```bash
pip install 'crawlee[parsel]'
```

```bash
pip install 'crawlee[playwright]'
```

```bash
pip install 'crawlee[cli]'
```

--------------------------------

### Basic SQLStorageClient Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates the basic usage of the SQLStorageClient for persistent storage using a SQL database.

```python
from crawlee.storage import SqlStorageClient

# Initialize the SQLStorageClient (defaults to SQLite)
# For other databases, specify the connection URL, e.g., 'postgresql://user:password@host:port/database'
sql_storage = SqlStorageClient()

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# sql_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to SQL storage.")

print("SqlStorageClient initialized.")
```

--------------------------------

### Efficient Request Addition with BeautifulSoupCrawler

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/02_first_crawler.mdx

This example shows a more concise way to start a BeautifulSoupCrawler by passing requests directly to the `crawler.run()` method. This approach internally uses batched request additions for better performance, allowing crawling to start almost instantly.

```python
from crawlee import BeautifulSoupCrawler

async def request_handler({context}):
    print(f'The title of "{context.request.url}" is "{context.data.bs4.title.string}".')

crawler = BeautifulSoupCrawler(
    request_handler=request_handler,
)

await crawler.run(['https://crawlee.dev'])

```

--------------------------------

### Basic FileSystemStorageClient Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates the basic usage of the FileSystemStorageClient for persistent file system storage with in-memory caching.

```python
from crawlee.storage import FileSystemStorageClient

# Initialize the FileSystemStorageClient (defaults to './storage')
file_system_storage = FileSystemStorageClient()

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# file_system_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to file system storage.")

print("FileSystemStorageClient initialized.")
```

--------------------------------

### Basic MemoryStorageClient Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates the basic usage of the MemoryStorageClient for in-memory data storage. No persistence is provided.

```python
from crawlee.storage import MemoryStorageClient

# Initialize the MemoryStorageClient
memory_storage = MemoryStorageClient()

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# memory_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to memory storage.")

print("MemoryStorageClient initialized.")
```

--------------------------------

### Install Crawlee with All Features

Source: https://github.com/apify/crawlee-python/blob/master/README.md

Installs the crawlee package with all optional features. Ensure Playwright dependencies are installed separately.

```sh
python -m pip install 'crawlee[all]'
```

--------------------------------

### Basic RedisStorageClient Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates the basic usage of the RedisStorageClient for persistent storage using a Redis database.

```python
from crawlee.storage import RedisStorageClient

# Initialize the RedisStorageClient (defaults to localhost:6379)
redis_storage = RedisStorageClient()

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# redis_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to Redis storage.")

print("RedisStorageClient initialized.")
```

--------------------------------

### Install Crawlee CLI with uv

Source: https://github.com/apify/crawlee-python/blob/master/README.md

Installs the Crawlee CLI using uvx, a tool for running Python tools. Ensure uv is installed first.

```sh
uvx 'crawlee[cli]' create my-crawler
```

--------------------------------

### Custom Storage Client Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Illustrates how to create and use a custom storage client by extending the base StorageClient class.

```python
from crawlee.storage import StorageClient

class MyCustomStorageClient(StorageClient):
    def __init__(self):
        super().__init__()
        print("MyCustomStorageClient initialized.")

    # Implement abstract methods here, e.g.:
    # def get_dataset(self, dataset_id=None):
    #     pass
    # def get_key_value_store(self, key_value_store_id=None):
    #     pass
    # def get_request_queue(self, request_queue_id=None):
    #     pass

# Instantiate and use the custom client
custom_client = MyCustomStorageClient()

print("Custom storage client created and used.")
```

--------------------------------

### Install Dependencies with uv and poe

Source: https://github.com/apify/crawlee-python/blob/master/AGENTS.md

Installs all project dependencies, including development, extras, pre-commit hooks, and Playwright. Use this command to set up your development environment.

```bash
uv run poe install-dev
```

--------------------------------

### RedisStorageClient Configuration Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Shows how to configure the RedisStorageClient with a specific Redis connection URL and port.

```python
from crawlee.storage import RedisStorageClient

# Initialize the RedisStorageClient with a custom host and port
custom_redis_storage = RedisStorageClient(host="my-redis-host", port=16379)

# Example usage (assuming you have methods to interact with storage)
# For instance, saving data to a dataset:
# custom_redis_storage.dataset.push_items([{"key": "value"}])
# print("Data saved to custom Redis storage.")

print("RedisStorageClient initialized with custom host and port.")
```

--------------------------------

### Full Scraping Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/06_scraping.mdx

A complete example demonstrating how to integrate the scraping logic into a request handler.

```python
from crawlee import PlaywrightCrawler

from scrapy import Selector


async def request_handler({context}):
    # Find the SKU element using the selector and get its text content.
    sku = await context.page.locator('span.product-meta__sku-number').text_content()

    # Locate the price element and filter out the visually hidden elements.
    price_element = context.page.locator('span.price', has_text='$').first

    # Extract the text content of the price element.
    current_price_string = await price_element.text_content() or ''
    # current_price_string: 'Sale price$1,398.00'

    # Split the string by the '$' sign to get the numeric part.
    raw_price = current_price_string.split('$')[1]
    # raw_price: '1,398.00'

    # Convert the raw price string to a float after removing commas.
    price = float(raw_price.replace(',', ''))
    # price: 1398.00

    # Locate the element that contains the text 'In stock' and filter out other elements.
    in_stock_element = context.page.locator(
        selector='span.product-form__inventory',
        has_text='In stock',
    ).first

    # Check if the element exists by counting the matching elements.
    in_stock = await in_stock_element.count() > 0

    # Print the scraped data.
    print(
        {
            "url": context.page.url,
            "manufacturer": "sony",
            "title": "Sony STR-ZA810ES 7.2-Ch Hi-Res Wi-Fi Network A/V Receiver",
            "sku": sku,
            "price": price,
            "in_stock": in_stock,
        }
    )


async def main():
    crawler = PlaywrightCrawler(request_handler=request_handler)
    await crawler.run([
        "https://warehouse-theme-metal.myshopify.com/products/sony-str-za810es-7-2-channel-hi-res-wi-fi-network-av-receiver",
    ])


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

```

--------------------------------

### Install Dependencies with pip

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Run this command to install project dependencies using pip.

```sh
python -m pip install .
```

--------------------------------

### Basic BeautifulSoupCrawler Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/02_first_crawler.mdx

This snippet demonstrates the fundamental setup of a BeautifulSoupCrawler. It initializes the crawler, defines a request handler to process the HTML content, and runs the crawler on a target URL. Use this for simple crawling tasks where JavaScript rendering is not required.

```python
from crawlee import BeautifulSoupCrawler

async def request_handler({context}):
    # You can access the parsed HTML via context.data.html
    # Or use BeautifulSoup directly via context.data.bs4
    print(f'The title of "{context.request.url}" is "{context.data.bs4.title.string}".')


crawler = BeautifulSoupCrawler(
    request_handler=request_handler,
)

await crawler.run(['https://crawlee.dev'])

```

--------------------------------

### Initialize RedisStorageClient

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Instantiate the RedisStorageClient using a connection string. Ensure 'crawlee[redis]' is installed and Redis is running.

```python
from crawlee.storage import RedisStorageClient

# Use a connection string
client = RedisStorageClient(connection_string="redis://localhost:6379/0")
```

--------------------------------

### Registering a Storage Client

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/upgrading/upgrading_to_v1.md

This example demonstrates how to register a custom storage client globally, for a single crawler, or for a single storage instance.

```python
from crawlee import service_locator
from crawlee.crawlers import ParselCrawler
from crawlee.storage_clients import MemoryStorageClient
from crawlee.storages import Dataset

# Create custom storage client
storage_client = MemoryStorageClient()

# Then register it globally
service_locator.set_storage_client(storage_client)

# Or use it for a single crawler only
crawler = ParselCrawler(storage_client=storage_client)

# Or use it for a single storage only
dataset = await Dataset.open(
    name='my-dataset',
    storage_client=storage_client,
)
```

--------------------------------

### Install Apify SDK

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/deployment/apify_platform.mdx

Install the Apify SDK for Python using pip. This is a prerequisite for running Crawlee code on the Apify platform.

```bash
pip install apify
```

--------------------------------

### Install Crawlee with Multiple Extras

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Install Crawlee with multiple optional features simultaneously by separating them with commas.

```sh
python -m pip install 'crawlee[beautifulsoup,curl-impersonate]'
```

--------------------------------

### Install Crawlee with curl-impersonate extra

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/http_clients.mdx

Install Crawlee with the `curl-impersonate` extra to enable the CurlImpersonateHttpClient.

```sh
python -m pip install 'crawlee[curl-impersonate]'
```

--------------------------------

### Install Dependencies with uv

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Use this command to install project dependencies when using uv as your package manager.

```sh
uv sync
```

--------------------------------

### Install Crawlee Core Package

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Install the essential Crawlee package using pip. This command installs the core functionality.

```sh
python -m pip install crawlee
```

--------------------------------

### Install Apify CLI with npm

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/09_running_in_cloud.mdx

Installs the Apify CLI globally, a command-line tool for authentication and deployment to the Apify platform. Requires Node.js.

```sh
npm install -g apify-cli
```

--------------------------------

### Basic RequestList Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/request_loaders.mdx

Demonstrates the fundamental usage of `RequestList` with an asynchronous generator to stream requests, reducing memory consumption.

```python
from crawlee import RequestList

async def main():
    request_list = RequestList()
    await request_list.add("http://example.com")
    await request_list.add("http://example.org")

    async for request in request_list:
        print(f"Processing: {request.url}")
        # Process the request here
        await request_list.done(request)

    await request_list.save()
    print("Finished processing requests.")
```

--------------------------------

### Install Dependencies with Poetry

Source: https://github.com/apify/crawlee-python/blob/master/src/crawlee/project_template/{{cookiecutter.project_name}}/README.md

Use this command to install project dependencies when using Poetry as your package manager.

```sh
poetry install
```

--------------------------------

### Router Handler Examples

Source: https://github.com/apify/crawlee-python/blob/master/GEMINI.md

Shows how to define default and labeled request handlers for a crawler using decorators.

```python
@crawler.router.default_handler
async def handler(context: BeautifulSoupCrawlingContext): ...
```

```python
@crawler.router.handler(label='detail')
async def detail(context: BeautifulSoupCrawlingContext): ...
```

--------------------------------

### Setup OpenTelemetry Tracing for Crawlers

Source: https://context7.com/apify/crawlee-python/llms.txt

Integrates OpenTelemetry for tracing storage operations. Ensure the OTLP exporter endpoint is correctly configured.

```python
import asyncio

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.trace import set_tracer_provider

from crawlee.crawlers import ParselCrawler, ParselCrawlingContext
from crawlee.otel import CrawlerInstrumentor
from crawlee.storages import Dataset, KeyValueStore, RequestQueue


def setup_tracing() -> None:
    resource = Resource.create({'service.name': 'MyCrawler', 'service.version': '1.0.0'})
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(
        SimpleSpanProcessor(OTLPSpanExporter(endpoint='localhost:4317', insecure=True))
    )
    set_tracer_provider(provider)
    CrawlerInstrumentor(
        instrument_classes=[RequestQueue, KeyValueStore, Dataset]
    ).instrument()


async def main() -> None:
    setup_tracing()

    crawler = ParselCrawler(max_requests_per_crawl=100)
    kvs = await KeyValueStore.open()

    @crawler.router.default_handler
    async def handler(context: ParselCrawlingContext) -> None:
        await context.push_data({'url': context.request.url})
        await kvs.set_value(key='last-url', value=context.request.url)
        await context.enqueue_links()

    await crawler.run(['https://crawlee.dev/'])


if __name__ == '__main__':
    asyncio.run(main())

```

--------------------------------

### Register Storage Clients

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Demonstrates how to register custom storage clients. This example shows registering clients per storage instance when opening them.

```python
from apify_client.storage import Dataset, KeyValueStore, RequestQueue

# Assuming 'custom_dataset_client', 'custom_kv_store_client', 'custom_rq_client' are instances of custom storage clients

dataset = await Dataset.open(name='my-dataset', storage_client=custom_dataset_client)
key_value_store = await KeyValueStore.open(name='my-kv-store', storage_client=custom_kv_store_client)
request_queue = await RequestQueue.open(name='my-request-queue', storage_client=custom_rq_client)
```

--------------------------------

### Start Jaeger Docker Container

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/trace_and_monitor_crawlers.mdx

Use this command to run a preconfigured Jaeger Docker container locally. Ensure Docker is installed and running.

```bash
docker run -d --name jaeger -e COLLECTOR_OTLP_ENABLED=true -p 16686:16686 -p 4317:4317 -p 4318:4318 jaegertracing/all-in-one:latest
```

--------------------------------

### Python PlaywrightCrawler Sanity Check

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/04_real_world_project.mdx

Creates a PlaywrightCrawler to visit a start URL and print the text content of category elements. Useful for verifying initial setup and selectors.

```python
from crawlee import PlaywrightCrawler


async def main():
    crawler = PlaywrightCrawler( \ 
        # Use the same browser as in the Playwright API
        # For more options, see https://playwright.dev/docs/api/class-playwright#playwrightlaunch-options
        launch_options={"use": {"headless": True}},
    )
    await crawler.run(["https://warehouse-theme-metal.myshopify.com/collections"])


# Example of how to run the crawler
# crawler.run(start_urls=["https://warehouse-theme-metal.myshopify.com/collections"])

```

--------------------------------

### PlaywrightCrawler Example

Source: https://context7.com/apify/crawlee-python/llms.txt

Demonstrates using PlaywrightCrawler for JavaScript-rendered content. It uses a headless browser and provides the full Playwright `Page` API. Requires `crawlee[playwright]` and `playwright install`.

```python
import asyncio

from crawlee.crawlers import (
    PlaywrightCrawler,
    PlaywrightCrawlingContext,
    PlaywrightPreNavCrawlingContext,
)


async def main() -> None:
    crawler = PlaywrightCrawler(
        max_requests_per_crawl=10,
        headless=True,
        browser_type='chromium',  # 'firefox' or 'webkit' also supported
    )

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        posts = await context.page.query_selector_all('.athing')
        data = []
        for post in posts:
            title_el = await post.query_selector('.title a')
            rank_el = await post.query_selector('.rank')
            data.append({
                'title': await title_el.inner_text() if title_el else None,
                'rank': await rank_el.inner_text() if rank_el else None,
                'href': await title_el.get_attribute('href') if title_el else None,
            })

        await context.push_data(data)
        await context.enqueue_links(selector='.morelink')  # paginate

    @crawler.pre_navigation_hook
    async def log_nav(context: PlaywrightPreNavCrawlingContext) -> None:
        context.log.info(f'Navigating to {context.request.url} ...')

    await crawler.run(['https://news.ycombinator.com/'])


if __name__ == '__main__':
    asyncio.run(main())

```

--------------------------------

### Adaptive Crawler Handlers

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/request_router.mdx

Illustrates an adaptive approach to crawler handlers, potentially using different strategies based on request characteristics. This example shows a flexible setup for handling various request types.

```python
from crawlee import Router

router = Router()

@router.handle("product", "category")
def handle_products_and_categories(context):
    context.log.info(f"Handling product or category: {context.request.url}")

@router.handle_default()
def handle_other(context):
    context.log.info(f"Handling other types: {context.request.url}")

```

--------------------------------

### Verify Python and Pip Installation

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Check if Python and pip are installed on your system. These are required for Crawlee installation.

```sh
python --version
```

```sh
python -m pip --version
```

--------------------------------

### Install Playwright Dependencies

Source: https://github.com/apify/crawlee-python/blob/master/README.md

Installs the necessary Playwright browser binaries. This is a required step after installing the crawlee package.

```sh
playwright install
```

--------------------------------

### Configuring FileSystemStorageClient

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Shows how to configure the FileSystemStorageClient using environment variables or the Configuration class. Key options include 'storage_dir' and 'purge_on_start'.

```python
from crawlee.storage.file_system import FileSystemStorageClient
from crawlee.config import Configuration

# Option 1: Using environment variables (e.g., CRAWLEE_STORAGE_DIR='my_custom_storage')
# Option 2: Using the Configuration class
config = Configuration({
    "storage_dir": "./my_custom_storage",
    "purge_on_start": False
})

storage_client = FileSystemStorageClient(config=config)

```

--------------------------------

### Install uv with pip

Source: https://context7.com/apify/crawlee-python/llms.txt

Installs the uv package manager using pip. uv is a fast Python package installer.

```bash
# Install uv first (https://docs.astral.sh/uv/)
pip install uv

```

--------------------------------

### Install Crawlee with Playwright Extra

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Install Crawlee with the 'playwright' extra for PlaywrightCrawler support. This also requires installing Playwright dependencies separately.

```sh
python -m pip install 'crawlee[playwright]'
```

```sh
playwright install
```

--------------------------------

### Configuring Storage Clients

Source: https://context7.com/apify/crawlee-python/llms.txt

Demonstrates how to replace the default filesystem storage with alternative backends like in-memory or Redis by passing a storage client to the crawler. Useful for testing or persistent storage needs.

```python
import asyncio

from crawlee.crawlers import ParselCrawler
from crawlee.storage_clients import MemoryStorageClient, RedisStorageClient

# In-memory (no disk I/O — ideal for tests)
memory_crawler = ParselCrawler(
    storage_client=MemoryStorageClient(),
    max_requests_per_crawl=5,
)

# Redis-backed (persistent, shareable across processes)
redis_crawler = ParselCrawler(
    storage_client=RedisStorageClient(connection_string='redis://localhost:6379'),
    max_requests_per_crawl=5,
)
```

--------------------------------

### Install Crawlee with Parsel Extra

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Install Crawlee with the 'parsel' extra, required for using the ParselCrawler.

```sh
python -m pip install 'crawlee[parsel]'
```

--------------------------------

### Implement Custom Storage Client

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storage_clients.mdx

Example of a custom storage client implementing the `StorageClient` interface. This serves as a template for integrating custom storage logic.

```python
from apify_client.storage.storage_client import StorageClient

class CustomStorageClientExample(StorageClient):
    async def get_dataset_client(self):
        # Implementation for getting a Dataset client
        pass

    async def get_key_value_store_client(self):
        # Implementation for getting a KeyValueStore client
        pass

    async def get_request_queue_client(self):
        # Implementation for getting a RequestQueue client
        pass

    async def close(self):
        # Implementation for closing the client
        pass
```

--------------------------------

### Install Crawlee with BeautifulSoup Extra

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Install Crawlee with the 'beautifulsoup' extra, required for using the BeautifulSoupCrawler.

```sh
python -m pip install 'crawlee[beautifulsoup]'
```

--------------------------------

### Basic Request Handlers with Router

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/request_router.mdx

Demonstrates how to set up basic request handlers using the Router class. This example shows how to define handlers for different labels and a default handler for unmatched requests.

```python
from crawlee import Router

router = Router()

@router.handle("product")
def handle_product(context):
    context.log.info(f"Handling product page: {context.request.url}")

@router.handle("category")
def handle_category(context):
    context.log.info(f"Handling category page: {context.request.url}")

@router.handle_default()
def handle_default(context):
    context.log.info(f"Handling default page: {context.request.url}")

```

--------------------------------

### Run Documentation Locally

Source: https://github.com/apify/crawlee-python/blob/master/CONTRIBUTING.md

Builds and runs the documentation website locally. Requires Node.js 20+.

```sh
uv run poe run-docs
```

--------------------------------

### Verify Crawlee Installation

Source: https://github.com/apify/crawlee-python/blob/master/README.md

Checks if the crawlee library is installed correctly by printing its version number.

```python
import crawlee; print(crawlee.__version__)
```

--------------------------------

### Log in to Apify CLI

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/deployment/apify_platform.mdx

Install the Apify CLI and log in using your API token. This allows the CLI to authenticate with the Apify platform for subsequent commands.

```bash
npm install -g apify-cli
apify login -t YOUR_API_TOKEN
```

--------------------------------

### Example Crawl Result Data

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/quick-start/index.mdx

This is an example of the JSON data structure that Crawlee saves for each crawled page.

```json
{
    "url": "https://crawlee.dev/",
    "title": "Crawlee · Build reliable crawlers. Fast. | Crawlee"
}
```

--------------------------------

### Check uv Installation

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Verify if the uv package manager is installed on your system. uv is recommended for managing Python environments and dependencies.

```sh
uv --version
```

--------------------------------

### Create New Crawlee Project Directly

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/01_setting_up.mdx

Initialize a new Crawlee project named 'my_crawler' using the Crawlee CLI. This method is suitable if Crawlee is already installed.

```sh
crawlee create my_crawler
```

--------------------------------

### Initialize Dataset and Push Data

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/introduction/07_saving_data.mdx

Import the Dataset class and open an instance within your crawler's setup. Then, push extracted data to this dataset instance.

```python
from crawlee import Dataset

# ... crawler setup ...

async def setup(self):
    # ...
    dataset = await Dataset.open()
    # ...

@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
    # ...

    data = {
        'manufacturer': manufacturer,
        'title': title,
        'sku': sku,
        'price': price,
        'in_stock': in_stock,
    }

    # Push the data to the dataset.
    await dataset.push_data(data)

    # ...

```

--------------------------------

### Adaptive Playwright Crawler Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/examples/playwright_crawler_adaptive.mdx

This example demonstrates how to use AdaptivePlaywrightCrawler. It combines Playwright and HTTP-based crawling, switching between them for performance. Pre-navigation hooks can be used to perform actions before navigating to a URL.

```python
import asyncio

from playwright.sync_api import sync_playwright

from crawlee import ApifyClient
from crawlee.playwright_crawler import PlaywrightCrawler
from crawlee.playwright_crawler.adaptive_playwright_crawler import AdaptivePlaywrightCrawler


async def main():
    # You can optionally specify the Playwright browser to use.
    # If not specified, the default browser will be used.
    # For more info, see https://playwright.dev/docs/api/class-playwright#playwright-launch
    async with sync_playwright() as p:
        # You can also specify the Playwright browser to use.
        # If not specified, the default browser will be used.
        # For more info, see https://playwright.dev/docs/api/class-playwright#playwright-launch
        browser = await p.chromium.launch(headless=False)

        # Initialize the AdaptivePlaywrightCrawler.
        # You can pass any PlaywrightCrawler or ParselCrawler options here.
        crawler = AdaptivePlaywrightCrawler(
            # You can also specify the Playwright browser to use.
            # If not specified, the default browser will be used.
            # For more info, see https://playwright.dev/docs/api/class-playwright#playwright-launch
            browser_instance=browser,
            # You can also specify the Playwright browser to use.
            # If not specified, the default browser will be used.
            # For more info, see https://playwright.dev/docs/api/class-playwright#playwright-launch
            pre_navigation_hooks=[
                (
                    "https://www.example.com",
                    (
                        lambda playwright_context, url: print(
                            f"Navigating to {url} with Playwright"
                        )
                    ),
                    {"playwright_only": True},
                )
            ],
        )

        # Add a start URL to the queue.
        await crawler.enqueue_links(["https://www.example.com"])

        # Start the crawler.
        await crawler.run()


if __name__ == "__main__":
    asyncio.run(main())

```

--------------------------------

### Basic Key-Value Store Operations in Python

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/storages.mdx

Demonstrates fundamental operations for the Key-Value Store, including saving and retrieving data using keys. Ensure the Key-Value Store is accessible.

```python
from crawlee import KeyValueStore

# Save data to the default key-value store
await KeyValueStore.save_record(key="my-key", value="my-value")

# Retrieve data from the default key-value store
retrieved_value = await KeyValueStore.get_record("my-key")
print(f"Retrieved value: {retrieved_value}")
```

--------------------------------

### Playwright Crawler with Fingerprint Generator Example

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/examples/playwright_crawler_with_fingerprint_generator.mdx

Use this example to configure PlaywrightCrawler with a FingerprintGenerator. Initialize the generator with desired fingerprint options to mimic real browser fingerprints. Unspecified options are automatically selected.

```python
import asyncio

from playwright.sync_api import sync_playwright

from apify_client import ApifyClient
from apify_storage_memory import MemoryStorage
from crawlee import PlaywrightCrawler
from crawlee.playwright_crawler import PlaywrightCrawlerOptions
from crawlee.fingerprint_generator import FingerprintGenerator, FingerprintGeneratorOptions


def main():
    # You can optionally view the Playwright logs by uncommenting this line:
    # import logging
    # logging.basicConfig(level=logging.DEBUG)

    # Initialize the ApifyClient and MemoryStorage
    client = ApifyClient("YOUR_APIFY_TOKEN")  # Replace with your Apify token or leave empty for local testing
    storage = MemoryStorage()

    # Initialize the FingerprintGenerator with desired options
    # If an option is not specified, it will be automatically selected from a set of reasonable values.
    # If some option is important for you, do not rely on the default and explicitly define it.
    fingerprint_generator_options = FingerprintGeneratorOptions(
        navigator_vendor_id=True,
        navigator_vendor=True,
        navigator_platform=True,
        navigator_user_agent=True,
        navigator_language=True,
        navigator_languages=True,
        navigator_timezone_offset=True,
        navigator_webdriver=True,
        screen_width=True,
        screen_height=True,
        screen_avail_width=True,
        screen_avail_height=True,
        screen_color_depth=True,
        screen_pixel_depth=True,
        webgl_vendor=True,
        webgl_renderer=True,
        webgl_aliased_width=True,
        webgl_aliased_height=True,
        webgl_unmasked_vendor=True,
        webgl_unmasked_renderer=True,
        canvas_winding_order=True,
        canvas_text=True,
        audio_context_fingerprint=True,
        webgl_context_attributes=True,
        font_family=True,
        font_resolution=True,
        font_blur=True,
        font_hinting=True,
        font_hinting_small=True,
        font_contrast=True,
        font_grayscale=True,
        font_smoothing=True,
        font_subpixel_aa=True,
        plugins=True,
        hardware_concurrency=True,
        device_memory=True,
        performance_timing=True,
        performance_navigation=True,
        dom_rect=True,
        media_codecs=True,
        battery_status=True,
        permissions=True,
        webdriver_selenium=True,
        webdriver_selenium_version=True,
        webdriver_chrome=True,
        webdriver_chrome_version=True,
        webdriver_edge=True,
        webdriver_edge_version=True,
        webdriver_firefox=True,
        webdriver_firefox_version=True,
        webdriver_safari=True,
        webdriver_safari_version=True,
        webdriver_opera=True,
        webdriver_opera_version=True,
        webdriver_ie=True,
        webdriver_ie_version=True,
    )
    fingerprint_generator = FingerprintGenerator(fingerprint_generator_options)

    # Configure PlaywrightCrawler with the fingerprint generator
    crawler_options = PlaywrightCrawlerOptions(
        storage=storage,
        fingerprint_generator=fingerprint_generator,
    )
    crawler = PlaywrightCrawler(crawler_options)

    # Define the start URLs and the request handler
    async def request_handler({request}):  # noqa
        print(f"Visiting {request.url}...")
        await request.get_page().wait_for_timeout(1000)  # Wait for 1000 ms

    await crawler.run(["https://apify.com"])

    print("Crawling finished.")


if __name__ == "__main__":
    asyncio.run(main())

```

--------------------------------

### Register StorageClient via Service Locator

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/service_locator.mdx

Demonstrates how to register a custom StorageClient implementation with the global ServiceLocator. This ensures that all components using the StorageClient will utilize the registered instance.

```python
from crawlee.storage.storage_client import StorageClient
from crawlee.service_locator import service_locator


class MyStorageClient(StorageClient):
    ...


service_locator.register_storage_client(MyStorageClient())

```

--------------------------------

### Provide Storage Client to Storage

Source: https://github.com/apify/crawlee-python/blob/master/website/versioned_docs/version-1.6/guides/service_locator.mdx

Instantiate a storage client and pass it directly to the storage constructor to use it for that specific instance.

```python
from crawlee.storage import MemoryStorage

# Provide a custom storage client to a specific storage instance
storage = MemoryStorage(storage_client=MemoryStorage())

```