### Install Code Formatting Tools Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/CONTRIBUTING.md Install Black and isort for code formatting. These tools ensure code consistency across the project. ```bash pip install black isort ``` -------------------------------- ### Install Oxylabs Python SDK Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Install the Oxylabs Python SDK using pip. Ensure you have Python 3.5 or above. ```bash pip install oxylabs ``` -------------------------------- ### Quick Start: RealtimeClient Google Search Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Initialize the Realtime client with your Oxylabs API credentials and perform a Google search. Replace 'username' and 'password' with your actual credentials. ```python from oxylabs import RealtimeClient # Set your Oxylabs API Credentials. username = "username" password = "password" # Initialize the Realtime client with your credentials. client = RealtimeClient(username, password) # Use `google_search` as a source to scrape Google with nike as a query. result = client.google.scrape_search("nike") print(result.raw) ``` -------------------------------- ### Scrape Google Search with Custom Parameters Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md This example demonstrates how to pass specific query parameters like start_page, pages, and limit to the scrape_search function for Google. Default parameters are used if none are provided. ```python client = RealtimeClient(username, password) result = client.google.scrape_search( "football", start_page=1, pages=3, limit=4, ) ``` -------------------------------- ### Google Search with Context Options Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Example of sending context options for Google Search scraping, including language, filtering, and pagination limits. ```python client = RealtimeClient(username, password) result = client.google.scrape_search( "adidas", parse=True, context=[ {"key": "results_language", "value": "en"}, {"key": "filter", "value": 0}, {"key": "tbm", "value": "isch"}, { "key": "limit_per_page", "value": [ {"page": 1, "limit": 10}, {"page": 2, "limit": 10}, ], }, ], ) ``` -------------------------------- ### Scraping Google Search Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Example of how to scrape Google search results using the SDK. You can specify search queries and optionally provide scraping parameters. ```APIDOC ## google.scrape_search ### Description Scrapes Google search results for a given query. ### Method `client.google.scrape_search(query, **kwargs)` ### Parameters #### Path Parameters None #### Query Parameters - **query** (string) - Required - The search term to use. - **start_page** (integer) - Optional - The page number to start scraping from. - **pages** (integer) - Optional - The number of pages to scrape. - **limit** (integer) - Optional - The maximum number of results to retrieve per page. ### Request Example ```python from oxylabs.realtime import RealtimeClient username = "your_username" password = "your_password" client = RealtimeClient(username, password) result = client.google.scrape_search("football") ``` ### Request Example with Parameters ```python from oxylabs.realtime import RealtimeClient username = "your_username" password = "your_password" client = RealtimeClient(username, password) result = client.google.scrape_search( "football", start_page=1, pages=3, limit=4, ) ``` ### Response #### Success Response (200) - **results** (list) - A list of search result objects. #### Response Example ```json { "results": [ { "title": "Example Search Result Title", "url": "http://example.com", "snippet": "This is a short description of the search result." } ] } ``` ``` -------------------------------- ### Check Python Version Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Use this command to check your Python version. If you have multiple versions installed, use `python3 --version`. ```sh python --version ``` ```sh python3 --version ``` -------------------------------- ### Using Dedicated Parsers for Structured Data Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Example of using a dedicated parser for Amazon search results to retrieve structured data like ASIN and title. ```python # Scrape Amazon search results for keyword "headset" # Then print a list of products including their ASIN and title client = RealtimeClient(username, password) response = client.amazon.scrape_search("headset", parse=True) for result in response.results: for item in result.content["results"]["organic"]: print(f'{item["asin"]}: {item["title"]}') ``` -------------------------------- ### Proxy Endpoint Integration with ProxyClient Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Utilize this snippet to integrate with Oxylabs as a proxy service. Customize request headers for user agent, geo-location, and rendering preferences before making a GET request. ```python from oxylabs import ProxyClient # Set your Oxylabs API Credentials. username = "username" password = "password" # Initialize the ProxyClient with your credentials. proxy = ProxyClient(username, password) # Customize headers for specific requirements (optional). proxy.add_user_agent_header("desktop_chrome") proxy.add_geo_location_header("Germany") proxy.add_render_header("html") # Use the proxy to make a request. result = proxy.get("https://www.example.com") print(result.text) ``` -------------------------------- ### Scrape YouTube Video Metadata Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves parsed metadata for a specific YouTube video using its video ID. Set `parse=True` to get structured data including title, description, views, likes, and channel information. Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_metadata( "dQw4w9WgXcQ", # YouTube video ID parse=True, ) print(result.raw) ``` -------------------------------- ### Using Type Constants for User Agent and Render Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Shows how to use pre-defined constants for `user_agent_type` and `render` parameters to specify browser and output format. Import these from `oxylabs.utils.types`. ```python from oxylabs import RealtimeClient from oxylabs.utils.types import user_agent_type, render client = RealtimeClient("your_username", "your_password") result = client.google.scrape_search( "laptops under $1000", user_agent_type=user_agent_type.DESKTOP_CHROME, # "desktop_chrome" render=render.HTML, # "html" parse=True, ) # user_agent_type constants: # DESKTOP, DESKTOP_CHROME, DESKTOP_EDGE, DESKTOP_FIREFOX, # DESKTOP_OPERA, DESKTOP_SAFARI # MOBILE, MOBILE_ANDROID, MOBILE_IOS # TABLET, TABLET_ANDROID, TABLET_IOS # render constants: HTML, PNG ``` -------------------------------- ### Scraping Various E-commerce and Real Estate Sources Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Illustrates how to use the SDK to scrape data from a wide range of sources including North American and European retail, global marketplaces, real estate platforms, and AI chat services. All sources follow consistent method naming conventions. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") # North American Retail walmart_result = client.walmart.scrape_search("gaming chair") bestbuy_result = client.bestbuy.scrape_product("6525177") target_result = client.target_store.scrape_category("5xtg6") costco_result = client.costco.scrape_url("https://www.costco.com/televisions.html") # European Retail allegro_result = client.allegro.scrape_search("słuchawki bezprzewodowe") idealo_result = client.idealo.scrape_search("laptop") mediamarkt_result = client.mediamarkt.scrape_product("2892316") # Asian & Global Marketplaces alibaba_result = client.alibaba.scrape_search("solar panels") flipkart_result = client.flipkart.scrape_product("itme-id-123") lazada_result = client.lazada.scrape_url("https://www.lazada.sg/products/...") # Latin American mercadolibre_result = client.mercadolibre.scrape_search("zapatillas") magazineluiza_result= client.magazineluiza.scrape_product("229391700") # Real Estate airbnb_result = client.airbnb.scrape_url("https://www.airbnb.com/rooms/12345") zillow_result = client.zillow.scrape_url("https://www.zillow.com/homedetails/...") # AI Chat Platforms chatgpt_result = client.chatgpt.scrape("https://chatgpt.com/") perplexity_result = client.perplexity.scrape("https://www.perplexity.ai/") # TikTok Shop tiktok_result = client.tiktok.scrape_shop_search("sneakers") # Bing bing_result = client.bing.scrape_search("best python IDE", parse=True) # Google Shopping gs_result = client.google_shopping.scrape_shopping_search("4K monitor", parse=True) # Universal (any URL) any_result = client.universal.scrape_url("https://example.com", render="html") for r in walmart_result.results: print(r.status_code, r.content) ``` -------------------------------- ### Run Unit Tests with tests.sh Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/CONTRIBUTING.md Execute the tests.sh script to run all unit tests for the project. This ensures code quality and stability before submitting contributions. ```bash scripts/tests.sh ``` -------------------------------- ### Format Code with fmt.sh Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/CONTRIBUTING.md Run the provided fmt.sh script to automatically format your code using isort and black. This script applies the project's code style to the src directory. ```bash scripts/fmt.sh ``` -------------------------------- ### Basic Scrape with Configurable Options Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Demonstrates using predefined constants for user agent type and render format when scraping Google search results. ```python from oxylabs import RealtimeClient from oxylabs.utils.types import user_agent_type, render client = RealtimeClient(username, password) result = client.google.scrape_search( "adidas", user_agent_type=user_agent_type.DESKTOP, render=render.HTML, ) ``` -------------------------------- ### Download YouTube Video/Audio with AsyncClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Use `AsyncClient.youtube.scrape_download` to download YouTube content to cloud storage. Specify storage type, URL, download type, quality, and timeouts. ```python import asyncio from oxylabs import AsyncClient async def main(): client = AsyncClient("your_username", "your_password") result = await client.youtube.scrape_download( query="dQw4w9WgXcQ", storage_type="s3", storage_url="my-scraping-bucket", context=[ {"key": "download_type", "value": "video"}, {"key": "video_quality", "value": "720p"}, ], job_completion_timeout=300, poll_interval=10, ) print(result.raw) asyncio.run(main()) ``` -------------------------------- ### Importing Utility Types Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Import predefined parameter values as constants from the oxylabs.utils.types module for consistency and ease of use. ```python from oxylabs.utils.types import user_agent_type, render ``` -------------------------------- ### amazon.scrape_product Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves full product details (title, price, stock, rating, specs, delivery, variants) for a given ASIN. ```APIDOC ## amazon.scrape_product — Amazon Product Detail Page Retrieves full product details (title, price, stock, rating, specs, delivery, variants) for a given ASIN. ### Method Signature ```python client.amazon.scrape_product(asin: str, domain: str = 'com', geo_location: str = None, parse: bool = True) ``` ### Parameters - **asin** (str) - Required - The 10-character ASIN of the product. - **domain** (str) - Optional - The Amazon domain to scrape (e.g., 'com', 'de', 'co.uk'). Defaults to 'com'. - **geo_location** (str) - Optional - The ZIP code for delivery location to influence results. Example: '10001'. - **parse** (bool) - Optional - Whether to parse the results into a structured format. Defaults to True. ### Request Example ```python result = client.amazon.scrape_product( "B08N5WRWNW", # 10-character ASIN domain="com", geo_location="10001", parse=True, ) ``` ### Response Example ```python for r in result.results: c = r.content_parsed print(c.product_name, c.price, c.currency, c.rating, c.reviews_count) print(c.is_prime_eligible, c.stock) for spec in c.specifications.items: print(spec.title, spec.value) ``` ``` -------------------------------- ### Synchronous Scraping with RealtimeClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Initialize a synchronous client for blocking HTTP requests. Use for simple, sequential scraping tasks. All scrape methods block until the response is received. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") # Simple Google search result = client.google.scrape_search("adidas sneakers") print(result.raw) # Full raw JSON dict print(result.job.id) # Job ID for r in result.results: print(r.status_code, r.url) # Per-page HTTP status and URL ``` -------------------------------- ### Scrape Google Search with Python SDK Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Use this snippet to initiate a Google search scrape. Ensure you have initialized the RealtimeClient with your username and password. ```python client = RealtimeClient(username, password) result = client.google.scrape_search("football") ``` -------------------------------- ### General Scraping Method Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Demonstrates the general approach to calling scraping methods for various targets. The SDK provides specific methods for each supported target. ```APIDOC ## General SDK Usage ### Description Instantiate the `RealtimeClient` with your credentials and then call the appropriate method for the desired target and scraping task. ### Method `client..(, **kwargs)` ### Example ```python from oxylabs.realtime import RealtimeClient username = "your_username" password = "your_password" client = RealtimeClient(username, password) # Example for Amazon product scraping # result = client.amazon.scrape_product("some_product_id") # Example for YouTube search # result = client.youtube.scrape_search("trending videos") # Example for ChatGPT scraping # result = client.chatgpt.scrape("What is the weather today?") ``` ### Note on Parameters Each source and method may accept specific query parameters. Refer to the [Web Scraper API Documentation](https://developers.oxylabs.io/scraping-solutions/web-scraper-api) for a detailed list of accepted parameters for each source. ``` -------------------------------- ### Enable JavaScript Rendering with ProxyClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Use `ProxyClient.add_render_header` to enable JavaScript rendering for proxy requests. Accepted values are 'html' for rendered HTML or 'png' for a screenshot. ```python from oxylabs import ProxyClient proxy = ProxyClient("your_username", "your_password") proxy.add_render_header("html") result = proxy.get("https://www.bestbuy.com/site/searchpage.jsp?st=laptop") print(result.text) ``` -------------------------------- ### RealtimeClient - Synchronous Scraping Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Initializes a synchronous client for blocking scrape operations. All scrape methods wait for the response. ```APIDOC ## RealtimeClient `RealtimeClient(username, password)` initializes a synchronous client. All scrape methods block until the response is received (connection TTL: 150 s). Every source is available as an attribute on the client instance. ### Usage ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") # Simple Google search result = client.google.scrape_search("adidas sneakers") print(result.raw) # Full raw JSON dict print(result.job.id) # Job ID for r in result.results: print(r.status_code, r.url) # Per-page HTTP status and URL ``` ``` -------------------------------- ### Browser Instructions for JavaScript Rendering Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/README.md Configures browser instructions for executing JavaScript, including inputting text, clicking elements, and setting wait times. ```python client = RealtimeClient(username, password) result = client.universal.scrape_url( "https://www.ebay.com/", render="html", browser_instructions=[ { "type": "input", "value": "pizza boxes", "selector": { "type": "xpath", "value": "//input[@class='gh-tb ui-autocomplete-input']" } }, { "type": "click", "selector": { "type": "xpath", "value": "//input[@type='submit']" } }, { "type": "wait", "wait_time_s": 10 } ] ) ``` -------------------------------- ### Enable Parsing via Proxy with ProxyClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Activate Oxylabs' dedicated parser for proxy requests by setting `parse=True` in `ProxyClient.add_parse_header`. `parsing_instructions` can be `None` for auto-detection. ```python from oxylabs import ProxyClient proxy = ProxyClient("your_username", "your_password") proxy.add_parse_header( parse=True, parsing_instructions=None # Use dedicated parser (auto-detected by domain) ) result = proxy.get("https://www.amazon.com/dp/B08N5WRWNW") print(result.text) ``` -------------------------------- ### Context Options Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Advanced Source Parameters. Context options pass additional source-specific parameters (e.g., language filter, tbm type, check-in dates) as a list of `{"key": ..., "value": ...}` dicts. ```APIDOC ## Context Options — Advanced Source Parameters Context options pass additional source-specific parameters (e.g., language filter, tbm type, check-in dates) as a list of `{"key": ..., "value": ...}` dicts. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") # Google search with language filter and per-page limits result = client.google.scrape_search( "python frameworks", parse=True, context=[ {"key": "results_language", "value": "en"}, {"key": "filter", "value": 1}, { "key": "limit_per_page", "value": [ {"page": 1, "limit": 10}, {"page": 2, "limit": 5}, ], }, ], ) # Amazon search with merchant filter result2 = client.amazon.scrape_search( "coffee maker", parse=True, context=[ {"key": "autocompletion", "value": True}, ], ) ``` ``` -------------------------------- ### amazon.scrape_pricing Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves all third-party seller pricing offers for an ASIN. ```APIDOC ## amazon.scrape_pricing — Amazon Offer Listing Retrieves all third-party seller pricing offers for an ASIN. ### Method Signature ```python client.amazon.scrape_pricing(asin: str, domain: str = 'com', start_page: int = 1, pages: int = 1, parse: bool = True) ``` ### Parameters - **asin** (str) - Required - The 10-character ASIN of the product. - **domain** (str) - Optional - The Amazon domain to scrape (e.g., 'com', 'de', 'co.uk'). Defaults to 'com'. - **start_page** (int) - Optional - The starting page number for offer listings. Defaults to 1. - **pages** (int) - Optional - The number of pages to scrape. Defaults to 1. - **parse** (bool) - Optional - Whether to parse the results into a structured format. Defaults to True. ### Request Example ```python result = client.amazon.scrape_pricing( "B08N5WRWNW", domain="com", start_page=1, pages=3, parse=True, ) ``` ### Response Example ```python for r in result.results: for offer in r.content_parsed.pricing: print(offer.seller, offer.price, offer.currency, offer.condition, offer.price_shipping) ``` ``` -------------------------------- ### Advanced Context Options for Google and Amazon Search Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Pass additional source-specific parameters as a list of `{"key": ..., "value": ...}` dictionaries to `RealtimeClient.google.scrape_search` and `RealtimeClient.amazon.scrape_search` for advanced filtering. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") # Google search with language filter and per-page limits result = client.google.scrape_search( "python frameworks", parse=True, context=[ {"key": "results_language", "value": "en"}, {"key": "filter", "value": 1}, { "key": "limit_per_page", "value": [ {"page": 1, "limit": 10}, {"page": 2, "limit": 5}, ], }, ], ) # Amazon search with merchant filter result2 = client.amazon.scrape_search( "coffee maker", parse=True, context=[ {"key": "autocompletion", "value": True}, ], ) ``` -------------------------------- ### Scrape YouTube Channel Info Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes metadata and recent video listings for a YouTube channel using its handle. Set `parse=True` for structured data and adjust `limit` to control the number of videos returned (default is 20). Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_channel( "@Oxylabs", parse=True, limit=50, # Number of videos to return (default: 20) ) print(result.raw) ``` -------------------------------- ### Check YouTube Video AI Training Eligibility Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Checks if a YouTube video is eligible for use in AI/ML training datasets by providing its video ID. Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_video_trainability( video_id="dQw4w9WgXcQ", ) print(result.raw) ``` -------------------------------- ### Oxylabs Python SDK Repository Structure Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/CLAUDE.md Overview of the directory structure for the Oxylabs Python SDK, highlighting key modules for API interaction, source-specific implementations, and utilities. ```tree src/oxylabs/ ├── internal/ │ ├── api.py # Base API classes (RealtimeAPI, AsyncAPI) │ └── client.py # RealtimeClient, AsyncClient — registers all sources ├── sources/ │ ├── /.py # Each source has sync + async class │ ├── north_american/ # bestbuy, costco, lowes, walmart, target_store, etc. │ ├── asian/ # alibaba, aliexpress, flipkart, lazada, etc. │ ├── european/ # allegro, cdiscount, idealo, mediamarkt │ ├── latin_american/ # falabella, mercadolibre, mercadolivre, etc. │ ├── real_estate/ # airbnb, zillow │ └── response.py # Shared Response class ├── utils/ │ ├── types/source.py # Source name constants (e.g. AMAZON_SEARCH = "amazon_search") │ └── utils.py # Helpers (prepare_config, check_parsing_instructions_validity) └── proxy/proxy.py tests/sources/ # Mirrors source structure ``` -------------------------------- ### AsyncClient - Push-Pull Asynchronous Scraping Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Implements the Push-Pull integration using aiohttp for asynchronous scraping. Jobs are submitted and then polled until completion. ```APIDOC ## AsyncClient — Push-Pull Asynchronous Scraping `AsyncClient(username, password)` implements the Push-Pull integration using `aiohttp`. Each method is a coroutine; jobs are submitted then polled at `poll_interval` until `job_completion_timeout` is reached. Multiple concurrent jobs can be submitted and awaited with `asyncio.as_completed`. ### Usage ```python import asyncio from oxylabs import AsyncClient async def main(): client = AsyncClient("your_username", "your_password") tasks = [ client.google.scrape_url( "https://www.google.com/search?q=nike", parse=True, job_completion_timeout=60, poll_interval=5, ), client.amazon.scrape_search( "headphones", parse=True, job_completion_timeout=60, poll_interval=5, ), ] for future in asyncio.as_completed(tasks): result = await future print(result.raw) asyncio.run(main()) ``` ``` -------------------------------- ### youtube.scrape_download Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Downloads YouTube video or audio content to cloud storage (GCS, S3, S3-compatible). Available only via `AsyncClient`. ```APIDOC ## youtube.scrape_download — YouTube Download (Async Only) Downloads YouTube video or audio content to cloud storage (GCS, S3, S3-compatible). Available only via `AsyncClient`. ```python import asyncio from oxylabs import AsyncClient async def main(): client = AsyncClient("your_username", "your_password") result = await client.youtube.scrape_download( query="dQw4w9WgXcQ", storage_type="s3", storage_url="my-scraping-bucket", context=[ {"key": "download_type", "value": "video"}, {"key": "video_quality", "value": "720p"}, ], job_completion_timeout=300, poll_interval=10, ) print(result.raw) asyncio.run(main()) ``` ``` -------------------------------- ### Custom Server-Side Parsing with RealtimeClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Define custom parsing logic using XPath or CSS selectors executed server-side via `RealtimeClient.universal.scrape_url`. Use `_fns` for function chains and `_fn` for specific operations. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.universal.scrape_url( "https://news.ycombinator.com/", parse=True, parsing_instructions={ "headlines": { "_fns": [ { "_fn": "css", "_args": [".titleline > a"] } ] }, "top_score": { "_fns": [ { "_fn": "css_one", "_args": [".score"] }, { "_fn": "amount_from_string", "_args": None } ] } }, ) data = result.results[0].custom_content_parsed print(data["headlines"]) # list of headline strings print(data["top_score"]) # numeric score ``` -------------------------------- ### amazon.scrape_reviews Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes customer reviews for a given ASIN. ```APIDOC ## amazon.scrape_reviews — Amazon Customer Reviews Scrapes customer reviews for a given ASIN. ### Method Signature ```python client.amazon.scrape_reviews(asin: str, domain: str = 'com', start_page: int = 1, pages: int = 1, parse: bool = True) ``` ### Parameters - **asin** (str) - Required - The 10-character ASIN of the product. - **domain** (str) - Optional - The Amazon domain to scrape (e.g., 'com', 'de', 'co.uk'). Defaults to 'com'. - **start_page** (int) - Optional - The starting page number for reviews. Defaults to 1. - **pages** (int) - Optional - The number of pages to scrape. Defaults to 1. - **parse** (bool) - Optional - Whether to parse the results into a structured format. Defaults to True. ### Request Example ```python result = client.amazon.scrape_reviews( "B08N5WRWNW", domain="com", start_page=1, pages=2, parse=True, ) ``` ### Response Example ```python for r in result.results: for review in r.content_parsed.reviews: print(review.author, review.rating, review.title) print(review.content[:100]) print(review.timestamp, review.is_verified) ``` ``` -------------------------------- ### Asynchronous Scraping with AsyncClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Implement Push-Pull asynchronous scraping using aiohttp. Jobs are submitted and then polled until completion. Suitable for concurrent scraping tasks. ```python import asyncio from oxylabs import AsyncClient async def main(): client = AsyncClient("your_username", "your_password") tasks = [ client.google.scrape_url( "https://www.google.com/search?q=nike", parse=True, job_completion_timeout=60, poll_interval=5, ), client.amazon.scrape_search( "headphones", parse=True, job_completion_timeout=60, poll_interval=5, ), ] for future in asyncio.as_completed(tasks): result = await future print(result.raw) asyncio.run(main()) ``` -------------------------------- ### Scrape Amazon Offer Listings Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Fetches all third-party seller pricing offers for a given ASIN. Supports pagination, domain selection, and structured data parsing. Requires a 10-character ASIN. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.amazon.scrape_pricing( "B08N5WRWNW", domain="com", start_page=1, pages=3, parse=True, ) for r in result.results: for offer in r.content_parsed.pricing: print(offer.seller, offer.price, offer.currency, offer.condition, offer.price_shipping) ``` -------------------------------- ### Synchronous Scrape Method Signature Source: https://github.com/oxylabs/oxylabs-sdk-python/blob/main/CLAUDE.md Illustrates the signature for a synchronous scrape method, including required and optional parameters, and how to prepare configuration and payload for an API request. Note that `request_timeout` is always the last named parameter before `**kwargs`. ```python def scrape_search( self, query: str, # required param domain: Optional[str] = None, # optional params locale: Optional[str] = None, geo_location: Optional[str] = None, user_agent_type: Optional[str] = None, render: Optional[str] = None, callback_url: Optional[str] = None, context: Optional[list] = None, parse: Optional[bool] = None, parsing_instructions: Optional[dict] = None, request_timeout: Optional[int] = 165, # always last named param **kwargs # always at the end ) -> Response: config = prepare_config(request_timeout=request_timeout) payload = { "source": source.AMAZON_SEARCH, "query": query, "domain": domain, # ... all params ... **kwargs, } check_parsing_instructions_validity(parsing_instructions) api_response = self._api_instance.get_response(payload, config) return Response(api_response) ``` -------------------------------- ### Scrape eBay with JS Rendering and Browser Instructions Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes a URL with JavaScript rendering enabled and provides specific browser instructions for interacting with the page, such as inputting text and clicking elements. Use this for dynamic websites requiring browser automation. ```python result = client.universal.scrape_url( "https://www.ebay.com/", render="html", geo_location="United States", browser_instructions=[ { "type": "input", "value": "vintage camera", "selector": { "type": "xpath", "value": "//input[@id='gh-ac']" } }, { "type": "click", "selector": { "type": "xpath", "value": "//input[@id='gh-btn']" } }, { "type": "wait", "wait_time_s": 5 } ], parsing_instructions={ "listing_titles": { "_fns": [ { "_fn": "xpath", "_args": ["..//h3[@class='s-item__title']/text()"] } ] } }, ) print(result.results[0].custom_content_parsed) ``` -------------------------------- ### ProxyClient.add_render_header Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Enables JavaScript rendering for proxy requests. Accepted values: `"html"` (rendered HTML) or `"png"` (screenshot). ```APIDOC ## ProxyClient.add_render_header — JavaScript Rendering Enables JavaScript rendering for proxy requests. Accepted values: `"html"` (rendered HTML) or `"png"` (screenshot). ```python from oxylabs import ProxyClient proxy = ProxyClient("your_username", "your_password") proxy.add_render_header("html") result = proxy.get("https://www.bestbuy.com/site/searchpage.jsp?st=laptop") print(result.text) ``` ``` -------------------------------- ### Scrape Amazon Product Details Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves comprehensive product details for a specific ASIN. Supports domain selection, geo-location, and structured data parsing. Requires a 10-character ASIN. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.amazon.scrape_product( "B08N5WRWNW", # 10-character ASIN domain="com", geo_location="10001", parse=True, ) for r in result.results: c = r.content_parsed print(c.product_name, c.price, c.currency, c.rating, c.reviews_count) print(c.is_prime_eligible, c.stock) for spec in c.specifications.items: print(spec.title, spec.value) ``` -------------------------------- ### Scrape YouTube Autocomplete Suggestions Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves YouTube autocomplete keyword suggestions for a given search term. Specify `location` and `language` for targeted suggestions. Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_autocomplete( "python web scra", location="US", language="en", ) print(result.raw) ``` -------------------------------- ### google.scrape_ai_mode Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes Google AI Mode responses for a prompt/question. Requires JavaScript rendering (defaults to `render="html"`). Query must be fewer than 400 characters. ```APIDOC ## google.scrape_ai_mode — Google AI Mode Scrapes Google AI Mode responses for a prompt/question. Requires JavaScript rendering (defaults to `render="html"`). Query must be fewer than 400 characters. ### Method Signature ```python client.google.scrape_ai_mode(query: str, geo_location: str = None, parse: bool = False) ``` ### Parameters - **query** (str): The prompt or question for Google AI Mode (max 400 characters). - **geo_location** (str, optional): The geographical location for the search. - **parse** (bool, optional): Whether to parse the results. Defaults to False. ### Request Example ```python result = client.google.scrape_ai_mode( "What are the best practices for Python packaging?", geo_location="United States", parse=True, ) ``` ### Response Example ```python print(result.raw) ``` ``` -------------------------------- ### Scrape Amazon Q&A Section Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Retrieves customer questions and answers for a given ASIN. Supports domain selection and structured data parsing. Requires a 10-character ASIN. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.amazon.scrape_questions( "B08N5WRWNW", domain="com", parse=True, ) for r in result.results: q = r.content_parsed.questions print(q.title, q.votes) for answer in q.answers: print(answer.author, answer.content) ``` -------------------------------- ### Scrape YouTube Search Results (Extended) Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes up to 700 YouTube search results for a given query. Supports filtering for 4K content and sorting by relevance. Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_search_max( "machine learning", sort_by="relevance", filter_4k=True, ) print(len(result.raw.get("results", []))) ``` -------------------------------- ### universal.scrape_url Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes any URL on the web. Supports JavaScript rendering, browser instructions (input, click, wait, scroll), custom parsing instructions, and content encoding for binary downloads. ```APIDOC ## universal.scrape_url — Universal Scraper Scrapes any URL on the web. Supports JavaScript rendering, browser instructions (input, click, wait, scroll), custom parsing instructions, and content encoding for binary downloads. ### Method Signature ```python client.universal.scrape_url(url: str, browser_instructions: list = None, render_js: bool = True, custom_parser: str = None, encoding: str = None, geo_location: str = None, parse: bool = True) ``` ### Parameters - **url** (str) - Required - The URL to scrape. - **browser_instructions** (list) - Optional - A list of instructions for browser automation (e.g., `["input(selector='#search', text='Oxylabs')", "click('#search-icon')", "wait(5)"]`). - **render_js** (bool) - Optional - Whether to render JavaScript on the page. Defaults to True. - **custom_parser** (str) - Optional - Custom parsing instructions (e.g., CSS selectors or JSONPath). - **encoding** (str) - Optional - Content encoding for binary downloads (e.g., 'base64'). - **geo_location** (str) - Optional - The geographical location to use for the request. - **parse** (bool) - Optional - Whether to parse the results into a structured format. Defaults to True. ### Request Example ```python result = client.universal.scrape_url( "https://example.com", browser_instructions=["wait(2)", "scroll()"], render_js=True, parse=True, ) ``` ### Response Example ```python print(result.results[0].content) ``` ``` -------------------------------- ### Universal URL Scraper Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes any URL on the web, supporting JavaScript rendering, browser instructions, custom parsing, and binary downloads. Requires initialization of the RealtimeClient. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") ``` -------------------------------- ### Custom Parsing Instructions Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Server-side XPath / CSS Extraction. Custom parsing instructions let you define XPath or CSS extraction logic executed server-side on the raw HTML. Validated locally before sending via `check_parsing_instructions_validity`. ```APIDOC ## Custom Parsing Instructions — Server-Side XPath / CSS Extraction Custom parsing instructions let you define XPath or CSS extraction logic executed server-side on the raw HTML. Validated locally before sending via `check_parsing_instructions_validity`. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.universal.scrape_url( "https://news.ycombinator.com/", parse=True, parsing_instructions={ "headlines": { "_fns": [ { "_fn": "css", "_args": [" .titleline > a"] } ] }, "top_score": { "_fns": [ { "_fn": "css_one", "_args": [" .score"] }, { "_fn": "amount_from_string", "_args": None } ] } }, ) data = result.results[0].custom_content_parsed print(data["headlines"]) print(data["top_score"]) ``` **Supported `_fn` values:** `xpath`, `xpath_one`, `css`, `css_one`, `element_text`, `length`, `convert_to_float`, `convert_to_int`, `convert_to_str`, `max`, `min`, `product`, `amount_from_string`, `amount_range_from_string`, `regex_find_all`, `regex_search`, `regex_substring`, `join`, `select_nth`, `average` ``` -------------------------------- ### amazon.scrape_questions Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes the Q&A (customer questions and answers) section for an ASIN. ```APIDOC ## amazon.scrape_questions — Amazon Q&A Scrapes the Q&A (customer questions and answers) section for an ASIN. ### Method Signature ```python client.amazon.scrape_questions(asin: str, domain: str = 'com', parse: bool = True) ``` ### Parameters - **asin** (str) - Required - The 10-character ASIN of the product. - **domain** (str) - Optional - The Amazon domain to scrape (e.g., 'com', 'de', 'co.uk'). Defaults to 'com'. - **parse** (bool) - Optional - Whether to parse the results into a structured format. Defaults to True. ### Request Example ```python result = client.amazon.scrape_questions( "B08N5WRWNW", domain="com", parse=True, ) ``` ### Response Example ```python for r in result.results: q = r.content_parsed.questions print(q.title, q.votes) for answer in q.answers: print(answer.author, answer.content) ``` ``` -------------------------------- ### ProxyClient - Proxy Endpoint Integration Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Wraps a requests.Session to route traffic through the Oxylabs proxy endpoint. Allows configuration of headers for user-agent, geo-location, rendering, and parsing. ```APIDOC ## ProxyClient — Proxy Endpoint Integration `ProxyClient(username, password)` wraps a `requests.Session` configured to route all traffic through the Oxylabs proxy endpoint. Optional headers control user-agent type, geo-location, rendering, and parsing. Use `.get(url)` to scrape any URL. ### Usage ```python from oxylabs import ProxyClient proxy = ProxyClient("your_username", "your_password") # Optional: configure scraping behaviour via headers proxy.add_user_agent_header("desktop_chrome") # UA type proxy.add_geo_location_header("Germany") # Geo-target proxy.add_render_header("html") # JS rendering proxy.add_parse_header(parse=True) # Enable parser result = proxy.get("https://www.example.com", request_timeout=60) if result: print(result.text) print(result.status_code) ``` ``` -------------------------------- ### Scrape YouTube Search Results (Standard) Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes up to 20 YouTube search results for a given query. Supports filtering by upload date, content type, duration, HD, subtitles, and sort order. Initialize RealtimeClient with your username and password. ```python from oxylabs import RealtimeClient client = RealtimeClient("your_username", "your_password") result = client.youtube.scrape_search( "python tutorials 2025", upload_date="this_year", type="video", duration="4-20", sort_by="view_count", hd=True, subtitles=True, ) print(result.raw) ``` -------------------------------- ### google.scrape_images Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Scrapes Google Images for a given query by setting the `tbm=isch` context automatically. ```APIDOC ## google.scrape_images — Google Image Search Scrapes Google Images for a given query by setting the `tbm=isch` context automatically. ### Method Signature ```python client.google.scrape_images(query: str, pages: int = 1, geo_location: str = None, parse: bool = False) ``` ### Parameters - **query** (str): The search query for images. - **pages** (int, optional): The number of pages to scrape. Defaults to 1. - **geo_location** (str, optional): The geographical location for the search. - **parse** (bool, optional): Whether to parse the results. Defaults to False. ### Request Example ```python result = client.google.scrape_images( "mountain landscape photography", pages=2, geo_location="United States", parse=True, ) ``` ### Response Example ```python for r in result.results: for img in r.content_parsed.results.images.items: print(img.url, img.alt) ``` ``` -------------------------------- ### Set User Agent Header with ProxyClient Source: https://context7.com/oxylabs/oxylabs-sdk-python/llms.txt Control the browser user-agent for proxy requests using `ProxyClient.add_user_agent_header`. Ensure the provided value is one of the supported types. ```python from oxylabs import ProxyClient proxy = ProxyClient("your_username", "your_password") # Valid values: "desktop", "desktop_chrome", "desktop_edge", "desktop_firefox", # "desktop_opera", "desktop_safari", "mobile", "mobile_android", # "mobile_ios", "tablet", "tablet_android", "tablet_ios" proxy.add_user_agent_header("mobile_android") result = proxy.get("https://www.amazon.com/s?k=laptop") print(result.text) ```