### Install zyte-api using pip Source: https://github.com/zytedata/python-zyte-api/blob/main/README.rst Installs the zyte-api Python package. For enhanced functionality with x402, use the optional dependency. Note the Python version requirements. ```shell pip install zyte-api ``` ```shell pip install zyte-api[x402] ``` -------------------------------- ### Input File: List of URLs for Zyte API Requests Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This example shows a plain-text file containing a list of URLs, where each URL is on a new line. The zyte-api client will send a request for each URL with the browserHtml parameter set to True. ```none https://books.toscrape.com https://quotes.toscrape.com ``` -------------------------------- ### Run Tests with Tox Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/contributing.rst This command uses 'tox' to execute tests across different Python versions and perform type checks using 'mypy'. Ensure 'tox' is installed in your environment before running this command. ```shell tox ``` -------------------------------- ### Synchronous single API request with ZyteAPI Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Demonstrates how to perform a single synchronous request using the ZyteAPI client. It initializes the client and uses the 'get' method with a dictionary specifying the URL and response body format. This is suitable for simple scripts or debugging. ```python from zyte_api import ZyteAPI client = ZyteAPI() result = client.get({"url": "https://toscrape.com", "httpResponseBody": True}) ``` -------------------------------- ### Input File: JSON Lines Format for Zyte API Requests Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This example illustrates an input file in JSON Lines format. Each line is a JSON object representing Zyte API request parameters, allowing for more detailed customization of requests, such as specifying geolocation or httpResponseBody. ```json {"url": "https://a.example", "browserHtml": true, "geolocation": "GB"} {"url": "https://b.example", "httpResponseBody": true} {"url": "https://books.toscrape.com", "productNavigation": true} ``` -------------------------------- ### Asynchronous single API request with AsyncZyteAPI Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Illustrates how to make a single asynchronous request using the AsyncZyteAPI client within an asyncio event loop. It initializes the client and uses the 'get' method with 'await'. This is suitable for coroutine-based applications. ```python import asyncio from zyte_api import AsyncZyteAPI async def main(): client = AsyncZyteAPI() result = await client.get({"url": "https://toscrape.com", "httpResponseBody": True}) asyncio.run(main()) ``` -------------------------------- ### Retry Network Errors with Brotli Compression - Python Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet illustrates how the zyte-api Python client handles network errors by retrying requests for extended periods and incorporates Brotli compression for responses. It requires the 'Brotli' library to be installed. ```python import zyte_api # Network errors are now retried for up to 15 minutes. # Brotli compression is supported, requiring 'Brotli' installation. # Example usage would involve making requests through the zyte_api client. # Example of a hypothetical request that might encounter retries: # try: # result = zyte_api.get_url("some_url") # except Exception as e: # print(f"Request failed after retries: {e}") ``` -------------------------------- ### Command-line Client Usage Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst Basic usage of the zyte-api command-line client, specifying an input file and an output file. ```APIDOC ## POST /zyte-api ### Description This endpoint represents the command-line interface for interacting with the Zyte API. It allows users to process lists of URLs or JSON Lines input files and save the results to an output file. ### Method N/A (Command-line tool) ### Endpoint N/A (Command-line tool) ### Parameters #### Path Parameters - **input_file** (string) - Required - Path to the input file (either a plain-text URL list or a JSON Lines file). #### Query Parameters - **--output** / **-o** (string) - Optional - Path to the output file. If not specified, output is printed to standard output. - **--n-conn** (integer) - Optional - Number of concurrent connections to use for requests. Defaults to 20. - **--shuffle** (boolean) - Optional - Randomizes the request order, useful for distributing load when targeting multiple websites. - **--dont-retry-errors** (boolean) - Optional - Disables retrying of error responses, only retrying rate-limiting responses. - **--store-errors** (boolean) - Optional - Includes error responses in the output file. ### Request Example ```shell zyte-api urls.txt --output result.jsonl ``` ### Response #### Success Response (N/A for CLI) - **output_data** (JSON Lines) - Each line contains a JSON object with a response from Zyte API. #### Response Example ```json {"url": "https://example.com", "browserHtml": "..."} {"url": "https://another.com", "httpResponseBody": "..."} ``` ``` -------------------------------- ### Basic zyte-api Command with Input and Output Files Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This command demonstrates the basic usage of the zyte-api command-line client. It takes a file containing target URLs as input and specifies an output file in JSON Lines format for storing the results. ```shell zyte-api urls.txt --output result.jsonl ``` -------------------------------- ### Optimization and Concurrency Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst Explains how to optimize performance using concurrent connections and request shuffling. ```APIDOC ## Optimization and Concurrency ### Description This section covers how to optimize the performance of the `zyte-api` command-line client by adjusting the number of concurrent connections and randomizing request order. ### Concurrent Connections (`--n-conn`) - **Purpose**: Controls the number of simultaneous requests made to the Zyte API. - **Default**: 20 connections. - **Usage**: Use the `--n-conn` switch followed by the desired number of connections. **Example:** ```shell zyte-api --n-conn 40 urls.txt --output result.jsonl ``` ### Request Shuffling (`--shuffle`) - **Purpose**: Randomizes the order of requests. This is particularly useful when targeting multiple websites and your input file is sorted by domain, helping to distribute the load more evenly. - **Usage**: Add the `--shuffle` option to your command. **Example:** ```shell zyte-api urls.txt --shuffle --output result.jsonl ``` ``` -------------------------------- ### Use Zyte API synchronous Python client Source: https://github.com/zytedata/python-zyte-api/blob/main/README.rst Demonstrates basic usage of the synchronous Zyte API client in Python. Initializes the client with an API key and makes a request to fetch a web page's response body. ```python from zyte_api import ZyteAPI client = ZyteAPI(api_key="YOUR_API_KEY") response = client.get({"url": "https://toscrape.com", "httpResponseBody": True}) ``` -------------------------------- ### Input File Formats Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst Describes the two accepted formats for input files: plain-text URL lists and JSON Lines files. ```APIDOC ## Input File Formats ### Description The zyte-api command-line tool accepts input from two types of files: a plain-text file with URLs or a JSON Lines file with Zyte API request parameters. ### Plain-text URL List - Each line in the file should contain a single URL. - For each URL, a Zyte API request will be sent with `request:browserHtml` set to `True` by default. **Example:** ```text https://books.toscrape.com https://quotes.toscrape.com ``` ### JSON Lines File - Each line in the file is a JSON object representing Zyte API request parameters. - This allows for more granular control over the request, specifying parameters like `url`, `browserHtml`, `geolocation`, `httpResponseBody`, `productNavigation`, etc. **Example:** ```json {"url": "https://a.example", "browserHtml": true, "geolocation": "GB"} {"url": "https://b.example", "httpResponseBody": true} {"url": "https://books.toscrape.com", "productNavigation": true} ``` ``` -------------------------------- ### Set Ethereum Private Key (Environment Variable) Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/x402.rst This snippet demonstrates how to set the Ethereum private key as an environment variable for command-line shells on Windows, macOS, and Linux. This is the recommended method for authorizing payments when using the x402 protocol. ```shell > set ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY ``` ```shell $ export ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY ``` -------------------------------- ### Use Zyte API command-line client Source: https://github.com/zytedata/python-zyte-api/blob/main/README.rst Executes Zyte API requests from the command line. Requires a file containing a list of URLs and your API key. Output is saved to a JSON Lines file. ```none https://books.toscrape.com https://quotes.toscrape.com ``` ```shell zyte-api url-list.txt --api-key YOUR_API_KEY --output results.jsonl ``` -------------------------------- ### Pass Ethereum Private Key Directly (Command-Line) Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/x402.rst This code snippet shows how to pass your Ethereum private key directly to the Zyte API command-line client using the --eth-key switch. This is an alternative to using environment variables for authorization. ```shell zyte-api --eth-key YOUR_ETH_PRIVATE_KEY … ``` -------------------------------- ### Initialize ZyteAPI Client with Aggressive Retrying Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Demonstrates how to create a ZyteAPI client instance using the predefined aggressive_retrying policy for enhanced retry attempts. This policy doubles the retry attempts for all retry scenarios. ```python from zyte_api import ZyteAPI, aggressive_retrying client = ZyteAPI(retrying=aggressive_retrying) ``` -------------------------------- ### Output File Format Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst Details the format of the output file generated by the zyte-api command-line client. ```APIDOC ## Output File Format ### Description The output file generated by the `zyte-api` command-line tool is in JSON Lines format. Each line contains a JSON object representing a response from the Zyte API. ### Format - **JSON Lines**: Each line is a valid JSON object. - **Content**: Each JSON object corresponds to the result of a single request made to the Zyte API. **Example:** ```json {"url": "https://example.com", "browserHtml": "...", "echoData": "https://example.com"} {"url": "https://another.com", "httpResponseBody": "...", "echoData": "https://another.com"} ``` ### Ordering By default, the order of responses in the output file may not match the order of requests in the input file due to concurrent processing. To ensure responses match requests, use the `request:echoData` parameter. ``` -------------------------------- ### Synchronous multiple API requests with ZyteAPI session Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Shows how to perform multiple synchronous requests efficiently using a ZyteAPI session. It initializes the client, creates a session context, defines a list of queries, and iterates through the results or exceptions using 'session.iter'. This is recommended for better performance when handling numerous requests. ```python from zyte_api import ZyteAPI, RequestError client = ZyteAPI() with client.session() as session: queries = [ {"url": "https://toscrape.com", "httpResponseBody": True}, {"url": "https://books.toscrape.com", "httpResponseBody": True}, ] for result_or_exception in session.iter(queries): if isinstance(result_or_exception, dict): ... elif isinstance(result_or_exception, RequestError): ... else: assert isinstance(result_or_exception, Exception) ... ``` -------------------------------- ### Python Zyte API: New Synchronous and Asynchronous Interfaces Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet demonstrates the introduction of `ZyteAPI` and `AsyncZyteAPI` classes, offering cleaner synchronous and asynchronous interfaces. It also shows how to replace older methods like `zyte_api.aio.client.create_session` with the new `AsyncZyteAPI.session`. ```python from zyte_api import ZyteAPI, AsyncZyteAPI # Synchronous client sync_client = ZyteAPI() session = sync_client.session() # Asynchronous client async_client = AsyncZyteAPI() async_session = async_client.session() # Deprecated way (example of what to replace) # from zyte_api.aio.client import create_session # async_session = await create_session() ``` -------------------------------- ### Pass Ethereum Private Key Directly (Python Client) Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/x402.rst This Python code demonstrates how to pass your Ethereum private key directly to the ZyteAPI and AsyncZyteAPI client classes during instantiation. This method is used to authorize payments when utilizing the x402 protocol. ```python from zyte_api import ZyteAPI client = ZyteAPI(eth_key="YOUR_ETH_PRIVATE_KEY") ``` ```python from zyte_api import AsyncZyteAPI client = AsyncZyteAPI(eth_key="YOUR_ETH_PRIVATE_KEY") ``` -------------------------------- ### Error Handling and Retries Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst Details how the zyte-api client handles errors and retries, and how to configure this behavior. ```APIDOC ## Errors and Retries ### Description This section explains how the `zyte-api` command-line client automatically handles retries for various errors and how to customize this behavior. ### Automatic Retries - The client automatically retries requests for rate-limiting errors, unsuccessful responses, and network errors. - These retries follow the default retry policy defined by the Zyte API. ### Disabling Error Retries (`--dont-retry-errors`) - **Purpose**: To disable retrying of error responses, allowing only retries for rate-limiting responses. - **Usage**: Use the `--dont-retry-errors` flag. **Example:** ```shell zyte-api --dont-retry-errors urls.txt --output result.jsonl ``` ### Storing Errors (`--store-errors`) - **Purpose**: To include error responses directly in the output file, in addition to logging them to standard error. - **Usage**: Use the `--store-errors` flag. **Example:** ```shell zyte-api --store-errors urls.txt --output result.jsonl ``` ``` -------------------------------- ### Pass API Key to Zyte API Python Client Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/key.rst Instantiate Zyte API client objects by directly passing the API key using the 'api_key' parameter. This is an alternative to using environment variables and requires explicit key provision during client initialization for both synchronous and asynchronous clients. ```python from zyte_api import ZyteAPI client = ZyteAPI(api_key="YOUR_API_KEY") ``` ```python from zyte_api import AsyncZyteAPI client = AsyncZyteAPI(api_key="YOUR_API_KEY") ``` -------------------------------- ### zyte-api with Shuffle Option for Randomized Requests Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This command utilizes the --shuffle option to randomize the order of requests sent by the zyte-api client. This is particularly useful when targeting multiple websites to distribute load more evenly. ```shell zyte-api urls.txt --shuffle … ``` -------------------------------- ### Asynchronous multiple API requests with AsyncZyteAPI session Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Demonstrates performing multiple asynchronous requests concurrently using an AsyncZyteAPI session. It initializes the client, creates an async session, defines queries, and iterates through the futures returned by 'session.iter', awaiting each result and handling potential RequestErrors or other exceptions. ```python import asyncio from zyte_api import AsyncZyteAPI, RequestError async def main(): client = AsyncZyteAPI() async with client.session() as session: queries = [ {"url": "https://toscrape.com", "httpResponseBody": True}, {"url": "https://books.toscrape.com", "httpResponseBody": True}, ] for future in session.iter(queries): try: result = await future except RequestError as e: ... except Exception as e: ... asyncio.run(main()) ``` -------------------------------- ### Use Zyte API asynchronous Python client Source: https://github.com/zytedata/python-zyte-api/blob/main/README.rst Shows how to use the asynchronous Zyte API client within an asyncio event loop. Initializes the client and performs a non-blocking request to retrieve web page content. ```python import asyncio from zyte_api import AsyncZyteAPI async def main(): client = AsyncZyteAPI(api_key="YOUR_API_KEY") response = await client.get( {"url": "https://toscrape.com", "httpResponseBody": True} ) asyncio.run(main()) ``` -------------------------------- ### Set API Key Environment Variable (Shell) Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/key.rst Configure the ZYTE_API_KEY environment variable for use with Zyte API clients. This method is recommended for seamless integration with both command-line and Python clients. Ensure you replace 'YOUR_API_KEY' with your actual API key. ```shell > set ZYTE_API_KEY=YOUR_API_KEY ``` ```shell $ export ZYTE_API_KEY=YOUR_API_KEY ``` -------------------------------- ### Configuring concurrent connections for ZyteAPI client Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/api.rst Shows how to customize the number of concurrent connections for both ZyteAPI and AsyncZyteAPI clients. By passing the 'n_conn' parameter during initialization, you can adjust the concurrency level to optimize performance. This setting applies globally to all method calls from the client instance. ```python client = ZyteAPI(n_conn=30) ``` -------------------------------- ### Pass API Key to Zyte API Command-Line Client Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/key.rst Provide your API key directly to the Zyte API command-line client using the '--api-key' switch. This method allows for ad-hoc usage without pre-configuring environment variables. ```shell zyte-api --api-key YOUR_API_KEY … ``` -------------------------------- ### Python Zyte API: Custom User Agent for AsyncClient Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet illustrates how to set a custom user agent string when initializing the `AsyncClient` in the Python Zyte API. This allows for specific identification of your client when making requests to the Zyte API. ```python from zyte_api.aio.client import AsyncClient custom_user_agent = "MyZyteCrawler/1.0" client = AsyncClient(user_agent=custom_user_agent) # Or using the new AsyncZyteAPI # from zyte_api import AsyncZyteAPI # async_client = AsyncZyteAPI(user_agent=custom_user_agent) ``` -------------------------------- ### Improve Command-Line Script Usability - Python Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet demonstrates usability improvements to the zyte-api Python client's command-line interface. It allows execution via `zyte-api` instead of `python -m zyte_api` and automatically guesses the input file type based on extension and content (.jl, .jsonl, .txt). ```bash # Previous usage: # python -m zyte_api --intype jsonl input.jsonl # New usage: # zyte-api --intype jsonl input.jsonl # Type guessing based on extension/content: # zyte-api input.jl # zyte-api input.jsonl # zyte-api input.txt ``` -------------------------------- ### zyte-api with Custom Number of Connections Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This command shows how to adjust the number of concurrent connections used by the zyte-api client. Increasing this value can improve performance when making many requests. ```shell zyte-api --n-conn 40 … ``` -------------------------------- ### Retry Temporary Download Errors - Python Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet highlights that temporary download errors in the zyte-api Python client are now retried 3 times by default. Previously, these errors were not handled with retries, potentially leading to failed downloads. ```python import zyte_api # Temporary download errors are now retried 3 times by default. # This enhances the reliability of downloading content. # Example of a download operation: # try: # content = zyte_api.download("some_resource_url") # except Exception as e: # print(f"Download failed after retries: {e}") ``` -------------------------------- ### Python Zyte API: Custom Retrying Policy for AsyncClient Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet shows how to provide a custom retrying policy object to the `AsyncClient` constructor. This enables advanced control over how the client handles retries for specific errors or conditions. ```python from zyte_api.aio.client import AsyncClient from zyte_api.retry import RetryPolicy # Assuming RetryPolicy is defined elsewhere # Define your custom retry policy class MyCustomRetryPolicy(RetryPolicy): def should_retry(self, error, response): # Custom retry logic here return True custom_policy = MyCustomRetryPolicy() client = AsyncClient(retrying=custom_policy) # Or using the new AsyncZyteAPI # from zyte_api import AsyncZyteAPI # async_client = AsyncZyteAPI(retrying=custom_policy) ``` -------------------------------- ### zyte-api to Disable Error Retries and Store Errors Source: https://github.com/zytedata/python-zyte-api/blob/main/docs/use/cli.rst This command demonstrates how to disable automatic retries for error responses using the --dont-retry-errors flag, while still retrying rate-limiting responses. It also includes the --store-errors flag to include error responses in the output file, which are otherwise only logged to stderr. ```shell zyte-api --dont-retry-errors … ``` ```shell zyte-api --store-errors … ``` -------------------------------- ### Deprecating Temporary Download Error Methods in Python Zyte API Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet shows the restoration and deprecation of `temporary_download_error_stop()` and `temporary_download_error_wait()` methods within the `RetryFactory` class for maintaining backward compatibility. These methods are now considered legacy. ```python from zyte_api.retry import RetryFactory # Example usage (methods are deprecated, use new ones instead) retry_factory = RetryFactory() # retry_factory.temporary_download_error_stop() # retry_factory.temporary_download_error_wait() ``` -------------------------------- ### Update AggStats Class - Python Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet details internal refactoring of the AggStats class in the zyte-api Python client. Attributes like `n_extracted_queries`, `n_results`, and `n_input_queries` have been modified or removed, with `n_processed` being a new addition. This change is backwards incompatible if direct manipulation of AggStats attributes was used. ```python class AggStats: def __init__(self): # self.n_extracted_queries attribute is removed. # self.n_results is renamed to self.n_success self.n_success = 0 # self.n_input_queries is removed. self.n_processed = 0 # Other methods and attributes would follow. ``` -------------------------------- ### Treat ClientConnectorError as Network Error - Python Source: https://github.com/zytedata/python-zyte-api/blob/main/CHANGES.rst This snippet shows how the zyte-api Python client now treats `aiohttp.client_exceptions.ClientConnectorError` as a network error, enabling automatic retries for such connection issues. This improves the robustness of network requests. ```python from aiohttp.client_exceptions import ClientConnectorError import zyte_api # ClientConnectorError is now treated as a network error and retried. # The zyte_api client internally handles this. # Example scenario: # try: # # Attempting a request that might raise ClientConnectorError # response = zyte_api.request("some_url") # except ClientConnectorError as e: # print(f"Connection error encountered, retrying: {e}") # except Exception as e: # print(f"An unexpected error occurred: {e}") ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.