### Install ScrapingBee Python SDK Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/README.md Installs the ScrapingBee Python SDK using pip. This is the first step to integrate the SDK into your Python project. ```bash pip install scrapingbee ``` -------------------------------- ### ScrapingBee API GET Request with Parameters Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/README.md Demonstrates making a GET request using the ScrapingBee Python SDK. It shows how to initialize the client, send a URL, and configure numerous API parameters for rendering JavaScript, blocking resources, using specific devices, applying extraction rules, and more. It also covers sending custom headers and cookies. ```python from scrapingbee import ScrapingBeeClient client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY') response = client.get( 'https://www.scrapingbee.com/blog/', params={ # Block ads on the page you want to scrape 'block_ads': False, # Block images and CSS on the page you want to scrape 'block_resources': True, # Premium proxy geolocation 'country_code': '', # Control the device the request will be sent from 'device': 'desktop', # Use some data extraction rules 'extract_rules': {'title': 'h1'}, # Wrap response in JSON 'json_response': False, # Interact with the webpage you want to scrape 'js_scenario': { "instructions": [ {"wait_for": "#slow_button"}, {"click": "#slow_button"}, {"scroll_x": 1000}, {"wait": 1000}, {"scroll_x": 1000}, {"wait": 1000}, ] }, # Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) 'premium_proxy': False, # Execute JavaScript code with a Headless Browser (5 credits/request) 'render_js': True, # Return the original HTML before the JavaScript rendering 'return_page_source': False, # Return page screenshot as a png image 'screenshot': False, # Take a full page screenshot without the window limitation 'screenshot_full_page': False, # Transparently return the same HTTP code of the page requested. 'transparent_status_code': False, # Wait, in miliseconds, before returning the response 'wait': 0, # Wait for CSS selector before returning the response, ex ".title" 'wait_for': '', # Set the browser window width in pixel 'window_width': 1920, # Set the browser window height in pixel 'window_height': 1080 }, headers={ # Forward custom headers to the target website "key": "value" }, cookies={ # Forward custom cookies to the target website "name": "value" } ) # Access the response text # print(response.text) ``` -------------------------------- ### ScrapingBee API GET Request with Retries Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/README.md Illustrates how to configure automatic retries for 5XX server errors when making requests with the ScrapingBee Python SDK. This enhances robustness by automatically retrying failed requests. ```python from scrapingbee import ScrapingBeeClient client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY') response = client.get( 'https://www.scrapingbee.com/blog/', params={ 'render_js': True, }, retries=5 ) ``` -------------------------------- ### ScrapingBee API Screenshot Retrieval Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/README.md Shows how to retrieve and save a screenshot of a webpage using the ScrapingBee Python SDK. This example configures the request to capture a full-page screenshot with a specific mobile viewport width. ```python from scrapingbee import ScrapingBeeClient client = ScrapingBeeClient(api_key='REPLACE-WITH-YOUR-API-KEY') response = client.get( 'https://www.scrapingbee.com/blog/', params={ # Take a screenshot 'screenshot': True, # Specify that we need the full height 'screenshot_full_page': True, # Specify a mobile width in pixel 'window_width': 375 } ) if response.ok: with open("./scrapingbee_mobile.png", "wb") as f: f.write(response.content) ``` -------------------------------- ### ScrapingBee API Parameters Reference Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/README.md Provides a comprehensive reference for parameters supported by the ScrapingBee API, accessible via the Python SDK. This includes options for rendering JavaScript, controlling headless browsers, proxy settings, screenshots, and data extraction. ```APIDOC ScrapingBee API Parameters: This section details the parameters that can be passed to the ScrapingBee API via the Python SDK's `get` or `post` methods to control web scraping behavior. **Core Parameters:** - `url` (string, required): The URL of the webpage to scrape. **Rendering & Execution Parameters:** - `render_js` (boolean, optional): If `True`, executes JavaScript on the page using a headless browser. Costs 5 credits per request. Defaults to `False`. - `js_scenario` (object, optional): Defines a sequence of JavaScript actions to perform on the page, such as clicking elements, scrolling, or waiting. Example: `{"instructions": [{"wait_for": "#my_button"}, {"click": "#my_button"}]}`. - `wait` (integer, optional): Waits for a specified number of milliseconds before returning the response. Defaults to `0`. - `wait_for` (string, optional): Waits for a CSS selector to appear on the page before returning the response. Example: `'.my-element'`. - `return_page_source` (boolean, optional): If `True`, returns the original HTML source before JavaScript rendering. Defaults to `False`. **Blocking & Resource Control:** - `block_ads` (boolean, optional): If `True`, blocks ads on the page. Defaults to `False`. - `block_resources` (boolean, optional): If `True`, blocks images, CSS, and other non-essential resources to speed up scraping. Defaults to `False`. **Proxy & Geolocation:** - `premium_proxy` (boolean, optional): If `True`, uses premium proxies for scraping. Costs 10-25 credits per request. Defaults to `False`. - `country_code` (string, optional): Specifies the country code for premium proxy geolocation (e.g., 'US', 'GB'). Requires `premium_proxy` to be `True`. **Output & Formatting:** - `screenshot` (boolean, optional): If `True`, returns a screenshot of the rendered page as a PNG image. Defaults to `False`. - `screenshot_full_page` (boolean, optional): If `True` and `screenshot` is `True`, captures a full-page screenshot. Defaults to `False`. - `window_width` (integer, optional): Sets the browser window width in pixels. Used for rendering and screenshots. Defaults to `1920`. - `window_height` (integer, optional): Sets the browser window height in pixels. Used for rendering and screenshots. Defaults to `1080`. - `json_response` (boolean, optional): If `True`, wraps the response content in a JSON object. Defaults to `False`. **Other Parameters:** - `extract_rules` (object, optional): Defines custom data extraction rules using CSS selectors. Example: `{"title": "h1", "description": "p.description"}`. - `transparent_status_code` (boolean, optional): If `True`, returns the same HTTP status code as the target page. Defaults to `False`. **Usage Example (within SDK):** ```python response = client.get( 'https://example.com', params={ 'render_js': True, 'screenshot': True, 'country_code': 'US', 'extract_rules': {'product_name': 'h1.product-title'} } ) ``` **Related Methods:** - `client.post()`: For making POST requests with similar parameters. - `ScrapingBeeClient(api_key, retries)`: Constructor allows setting default retry count. ``` -------------------------------- ### Push Git Tag to Origin Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/RELEASE.md Pushes the newly created Git tag to the remote repository (origin). This action is what triggers the automated deployment pipeline, which uploads the new version to PyPI. ```bash git push origin X.X.X ``` -------------------------------- ### Create Git Tag for Version Release Source: https://github.com/scrapingbee/scrapingbee-python/blob/main/RELEASE.md Creates a Git tag with the specified version number. This tag marks a specific point in the project's history for release and is a prerequisite for triggering the automated PyPI deployment. ```bash git tag X.X.X ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.