### Install the SDK Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Installs the Bright Data SDK and python-dotenv for environment variable support. ```python # Install the SDK and python-dotenv for .env file support !pip install brightdata-sdk python-dotenv -q ``` -------------------------------- ### Quick Start Source: https://github.com/brightdata/sdk-python/blob/main/README.md An asynchronous example demonstrating how to initialize the BrightDataClient and scrape a URL. ```python import asyncio from brightdata import BrightDataClient async def main(): async with BrightDataClient() as client: result = await client.scrape_url("https://example.com") print(result.data) asyncio.run(main()) ``` -------------------------------- ### Error Handling Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Demonstrates how to handle potential validation, API, or unexpected errors during a scraping operation. ```python from brightdata.exceptions import ValidationError, APIError try: # This will fail - invalid URL result = client.scrape.amazon.products(url="invalid-url") except ValidationError as e: print(f"āŒ Validation Error: {e}") except APIError as e: print(f"āŒ API Error: {e}") print(f" Status Code: {e.status_code}") except Exception as e: print(f"āŒ Unexpected Error: {e}") ``` -------------------------------- ### Your First Scrape Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Performs a scrape of an Amazon product page using the Bright Data SDK. Note the use of async/await for Jupyter notebooks. ```python from brightdata import BrightDataClient # Initialize client client = BrightDataClient(token=API_TOKEN) # Scrape an Amazon product # # NOTE: In Jupyter notebooks, you MUST use async/await because Jupyter # already has a running event loop. The _sync methods won't work here. # # In regular Python scripts, you can use: # result = client.scrape.amazon.products_sync(url="...") async with client.scrape.amazon.engine: result = await client.scrape.amazon.products( url="https://www.amazon.com/dp/B0CRMZHDG8", timeout=660 ) print(f"āœ… Success: {result.success}") print(f"šŸ’° Cost: ${result.cost:.4f}" if result.cost else "šŸ’° Cost: N/A") print(f"šŸ“Š Status: {result.status}") if result.data: print(f"\nšŸ“¦ Data keys: {list(result.data.keys())[:10]}...") # Show first 10 keys ``` -------------------------------- ### Authentication Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Sets up authentication for the Bright Data SDK using an API token, with options for direct assignment, environment variables, or a .env file. ```python import os # Set your API token here - choose one option: # Option 1: Direct assignment (for quick testing) # API_TOKEN = "your_api_token_here" # Option 2: Use environment variable # os.environ['BRIGHTDATA_API_TOKEN'] = 'your_token_here' # API_TOKEN = os.getenv('BRIGHTDATA_API_TOKEN') # Option 3: Load from .env file (recommended for projects) # Create a .env file in your project root with: BRIGHTDATA_API_TOKEN=your_token_here from dotenv import load_dotenv load_dotenv() # Loads .env from current directory or parents API_TOKEN = os.getenv('BRIGHTDATA_API_TOKEN') if API_TOKEN: print(f"āœ… Token loaded: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") else: print("āŒ No token found. Set BRIGHTDATA_API_TOKEN in .env file or use Option 1/2") ``` -------------------------------- ### Setup - Use Local Development Version Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/pinterest.ipynb This code snippet sets up the environment for local development by adding the source path to sys.path, loading environment variables, and retrieving the API token. It also prints the API token (masked) and confirms setup completion. ```python import os import sys from pathlib import Path # Add local src to path (use development version, not installed) project_root = Path.cwd().parent.parent src_path = project_root / "src" if str(src_path) not in sys.path: sys.path.insert(0, str(src_path)) print(f"Using source from: {src_path}") # Load environment variables from dotenv import load_dotenv load_dotenv(project_root / ".env") # Get API token API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("BRIGHTDATA_API_TOKEN not found in environment") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") print("Setup complete!") ``` -------------------------------- ### Save Your Data Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Saves the scraped results to a file in JSON format. ```python # Save to JSON result.save_to_file("amazon_product.json", format="json") print("āœ… Saved to amazon_product.json") ``` -------------------------------- ### Install brightdata-sdk Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/03_serp.ipynb Checks the installed version of the brightdata-sdk. ```shell !pip show brightdata-sdk ``` -------------------------------- ### Search Engines (SERP) Example Source: https://github.com/brightdata/sdk-python/blob/main/README.md Example of performing a Google search query using the SDK and iterating through the results. ```python async with BrightDataClient() as client: result = await client.search.google(query="python scraping", num_results=10) for item in result.data: print(item) ``` -------------------------------- ### Install SDK Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/05_scraper_studio.ipynb Installs the Bright Data SDK. ```python #pip install brightdata-sdk ``` -------------------------------- ### Setup and Authentication Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/05_scraper_studio.ipynb Loads API token from environment variables and initializes the BrightDataClient. ```python import os from dotenv import load_dotenv load_dotenv() API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("Set BRIGHTDATA_API_TOKEN in .env file") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") print("Setup complete!") ``` -------------------------------- ### Verify SDK Installation Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/test_v2.1.0_release.ipynb Imports the brightdata library and prints its installed version to confirm successful installation. ```python import brightdata print(f"Installed version: {brightdata.__version__}") ``` -------------------------------- ### Inspect the Data Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb Displays information about the scraped result, including URL, platform, status, and the keys available in the returned data. ```python # Display result info print(f"URL: {result.url}") print(f"Platform: {result.platform}") print(f"Status: {result.status}") print(f"\nData keys: {list(result.data.keys()) if result.data else 'No data'}") # Show first few fields if result.data: for key, value in list(result.data.items())[:5]: print(f" {key}: {str(value)[:80]}..." if len(str(value)) > 80 else f" {key}: {value}") ``` -------------------------------- ### Installation Source: https://github.com/brightdata/sdk-python/blob/main/README.md Install the Bright Data Python SDK using pip. ```bash pip install brightdata-sdk ``` -------------------------------- ### Setup - Use Local Development Version Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/youtube.ipynb This code snippet sets up the environment by adding the local src directory to the system path, loading environment variables, and initializing the BrightDataClient. It also verifies that the local development version of the brightdata module is being used and checks for the availability of YouTube scraper methods. ```python import os import sys from pathlib import Path # Add local src to path (use development version, not installed) project_root = Path.cwd().parent src_path = project_root / "src" if str(src_path) not in sys.path: sys.path.insert(0, str(src_path)) print(f"Using source from: {src_path}") # Load environment variables from dotenv import load_dotenv load_dotenv(project_root / ".env") # Get API token API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("BRIGHTDATA_API_TOKEN not found in environment") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") print("Setup complete!") ``` ```python from brightdata import BrightDataClient # Verify we're using local version import brightdata print(f"brightdata module location: {brightdata.__file__}") # Initialize client client = BrightDataClient(token=API_TOKEN) # Verify YouTube scraper is accessible print(f"\nYouTubeScraper: {type(client.scrape.youtube).__name__}") print(f"YouTubeSearchScraper: {type(client.search.youtube).__name__}") # Check for scraper methods print("\nScraper methods (URL-based):") print([m for m in dir(client.scrape.youtube) if not m.startswith('_') and callable(getattr(client.scrape.youtube, m))]) print("\nSearch scraper methods (Discovery):") print([m for m in dir(client.search.youtube) if not m.startswith('_') and callable(getattr(client.search.youtube, m))]) ``` -------------------------------- ### Environment Setup and SDK Check Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/amazon.ipynb This code snippet sets up the environment by loading the API token from a .env file and checks the BrightData SDK version and location. ```python import os from dotenv import load_dotenv load_dotenv() API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("Set BRIGHTDATA_API_TOKEN in .env file") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") # Check SDK version and location import brightdata print(f"SDK Version: {brightdata.__version__}") print(f"SDK Location: {brightdata.__file__}") print("Setup complete!") ``` -------------------------------- ### Pagination Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/03_serp.ipynb Demonstrates fetching more than 10 results, triggering automatic pagination. ```python QUERY = "machine learning frameworks" NUM_RESULTS = 30 # Request 30 results (3 pages) print(f"Query: '{QUERY}'") print(f"Requesting {NUM_RESULTS} results (pagination will fetch ~3 pages)...") async with client: result = await client.search.google( query=QUERY, location="United States", num_results=NUM_RESULTS ) print(f"Success: {result.success}") print(f"Results returned: {len(result.data) if result.data else 0}") print(f"Total found (Google estimate): {result.total_found}") print(f"Results per page: {result.results_per_page}") if result.error: print(f"Note: {result.error}") if result.success and result.data: print(f"\n--- All {len(result.data)} Results ---") for i, item in enumerate(result.data): title = item.get('title', 'N/A')[:55] print(f"{i+1:2}. {title}") else: print(f"\nError: {result.error}") ``` -------------------------------- ### Setup - Use Local Development Version Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/perplexity.ipynb This code snippet sets up the environment for using a local development version of the BrightData SDK. It adds the local 'src' directory to the system path, loads environment variables to retrieve the BrightData API token, and prints confirmation messages. ```python import os import sys from pathlib import Path # Add local src to path (use development version, not installed) project_root = Path.cwd().parent src_path = project_root / "src" if str(src_path) not in sys.path: sys.path.insert(0, str(src_path)) print(f"Using source from: {src_path}") # Load environment variables from dotenv import load_dotenv load_dotenv(project_root / ".env") # Get API token API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("BRIGHTDATA_API_TOKEN not found in environment") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") print("Setup complete!") ``` -------------------------------- ### Web Scraper API - Amazon Example Source: https://github.com/brightdata/sdk-python/blob/main/README.md Examples of using the SDK to scrape Amazon product details, reviews, and sellers. ```python async with BrightDataClient() as client: # Product details result = await client.scrape.amazon.products(url="https://amazon.com/dp/B0CRMZHDG8") # Reviews result = await client.scrape.amazon.reviews(url="https://amazon.com/dp/B0CRMZHDG8") # Sellers result = await client.scrape.amazon.sellers(url="https://amazon.com/dp/B0CRMZHDG8") ``` -------------------------------- ### Browser API Example Source: https://github.com/brightdata/sdk-python/blob/main/README.md Connect to a cloud-hosted Chrome instance and automate browser actions using Playwright. ```python from brightdata import BrightDataClient from playwright.async_api import async_playwright client = BrightDataClient( browser_username="brd-customer--zone-", browser_password="", ) url = client.browser.get_connect_url(country="us") # country is optional async with async_playwright() as pw: browser = await pw.chromium.connect_over_cdp(url) page = await browser.new_page() await page.goto("https://example.com") html = await page.content() await browser.close() ``` -------------------------------- ### Web Scraping Source: https://github.com/brightdata/sdk-python/blob/main/README.md Basic example of scraping a URL using the async BrightDataClient. ```python async with BrightDataClient() as client: result = await client.scrape_url("https://example.com") print(result.data) ``` -------------------------------- ### Install BrightData SDK Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/test_v2.1.0_release.ipynb Installs version 2.1.0 of the brightdata-sdk package using pip. ```python !pip install brightdata-sdk==2.1.0 ``` -------------------------------- ### Basic Amazon Product Scrape Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/02_pandas_integration.ipynb A simple example of triggering an Amazon product scrape using the SDK. ```python async with client.scrape.amazon.engine: result = await client.scrape.amazon.products(url=url) ``` -------------------------------- ### Environment Variable Setup Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/06_browser_api.ipynb Loads API credentials from environment variables and performs basic validation. ```python import os from dotenv import load_dotenv load_dotenv() API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") BROWSER_USER = os.getenv("BRIGHTDATA_BROWSERAPI_USERNAME") BROWSER_PASS = os.getenv("BRIGHTDATA_BROWSERAPI_PASSWORD") if not API_TOKEN: raise ValueError("Set BRIGHTDATA_API_TOKEN in .env file") if not BROWSER_USER or not BROWSER_PASS: raise ValueError( "Set BRIGHTDATA_BROWSERAPI_USERNAME and BRIGHTDATA_BROWSERAPI_PASSWORD in .env file.\n" "Find credentials at: https://brightdata.com/cp/zones (Browser API zone > Overview tab)" ) print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") print(f"Browser Username: {BROWSER_USER[:20]}...") print(f"Browser Password: {'*' * len(BROWSER_PASS)}") print("\nSetup complete!") ``` -------------------------------- ### SERP Async Mode Source: https://github.com/brightdata/sdk-python/blob/main/README.md Example of non-blocking SERP requests using mode="async", with polling interval and timeout. ```python async with BrightDataClient() as client: # Non-blocking - polls for results result = await client.search.google( query="python programming", mode="async", poll_interval=2, # Check every 2 seconds poll_timeout=30 # Give up after 30 seconds ) for item in result.data: print(item['title'], item['link']) ``` -------------------------------- ### API Token Setup Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/linkedin.ipynb Loads the BrightData API token from environment variables and prints a masked version of it. It also includes commented-out lines to check the SDK version and location. ```python import os from dotenv import load_dotenv load_dotenv() API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN") if not API_TOKEN: raise ValueError("Set BRIGHTDATA_API_TOKEN in .env file") print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}") # Check SDK version and location # print(f"SDK Version: {brightdata.__version__}") # print(f"SDK Location: {brightdata.__file__}") # print("Setup complete!") ``` -------------------------------- ### Test technical question Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/perplexity.ipynb This example shows how to use the Perplexity engine to answer a technical question about Python's async/await implementation. It prints the prompt, executes the search, and then displays the technical answer and any associated web search queries used. ```python # Test technical question PROMPT = "How do I implement async/await in Python? Give me a simple example." print("Technical question:") print(f" '{PROMPT}'") print("\nThis may take up to 11 minutes...\n") async with client.scrape.perplexity.engine: result = await client.scrape.perplexity.search( prompt=PROMPT, country="US", poll_timeout=660 ) print(f"Success: {result.success}") print(f"Status: {result.status}") if result.success and result.data: data = result.data if isinstance(data, list) and len(data) > 0: data = data[0] if isinstance(data, dict) and 'error' not in data: print("\n--- Technical Answer ---") answer = data.get('answer_html', data.get('answer', 'N/A')) print(f"{str(answer)[:1000]}..." if len(str(answer)) > 1000 else answer) # Web search queries used queries = data.get('web_search_query', []) if queries: print(f"\nSearch queries used: {queries}") elif isinstance(data, dict) and 'error' in data: print(f"\nAPI Error: {data.get('error')}") else: print(f"\nError: {result.error}") ``` -------------------------------- ### Export to Excel Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/02_pandas_integration.ipynb Example of exporting a Pandas DataFrame to an Excel file, including error handling for missing dependencies. ```python try: df_batch.to_excel('amazon_products.xlsx', index=False, sheet_name='Products') print("āœ… Exported to amazon_products.xlsx") except ImportError: print("āš ļø Install openpyxl for Excel export: pip install openpyxl") ``` -------------------------------- ### Filtering and Localization Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/07_discover_api.ipynb This code snippet shows how to use the `filter_keywords` parameter to refine search results and `country` for localization. It prints the query, intent, filter keywords, and country, then makes a discover API call and prints the success status, total results, and a list of filtered results with their relevance scores, titles, and links. ```python QUERY = "sustainable fashion brands" INTENT = "eco-friendly clothing companies" FILTER_KEYWORDS = ["sustainability", "eco-friendly", "organic"] print(f"Query: '{QUERY}'") print(f"Intent: '{INTENT}'") print(f"Filter Keywords: {FILTER_KEYWORDS}") print(f"Country: us\n") async with client: result = await client.discover( query=QUERY, intent=INTENT, filter_keywords=FILTER_KEYWORDS, country="us", num_results=10 ) print(f"Success: {result.success}") print(f"Total Results: {result.total_results}") if result.success and result.data: print("\n--- Filtered Results ---") for i, item in enumerate(result.data[:10]): score = item.get('relevance_score', 0) title = item.get('title', 'N/A') link = item.get('link', 'N/A') print(f"\n{i+1}. [{score:.2f}] {title}") print(f" {link}") else: print(f"\nError: {result.error}") ``` -------------------------------- ### Web Scraping Async Mode Source: https://github.com/brightdata/sdk-python/blob/main/README.md Example of non-blocking web scraping using mode="async", including polling interval and timeout settings. Also shows batch scraping of multiple URLs. ```python async with BrightDataClient() as client: # Triggers request → gets response_id → polls until ready result = await client.scrape_url( url="https://example.com", mode="async", poll_interval=5, # Check every 5 seconds poll_timeout=180 # Web Unlocker async can take ~2 minutes ) print(result.data) # Batch scraping multiple URLs concurrently urls = ["https://example.com", "https://example.org", "https://example.net"] results = await client.scrape_url(url=urls, mode="async", poll_timeout=180) ``` -------------------------------- ### Initialize Client Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/04_web_unlocker.ipynb Initializes the BrightData client and prints SDK version and default zone. ```python from brightdata import BrightDataClient from brightdata import __version__ # Initialize client client = BrightDataClient(token=API_TOKEN) print("Client initialized") print(f"SDK version: {__version__}") print(f"Default Web Unlocker zone: {client.web_unlocker_zone}") ``` -------------------------------- ### IMDB Movie Data Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb An example of a Python dictionary containing movie data, including a list of actors, their characters, and links, along with the movie's URL and video information. ```python 'character': 'Kitchen Guard', 'link': 'https://www.imdb.com/name/nm0855957'}, {'actor': 'Layla Bias Galloway', 'character': 'Shower Guard', 'link': 'https://www.imdb.com/name/nm0303190'}, {'actor': 'Ann Stockdale', 'character': 'Bonnie', 'link': 'https://www.imdb.com/name/nm0830768'}, {'actor': 'Essie Hayes', 'character': 'Essie', 'link': 'https://www.imdb.com/name/nm0371009'}, {'actor': 'John Aprea', 'character': 'Dream Man', 'link': 'https://www.imdb.com/name/nm0032501'} ], 'url': 'https://www.imdb.com/title/tt0071266/', 'videos': None} ``` ``` -------------------------------- ### Initialize Client Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/05_scraper_studio.ipynb Initializes the BrightDataClient with the API token and enters the client context. ```python from brightdata import BrightDataClient client = BrightDataClient(token=API_TOKEN) await client.__aenter__() # Your collector ID from Scraper Studio dashboard COLLECTOR_ID = "c_mly0sa6x10hshxi8jb" # Replace with your collector ID print("Client initialized") ``` -------------------------------- ### Install Required Packages Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/02_pandas_integration.ipynb Installs the necessary Python packages for Bright Data SDK, pandas, matplotlib, seaborn, and python-dotenv. It also imports these libraries and sets up plotting styles. ```python # Install required packages %pip install brightdata-sdk pandas matplotlib seaborn python-dotenv -q import os import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from brightdata import BrightDataClient # Set plotting style sns.set_style('whitegrid') plt.rcParams['figure.figsize'] = (12, 6) print("āœ… All packages loaded") ``` -------------------------------- ### Client Initialization Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/03_serp.ipynb Initializes the BrightDataClient with the API token and prints the default SERP zone. ```python from brightdata import BrightDataClient # Initialize client client = BrightDataClient(token=API_TOKEN) print("Client initialized") print(f"Default SERP zone: {client.serp_zone}") ``` -------------------------------- ### Product Scraping (Single URL) Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/amazon.ipynb Example input for scraping product details from a single Amazon URL. ```python input: {'url': 'https://www.amazon.com/dp/B0CRMZHDG8', 'asin': '', 'zipcode': '', 'lang...'} ``` -------------------------------- ### Troubleshooting: Uninitialized BrightDataClient Source: https://github.com/brightdata/sdk-python/blob/main/README.md Shows the incorrect way of initializing BrightDataClient without a context manager and the correct way using 'async with'. ```python # Wrong - forgot context manager client = BrightDataClient() result = await client.scrape_url("...") # Error! # Correct - use context manager async with BrightDataClient() as client: result = await client.scrape_url("...") ``` -------------------------------- ### Client Initialization and Scraper Verification Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/pinterest.ipynb This snippet initializes the BrightDataClient using the API token and verifies that the Pinterest scraper and search scraper are accessible and lists their available methods. ```python from brightdata import BrightDataClient # Verify we're using local version import brightdata print(f"brightdata module location: {brightdata.__file__}") # Initialize client client = BrightDataClient(token=API_TOKEN) # Verify Pinterest scraper is accessible print(f"\nPinterestScraper: {type(client.scrape.pinterest).__name__}") print(f"PinterestSearchScraper: {type(client.search.pinterest).__name__}") # Check for scraper methods print("\nScraper methods (URL-based):") print([m for m in dir(client.scrape.pinterest) if not m.startswith('_') and callable(getattr(client.scrape.pinterest, m))]) print("\nSearch scraper methods (Discovery):") print([m for m in dir(client.search.pinterest) if not m.startswith('_') and callable(getattr(client.search.pinterest, m))]) ``` -------------------------------- ### NBA Players Stats - Trigger Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb This code snippet shows how to get a sample of NBA player statistics. ```python # NBA Players Stats - Trigger nba_players_snapshot = await client.datasets.nba_players_stats.sample(records_limit=2) print(f"NBA Players Stats snapshot: {nba_players_snapshot}") ``` -------------------------------- ### Datasets API - Get Metadata Source: https://github.com/brightdata/sdk-python/blob/main/README.md Retrieve metadata for a dataset to discover available fields and their types. ```python metadata = await client.datasets.imdb_movies.get_metadata() for name, field in metadata.fields.items(): print(f"{name}: {field.type}") ``` -------------------------------- ### Get Dataset Metadata from API Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/crunchbase/crunchbase.ipynb Fetches and displays metadata for the Crunchbase Companies dataset from the API. ```python print("Fetching Crunchbase metadata from API...\n") async with client: metadata = await client.datasets.crunchbase_companies.get_metadata() print(f"Dataset ID: {metadata.id}") print(f"Total fields from API: {len(metadata.fields)}") print("\n=== Sample Fields ===") for i, (name, field) in enumerate(list(metadata.fields.items())[:10]): print(f" {name}: {field.type} - {field.description or 'N/A'}") ``` -------------------------------- ### Client Initialization and Scraper Verification Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/perplexity.ipynb This code snippet initializes the BrightDataClient using the obtained API token and verifies that the Perplexity scraper is accessible and lists its available methods. ```python from brightdata import BrightDataClient # Verify we're using local version import brightdata print(f"brightdata module location: {brightdata.__file__}") # Initialize client client = BrightDataClient(token=API_TOKEN) # Verify Perplexity scraper is accessible print(f"\nPerplexityScraper: {type(client.scrape.perplexity).__name__}") # Check for scraper methods print("\nAvailable methods:") print([m for m in dir(client.scrape.perplexity) if not m.startswith('_') and callable(getattr(client.scrape.perplexity, m))]) ``` -------------------------------- ### Airbnb Properties - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading Airbnb Properties data. ```python # Airbnb Properties - Trigger airbnb_snapshot = await client.datasets.airbnb_properties.sample(records_limit=2) print(f"Airbnb Properties snapshot: {airbnb_snapshot}") ``` ```python # Airbnb Properties - Download airbnb_data = await client.datasets.airbnb_properties.download(airbnb_snapshot) print(f"Airbnb Properties: {len(airbnb_data)} records") airbnb_data ``` -------------------------------- ### Version Check Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/test_v2.1.0_release.ipynb Asserts that the installed BrightData SDK version is 2.1.0 and prints a success message. ```python assert brightdata.__version__ == "2.1.0", f"Expected 2.1.0, got {brightdata.__version__}" print("Version check passed!") ``` -------------------------------- ### Initialize the SERP API client Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/03_serp.ipynb Initialize the SERP API client with your account credentials. ```python account_id = "YOUR_ACCOUNT_ID" api_key = "YOUR_API_KEY" api = BrightData(account_id=account_id, api_key=api_key) serp_api = SERP(api=api) ``` -------------------------------- ### Initialize BrightDataClient Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Initializes the BrightDataClient and enters its asynchronous context. ```python from brightdata import BrightDataClient client = BrightDataClient() await client.__aenter__() ``` -------------------------------- ### Pinterest Posts - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading Pinterest Posts data. ```python # Pinterest Posts - Trigger pinterest_posts_snapshot = await client.datasets.pinterest_posts.sample(records_limit=2) print(f"Pinterest Posts snapshot: {pinterest_posts_snapshot}") ``` ```python # Pinterest Posts - Download pinterest_posts_data = await client.datasets.pinterest_posts.download(pinterest_posts_snapshot) print(f"Pinterest Posts: {len(pinterest_posts_data)} records") pinterest_posts_data ``` -------------------------------- ### Instagram Post URLs Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/instagram.ipynb Provides example URLs for different types of Instagram posts: a post with a carousel, a single image post, and a reel. ```python post_img_link_with_carausel ="https://www.instagram.com/p/DTLo9uhDPCn/" post_img_link="https://www.instagram.com/p/DTGAZJQkg5k/" reel_link="https://www.instagram.com/reel/DTQygzxD6QC/" ``` -------------------------------- ### Trustpilot Reviews - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading Trustpilot Reviews data. ```python # Trustpilot Reviews - Trigger trustpilot_snapshot = await client.datasets.trustpilot_reviews.sample(records_limit=2) print(f"Trustpilot Reviews snapshot: {trustpilot_snapshot}") ``` ```python # Trustpilot Reviews - Download trustpilot_data = await client.datasets.trustpilot_reviews.download(trustpilot_snapshot) print(f"Trustpilot Reviews: {len(trustpilot_data)} records") trustpilot_data ``` -------------------------------- ### Pinterest Profiles - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading Pinterest Profiles data. ```python # Pinterest Profiles - Trigger pinterest_profiles_snapshot = await client.datasets.pinterest_profiles.sample(records_limit=2) print(f"Pinterest Profiles snapshot: {pinterest_profiles_snapshot}") ``` ```python # Pinterest Profiles - Download pinterest_profiles_data = await client.datasets.pinterest_profiles.download(pinterest_profiles_snapshot) print(f"Pinterest Profiles: {len(pinterest_profiles_data)} records") pinterest_profiles_data ``` -------------------------------- ### Get Geo-Targeted Connection URLs Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/06_browser_api.ipynb Generates CDP WebSocket URLs for connecting to cloud Chrome browsers routed through specific countries. ```python countries = ["us", "gb", "de", "jp"] print("Geo-targeted connection URLs:\n") for country in countries: url = client.browser.get_connect_url(country=country) display_url = url.replace(BROWSER_PASS, "****") print(f" {country.upper()}: {display_url}") ``` -------------------------------- ### Get Basic Connection URL Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/06_browser_api.ipynb Generates the basic CDP WebSocket URL for connecting to a cloud Chrome browser without geo-targeting. ```python # Basic URL (no geo-targeting) url = client.browser.get_connect_url() # Mask password in output for notebook demo only display_url = url.replace(BROWSER_PASS, "****") print(f"Connection URL: {display_url}") print(f"\nProtocol: wss://") print(f"Host: brd.superproxy.io") print(f"Port: 9222") ``` -------------------------------- ### Troubleshooting: Sync Client in Async Context Source: https://github.com/brightdata/sdk-python/blob/main/README.md Illustrates the incorrect usage of SyncBrightDataClient within an async function and the correct approach using the async client. ```python # Wrong - using sync client in async function async def main(): with SyncBrightDataClient() as client: # Error! ... # Correct - use async client async def main(): async with BrightDataClient() as client: result = await client.scrape_url("https://example.com") ``` -------------------------------- ### Manual Job Control Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/amazon.ipynb This Python code snippet shows how to manually trigger an Amazon product scrape, poll its status, and fetch the results using the BrightData SDK. ```python import asyncio PRODUCT_URL = "https://www.amazon.com/dp/B0CRMZHDG8" print(f"Manual workflow for: {PRODUCT_URL}\n") async with client: # Step 1: Trigger the scrape print("Step 1: Triggering scrape...") job = await client.scrape.amazon.products_trigger(url=PRODUCT_URL) print(f" Job ID: {job.snapshot_id}") # Step 2: Poll for status print("\nStep 2: Polling for status...") while True: status = await job.status() print(f" Status: {status}") if status == "ready": break elif status in ("error", "failed"): print(" Job failed!") break await asyncio.sleep(5) # Step 3: Fetch results if status == "ready": print("\nStep 3: Fetching results...") data = await job.fetch() if data: item = data[0] if isinstance(data, list) else data print(f" Title: {item.get('title', 'N/A')}") print(f" Price: {item.get('final_price', 'N/A')}") print(f" Rating: {item.get('rating', 'N/A')}") ``` -------------------------------- ### Trigger-Then-Poll Pattern for Batch Scraping Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/02_pandas_integration.ipynb An example of the trigger-then-poll pattern for efficient batch scraping, where jobs are triggered first and then polled in parallel. ```python # Trigger all jobs first (fast) jobs = {} for url in urls: job = await client.scrape.amazon.products_trigger(url=url) jobs[url] = job # Then poll in parallel (efficient) results = await asyncio.gather(*[poll_job(url, job) for url, job in jobs.items()]) ``` -------------------------------- ### Import necessary libraries Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/03_serp.ipynb Import the BrightData API client and the SERP API. ```python from brightdata import BrightData from brightdata.serp import SERP ``` -------------------------------- ### Get Dataset Metadata from API Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/amazon/amazon.ipynb Fetches and displays dataset metadata, including ID, total fields, and a sample of fields with their types and descriptions, from the API. ```python print("Fetching Amazon Products metadata from API...\n") async with client: metadata = await client.datasets.amazon_products.get_metadata() print(f"Dataset ID: {metadata.id}") print(f"Total fields from API: {len(metadata.fields)}") print("\n=== Sample Fields ===") for i, (name, field) in enumerate(list(metadata.fields.items())[:10]): print(f" {name}: {field.type} - {field.description or 'N/A'}") ``` -------------------------------- ### Combined Filter (AND/OR) - Create Filter Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/linkedin/linkedin.ipynb Creates a filter for LinkedIn profiles using combined AND/OR operators. This example filters for US-based profiles with more than 5000 followers. ```python # Step 1: Create filter COMBINED_FILTER = { "operator": "and", "filters": [ {"name": "country_code", "operator": "=", "value": "US"}, {"name": "followers", "operator": ">", "value": 5000} ] } print("Filter: US-based profiles with 5000+ followers") print(f"Records limit: 5\n") async with client: snapshot_id = await client.datasets.linkedin_profiles( filter=COMBINED_FILTER, records_limit=5 ) print(f"Snapshot created: {snapshot_id}") ``` -------------------------------- ### Test 5: Geo-Targeted Scrape Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/06_browser_api.ipynb This example shows how to connect through a specific country (US in this case) and scrape content from a website that might vary based on location. ```python TARGET_URL = "https://www.whatismyip.com/" url = client.browser.get_connect_url(country="us") print(f"Connecting through US proxy...") print(f"Target: {TARGET_URL}\n") async with async_playwright() as pw: browser = await pw.chromium.connect_over_cdp(url) context = browser.contexts[0] if browser.contexts else await pw.new_context() page = context.pages[0] if context.pages else await context.new_page() await page.goto(TARGET_URL, wait_until="domcontentloaded") await page.wait_for_timeout(3000) # Wait for JS to render title = await page.title() content = await page.text_content("body") await browser.close() print(f"Page title: {title}") print(f"\nBody text (first 500 chars):\n{content[:500] if content else 'N/A'}") ``` -------------------------------- ### Zillow Properties - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading Zillow Properties data. ```python # Zillow Properties - Trigger zillow_snapshot = await client.datasets.zillow_properties.sample(records_limit=2) print(f"Zillow Properties snapshot: {zillow_snapshot}") ``` ```python # Zillow Properties - Download zillow_data = await client.datasets.zillow_properties.download(zillow_snapshot) print(f"Zillow Properties: {len(zillow_data)} records") zillow_data ``` -------------------------------- ### Asos Products - Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Downloads Asos product data and prints the number of records. ```python # Asos Products - Download asos_data = await client.datasets.asos_products.download(asos_snapshot) print(f"Asos Products: {len(asos_data)} records") asos_data ``` -------------------------------- ### BrightData Client Initialization Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/06_browser_api.ipynb Initializes the BrightData client, which automatically loads credentials from environment variables. ```python from brightdata import BrightDataClient # Credentials load automatically from env vars client = BrightDataClient() print("Client initialized") print(f"Browser API ready: {client.browser is not None}") ``` -------------------------------- ### Mediamarkt Products - Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Downloads Mediamarkt product data and prints the number of records. ```python mediamarkt_data = await client.datasets.mediamarkt_products.download(mediamarkt_snapshot) print(f"Mediamarkt Products: {len(mediamarkt_data)} records") mediamarkt_data ``` -------------------------------- ### Get Dataset Metadata from API Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/linkedin/linkedin.ipynb Fetches live metadata for the LinkedIn Profiles dataset from the API to display current field schema, including dataset ID, total fields, and details of sample fields. ```python print("Fetching LinkedIn Profiles metadata from API...\n") async with client: metadata = await client.datasets.linkedin_profiles.get_metadata() print(f"Dataset ID: {metadata.id}") print(f"Total fields from API: {len(metadata.fields)}") print("\n=== Sample Fields ===") for i, (name, field) in enumerate(list(metadata.fields.items())[:10]): print(f" {name}:") print(f" type: {field.type}") print(f" active: {field.active}") print(f" description: {field.description or 'N/A'}") ``` -------------------------------- ### Sync Client - Basic Usage Source: https://github.com/brightdata/sdk-python/blob/main/README.md Use the SyncBrightDataClient for simpler use cases, including scraping a URL. ```python from brightdata import SyncBrightDataClient with SyncBrightDataClient() as client: result = client.scrape_url("https://example.com") print(result.data) # All methods work the same result = client.scrape.amazon.products(url="https://amazon.com/dp/B123") result = client.search.google(query="python") ``` -------------------------------- ### Example Instagram Post Data Structure Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/instagram.ipynb This snippet shows the actual data structure returned for an Instagram post, including details like user information, post content (image URL, alt text), and timestamps. ```python { 'url': 'https://www.instagram.com/p/DTGAZJQkg5k/', 'is_verified': True, 'is_paid_partnership': False, 'partnership_details': None, 'user_posted_id': '1315934698', 'post_content': [ { 'index': 0, 'type': 'Photo', 'url': 'https://scontent-bos5-1.cdninstagram.com/v/t51.2885-15/610176878_18558585163030699_8176807390950763652_n.jpg?stp=dst-jpg_e15_fr_p1080x1080_tt6&_nc_ht=scontent-bos5-1.cdninstagram.com&_nc_cat=107&_nc_oc=Q6cZ2QEkh4dVLXfb9wB7SW8tba5SXhPxPoYVePY5t47wEhrB5SC7fSnvYIs9cFBvJETQLtA&_nc_ohc=m4clcQGltTAQ7kNvwGtd33G&_nc_gid=d0unG-mnNeH_KJhc3ChoZQ&edm=ANTKIIoBAAAA&ccb=7-5&oh=00_Afqd-Dygr6WTcuJGtH7X5axfpWO_O_XJAXRk6HBg7Kb-VA&oe=696E87CF&_nc_sid=d885a2', 'id': '3802728663289564772', 'alt_text': "Photo by Harry Potter on January 04, 2026. May be a meme of text that says 'failing my new year's resolutions already 2025 2025me me 2026 2026me me'." } ], 'audio': None, 'profile_url': 'https://www.instagram.com/harrypotter', 'videos_duration': None, 'images': [], 'alt_text': "Photo by Harry Potter on January 04, 2026. May be a meme of text that says 'failing my new year's resolutions already 2025 2025me me 2026 2026me me'.", 'photos_number': 0, 'timestamp': '2026-01-15T09:57:02.671Z', 'input': { 'url': 'https://www.instagram.com/p/DTGAZJQkg5k/' } } ``` -------------------------------- ### Search with Intent Example Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/07_discover_api.ipynb This code snippet shows how to use the `intent` parameter to search for 'Tesla battery technology' with the specific intent of finding 'recent breakthroughs in EV battery chemistry'. It then prints the results, ranked by relevance to the intent. ```python QUERY = "Tesla battery technology" INTENT = "recent breakthroughs in EV battery chemistry" print(f"Query: '{QUERY}'") print(f"Intent: '{INTENT}'\n") async with client: result = await client.discover( query=QUERY, intent=INTENT ) print(f"Success: {result.success}") print(f"Total Results: {result.total_results}") if result.success and result.data: print("\n--- Results ranked by relevance to intent ---") for i, item in enumerate(result.data[:10]): score = item.get('relevance_score', 0) title = item.get('title', 'N/A') link = item.get('link', 'N/A') print(f"\n{i+1}. [{score:.4f}] {title}") print(f" {link}") else: print(f"\nError: {result.error}") ``` -------------------------------- ### Release Test Summary Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/test_v2.1.0_release.ipynb Prints a summary of the Bright Data SDK v2.1.0 release verification tests. ```python print("="*50) print("Bright Data SDK v2.1.0 Release Test Summary") print("="*50) print(f"Version: {brightdata.__version__}") print("Imports: OK") print("Client structure: OK") print("Async mode parameter: OK") print("AsyncUnblockerClient: OK") if TOKEN: print("Live tests: Completed (check results above)") else: print("Live tests: Skipped (no token)") print("="*50) print("Release verification complete!") ``` -------------------------------- ### Client Initialization and Scraper Verification Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/instagram.ipynb Initializes the BrightData client and verifies the accessibility and methods of the Instagram scraper modules. ```python from brightdata import BrightDataClient # Verify we're using local version import brightdata print(f"brightdata module location: {brightdata.__file__}") # Initialize client client = BrightDataClient(token=API_TOKEN) # Verify Instagram scraper is accessible print(f"\nInstagramScraper: {type(client.scrape.instagram).__name__}") print(f"InstagramSearchScraper: {type(client.search.instagram).__name__}") # Check for new methods (profiles discovery, reels_all) print("\nSearch scraper methods:") print([m for m in dir(client.search.instagram) if not m.startswith('_') and callable(getattr(client.search.instagram, m))]) ``` -------------------------------- ### Mango Products - Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Downloads Mango product data and prints the number of records. ```python # Mango Products - Download mango_data = await client.datasets.mango_products.download(mango_snapshot) print(f"Mango Products: {len(mango_data)} records") mango_data ``` -------------------------------- ### G2 Reviews - Trigger and Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Demonstrates triggering a sample and downloading G2 Reviews data. ```python # G2 Reviews - Trigger g2_reviews_snapshot = await client.datasets.g2_reviews.sample(records_limit=2) print(f"G2 Reviews snapshot: {g2_reviews_snapshot}") ``` ```python # G2 Reviews - Download g2_reviews_data = await client.datasets.g2_reviews.download(g2_reviews_snapshot) print(f"G2 Reviews: {len(g2_reviews_data)} records") g2_reviews_data ``` -------------------------------- ### Client Initialization and Scraper Verification Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/web_scrapers/tiktok.ipynb This snippet initializes the BrightData client, verifies that the local SDK version is being used, and checks the accessibility and methods of the TikTok scraper and search scraper. ```python from brightdata import BrightDataClient # Verify we're using local version import brightdata print(f"brightdata module location: {brightdata.__file__}") # Initialize client client = BrightDataClient(token=API_TOKEN) # Verify TikTok scraper is accessible print(f"\nTikTokScraper: {type(client.scrape.tiktok).__name__}") print(f"TikTokSearchScraper: {type(client.search.tiktok).__name__}") # Check for scraper methods print("\nScraper methods (URL-based):") print([m for m in dir(client.scrape.tiktok) if not m.startswith('_') and callable(getattr(client.scrape.tiktok, m))]) print("\nSearch scraper methods (Discovery):") print([m for m in dir(client.search.tiktok) if not m.startswith('_') and callable(getattr(client.search.tiktok, m))]) ``` -------------------------------- ### BrightData Client Initialization Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/amazon/amazon.ipynb Initializes the BrightDataClient with the API token. ```python from brightdata import BrightDataClient client = BrightDataClient(token=API_TOKEN) print("Client initialized") ``` -------------------------------- ### Montblanc Products - Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Downloads Montblanc products data and prints the number of records. ```python # Montblanc Products - Download montblanc_data = await client.datasets.montblanc_products.download(montblanc_snapshot) print(f"Montblanc Products: {len(montblanc_data)} records") montblanc_data ``` -------------------------------- ### Ikea Products - Download Source: https://github.com/brightdata/sdk-python/blob/main/notebooks/datasets/mass_test.ipynb Downloads Ikea product data and prints the number of records. ```python # Ikea Products - Download ikea_data = await client.datasets.ikea_products.download(ikea_snapshot) print(f"Ikea Products: {len(ikea_data)} records") ikea_data ```