### Install project dependencies

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/README.md

Run this command to install all required dependencies for the project.

```bash
$ yarn
```

--------------------------------

### Install Botasaurus Environment on VM

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Initializes the VM environment by running the Botasaurus installation script.

```bash
curl -sL https://raw.githubusercontent.com/omkarcloud/botasaurus/master/vm-scripts/install-bota.sh | bash
```

--------------------------------

### Define S3 Debian Installer URL

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-aws.md

Example URL format for a Debian installer hosted on an S3 bucket.

```text
https://your-bucket.s3.amazonaws.com/Your-App-amd64.deb
```

--------------------------------

### Comprehensive API Configuration Example

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

A detailed example demonstrating multiple API configurations including enabling the API, setting port and base path, adding scraper aliases, and defining custom routes with middleware.

```typescript
import ApiConfig from 'botasaurus-server/api-config';
import { hotelsSearchScraper } from "../src/scrapers";

// Enable API functionality
ApiConfig.enableApi();

// Production configuration
ApiConfig.setApiPort(3000);
ApiConfig.setApiBasePath("/v1");

// Add scraper aliases for direct access
ApiConfig.addScraperAlias(hotelsSearchScraper, '/hotels/search');

// Add custom routes
ApiConfig.addCustomRoutes((server) => {
  // Health check for monitoring
  server.get('/health', (request, reply) => {
    return reply.send({ status: 'OK'});
  });

  // Authentication middleware  
  server.addHook('onRequest', (request, reply, done) => {
    // Check for secret
    const secret = request.headers['x-secret'] as string;

    if (secret === '49cb1de3-419b-4647-bf06-22c9e1110313') {
      // Valid secret, proceed
      return done(); 
    } else {
      return reply.status(401).send({
        message: 'Unauthorized: Invalid secret.',
      });
    }
  });
});
```

--------------------------------

### Install Node.js Packages

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/quick-start.md

Install all necessary npm packages for the project.

```bash
npm install
```

--------------------------------

### Start local development server

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/README.md

Launches a local server with live reloading for development purposes.

```bash
$ yarn start
```

--------------------------------

### Install Scraper Dependencies

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Commands to install project requirements and initialize the environment.

```bash
python -m pip install -r requirements.txt
python run.py install
```

--------------------------------

### Install Desktop Application

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-aws.md

Installs a desktop application on the EC2 instance using a Debian installer URL.

```bash
python3 -m bota install-desktop-app --debian-installer-url https://yahoo-finance-extractor.s3.us-east-1.amazonaws.com/Yahoo+Finance+Extractor-amd64.deb
```

--------------------------------

### Install pg-cache-storage

Source: https://github.com/omkarcloud/botasaurus/blob/master/pg-cache-storage/README.md

Install the library using pip. This is the first step before using the cache storage.

```bash
pip install pg-cache-storage
```

--------------------------------

### Install UI Scraper

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Deploy the scraper from a repository URL to the VM.

```bash
python3 -m bota install-ui-scraper --repo-url https://github.com/omkarcloud/botasaurus-starter
```

--------------------------------

### Install Desktop Application

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-google-cloud.md

Installs the desktop application using a Debian installer URL. Supports optional configuration flags for port and base path.

```bash
python3 -m bota install-desktop-app --debian-installer-url https://yahoo-finance-extractor.s3.us-east-1.amazonaws.com/Yahoo+Finance+Extractor-amd64.deb
```

```sh
python3 -m bota install-desktop-app \
  --debian-installer-url https://amazon-invoice-extractor.s3.us-east-1.amazonaws.com/Amazon+Invoice+Extractor-amd64.deb \
  // highlight-next-line
  --port 8001 \
  // highlight-next-line
  --api-base-path /amazon-invoices
```

--------------------------------

### Initialize Botasaurus and Create Static IP

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Install the bota package and generate a static IP address for the virtual machine.

```bash
python -m pip install bota
python -m bota create-ip
```

--------------------------------

### Example Usage of Botasaurus API Client

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

A basic example demonstrating how to import the Api class and create an instance of the Botasaurus API client.

```python
from botasaurus_api import Api

# Create an instance of the API client
api = Api()
```

--------------------------------

### Install Botasaurus Humancursor

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_humancursor/README.md

Use pip to install the library in your Python environment.

```bash
pip install botasaurus-humancursor
```

--------------------------------

### Launch Botasaurus Desktop App

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/quick-start.md

Start the development server to launch the desktop application.

```bash
npm run dev
```

--------------------------------

### Install Botasaurus API Client

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

Install the Botasaurus API client using pip. This command upgrades the package if it's already installed.

```bash
python -m pip install --upgrade botasaurus-api
```

--------------------------------

### Scrape Product Heading with Botasaurus

Source: https://github.com/omkarcloud/botasaurus/blob/master/advanced.md

This example demonstrates how to use Botasaurus to navigate to a specific URL, extract text from an element, and save a screenshot. It utilizes the `@browser` decorator for easy setup.

```python
from botasaurus.browser import browser, Driver

@browser
def scrape_heading_task(driver: Driver, data):
    driver.google_get("https://www.g2.com/products/jenkins/reviews?page=5", bypass_cloudflare=True)
    heading = driver.get_text('.product-head__title [itemprop="name"]')
    driver.save_screenshot()
    return heading

scrape_heading_task()
```

--------------------------------

### Install Scraper Repository

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Clones and installs a specific scraper repository onto the VM.

```bash
python3 -m bota install-scraper --repo-url https://github.com/omkarcloud/botasaurus-starter
```

--------------------------------

### Initialize and Run Botasaurus Project

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/getting-started.md

Commands to navigate to the project directory, install requirements, and execute the main script.

```bash
cd my-botasaurus-project
python -m pip install -r requirements.txt
code . # Optionally, open the project in VSCode
python main.py
```

--------------------------------

### Install Botasaurus CLI and Create Static IP

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-google-cloud.md

Installs the Botasaurus CLI and creates a static IP address for your VM. You will be prompted for a name and region.

```bash
python -m pip install bota --upgrade 
python -m bota create-ip # Create a static IP address for your VM
```

--------------------------------

### Install Multiple Desktop APIs

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-aws.md

Configures an additional application on the same instance using unique ports and API base paths to avoid conflicts.

```sh
python3 -m bota install-desktop-app \
  --debian-installer-url https://amazon-invoice-extractor.s3.us-east-1.amazonaws.com/Amazon+Invoice+Extractor-amd64.deb \
  // highlight-next-line
  --port 8001 \
  // highlight-next-line
  --api-base-path /amazon-invoices
```

--------------------------------

### Run the Scraper

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Command to start the scraper application.

```bash
python run.py
```

--------------------------------

### Create Kubernetes Cluster with Bota

Source: https://github.com/omkarcloud/botasaurus/blob/master/run-scraper-in-kubernetes.md

Installs the bota package and initializes a new Kubernetes cluster in Google Cloud.

```bash
python -m pip install bota
python -m bota create-cluster
```

--------------------------------

### Install Botasaurus CLI and Apache

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-aws.md

Executes a script to install the Botasaurus CLI and Apache web server on an EC2 instance.

```bash
curl -sL https://raw.githubusercontent.com/omkarcloud/botasaurus/master/vm-scripts/install-bota-desktop.sh | bash
```

--------------------------------

### Package Application for Current OS

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/packaging-publishing.md

Executes the build process to generate an installer for the host operating system.

```bash
npm run package
```

--------------------------------

### Install SQLite Cache Storage

Source: https://github.com/omkarcloud/botasaurus/blob/master/sqlite-cache-storage/README.md

Install the package via pip to enable SQLite caching.

```bash
pip install sqlite-cache-storage
```

--------------------------------

### Install Botasaurus

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/what-is-botasaurus.md

Install the Botasaurus package using pip. Ensure you have the latest version for all features.

```shell
python -m pip install --upgrade botasaurus
```

--------------------------------

### Install Certbot for Apache

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-domain-and-ssl.md

Installs the Certbot package and the Apache plugin on Debian-based systems.

```bash
sudo apt install certbot python3-certbot-apache -y
```

--------------------------------

### Install Chrome and Botasaurus in Google Colab

Source: https://github.com/omkarcloud/botasaurus/blob/master/advanced.md

Run these commands in a Google Colab notebook to install Google Chrome and the Botasaurus library. Ensure all dependencies are met before proceeding.

```python
! apt-get update
! wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
! apt-get install -y lsof wget gnupg2 apt-transport-https ca-certificates software-properties-common adwaita-icon-theme alsa-topology-conf alsa-ucm-conf at-spi2-core dbus-user-session dconf-gsettings-backend dconf-service fontconfig fonts-liberation glib-networking glib-networking-common glib-networking-services gsettings-desktop-schemas gtk-update-icon-cache hicolor-icon-theme libasound2 libasound2-data libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libauthen-sasl-perl libavahi-client3 libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libclone-perl libcolord2 libcups2 libdata-dump-perl libdatrie1 libdconf1 libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libencode-locale-perl libepoxy0 libfile-basedir-perl libfile-desktopentry-perl libfile-listing-perl libfile-mimeinfo-perl libfont-afm-perl libfontenc1 libgbm1 libgdk-pixbuf-2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libgraphite2-3 libgtk-3-0 libgtk-3-bin libgtk-3-common libharfbuzz0b libhtml-form-perl libhtml-format-perl libhtml-parser-perl libhtml-tagset-perl libhtml-tree-perl libhttp-cookies-perl libhttp-daemon-perl libhttp-date-perl libhttp-message-perl libhttp-negotiate-perl libice6 libio-html-perl libio-socket-ssl-perl libio-stringy-perl libipc-system-simple-perl libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libllvm11 liblwp-mediatypes-perl liblwp-protocol-https-perl libmailtools-perl libnet-dbus-perl libnet-http-perl libnet-smtp-ssl-perl libnet-ssleay-perl libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpciaccess0 libpixman-1-0 libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsensors-config libsensors5 libsm6 libsoup-gnome2.4-1 libsoup2.4-1 libtext-iconv-perl libthai-data libthai0 libtie-ixhash-perl libtimedate-perl libtry-tiny-perl libu2f-udev liburi-perl libvte-2.91-0 libvte-2.91-common libvulkan1 libwayland-client0 libwayland-cursor0 libwayland-egl1 libwayland-server0 libwww-perl libwww-robotrules-perl libx11-protocol-perl libx11-xcb1 libxaw7 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-randr0 libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-sync1 libxcb-xfixes0 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxft2 libxi6 libxinerama1 libxkbcommon0 libxkbfile1 libxml-parser-perl libxml-twig-perl libxml-xpathengine-perl libxmu6 libxmuu1 libxrandr2 libxrender1 libxshmfence1 libxt6 libxtst6 libxv1 libxxf86dga1 libxxf86vm1 libz3-4 mesa-vulkan-drivers perl-openssl-defaults shared-mime-info termit x11-common x11-utils xdg-utils xvfb
! dpkg -i google-chrome-stable_current_amd64.deb
! python -m pip install botasaurus
```

--------------------------------

### Install PDF Parsing Package

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/your-first-extractors/amazon-pdf-invoice-extractor.md

Install the electron-pdf-parse package via npm to enable PDF reading capabilities in an Electron environment.

```bash
npm install electron-pdf-parse
```

--------------------------------

### Restart the Scraper VM

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Command to reboot the virtual machine after starting it from the Google Cloud Console.

```bash
shutdown -r now
```

--------------------------------

### Complex API Configuration Example

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

A comprehensive configuration demonstrating multiple Botasaurus API settings including enabling the API, setting port and base path, adding scraper aliases, and defining custom routes with middleware.

```APIDOC
## Complex API Configuration

### Description
This example shows how to configure the Botasaurus API with various options, including enabling the API, setting the port and base path, registering custom scraper aliases, and adding custom routes with authentication middleware.

### Configuration Steps
1. **Enable API**: `ApiConfig.enableApi();`
2. **Set API Port**: `ApiConfig.setApiPort(3000);`
3. **Set API Base Path**: `ApiConfig.setApiBasePath("/v1");`
4. **Add Scraper Alias**: `ApiConfig.addScraperAlias(hotelsSearchScraper, '/hotels/search');`
5. **Add Custom Routes**: Use `ApiConfig.addCustomRoutes()` to add endpoints like health checks and middleware.

### Example Code
```ts title="src/scraper/backend/server.ts"
import ApiConfig from 'botasaurus-server/api-config';
import { hotelsSearchScraper } from "../src/scrapers";

// Enable API functionality
ApiConfig.enableApi();

// Production configuration
ApiConfig.setApiPort(3000);
ApiConfig.setApiBasePath("/v1");

// Add scraper aliases for direct access
ApiConfig.addScraperAlias(hotelsSearchScraper, '/hotels/search');

// Add custom routes
ApiConfig.addCustomRoutes((server) => {
  // Health check for monitoring
  server.get('/health', (request, reply) => {
    return reply.send({ status: 'OK'});
  });

  // Authentication middleware  
  server.addHook('onRequest', (request, reply, done) => {
    // Check for secret
    const secret = request.headers['x-secret'] as string;

    if (secret === '49cb1de3-419b-4647-bf06-22c9e1110313') {
      // Valid secret, proceed
      return done(); 
    } else {
      return reply.status(401).send({
        message: 'Unauthorized: Invalid secret.',
      });
    }
  });
});
```

### Resulting Endpoints and Behavior:
- The API runs on port `3000`.
- All routes are prefixed with `/v1`.
- Hotel search is available at `GET /v1/hotels/search`.
- Health check is available at `GET /v1/health`.
- All requests require authentication via the `x-secret` header.
```

--------------------------------

### Dynamically Configure Browser Profile and Proxy with Functions

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Use functions to extract configuration values from data parameters and pass them to the `@browser` decorator for dynamic browser setup. This is useful when each data item requires a different profile or proxy.

```python
from botasaurus.browser import browser, Driver

def get_profile(data):
    return data["profile"]

def get_proxy(data):
    return data["proxy"]

@browser(profile=get_profile, proxy=get_proxy)
def scrape_heading_task(driver: Driver, data):
    profile, proxy = driver.config.profile, driver.config.proxy
    print(profile, proxy)
    return profile, proxy

data = [
    {"profile": "pikachu", "proxy": "http://142.250.77.228:8000"},
    {"profile": "greyninja", "proxy": "http://142.250.77.229:8000"},
]

scrape_heading_task(data)
```

--------------------------------

### Accessing Botasaurus Storage

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/understanding-app-features.md

Demonstrates how to import and use the Botasaurus storage utility to set and get item values. This is useful for persisting user settings.

```javascript
import { getBotasaurusStorage } from 'botasaurus/botasaurus-storage';

const storage = getBotasaurusStorage();

storage.setItem('userId', 10);
const userId = storage.getItem('userId');
```

--------------------------------

### Configure Request Decorator with Proxy

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

This example shows how to configure a proxy for the Request Decorator. The decorator enhances requests with browser-like headers and connections.

```python
from botasaurus.request import request

@request(
    proxy="http://username:password@proxy-provider-domain:port"
)
```

--------------------------------

### Configure Authenticated Proxies with Selenium Wire

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Setup authenticated proxies using selenium-wire. Note that this method may be susceptible to bot detection.

```bash
python -m pip install selenium_wire
```

```python
from seleniumwire import webdriver  # Import from seleniumwire

# Define the proxy
proxy_options = {
    'proxy': {
        'http': 'http://username:password@proxy-provider-domain:port', # TODO: Replace with your own proxy
        'https': 'http://username:password@proxy-provider-domain:port', # TODO: Replace with your own proxy
    }
}

# Install and set up the driver
driver = webdriver.Chrome(seleniumwire_options=proxy_options)

# Visit the desired URL
link = 'https://fingerprint.com/products/bot-detection/'
driver.get("https://www.google.com/")
driver.execute_script(f'window.location.href = "{link}"')

# Prompt for user input
input("Press Enter to exit...")

# Clean up
driver.quit()
```

--------------------------------

### Field Definition Example

Source: https://github.com/omkarcloud/botasaurus/blob/master/advanced.md

Illustrates the basic usage of the `Field` class for displaying a single data field. It shows how to alias the output key.

```python
# value is the reviews_per_rating dictionary

```

--------------------------------

### Enable API in Botasaurus Server

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Enable the API functionality by calling `ApiConfig.enableApi()` in your `src/scraper/backend/api-config.ts` file. This starts an API server at `http://localhost:8000` by default.

```typescript
import ApiConfig from "botasaurus-server/api-config";

// Enable the API
ApiConfig.enableApi();
```

--------------------------------

### GET /hotels/search (Example Scraper Alias)

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

This endpoint allows direct GET requests to execute the `hotelsSearchScraper` immediately, bypassing task management overhead. It validates input data and respects rate limits.

```APIDOC
## GET /hotels/search

### Description
This endpoint allows direct GET requests to execute the `hotelsSearchScraper` immediately, bypassing task management overhead. It validates input data and respects rate limits.

### Method
GET

### Endpoint
`/hotels/search`

### Parameters

#### Query Parameters
- **(type)** - Required/Optional - Description of parameters the `hotelsSearchScraper` expects.

### Request Example
```
GET /hotels/search?param1=value1&param2=value2
```

### Response
#### Success Response (200)
- **(type)** - Description of the data returned by `hotelsSearchScraper`.

#### Response Example
```json
{
  "example": "response data from hotelsSearchScraper"
}
```
```

--------------------------------

### Manage All Profiles with Profiles Utility

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Utilize the `Profiles` utility to manage all browser profiles persistently. This includes setting, getting, and deleting profiles, with data stored in `profiles.json`.

```python
from botasaurus.profiles import Profiles

# Set profiles
Profiles.set_profile('amit', {'name': 'Amit Sharma', 'age': 30})
Profiles.set_profile('rahul', {'name': 'Rahul Verma', 'age': 30})

# Get a profile
profile = Profiles.get_profile('amit')
print(profile)  # Output: {'name': 'Amit Sharma', 'age': 30}

# Get all profiles
all_profiles = Profiles.get_profiles()
print(all_profiles)  # Output: [{'name': 'Amit Sharma', 'age': 30}, {'name': 'Rahul Verma', 'age': 30}]

# Get all profiles in random order
random_profiles = Profiles.get_profiles(random=True)
print(random_profiles)  # Output: [{'name': 'Rahul Verma', 'age': 30}, {'name': 'Amit Sharma', 'age': 30}] in random order

# Delete a profile
Profiles.delete_profile('amit')
```

--------------------------------

### Manage Cache with Botasaurus Cache Module

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/what-is-botasaurus.md

Use Botasaurus's Cache Module to manage cached data efficiently. This example demonstrates putting, checking, getting, removing, and clearing cached data for a scraping function.

```python
from botasaurus import *
from botasaurus.cache import Cache

# Example scraping function
@request
def scrape_data(data):
    # Your scraping logic here
    return {"processed": data}

# Sample data for scraping
input_data = {"key": "value"}

# Adding data to the cache
Cache.put(scrape_data, input_data, scrape_data(input_data))
# Checking if data is in the cache
if Cache.has(scrape_data, input_data):
    # Retrieving data from the cache
    cached_data = Cache.get(scrape_data, input_data)
# Removing specific data from the cache
Cache.remove(scrape_data, input_data)
# Clearing the complete cache for the scrape_data function
Cache.clear(scrape_data)
```

--------------------------------

### Clone Botasaurus Desktop Starter Project

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/quick-start.md

Clone the starter project repository and navigate into the new directory.

```bash
git clone https://github.com/omkarcloud/botasaurus-desktop-starter my-botasaurus-app
cd my-botasaurus-app
```

--------------------------------

### Configure PostgreSQL Instance Settings

Source: https://github.com/omkarcloud/botasaurus/blob/master/run-postgres-cloud-sql-instance.md

Settings for creating a cost-effective PostgreSQL instance on Google Cloud for testing purposes.

```text
Instance ID: pikachu # Choose any name for your instance.
Password: pikachu # For testing purposes, we're using a simple password "pikachu". In a production environment, use a strong password.
Choose a Cloud SQL edition: Enterprise # Opt for the Enterprise edition as it is cheaper.
Preset for this edition: Sandbox # Select the Sandbox preset as it is also cheaper.
Region: us-central1 # For testing, we're using the default region. In production, select the region that is closest to your server for best performance.
Machine shapes: Shared Core/1 vCPU, 0.614 GB # Choose the cheapest instance as we don't need a high-end machine for storing web scraping data.
Storage Capacity: 10 GB
Enable automatic storage increases: Yes # Enable this feature so you don't have to worry about running out of storage. (Awesome feature!)
Enable deletion protection: No # Disable this feature, otherwise you'll need to change this setting later to delete the instance.
```

--------------------------------

### Initialize Botasaurus Project

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Create a new directory for your project and open it in your code editor.

```shell
mkdir my-botasaurus-project
cd my-botasaurus-project
code .  # This will open the project in VSCode if you have it installed
```

--------------------------------

### Clone Botasaurus Starter Template

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/getting-started.md

Use this command to download the official starter template repository.

```bash
git clone https://github.com/omkarcloud/botasaurus-starter my-botasaurus-project
```

--------------------------------

### Configure VM Deployment Settings

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Recommended configuration parameters for the Google Cloud Click to Deploy interface.

```text
Zone: us-central1-a # Use the zone from the region you selected in the previous step.
Series: N1
Machine Type: n1-standard-2 (2 vCPU, 1 core, 7.5 GB memory)
Boot Disk Type: Standard persistent disk	# This is the most cost-effective disk option.
Boot disk size in GB: 20 GB # Adjust based on storage needs  
Network Interface [External IP]: pikachu-ip # Use the IP name you created in the previous step.
```

--------------------------------

### Disable API Autostart

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Prevent the API server from starting automatically on application launch by calling `ApiConfig.disableApiAutostart()`. Users will need to manually start the API from the desktop GUI.

```typescript
// API will not run until manually started from the desktop GUI
ApiConfig.disableApiAutostart();
```

--------------------------------

### Create a Simple 'Overview' View

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/enhancing-scrapers/custom-views.md

Defines a 'Overview' view with 'name' and 'price' fields and registers it with a scraper. Ensure 'yourScraper' is imported correctly.

```typescript
import { Server } from 'botasaurus-server/server';
import { View, Field } from 'botasaurus-server/ui';
import { yourScraper } from '../src/yourScraper';   // your scraper function

/* 1. Define the view */
// highlight-start
const overviewView = new View('Overview', [
  new Field('name'),
  new Field('price'),
]);
// highlight-end
/* 2. Register scraper + view */
Server.addScraper(yourScraper, { views: [overviewView] });
```

--------------------------------

### Botasaurus API Client Initialization

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

Demonstrates how to import and initialize the Botasaurus API client, with options for specifying the API URL and controlling response file creation.

```APIDOC
## Botasaurus API Client Initialization

### Description
Initialize the Botasaurus API client. You can optionally specify the `api_url` and `create_response_files`.

### Method
`Api(api_url='http://localhost:8000', create_response_files=True)`

### Parameters
#### Optional Parameters
- **api_url** (string) - The base URL for the API server. Defaults to `http://localhost:8000`.
- **create_response_files** (boolean) - Whether to create response JSON files for debugging. Defaults to `True`.

### Request Example
```python
from botasaurus_api import Api

# Default initialization
api = Api()

# With custom API URL
api_custom_url = Api('https://example.com/')

# Disable response file creation
api_no_files = Api(create_response_files=False)
```
```

--------------------------------

### Fetch Task Results

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

Get the results associated with a specific task ID.

```python
results = api.get_task_results(task['id'])
```

--------------------------------

### Solve CAPTCHAs with Capsolver

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Install and integrate the capsolver extension to handle CAPTCHAs automatically.

```bash
python -m pip install capsolver_extension_python
```

```python
from botasaurus.browser import browser, Driver
from capsolver_extension_python import Capsolver

# Replace "CAP-MY_KEY" with your actual CapSolver API key
@browser(extensions=[Capsolver(api_key="CAP-MY_KEY")])  
def solve_captcha(driver: Driver, data):
    driver.get("https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php")
    driver.prompt()

solve_captcha()
```

--------------------------------

### Run Botasaurus in Docker

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/what-is-botasaurus.md

Commands to clone the starter template and launch the project using Docker Compose.

```bash
git clone https://github.com/omkarcloud/botasaurus-starter my-botasaurus-project
cd my-botasaurus-project
docker-compose build && docker-compose up
```

--------------------------------

### Retrieving Element Properties

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Methods for getting text content, attributes, and other properties of elements.

```APIDOC
## Retrieving Element Properties

### Description
Methods for extracting information from web elements, such as their text content or attribute values.

### Methods
- `driver.get_text(selector)`: Gets the text content of the element matching the selector.
- `driver.get_element_containing_text(text)`: Finds an element that contains the specified text.
- `element.get_attribute(attribute_name)`: Gets the value of a specified attribute for an element.

### Request Example
```python
# Example usage:
header_text = driver.get_text("h1")
error_message = driver.get_element_containing_text("Error: Invalid input")
image_url = driver.select("img.logo").get_attribute("src")
```
```

--------------------------------

### Define Scraper Task Data

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/enhancing-scrapers/custom-views.md

Example structure of product data returned by a scraper task.

```ts
const taskScraper = task({
  name: "taskScraper",
  run: () => {
    // highlight-start
    return [
      {
        id: 1,
        name: "T-Shirt",
        price: 16, // in US Dollar
        reviews: 1000,
        reviews_per_rating: {
          1: 0,
          2: 0,
          3: 0,
          4: 100,
          5: 900,
        },
        featured_reviews: [
          {
            id: 1,
            rating: 5,
            content: "Awesome t-shirt!",
          },
          {
            id: 2,
            rating: 5,
            content: "Amazing t-shirt!",
          },
        ],
      },
      {
        id: 2,
        name: "Laptop",
        price: 700,
        reviews: 500,
        reviews_per_rating: {
          1: 0,
          2: 0,
          3: 0,
          4: 100,
          5: 400,
        },
        featured_reviews: [
          {
            id: 1,
            rating: 5,
            content: "Best laptop ever!",
          },
          {
            id: 2,
            rating: 5,
            content: "Great laptop!",
          },
        ],
      },
    ];
    // highlight-end
  },
})
```

--------------------------------

### Configure Supabase project settings

Source: https://github.com/omkarcloud/botasaurus/blob/master/run-supabase-postgres-instance.md

YAML configuration settings for initializing a new Supabase project.

```yaml
Name: Pikachu # Choose any name
Database Password: greyninja1234_A # For testing, use "greyninja1234_A". In production, use a strong password.
Region: West US (North California) # Select the region closest to your server for best performance.
```

--------------------------------

### Build and Run Botasaurus Scraper in Docker

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Commands to clone the Botasaurus Starter Template, build the Docker image, and run the scraper within a Docker environment.

```bash
git clone https://github.com/omkarcloud/botasaurus-starter my-botasaurus-project
cd my-botasaurus-project
docker-compose build
docker-compose up
```

--------------------------------

### Build static site

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/README.md

Compiles the project into static files located in the build directory for deployment.

```bash
$ yarn build
```

--------------------------------

### Configure Google Cloud VM Deployment

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Recommended hardware and disk configuration for cost-effective Botasaurus VM deployments.

```text
Zone: us-central1-a # Use us-central1 (Iowa) for the lowest-cost VMs
Series: N1
Machine Type: n1-standard-2 (2 vCPU, 1 core, 7.5 GB memory)
Boot Disk Type: Standard persistent disk	# This is the most cost-effective disk option.
Boot disk size in GB: 20 GB # Adjust based on storage needs
```

--------------------------------

### Upgrade Botasaurus Packages

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Run this command to update all Botasaurus packages to their latest versions. Ensure you have pip installed.

```bash
python -m pip install --upgrade bota botasaurus botasaurus-api botasaurus-requests botasaurus-driver botasaurus-proxy-authentication botasaurus-server botasaurus-humancursor
```

--------------------------------

### Load pages organically

Source: https://github.com/omkarcloud/botasaurus/blob/master/anti-detect-driver.md

Simulate a search engine referral by visiting Google before the target URL.

```python
driver.google_get("https://www.omkar.cloud/auth/sign-up/")
```

--------------------------------

### Uninstall Desktop Application

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-google-cloud.md

Removes the application using either the Debian installer URL or the package name defined in package.json.

```bash
python3 -m bota uninstall-desktop-app --debian-installer-url https://yahoo-finance-extractor.s3.us-east-1.amazonaws.com/Yahoo+Finance+Extractor-amd64.deb
```

```bash
python3 -m bota uninstall-desktop-app --package-name yahoo-finance-extractor
```

--------------------------------

### Create Botasaurus Project Directory

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/what-is-botasaurus.md

Set up a new directory for your Botasaurus project and navigate into it. The 'code .' command opens the directory in VSCode.

```shell
mkdir my-botasaurus-project
cd my-botasaurus-project
code .
```

--------------------------------

### Uninstall Desktop App via URL

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/deploying-on-aws.md

Removes a desktop application from the EC2 instance using its Debian installer URL.

```bash
python3 -m bota uninstall-desktop-app --debian-installer-url https://yahoo-finance-extractor.s3.us-east-1.amazonaws.com/Yahoo+Finance+Extractor-amd64.deb
```

--------------------------------

### Define a Simple Link Input Control

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

A basic example of defining a required link input control with a default value.

```javascript
/**
 * @typedef {import('../../frontend/node_modules/botasaurus-controls/dist/index').Controls} Controls
 */

/**
 * @param {Controls} controls
 */
function getInput(controls) {
    controls
        // Render a Link Input, which is required, defaults to "https://stackoverflow.blog/open-source". 
        .link('link', { isRequired: true, defaultValue: "https://stackoverflow.blog/open-source" })
}
```

--------------------------------

### Import and Initialize Api Class

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

Import the Api class from the botasaurus-api library and create an instance. The default API URL is http://localhost:8000.

```python
from botasaurus_api import Api

api = Api()
```

--------------------------------

### Initialize Api with Custom URL

Source: https://github.com/omkarcloud/botasaurus/blob/master/botasaurus_api/README.md

Create an instance of the Api class, specifying a custom base URL for the API server using the api_url parameter.

```python
api = Api('https://example.com/')
```

--------------------------------

### Manage Scraper Cache

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Perform basic cache operations like put, get, has, remove, and clear for scraping tasks.

```python
from botasaurus.task import task
from botasaurus.cache import Cache

# Example scraping function
@task
def scrape_data(data):
    # Your scraping logic here
    return {"processed": data}

# Sample data for scraping
input_data = {"key": "value"}

# Adding data to the cache
Cache.put('scrape_data', input_data, scrape_data(input_data))

# Checking if data is in the cache
if Cache.has('scrape_data', input_data):
    # Retrieving data from the cache
    cached_data = Cache.get('scrape_data', input_data)
    print(f"Cached data: {cached_data}")

# Removing specific data from the cache
Cache.remove('scrape_data', input_data)

# Clearing the complete cache for the scrape_data function
Cache.clear('scrape_data')
```

--------------------------------

### Define Complex Input Controls

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

A comprehensive example demonstrating multi-text, sections, switches, selects, and conditional logic for input controls.

```javascript
/**
 * @typedef {import('../../frontend/node_modules/botasaurus-controls/dist/index').Controls} Controls
 */


/**
 * @param {Controls} controls
 */
function getInput(controls) {
    controls
        .listOfTexts('queries', {
            defaultValue: ["Web Developers in Bangalore"],
            placeholder: "Web Developers in Bangalore",
            label: 'Search Queries',
            isRequired: true
        })
        .section("Email and Social Links Extraction", (section) => {
            section.text('api_key', {
                placeholder: "2e5d346ap4db8mce4fj7fc112s9h26s61e1192b6a526af51n9",
                label: 'Email and Social Links Extraction API Key',
                helpText: 'Enter your API key to extract email addresses and social media links.',
            })
        })
        .section("Reviews Extraction", (section) => {
            section
                .switch('enable_reviews_extraction', {
                    label: "Enable Reviews Extraction"
                })
                .numberGreaterThanOrEqualToZero('max_reviews', {
                    label: 'Max Reviews per Place (Leave empty to extract all reviews)',
                    placeholder: 20,
                    isShown: (data) => data['enable_reviews_extraction'], defaultValue: 20,
                })
                .choose('reviews_sort', {
                    label: "Sort Reviews By",
                    isRequired: true, isShown: (data) => data['enable_reviews_extraction'], defaultValue: 'newest', options: [{ value: 'newest', label: 'Newest' }, { value: 'most_relevant', label: 'Most Relevant' }, { value: 'highest_rating', label: 'Highest Rating' }, { value: 'lowest_rating', label: 'Lowest Rating' }]
                })
        })
        .section("Language and Max Results", (section) => {
            section
                .addLangSelect()
                .numberGreaterThanOrEqualToOne('max_results', {
                    placeholder: 100,
                    label: 'Max Results per Search Query (Leave empty to extract all places)'
                })
        })
        .section("Geo Location", (section) => {
            section
                .text('coordinates', {
                    placeholder: '12.900490, 77.571466'
                })
                .numberGreaterThanOrEqualToOne('zoom_level', {
                    label: 'Zoom Level (1-21)',
                    defaultValue: 14,
                    placeholder: 14
                })
        })
}
```

--------------------------------

### Extract Product Links from G2 Sitemap

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Fetches product links from a gzipped sitemap index by filtering for segments starting with 'products'.

```python
from botasaurus import bt
from botasaurus.sitemap import Sitemap, Filters, Extractors

links = (
    Sitemap("https://www.g2.com/sitemaps/sitemap_index.xml.gz")
    .filter(Filters.first_segment_equals("products"))
    .extract(Extractors.extract_link_upto_second_segment())
    .write_links('g2-products')
)
```

--------------------------------

### Choose Control (Button Options)

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/enhancing-scrapers/input-controls.md

Displays options as clickable buttons, suitable for a small number of choices (fewer than 3). It requires an `options` array.

```APIDOC
## choose
Displays options as clickable buttons (an alternative to `select`). It requires an `options` array.

Use `choose` instead of `select` when you have fewer than 3 options for better user experience.

### Parameters
- **options** (array) - Required. An array of objects, each with `value` and `label` properties, defining the selectable options.
- **defaultValue** (string) - Optional. The default selected option's value.

### Request Example
```ts
.choose("theme", {
  options: [
    { value: "light", label: "Light" },
    { value: "dark",  label: "Dark"  },
  ],
  defaultValue: "light",
})
```
```

--------------------------------

### Generate Deployment Manifests

Source: https://github.com/omkarcloud/botasaurus/blob/master/run-scraper-in-kubernetes.md

Creates the necessary GitHub Actions workflow and Kubernetes deployment YAML files.

```bash
python -m bota create-manifests
```

--------------------------------

### Visit URLs with Botasaurus Driver

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Use `driver.get` for standard URL navigation. `driver.google_get` is recommended for using Google as a referer. `driver.get_via` allows specifying a custom referer, and `driver.get_via_this_page` uses the current page as the referer.

```python
driver.get("https://www.example.com")
```

```python
driver.google_get("https://www.example.com")  # Use Google as the referer [Recommended]
```

```python
driver.get_via("https://www.example.com", referer="https://duckduckgo.com/")  # Use custom referer
```

```python
driver.get_via_this_page("https://www.example.com")  # Use current page as referer
```

--------------------------------

### Push Code to GitHub Repository

Source: https://github.com/omkarcloud/botasaurus/blob/master/run-scraper-in-kubernetes.md

Initializes a new git repository and pushes the local project files to a remote GitHub repository.

```bash
rm -rf .git # remove the existing git repository
git init
git add .
git commit -m "Initial Commit"
git remote add origin https://github.com/USERNAME/kubernetes-scraper # TODO: replace USERNAME with your GitHub username
git branch -M main
git push -u origin main
```

--------------------------------

### Add Custom Health Check Endpoint

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Use this to add a custom GET endpoint for health checks to your API. It requires the Fastify instance provided by `addCustomRoutes`.

```typescript
ApiConfig.addCustomRoutes((server) => {
  server.get('/health', (request, reply) => {
    return reply.send({ status: 'OK'});
  });
});
```

--------------------------------

### Field with Options: outputKey, map, and showIf

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/enhancing-scrapers/custom-views.md

Demonstrates a Field configuration with options to rename the output key, map values using a function, and conditionally show the column based on input data.

```typescript
new Field("reviews_per_rating", { 
    outputKey: "average_rating", 
    map: (value, record) => {
        // ... Logic to calculate average rating from the 'value' object
    }, 
    // Only show this column if the user checked "scrape_prices"
    showIf: (inputData) => inputData.scrape_prices === true 
})
```

--------------------------------

### SqliteCacheStorage Constructor

Source: https://github.com/omkarcloud/botasaurus/blob/master/sqlite-cache-storage/README.md

Initialize the storage backend with a custom database path and table name.

```python
SqliteCacheStorage(
    db_path: str = 'cache.db',
    table_name: str = 'botasaurus_cache'
)
```

--------------------------------

### Data Structure for Product Records

Source: https://github.com/omkarcloud/botasaurus/blob/master/advanced.md

Example data structure representing product information, including nested dictionaries and lists, used for demonstrating Botasaurus field types.

```python
products = [
    {
        "id": 1,
        "name": "T-Shirt",
        "price": 16,  # in US Dollar
        "reviews": 1000,
        "reviews_per_rating": {
            "1": 0,
            "2": 0, 
            "3": 0,
            "4": 100,
            "5": 900,
        },
        "featured_reviews": [
            {
                "id": 1,
                "rating": 5,
                "content": "Awesome t-shirt!",
            },
            {
                "id": 2,
                "rating": 5,
                "content": "Amazing t-shirt!",
            },
        ],
    },
    {
        "id": 2,
        "name": "Laptop",
        "price": 700,
        "reviews": 500,
        "reviews_per_rating": {
            "1": 0,
            "2": 0,
            "3": 0,
            "4": 100,
            "5": 400,
        },
        "featured_reviews": [
            {
                "id": 1,
                "rating": 5,
                "content": "Best laptop ever!",
            },
            {
                "id": 2,
                "rating": 5,
                "content": "Great laptop!",
            },
        ],
    },
]
```

--------------------------------

### Enable Headless Mode

Source: https://github.com/omkarcloud/botasaurus/blob/master/README.md

Run the browser in headless mode. Note that this may trigger anti-bot detection services.

```python
@browser(
    headless=True
)
```

--------------------------------

### Add Custom Authentication Middleware

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Implement custom authentication logic using Fastify's `onRequest` hook. This example checks for a specific secret in the request headers.

```typescript
ApiConfig.addCustomRoutes((server) => {
  server.addHook('onRequest', (request, reply, done) => {
    // Check for secret
    const secret = request.headers['x-secret'] as string;

    if (secret === '49cb1de3-419b-4647-bf06-22c9e1110313') {
      // Valid secret, proceed
      return done(); 
    } else {
      return reply.status(401).send({
        message: 'Unauthorized: Invalid secret.',
      });
    }
  });
});
```

--------------------------------

### Add Direct Scraper Endpoint

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Create a direct GET endpoint for a scraper using `ApiConfig.addScraperAlias()`. This bypasses task creation and scheduling overhead, allowing immediate execution.

```typescript
import { hotelsSearchScraper } from "../src/scrapers";

// Creates direct GET endpoint at /hotels/search
ApiConfig.addScraperAlias(hotelsSearchScraper, "/hotels/search");
```

--------------------------------

### addCustomRoutes((server: FastifyInstance) => void)

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/botasaurus-desktop-api/adding-api.md

Extends the API with custom endpoints and middleware using Fastify's routing system. This method receives a Fastify instance, allowing you to define any route you need.

```APIDOC
## `addCustomRoutes((server: FastifyInstance) => void)`

### Description
Extends the API with custom endpoints and middleware using Fastify's routing system. This method receives a Fastify instance, allowing you to define any route you need.

### Usage
```ts
ApiConfig.addCustomRoutes((server) => {
  // Define custom routes or middleware here
});
```

### Examples
#### Adding a custom health check endpoint
```ts
ApiConfig.addCustomRoutes((server) => {
  server.get('/health', (request, reply) => {
    return reply.send({ status: 'OK'});
  });
});
```

#### Adding validation middleware
```ts
ApiConfig.addCustomRoutes((server) => {
  server.addHook('onRequest', (request, reply, done) => {
    const secret = request.headers['x-secret'] as string;

    if (secret === '49cb1de3-419b-4647-bf06-22c9e1110313') {
      return done();
    } else {
      return reply.status(401).send({
        message: 'Unauthorized: Invalid secret.',
      });
    }
  });
});
```

### When to use:
- Adding authentication middleware
- Creating custom endpoints
- Implementing webhook receivers
```

--------------------------------

### Create PDF File Picker Input with Botasaurus

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/your-first-extractors/amazon-pdf-invoice-extractor.md

Use this JavaScript code to create a drag-and-drop file picker that accepts only PDF files. Ensure the 'botasaurus-controls' library is installed.

```javascript
/**
 * @typedef {import('botasaurus-controls').Controls} Controls
 * @typedef {import('botasaurus-controls').FileTypes} FileTypes
 */
const { FileTypes } = require('botasaurus-controls');

/**
 * Renders the form users see on the Home page.
 * @param {Controls} controls
 */
function getInput(controls) {
  // Render a File Input for uploading PDFs
  controls.filePicker('files', {
    label: 'Invoice PDFs',
    accept: FileTypes.PDF,
    isRequired: true,
    helpText: 'Drag one or more Amazon invoice PDFs here',
  });
}
```

--------------------------------

### Add Help Text

Source: https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/botasaurus-desktop/enhancing-scrapers/input-controls.md

Displays a help icon with descriptive text when hovered.

```ts
.text("api_key", {
   label: "API Key",
   // highlight-next-line
   helpText: "Find API key in Dashboard → Settings → API"
})
```