### Run Example Integration Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/_integration-example/README.md Activate the virtual environment and run the example integration script to start the agent. ```bash source .venv/bin/activate python example_integration.py ``` -------------------------------- ### Install Zerobus SDK for Go Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-zerobus-ingest/1-setup-and-authentication.md Install the Zerobus SDK for Go using the go get command. ```bash go get github.com/databricks/zerobus-sdk-go ``` -------------------------------- ### Run Setup Script Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/_integration-example/README.md Execute the setup script to create a virtual environment, install dependencies, and configure skills for the ai-dev-kit. ```bash ./setup.sh ``` -------------------------------- ### Quick Start Example Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/TEMPLATE/SKILL.md Demonstrates the most common use case for the skill. ```python # Example code or command example_function( parameter1="value1", parameter2="value2" ) ``` -------------------------------- ### Start M2M Communication Example App Locally Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/m2m-communication-example/README.md Configure and run the example app locally. Ensure `BUILDER_APP_URL` and `DATABRICKS_TOKEN` environment variables are set. ```bash cd databricks-builder-app/scripts/m2m-communication-example # Install dependencies pip install -r requirements.txt # Set environment variables export BUILDER_APP_URL=http://localhost:8000 export DATABRICKS_TOKEN=dapi... # Your PAT # Run uvicorn app:app --host 0.0.0.0 --port 8001 ``` -------------------------------- ### Local Development Setup for Visual Builder App Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/README.md Sets up dependencies and starts the Visual Builder App locally. Edit .env.local with your credentials before starting. ```bash ./scripts/setup.sh # Install dependencies # Edit .env.local with your credentials ./scripts/start_dev.sh # Start locally at http://localhost:3000 ``` -------------------------------- ### Start Local Development Server Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/README.md Execute this script to provision Lakebase, install dependencies, and start the local development environment. Ensure you replace `` with your Databricks CLI profile name. ```bash cd databricks-builder-app ./scripts/start_local.sh --profile ``` -------------------------------- ### Describe SQL Procedure Example Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-dbsql/sql-scripting.md An example of using `DESCRIBE PROCEDURE EXTENDED` to get detailed information about a stored procedure. ```sql DESCRIBE PROCEDURE EXTENDED run_daily_etl; ``` -------------------------------- ### Setup MCP Server Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/CONTRIBUTING.md Run the setup script for the MCP server, which includes databricks-tools-core. ```bash ./databricks-mcp-server/setup.sh ``` -------------------------------- ### Show Procedures Example Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-dbsql/sql-scripting.md An example of listing procedures within a specific catalog and schema. ```sql SHOW PROCEDURES IN my_catalog.my_schema; ``` -------------------------------- ### Copy Integration Example Directory Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/_integration-example/README.md Copy the integration example directory to your project. This sets up the necessary files for embedding the ai-dev-kit. ```bash cp -r _integration-example /path/to/your/project/ cd /path/to/your/project/_integration-example ``` -------------------------------- ### Volume Path Examples Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-unity-catalog/6-volumes.md Examples of the standard path format for accessing files within Unity Catalog volumes. ```text /Volumes/main/default/my_volume/data.csv ``` ```text /Volumes/analytics/raw/landing_zone/2024/01/orders.parquet ``` ```text /Volumes/ml/training/images/cats/cat_001.jpg ``` -------------------------------- ### Install databricks-tools-core with uv Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-tools-core/README.md Install the package using uv. Use the dev dependencies for development. ```bash # Install the package uv pip install -e . # Install with dev dependencies uv pip install -e ".[dev]" ``` -------------------------------- ### Deploy M2M Communication Example App Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/m2m-communication-example/README.md Steps to deploy the example app, including creating, configuring, and deploying it to your Databricks workspace. ```bash # Create the app databricks apps create m2m-communication-example # Copy and configure app.yaml cp app.yaml.example app.yaml # Edit app.yaml: set BUILDER_APP_URL to the builder app's URL # Upload and deploy databricks workspace import-dir . /Workspace/Users//apps/m2m-communication-example --overwrite databricks apps deploy m2m-communication-example --source-code-path /Workspace/Users//apps/m2m-communication-example ``` -------------------------------- ### Install Specific Tools Only (Mac/Linux) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/README.md Installs only the specified tools for the AI Dev Kit. Useful for customizing your installation. ```bash bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh) --tools cursor,gemini,antigravity,windsurf,opencode ``` -------------------------------- ### Install AI Dev Kit Globally (Mac/Linux) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/README.md Installs the AI Dev Kit globally with a force reinstall option. Use this for a user-level installation. ```bash bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh) --global --force ``` -------------------------------- ### Download Install Script (Windows PowerShell) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/README.md Downloads the AI Dev Kit installation script to the current directory for manual execution. ```powershell irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 -OutFile install.ps1 ``` -------------------------------- ### manage_dashboard Example Usage Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-aibi-dashboards/SKILL.md Example Python code demonstrating how to use the `manage_dashboard` tool for various actions. ```APIDOC **Example usage:** ```python # Create/update dashboard manage_dashboard( action="create_or_update", display_name="Sales Dashboard", parent_path="/Workspace/Users/me/dashboards", serialized_dashboard=dashboard_json, warehouse_id="abc123", publish=True # auto-publish after create ) # Get dashboard details manage_dashboard(action="get", dashboard_id="dashboard_123") # List all dashboards manage_dashboard(action="list") ``` ``` -------------------------------- ### Example Protobuf Schema Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-zerobus-ingest/4-protobuf-schema.md An example of a generated .proto file defining a message that corresponds to a Delta table schema. ```protobuf syntax = "proto3"; message AirQuality { string device_name = 1; int32 temp = 2; int64 humidity = 3; } ``` -------------------------------- ### Install Databricks Skills Locally Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/README.md Use these commands to install Databricks skills directly from a local repository checkout. Options include installing all skills, only Databricks skills, specific skills, pinning versions, listing available skills, and uploading to Genie Code. ```bash # Install all skills (Databricks + MLflow + APX) — downloads from GitHub by default ./databricks-skills/install_skills.sh # Install Databricks skills only from this checkout (no network for those skills) ./databricks-skills/install_skills.sh --local # Install specific skills ./databricks-skills/install_skills.sh databricks-bundles agent-evaluation # Pin MLflow / APX versions ./databricks-skills/install_skills.sh --mlflow-version v1.0.0 # List available skills ./databricks-skills/install_skills.sh --list # Install + upload to workspace for Genie Code (/Workspace/Users//.assistant/skills) ./databricks-skills/install_skills.sh --install-to-genie ./databricks-skills/install_skills.sh --install-to-genie --profile prod # Local Databricks skills + Genie upload ./databricks-skills/install_skills.sh --local --install-to-genie ``` -------------------------------- ### Add Example Questions for Evaluation Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-agent-bricks/2-supervisor-agents.md Use this JSON structure to provide example questions and their corresponding routing guidelines for evaluating agent performance. ```json { "examples": [ { "question": "I haven't received my invoice for this month", "guideline": "Should be routed to billing_agent" }, { "question": "The API is returning a 500 error", "guideline": "Should be routed to technical_agent" }, { "question": "How many vacation days do I have?", "guideline": "Should be routed to hr_agent" } ] } ``` -------------------------------- ### Start Builder App for Local Development Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/scripts/m2m-communication-example/README.md Run this command in the `databricks-builder-app` directory to start the Builder App locally. It will be accessible at `http://localhost:8000`. ```bash cd databricks-builder-app ./scripts/start_dev.sh ``` -------------------------------- ### Interact with Serving Endpoints Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Get a serving endpoint, query it with text prompts, or use it with chat/completions. Also shows how to get an OpenAI-compatible client. ```python endpoint = w.serving_endpoints.get(name="my-endpoint") ``` ```python response = w.serving_endpoints.query( name="my-endpoint", inputs={"prompt": "Hello, world!"} ) ``` ```python response = w.serving_endpoints.query( name="my-chat-endpoint", messages=[{"role": "user", "content": "Hello!"}] ) ``` ```python openai_client = w.serving_endpoints.get_open_ai_client() ``` -------------------------------- ### Install Skills for Genie Code Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/README.md Run this script from the repository root to install necessary skills to your workspace, making them available for Genie Code. ```bash # From the repo root — installs skills to your workspace for Genie Code ./databricks-skills/install_skills_to_genie_code.sh ``` -------------------------------- ### Local Development Script Options Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/README.md Various options for the `start_local.sh` script to control the setup process, such as skipping Lakebase provisioning, forcing dependency reinstallation, or regenerating environment files. ```bash # First time — everything from scratch ./scripts/start_local.sh --profile dbx_shared_demo ``` ```bash # Subsequent runs — fast (deps cached, Lakebase exists) ./scripts/start_local.sh --profile dbx_shared_demo ``` ```bash # Skip Lakebase provisioning ./scripts/start_local.sh --profile dbx_shared_demo --skip-lakebase ``` ```bash # Force reinstall all dependencies ./scripts/start_local.sh --profile dbx_shared_demo --force-install ``` ```bash # Regenerate .env.local ./scripts/start_local.sh --profile dbx_shared_demo --force-env ``` ```bash # Custom Lakebase project name ./scripts/start_local.sh --profile dbx_shared_demo --lakebase-id my-custom-db ``` -------------------------------- ### Install Packages via MCP Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-model-serving/9-package-requirements.md Installs packages using the `execute_code` function, which is part of the Databricks Machine Compute Platform (MCP). This example installs MLflow, Databricks Langchain, LangGraph, Databricks Agents, and Pydantic. ```python execute_code( code="%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic" ) ``` -------------------------------- ### Example Usage Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-zerobus-ingest/2-python-client.md Demonstrates how to use the ZerobusClient for both JSON and Protobuf data ingestion, including the use of the context manager. ```APIDOC ## Using the Client Class ### JSON Ingestion with Context Manager ```python import os from zerobus.sdk.shared import RecordType # Assuming environment variables are set for authentication and configuration with ZerobusClient( server_endpoint=os.environ["ZEROBUS_SERVER_ENDPOINT"], workspace_url=os.environ["DATABRICKS_WORKSPACE_URL"], table_name=os.environ["ZEROBUS_TABLE_NAME"], client_id=os.environ["DATABRICKS_CLIENT_ID"], client_secret=os.environ["DATABRICKS_CLIENT_SECRET"], record_type=RecordType.JSON, ) as client: for i in range(100): client.ingest({"device_name": f"sensor-{i}", "temp": 22, "humidity": 55}) ``` ### Protobuf Ingestion with Context Manager ```python import os import record_pb2 # Assuming record_pb2 is generated from your .proto file from zerobus.sdk.shared import RecordType # Assuming environment variables are set for authentication and configuration with ZerobusClient( server_endpoint=os.environ["ZEROBUS_SERVER_ENDPOINT"], workspace_url=os.environ["DATABRICKS_WORKSPACE_URL"], table_name=os.environ["ZEROBUS_TABLE_NAME"], client_id=os.environ["DATABRICKS_CLIENT_ID"], client_secret=os.environ["DATABRICKS_CLIENT_SECRET"], record_type=RecordType.PROTO, proto_descriptor=record_pb2.AirQuality.DESCRIPTOR, ) as client: for i in range(100): record = record_pb2.AirQuality(device_name=f"sensor-{i}", temp=22, humidity=55) client.ingest(record) ``` ``` -------------------------------- ### SQL Warehouses Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Operations for listing, getting, creating, starting, and stopping SQL warehouses. ```APIDOC ## SQL Warehouses ### Description Operations for listing, getting, creating, starting, and stopping SQL warehouses. ### List Warehouses ```python for wh in w.warehouses.list(): print(f"{wh.name}: {wh.state}") ``` ### Get Warehouse ```python warehouse = w.warehouses.get(id="abc123") ``` ### Create Warehouse and Wait ```python created = w.warehouses.create_and_wait( name="my-warehouse", cluster_size="Small", max_num_clusters=1, auto_stop_mins=15 ) ``` ### Start/Stop Warehouse ```python w.warehouses.start(id="abc123").result() w.warehouses.stop(id="abc123").result() ``` ``` -------------------------------- ### Project Setup with uv Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/spark-python-data-source/SKILL.md Initialize a new Python project for a Spark data source using uv, add necessary dependencies like pyspark and pytest, and set up the project directory structure. ```bash uv init your-datasource cd your-datasource uv add pyspark pytest pytest-spark ``` -------------------------------- ### Production Build and Run Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-builder-app/README.md Build the React frontend and then run the FastAPI backend using uvicorn for production deployment. ```bash # Build frontend cd client && npm run build && cd .. # Run with uvicorn uvicorn server.app:app --host 0.0.0.0 --port 8000 ``` -------------------------------- ### Cluster Management Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Examples for creating, starting, stopping, and deleting Databricks clusters using the SDK. ```APIDOC ## Cluster Management ### Description Examples for creating, starting, stopping, and deleting Databricks clusters using the SDK. ### Create and Wait for Cluster ```python from datetime import timedelta cluster = w.clusters.create_and_wait( cluster_name="my-cluster", spark_version="14.3.x-scala2.12", node_type_id="i3.xlarge", num_workers=2, timeout=timedelta(minutes=30) ) ``` ### Start/Stop/Delete Cluster ```python w.clusters.start(cluster_id="...").result() w.clusters.stop(cluster_id="...") w.clusters.delete(cluster_id="...") ``` ``` -------------------------------- ### Databricks SDK Clusters API Examples Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Examples of using the Databricks SDK's Clusters API to list, get, and create clusters. Includes selecting the latest Spark version and appropriate node types. ```python # List all clusters for cluster in w.clusters.list(): print(f"{cluster.cluster_name}: {cluster.state}") # Get cluster details cluster = w.clusters.get(cluster_id="0123-456789-abcdef") # Create a cluster (returns Wait object) wait = w.clusters.create( cluster_name="my-cluster", spark_version=w.clusters.select_spark_version(latest=True), node_type_id=w.clusters.select_node_type(local_disk=True), num_workers=2 ) cluster = wait.result() # Wait for cluster to be running ``` -------------------------------- ### manage_pipeline_run Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-spark-declarative-pipelines/references/2-mcp-approach.md Manages Databricks pipeline runs, allowing users to start, get status, stop, or retrieve events for a pipeline. ```APIDOC ## manage_pipeline_run ### Description Manages Databricks pipeline runs. Supports actions like starting, getting status, stopping, and retrieving events. ### Actions - **start**: Starts a pipeline update. Requires `pipeline_id`. - **get**: Retrieves the status of a pipeline run. Requires `pipeline_id` and `update_id`. - **stop**: Stops a currently running pipeline. Requires `pipeline_id`. - **get_events**: Retrieves events and logs for debugging a pipeline run. Requires `pipeline_id`. ### Options for `start` - `wait` (boolean): Block until the pipeline update is complete. Defaults to `True`. - `full_refresh` (boolean): Reprocess all data for the pipeline. Defaults to `False`. - `validate_only` (boolean): Perform a dry run without writing data. Defaults to `False`. - `refresh_selection` (list of strings): Specify specific tables to refresh. Example: `["table1", "table2"]`. ### Options for `get_events` - `event_log_level` (string): Filter events by log level. Options: "ERROR", "WARN" (default), "INFO". - `max_results` (integer): Maximum number of events to retrieve. Defaults to 5. - `update_id` (string): Filter events for a specific run ID. ``` -------------------------------- ### Get Current Package Versions for Pip Requirements Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-model-serving/6-logging-registration.md Dynamically fetch the currently installed versions of MLflow and databricks-langchain to use in your `pip_requirements`. ```python from pkg_resources import get_distribution pip_requirements=[ f"mlflow=={get_distribution('mlflow').version}", f"databricks-langchain=={get_distribution('databricks-langchain').version}", ] ``` -------------------------------- ### Python Wheel Entry Point Configuration Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-jobs/task-types.md Example of how to configure entry points in a Python package's `setup.py` for use with Databricks Python Wheel tasks. ```python # setup.py entry_points={ 'console_scripts': [ 'main=my_package.main:run', ], } ``` -------------------------------- ### Setup PostgreSQL Container for Integration Tests Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/spark-python-data-source/references/testing-patterns.md Provides a pytest fixture to start a PostgreSQL container using Testcontainers for integration testing. ```python import pytest from testcontainers.postgres import PostgresContainer @pytest.fixture(scope="session") def postgres_container(): """Start PostgreSQL container for integration tests.""" with PostgresContainer("postgres:15") as container: yield container ``` -------------------------------- ### SQL Warehouses API Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/doc-index.md Methods for managing Databricks SQL Warehouses, including listing, getting, creating, editing, deleting, starting, and stopping warehouses. ```APIDOC ## SQL Warehouses API ### List SQL Warehouses **Description**: Retrieves a list of SQL Warehouses. ### Method `w.warehouses.list()` ### Parameters - `page_size` (int, optional) - `page_token` (str, optional) - `run_as_user_id` (str, optional) ### Returns `Iterator[EndpointInfo]` --- ### Get SQL Warehouse **Description**: Retrieves details for a specific SQL Warehouse. ### Method `w.warehouses.get()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse. ### Returns `GetWarehouseResponse` --- ### Create SQL Warehouse **Description**: Creates a new SQL Warehouse. ### Method `w.warehouses.create()` ### Parameters - `name` (str) - Required - The name of the SQL Warehouse. - `cluster_size` (str, optional) - `max_num_clusters` (int, optional) - `auto_stop_mins` (int, optional) - `...` (other optional parameters) ### Returns `Wait[...]` - An object that can be used to wait for the warehouse creation to complete. --- ### Create and Wait for SQL Warehouse **Description**: Creates a new SQL Warehouse and waits for it to be ready. ### Method `w.warehouses.create_and_wait()` ### Parameters - `timeout` (int) - Required - The maximum time to wait for warehouse creation. - `...` (other required and optional parameters for warehouse creation) ### Returns `GetWarehouseResponse` - Details of the created SQL Warehouse. --- ### Edit SQL Warehouse **Description**: Edits an existing SQL Warehouse. ### Method `w.warehouses.edit()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to edit. - `[...]` (other optional parameters for editing) ### Returns `Wait[GetWarehouseResponse]` - An object that can be used to wait for the warehouse edit to complete. --- ### Delete SQL Warehouse **Description**: Deletes a SQL Warehouse. ### Method `w.warehouses.delete()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to delete. ### Returns `None` --- ### Start SQL Warehouse **Description**: Starts a stopped SQL Warehouse. ### Method `w.warehouses.start()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to start. ### Returns `Wait[GetWarehouseResponse]` - An object that can be used to wait for the warehouse to start. --- ### Start and Wait for SQL Warehouse **Description**: Starts a stopped SQL Warehouse and waits for it to be ready. ### Method `w.warehouses.start_and_wait()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to start. - `timeout` (int) - Required - The maximum time to wait for the warehouse to start. ### Returns `GetWarehouseResponse` - Details of the started SQL Warehouse. --- ### Stop SQL Warehouse **Description**: Stops a running SQL Warehouse. ### Method `w.warehouses.stop()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to stop. ### Returns `Wait[GetWarehouseResponse]` - An object that can be used to wait for the warehouse to stop. --- ### Stop and Wait for SQL Warehouse **Description**: Stops a running SQL Warehouse and waits for it to be stopped. ### Method `w.warehouses.stop_and_wait()` ### Parameters - **id** (str) - Required - The ID of the SQL Warehouse to stop. - `timeout` (int) - Required - The maximum time to wait for the warehouse to stop. ### Returns `GetWarehouseResponse` - Details of the stopped SQL Warehouse. --- ### Get Workspace Warehouse Config **Description**: Retrieves the workspace-level SQL Warehouse configuration. ### Method `w.warehouses.get_workspace_warehouse_config()` ### Parameters None ### Returns `GetWorkspaceWarehouseConfigResponse` --- ### Set Workspace Warehouse Config **Description**: Sets the workspace-level SQL Warehouse configuration. ### Method `w.warehouses.set_workspace_warehouse_config()` ### Parameters - `[...]` (parameters for configuring the workspace warehouse) ### Returns `None` --- ### Get SQL Warehouse Permissions **Description**: Retrieves the permissions for a SQL Warehouse. ### Method `w.warehouses.get_permissions()` ### Parameters - **warehouse_id** (str) - Required - The ID of the SQL Warehouse. ### Returns `WarehousePermissions` --- ### Set SQL Warehouse Permissions **Description**: Sets the permissions for a SQL Warehouse. ### Method `w.warehouses.set_permissions()` ### Parameters - **warehouse_id** (str) - Required - The ID of the SQL Warehouse. - `access_control_list` (list, optional) - A list of access control entries. ### Returns `WarehousePermissions` ``` -------------------------------- ### Complete DLT to SDP Python Migration Example Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-spark-declarative-pipelines/references/4-dlt-migration.md A comprehensive example demonstrating the migration of imports, decorators, table reads, expectations, CDC/SCD, and clustering from DLT to SDP. ```python # Before (DLT): import dlt from pyspark.sql import functions as F @dlt.table(name="bronze_orders", partition_cols=["order_date"]) def bronze_orders(): return spark.readStream.format("cloudFiles").load("/data/orders") @dlt.table(name="silver_orders") @dlt.expect_or_drop("valid_amount", "amount > 0") def silver_orders(): return dlt.read_stream("bronze_orders").filter(F.col("status") == "completed") dlt.create_streaming_table("dim_customers") dlt.apply_changes( target="dim_customers", source="customers_cdc", keys=["customer_id"], sequence_by="updated_at", stored_as_scd_type="2" ) ``` ```python # After (SDP): from pyspark import pipelines as dp from pyspark.sql import functions as F @dp.table(name="bronze_orders", cluster_by=["order_date"]) def bronze_orders(): return spark.readStream.format("cloudFiles").load("/data/orders") @dp.table(name="silver_orders") @dp.expect_or_drop("valid_amount", "amount > 0") def silver_orders(): return spark.readStream.table("bronze_orders").filter(F.col("status") == "completed") dp.create_streaming_table("dim_customers") dp.create_auto_cdc_flow( target="dim_customers", source="customers_cdc", keys=["customer_id"], sequence_by=F.col("updated_at"), stored_as_scd_type=2 ) ``` -------------------------------- ### Clusters API Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/doc-index.md Methods for managing Databricks compute clusters, including listing, getting, creating, editing, deleting, starting, stopping, and configuring clusters. ```APIDOC ## Clusters API ### List Clusters **Description**: Retrieves a list of clusters. ### Method `w.clusters.list()` ### Parameters - `filter_by` (str, optional) - `page_size` (int, optional) - `page_token` (str, optional) - `sort_by` (str, optional) ### Returns `Iterator[ClusterDetails]` --- ### Get Cluster **Description**: Retrieves details for a specific cluster. ### Method `w.clusters.get()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to retrieve. ### Returns `ClusterDetails` --- ### Create Cluster **Description**: Creates a new Databricks cluster. ### Method `w.clusters.create()` ### Parameters - **spark_version** (str) - Required - The Spark version for the cluster. - `cluster_name` (str, optional) - `num_workers` (int, optional) - `...` (other optional parameters) ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster creation to complete. --- ### Create and Wait for Cluster **Description**: Creates a new Databricks cluster and waits for it to be ready. ### Method `w.clusters.create_and_wait()` ### Parameters - `timeout` (int) - Required - The maximum time to wait for cluster creation. - `...` (other required and optional parameters for cluster creation) ### Returns `ClusterDetails` - Details of the created cluster. --- ### Edit Cluster **Description**: Edits an existing Databricks cluster. ### Method `w.clusters.edit()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to edit. - **spark_version** (str) - Required - The new Spark version for the cluster. - `[...]` (other optional parameters for editing) ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster edit to complete. --- ### Delete Cluster **Description**: Deletes a Databricks cluster. ### Method `w.clusters.delete()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to delete. ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster deletion to complete. --- ### Permanent Delete Cluster **Description**: Permanently deletes a Databricks cluster. ### Method `w.clusters.permanent_delete()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to permanently delete. ### Returns `None` --- ### Start Cluster **Description**: Starts a terminated Databricks cluster. ### Method `w.clusters.start()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to start. ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster to start. --- ### Start and Wait for Cluster **Description**: Starts a terminated Databricks cluster and waits for it to be ready. ### Method `w.clusters.start_and_wait()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to start. - `timeout` (int) - Required - The maximum time to wait for the cluster to start. ### Returns `ClusterDetails` - Details of the started cluster. --- ### Restart Cluster **Description**: Restarts a running Databricks cluster. ### Method `w.clusters.restart()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to restart. - `restart_user` (str, optional) - The user to restart the cluster as. ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster restart to complete. --- ### Resize Cluster **Description**: Resizes a Databricks cluster. ### Method `w.clusters.resize()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to resize. - `autoscale` (dict, optional) - Autoscale configuration. - `num_workers` (int, optional) - The new number of workers. ### Returns `Wait[ClusterDetails]` - An object that can be used to wait for the cluster resize to complete. --- ### List Cluster Events **Description**: Retrieves a list of events for a specific cluster. ### Method `w.clusters.events()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster. - `[...]` (other optional parameters for filtering events) ### Returns `Iterator[ClusterEvent]` --- ### Pin Cluster **Description**: Pins a cluster to keep it running. ### Method `w.clusters.pin()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to pin. ### Returns `None` --- ### Unpin Cluster **Description**: Unpins a cluster, allowing it to be terminated. ### Method `w.clusters.unpin()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to unpin. ### Returns `None` --- ### Select Spark Version **Description**: Selects a Spark version based on specified criteria. ### Method `w.clusters.select_spark_version()` ### Parameters - `latest` (bool, optional) - `long_term_support` (bool, optional) - `ml` (bool, optional) - `gpu` (bool, optional) - `...` (other optional parameters) ### Returns `str` - The selected Spark version. --- ### Select Node Type **Description**: Selects a node type based on specified criteria. ### Method `w.clusters.select_node_type()` ### Parameters - `min_memory_gb` (int, optional) - `min_cores` (int, optional) - `local_disk` (int, optional) - `...` (other optional parameters) ### Returns `str` - The selected node type. --- ### List Node Types **Description**: Lists available node types for clusters. ### Method `w.clusters.list_node_types()` ### Parameters None ### Returns `ListNodeTypesResponse` --- ### List Spark Versions **Description**: Lists available Spark versions. ### Method `w.clusters.spark_versions()` ### Parameters None ### Returns `GetSparkVersionsResponse` --- ### List Zones **Description**: Lists available availability zones. ### Method `w.clusters.list_zones()` ### Parameters None ### Returns `ListAvailableZonesResponse` --- ### Ensure Cluster is Running **Description**: Ensures a specified cluster is running, starting it if necessary. ### Method `w.clusters.ensure_cluster_is_running()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster to ensure is running. ### Returns `None` --- ### Change Cluster Owner **Description**: Changes the owner of a cluster. ### Method `w.clusters.change_owner()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster. - **owner_username** (str) - Required - The username of the new owner. ### Returns `None` --- ### Get Cluster Permissions **Description**: Retrieves the permissions for a cluster. ### Method `w.clusters.get_permissions()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster. ### Returns `ClusterPermissions` --- ### Set Cluster Permissions **Description**: Sets the permissions for a cluster. ### Method `w.clusters.set_permissions()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster. - `access_control_list` (list, optional) - A list of access control entries. ### Returns `ClusterPermissions` --- ### Update Cluster Permissions **Description**: Updates the permissions for a cluster. ### Method `w.clusters.update_permissions()` ### Parameters - **cluster_id** (str) - Required - The ID of the cluster. - `access_control_list` (list, optional) - A list of access control entries to update. ### Returns `ClusterPermissions` ``` -------------------------------- ### Manage SQL Warehouses Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Code snippets for listing, getting, creating, starting, and stopping SQL warehouses. The `create_and_wait` method is used for blocking creation. ```python # List warehouses for wh in w.warehouses.list(): print(f"{wh.name}: {wh.state}") ``` ```python # Get warehouse warehouse = w.warehouses.get(id="abc123") ``` ```python # Create warehouse created = w.warehouses.create_and_wait( name="my-warehouse", cluster_size="Small", max_num_clusters=1, auto_stop_mins=15 ) ``` ```python # Start/stop w.warehouses.start(id="abc123").result() ``` ```python w.warehouses.stop(id="abc123").result() ``` -------------------------------- ### Example Instructions for Knowledge Assistant Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-agent-bricks/1-knowledge-assistants.md Provides detailed instructions for a Knowledge Assistant on how to answer questions, including citation requirements and handling of unknown information. ```text Be helpful and professional. When answering: 1. Always cite the specific document and section 2. If multiple documents are relevant, mention all of them 3. If the information isn't in the documents, clearly say so 4. Use bullet points for multi-part answers ``` -------------------------------- ### Create Custom Tool with @tool Decorator Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-model-serving/4-tools-integration.md Define custom tools for your agent using the `@tool` decorator. This example shows a tool to get the current time. ```python from langchain_core.tools import tool from langchain_core.runnables import RunnableConfig @tool def get_current_time(timezone: str = "UTC") -> str: """Get the current time in the specified timezone. Args: timezone: The timezone (e.g., 'UTC', 'America/New_York') """ from datetime import datetime import pytz tz = pytz.timezone(timezone) now = datetime.now(tz) return now.strftime("%Y-%m-%d %H:%M:%S %Z") ``` -------------------------------- ### Basic Dashboard Setup (NYC Taxi) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-aibi-dashboards/3-examples.md This snippet demonstrates the initial steps for setting up a basic dashboard by checking table schema and testing SQL queries. Ensure the 'samples.nyctaxi' catalog and schema are accessible. ```python import json # Step 1: Check table schema table_info = get_table_stats_and_schema(catalog="samples", schema="nyctaxi") # Step 2: Test queries execute_sql("SELECT COUNT(*) as trips, AVG(fare_amount) as avg_fare, AVG(trip_distance) as avg_distance FROM samples.nyctaxi.trips") execute_sql(""" SELECT pickup_zip, COUNT(*) as trip_count FROM samples.nyctaxi.trips GROUP BY pickup_zip ORDER BY trip_count DESC LIMIT 10 """) ``` -------------------------------- ### Pipelines API (Delta Live Tables) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/doc-index.md Manage Delta Live Tables pipelines, including listing, getting, creating, updating, deleting, starting, and stopping pipelines. ```APIDOC ## Pipelines API (Delta Live Tables) ### list_pipelines **Description**: Lists Delta Live Tables pipelines. **Signature**: `w.pipelines.list_pipelines([filter, max_results, ...]) → Iterator[PipelineStateInfo]` ### get **Description**: Retrieves details of a specific Delta Live Tables pipeline. **Signature**: `w.pipelines.get(pipeline_id) → GetPipelineResponse` ### create **Description**: Creates a new Delta Live Tables pipeline. **Signature**: `w.pipelines.create([name, clusters, libraries, ...]) → CreatePipelineResponse` ### update **Description**: Updates an existing Delta Live Tables pipeline. **Signature**: `w.pipelines.update(pipeline_id, [...]) → None` ### delete **Description**: Deletes a Delta Live Tables pipeline. **Signature**: `w.pipelines.delete(pipeline_id) → None` ### start_update **Description**: Starts an update for a Delta Live Tables pipeline. **Signature**: `w.pipelines.start_update(pipeline_id, [full_refresh, ...]) → StartUpdateResponse` ### stop **Description**: Stops a Delta Live Tables pipeline. **Signature**: `w.pipelines.stop(pipeline_id) → Wait[GetPipelineResponse]` ### stop_and_wait **Description**: Stops a Delta Live Tables pipeline and waits for completion. **Signature**: `w.pipelines.stop_and_wait(pipeline_id, timeout) → GetPipelineResponse` ### list_updates **Description**: Lists updates for a Delta Live Tables pipeline. **Signature**: `w.pipelines.list_updates(pipeline_id, [...]) → ListUpdatesResponse` ### get_update **Description**: Retrieves details of a specific Delta Live Tables pipeline update. **Signature**: `w.pipelines.get_update(pipeline_id, update_id) → GetUpdateResponse` ### list_pipeline_events **Description**: Lists events for a Delta Live Tables pipeline. **Signature**: `w.pipelines.list_pipeline_events(pipeline_id, [...]) → Iterator[PipelineEvent]` ### get_permissions **Description**: Retrieves permissions for a Delta Live Tables pipeline. **Signature**: `w.pipelines.get_permissions(pipeline_id) → PipelinePermissions` ### set_permissions **Description**: Sets permissions for a Delta Live Tables pipeline. **Signature**: `w.pipelines.set_permissions(pipeline_id, [...]) → PipelinePermissions` ``` -------------------------------- ### Setup UC Functions Toolkit Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-model-serving/4-tools-integration.md Initialize the UCFunctionToolkit by specifying function names. Add these tools to your agent's tool list. ```python from databricks_langchain import UCFunctionToolkit # Specify functions by name uc_toolkit = UCFunctionToolkit( function_names=[ "catalog.schema.my_function", "catalog.schema.another_function", "system.ai.python_exec", # Built-in Python interpreter ] ) # Add to your tools list tools = [] tools.extend(uc_toolkit.tools) ``` -------------------------------- ### Dynamically Get Current Package Versions Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-model-serving/9-package-requirements.md Retrieves the currently installed versions of MLflow, Databricks Langchain, and LangGraph dynamically using pkg_resources. This is useful for creating reproducible environments. ```python from pkg_resources import get_distribution pip_requirements=[ f"mlflow=={get_distribution('mlflow').version}", f"databricks-langchain=={get_distribution('databricks-langchain').version}", f"langgraph=={get_distribution('langgraph').version}", ] ``` -------------------------------- ### Scaffold a new Databricks AppKit app Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-apps-python/SKILL.md Use this command to interactively scaffold a new Databricks application project. It handles project setup, dependency installation, and optional deployment. ```bash databricks apps init ``` -------------------------------- ### Example Trigger Interval Calculations Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-spark-structured-streaming/trigger-and-cost-optimization.md Demonstrates calculating trigger intervals based on different Service Level Agreements (SLAs). For real-time requirements, `realTime=True` is recommended. ```python # Example 1: 1 hour SLA sla = 60 # minutes trigger = sla / 3 # 20 minutes .trigger(processingTime="20 minutes") # Example 2: 15 minute SLA sla = 15 # minutes trigger = sla / 3 # 5 minutes .trigger(processingTime="5 minutes") # Example 3: Real-time requirement .trigger(realTime=True) # < 800ms ``` -------------------------------- ### Create and Manage Databricks Clusters Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Examples for creating, starting, stopping, and deleting Databricks clusters using the SDK. The `create_and_wait` method provides a blocking call for cluster creation. ```python cluster = w.clusters.create_and_wait( cluster_name="my-cluster", spark_version="14.3.x-scala2.12", node_type_id="i3.xlarge", num_workers=2, timeout=timedelta(minutes=30) ) ``` ```python w.clusters.start(cluster_id="...").result() ``` ```python w.clusters.stop(cluster_id="...") ``` ```python w.clusters.delete(cluster_id="...") ``` -------------------------------- ### Example Prompts for Spark Data Source Creation Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/spark-python-data-source/SKILL.md These prompts illustrate various use cases for creating Spark data sources, covering different data stores, functionalities like sharding and pagination, and delivery guarantees. ```text Create a Spark data source for reading from MongoDB with sharding support Build a streaming connector for RabbitMQ with at-least-once delivery Implement a batch writer for Snowflake with staged uploads Write a data source for REST API with OAuth2 authentication and pagination ``` -------------------------------- ### Initialize New Pipeline Project (Interactive) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-spark-declarative-pipelines/references/1-project-initialization.md Use this command to interactively scaffold a new Declarative Automation Bundle (DAB) project. Follow the prompts to define project name, catalog, schema usage, and initial language. ```bash databricks pipelines init --output-dir . ``` -------------------------------- ### JSON Ingestion Quick Start Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-zerobus-ingest/2-python-client.md Ingest data as Python dictionaries with keys matching target table column names. This example demonstrates synchronous ingestion and waiting for offset confirmation. ```python import os from zerobus.sdk.sync import ZerobusSdk from zerobus.sdk.shared import RecordType, StreamConfigurationOptions, TableProperties server_endpoint = os.environ["ZEROBUS_SERVER_ENDPOINT"] workspace_url = os.environ["DATABRICKS_WORKSPACE_URL"] table_name = os.environ["ZEROBUS_TABLE_NAME"] client_id = os.environ["DATABRICKS_CLIENT_ID"] client_secret = os.environ["DATABRICKS_CLIENT_SECRET"] sdk = ZerobusSdk(server_endpoint, workspace_url) options = StreamConfigurationOptions(record_type=RecordType.JSON) table_props = TableProperties(table_name) stream = sdk.create_stream(client_id, client_secret, table_props, options) try: for i in range(100): record = {"device_name": f"sensor-{i}", "temp": 22, "humidity": 55} offset = stream.ingest_record_offset(record) stream.wait_for_offset(offset) # Block until durably written finally: stream.close() ``` -------------------------------- ### Manage MLflow Production Monitoring Scorers Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-mlflow-evaluation/references/patterns-trace-ingestion.md List, get, update, stop, start, and delete production monitoring scorers. Stopping a scorer pauses monitoring without removing its registration. ```python from mlflow.genai.scorers import list_scorers, get_scorer, delete_scorer, ScorerSamplingConfig # List all registered scorers for the active experiment scorers = list_scorers() for s in scorers: print(f" {s.name}: sample_rate={s.sampling_config.sample_rate if s.sampling_config else 'N/A'}") # Get a specific scorer safety_scorer = get_scorer(name="production_safety") # Update sample rate (e.g., increase from 50% to 80%) safety_scorer = safety_scorer.update( sampling_config=ScorerSamplingConfig(sample_rate=0.8) ) # Stop monitoring (keeps registration for later re-start) safety_scorer = safety_scorer.stop() # Re-start monitoring safety_scorer = safety_scorer.start( sampling_config=ScorerSamplingConfig(sample_rate=0.5) ) # Delete entirely (removes registration) delete_scorer(name="production_safety") ``` -------------------------------- ### Minimal SQL Scripting Compound Statement Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-dbsql/sql-scripting.md A basic SQL script must start and end with a BEGIN...END block. This example demonstrates the simplest valid compound statement. ```sql BEGIN SELECT 'Hello, SQL Scripting!'; END; ``` -------------------------------- ### Install AI Dev Kit (Mac/Linux) Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/README.md Installs the AI Dev Kit using the default profile for project-level scope. Ensure you run this from the project directory. ```bash bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh) ``` -------------------------------- ### Clusters API - List, Get, Create Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-python-sdk/SKILL.md Examples for interacting with the Databricks Clusters API using the Python SDK, including listing all clusters, retrieving details for a specific cluster, and creating a new cluster. ```APIDOC ### Clusters API **Doc:** https://databricks-sdk-py.readthedocs.io/en/latest/workspace/compute/clusters.html ```python # List all clusters for cluster in w.clusters.list(): print(f"{cluster.cluster_name}: {cluster.state}") # Get cluster details cluster = w.clusters.get(cluster_id="0123-456789-abcdef") # Create a cluster (returns Wait object) wait = w.clusters.create( cluster_name="my-cluster", spark_version=w.clusters.select_spark_version(latest=True), node_type_id=w.clusters.select_node_type(local_disk=True), num_workers=2 ) cluster = wait.result() # Wait for cluster to be running ``` ``` -------------------------------- ### Initialize Dash App with Bootstrap and Font Awesome Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-apps-python/3-frameworks.md Initialize a Dash application using `dash-bootstrap-components` for layout and styling, and include Font Awesome for icons. This setup is critical for consistent theming. ```python import dash import dash_bootstrap_components as dbc app = dash.Dash( __name__, external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.FONT_AWESOME], title="My Dashboard", ) ``` -------------------------------- ### Start and Monitor an Existing Pipeline Run Source: https://github.com/databricks-solutions/ai-dev-kit/blob/main/databricks-skills/databricks-spark-declarative-pipelines/references/2-mcp-approach.md Use `manage_pipeline_run` with `action="start"` to initiate a run for an existing pipeline. Options include specifying a full refresh, waiting for completion, and setting a timeout. ```python # MCP Tool: manage_pipeline_run manage_pipeline_run( action="start", pipeline_id="", full_refresh=True, wait=True, # Wait for completion timeout=1800 # 30 minute timeout ) ```