### Install Project Dependencies

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Installs the project's dependencies defined in the setup file. The '-e' flag installs in editable mode, and '.[local,test]' includes dependencies required for local development and running tests.

```bash
pip install -e ".[local,test]"
```

--------------------------------

### Example Deployment Configuration (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This snippet shows a partial example of the `conf/deployment.yml` file used to define workflows and tasks for deployment on Databricks. It highlights the use of a custom section for repetitive configurations.

```yaml
# Custom section is used to store configurations that might be repetative.
```

--------------------------------

### Install dbx Tool

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Installs the dbx command-line tool using pip, making it available in the activated Conda environment.

```bash
pip install dbx
```

--------------------------------

### Example Project Tree Structure (Shell)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This shell output displays a sample directory tree for a dbx project, illustrating the typical layout including configuration files (.dbx, conf), source code (charming_aurora), notebooks, tests, and CI/CD workflows (.github).

```shell
.
├── .dbx #(1)
│   ├── lock.json #(2)
│   └── project.json #(3)
├── .github #(4)
│   └── workflows
│       ├── onpush.yml #(5)
│       └── onrelease.yml #(6)
├── .gitignore #(7)
├── README.md #(8)
├── charming_aurora #(9)
│   ├── __init__.py #(10)
│   ├── common.py #(11)
│   └── tasks #(12)
│       ├── __init__.py
│       ├── sample_etl_task.py #(13)
│       └── sample_ml_task.py #(14)
├── conf #(15)
│   ├── deployment.yml #(16)
│   └── tasks #(17)
│       ├── sample_etl_config.yml #(18)
│       └── sample_ml_config.yml #(19)
├── notebooks #(20)
│   └── sample_notebook.py
├── pyproject.toml #(21)
├── setup.py #(22)
└── tests #(23)
    ├── entrypoint.py #(24)
    ├── integration
    │   └── e2e_test.py
    └── unit #(25)
        ├── conftest.py #(26)
        └── sample_test.py #(27)
```

--------------------------------

### Exporting Coverage Reports (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

These Bash commands demonstrate how to run unit tests with coverage and export the results into specific formats using the `--cov-report` flag. Examples are provided for exporting to HTML and XML formats.

```bash
pytest tests/unit --cov --cov-report=html
pytest tests/unit --cov --cov-report=xml
```

--------------------------------

### Change Directory to Project Folder

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Navigates into the newly created project directory 'charming-aurora' which was generated by the 'dbx init' command.

```bash
cd charming-aurora
```

--------------------------------

### Install OpenJDK for Local Spark Tests

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Installs OpenJDK version 11.0.15 from the conda-forge channel, which is required for running local Apache Spark tests.

```bash
conda install -c conda-forge openjdk=11.0.15
```

--------------------------------

### Initialize dbx Project with Custom Artifact Location

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Initializes a new dbx project, similar to the previous command, but also specifies a custom cloud-based artifact storage location (S3, WASBS, or GS) instead of the default DBFS path. This is recommended for production setups.

```bash
dbx init \
    -p "cicd_tool=GitHub Actions" \
    -p "cloud=<your-cloud>" \
    -p "project_name=charming-aurora" \
    -p "profile=charming-aurora" \
    -p "artifact_location=<s3://some/path OR wasbs://some/path OR gs://some/path>" \
    --no-input
```

--------------------------------

### Install Project Locally with Pip

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Installs the project package locally in editable mode using pip. This command includes local and test dependencies, making the project ready for development and testing.

```bash
pip install -e ".[local,test]"
```

--------------------------------

### Verify Databricks CLI Profile

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Verifies that the configured Databricks CLI profile 'charming-aurora' is working correctly by listing the root directory of the Databricks workspace filesystem.

```bash
databricks --profile charming-aurora workspace ls /
```

--------------------------------

### Running Unit Tests (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Bash command executes the unit tests located in the `tests/unit` directory using the pytest framework. It will typically start a local Spark session, run the tests, and shut down the session upon completion.

```bash
pytest tests/unit
```

--------------------------------

### Mapping Entrypoint Alias in setup.py (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Python code snippet from "setup.py" defines console script entry points. It maps the alias "etl" to the "entrypoint" function within the "charming_aurora.tasks.sample_etl_task" module, allowing the task to be launched via this alias.

```python
# irrenevant content is ommited
setup(
   ...,
    entry_points = {
        "console_scripts": [
            "etl = charming_aurora.tasks.sample_etl_task:entrypoint",
            "ml = charming_aurora.tasks.sample_ml_task:entrypoint"
    ]},
    ...
)
```

--------------------------------

### Initialize dbx Project with GitHub Actions

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Initializes a new dbx project named 'charming-aurora' using the default template. It configures the project for GitHub Actions CI/CD, specifies the cloud provider, and links it to the 'charming-aurora' Databricks CLI profile. The --no-input flag prevents interactive prompts.

```bash
dbx init \
    -p "cicd_tool=GitHub Actions" \
    -p "cloud=<your-cloud>" \
    -p "project_name=charming-aurora" \
    -p "profile=charming-aurora" \
    --no-input
```

--------------------------------

### Launching Deployed Databricks Workflow (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This command uses the dbx tool to trigger a run of the previously deployed Databricks Job corresponding to the "charming-aurora-sample-etl" workflow. The "--trace" flag can be added to wait for completion.

```bash
dbx launch charming-aurora-sample-etl
```

--------------------------------

### Create Conda Environment for Project

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Creates a new Conda environment named 'charming-aurora' with Python version 3.9 to isolate project dependencies.

```bash
conda create -n charming-aurora python=3.9
```

--------------------------------

### Activate Conda Environment

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Activates the newly created Conda environment 'charming-aurora' to ensure subsequent installations are within this isolated environment.

```bash
conda activate charming-aurora
```

--------------------------------

### Initializing dbx Project from Git Template (Example) - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Provides a concrete example of initializing a dbx project from a Git repository template using the `--path` argument with a placeholder URL. This demonstrates the simplest form of using a Git-based template.

```bash
dbx init --path=https://git/repo/with/template.git
```

--------------------------------

### Configuring Python Wheel Task Deployment (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This YAML snippet shows how to configure a "python_wheel_task" within a dbx deployment file. It specifies the package name, the entry point alias ("etl"), and parameters passed to the task.

```yaml
# relevant section of the deployment file, some parts are omitted
environments:
  default:
    workflows:
      - name: "charming-aurora-sample-etl"
        tasks:
          - task_key: "main"
            <<: *basic-static-cluster
            python_wheel_task:
              package_name: "charming_aurora"
              entry_point: "etl" # take a look at the setup.py entry_points section for details on how to define an entrypoint
              parameters: ["--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml"]
```

--------------------------------

### Task Configuration Loading Utility (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Python `Task` class provides a base for Databricks job tasks, handling initialization and configuration loading. The `_provide_config` method reads configuration from a file specified by the `--conf-file` command-line argument, parsed by `_get_conf_file` using `argparse`. The `_read_config` method uses `yaml.safe_load` to parse the configuration file, supporting `file:fuse://` references for files uploaded to the workspace. The `__init__` method allows passing initial configuration or defaults to reading from the file.

```Python
# some lines were intentionally omitted

class Task(ABC):

    def __init__(self, spark=None, init_conf=None):
        self.spark = self._prepare_spark(spark)
        self.logger = self._prepare_logger()
        self.dbutils = self.get_dbutils()
        if init_conf: #(1)
            self.conf = init_conf
        else:
            self.conf = self._provide_config()
        self._log_conf()

    def _provide_config(self):
        self.logger.info("Reading configuration from --conf-file job option")
        conf_file = self._get_conf_file()
        if not conf_file:
            self.logger.info(
                "No conf file was provided, setting configuration to empty dict."
                "Please override configuration in subclass init method"
            )
            return {}
        else:
            self.logger.info(f"Conf file was provided, reading configuration from {conf_file}")
            return self._read_config(conf_file)

    @staticmethod
    def _get_conf_file(): #(2)
        p = ArgumentParser()
        p.add_argument("--conf-file", required=False, type=str)
        namespace = p.parse_known_args(sys.argv[1:])[0]
        return namespace.conf_file

    @staticmethod
    def _read_config(conf_file) -> Dict[str, Any]:
        config = yaml.safe_load(pathlib.Path(conf_file).read_text()) #(3)
        return config
```
```

--------------------------------

### Defining Python Task Entrypoint (Databricks)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Python code defines an "entrypoint" function used by "python_wheel_task" and a "__main__" block for "spark_python_task". The "entrypoint" function initializes and launches the main task logic.

```python
# if you're using python_wheel_task, you'll need the entrypoint function to be used in setup.py
def entrypoint():  # pragma: no cover
    task = SampleETLTask()
    task.launch()

# if you're using spark_python_task, you'll need the __main__ block to start the code execution
if __name__ == '__main__':
    entrypoint()
```

--------------------------------

### Deploying Databricks Workflow as Job (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This command uses the dbx tool to deploy the specified workflow ("charming-aurora-sample-etl") configuration, typically defined in "deployment.yml", as a Databricks Job in the target environment.

```bash
dbx deploy charming-aurora-sample-etl
```

--------------------------------

### Setup Local Python Environment with Conda

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Creates and activates a local Python environment using Conda for project development. This step is necessary to isolate project dependencies. Requires Conda to be installed and configured.

```bash
conda create -n {{cookiecutter.project_slug}} python=3.9
conda activate {{cookiecutter.project_slug}}
```

--------------------------------

### Executing Databricks Task on All-Purpose Cluster (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This command executes a specific task ("main") from the "charming-aurora-sample-etl" package on a named all-purpose (interactive) Databricks cluster. It handles package building, uploading to DBFS, and running the task in isolation.

```bash
dbx execute charming-aurora-sample-etl --task=main --cluster-name="some-interactive-cluster-name"
```

--------------------------------

### Run Local Unit Tests with Coverage

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Executes the unit tests located in the 'tests/unit' directory using pytest. The '--cov' flag generates a code coverage report.

```bash
pytest tests/unit --cov
```

--------------------------------

### Example .dbx/project.json Configuration

Source: https://github.com/databrickslabs/dbx/blob/main/docs/reference/project.md

Example structure of the `.dbx/project.json` file. It defines environments, mapping them to Databricks workspaces, specifying artifact storage type (currently only `mlflow`), and referencing a local Databricks CLI profile for authentication. The `inplace_jinja_support` flag enables Jinja templating within the project.

```JSON
{
    "environments": {
        "default": {
            "profile": "charming-aurora",
            "storage_type": "mlflow",
            "properties": {
                "workspace_directory": "/Shared/dbx/charming_aurora",
                "artifact_location": "dbfs:/Shared/dbx/projects/charming_aurora"
            }
        }
    },
    "inplace_jinja_support": true
}
```

--------------------------------

### Generate Project Tree (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This bash command generates a tree view of the project directory structure up to 3 levels deep, excluding common build and version control artifacts like __pycache__, .git, .pytest_cache, and .coverage.

```bash
tree -L 3 -I __pycache__ -a -I .git -I .pytest_cache -I .coverage
```

--------------------------------

### Install JDK with Conda

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Installs a specific version of the OpenJDK using Conda. This is required if JDK is not already present on the local machine, as some Databricks tools or dependencies might rely on it.

```bash
conda install -c conda-forge openjdk=11.0.15
```

--------------------------------

### Defining Databricks Job Workflow (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This YAML configuration defines a Databricks job workflow using `dbx`. It utilizes YAML anchors (`&`, `<<: *`) to reuse cluster configurations. The workflow includes a single task named 'main' which is a `python_wheel_task` executing the 'etl' entry point from the 'charming_aurora' package. It demonstrates passing configuration via a file reference using `file:fuse://`.

```YAML
# Please read YAML documentation for details on how to use substitutions and anchors.
custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "10.4.x-cpu-ml-scala2.12"

  basic-static-cluster: &basic-static-cluster
    new_cluster:
      <<: *basic-cluster-props
      num_workers: 1
      node_type_id: "<some-node-type-id" # this value will be different depending on your cloud provider

environments:
  default:
    workflows:
      #######################################################################################
      # this is an example job with single ETL task based on 2.1 API and wheel_task format #
      ######################################################################################
      - name: "charming-aurora-sample-etl"
        tasks:
          - task_key: "main"
            <<: *basic-static-cluster
            python_wheel_task:
              package_name: "charming_aurora"
              entry_point: "etl" # take a look at the setup.py entry_points section for details on how to define an entrypoint
              parameters: ["--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml"]
```
```

--------------------------------

### Accessing Task Utilities in dbx Subclass (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This snippet illustrates how to access common utilities provided by the dbx Task base class within a subclass. It shows how to get the SparkSession (self.spark), DBUtils (self.dbutils), logger (self.logger), and configuration dictionary (self.conf).

```Python
from charming_aurora.common import Task

class SomeTask(Task):
   def launch(self):
      self.spark.range(100) # self.spark provides access to SparkSession object instance
      self.dbutils(...) # self.dbutils provide access to DBUtils
      self.logger.info("some msg") # access to Spark-level driver logger
      self.conf.get("database") # access to the configuration dict which is coming from a conf file or from a dict object in tests
```

--------------------------------

### Viewing Databricks Driver Logs in Real-time (bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This command, run within the Databricks cluster's WebTerminal, tails the active log4j driver log file, providing a real-time stream of log output, including "self.logger" statements.

```bash
tail -f /databricks/driver/logs/log4j-active.log
```

--------------------------------

### Running Unit Tests with Coverage (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Bash command runs the unit tests using pytest and includes the `--cov` flag to generate a code coverage report. This helps in understanding which parts of the code are exercised by the tests.

```bash
pytest tests/unit --cov
```

--------------------------------

### Implementing Sample ETL Task with dbx Task Class (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This snippet defines a sample ETL task using the dbx Task base class. It demonstrates how to fetch data using sklearn, convert it to a Spark DataFrame, and save it as a Delta table. It also shows the standard entrypoint and __main__ structure for dbx tasks, relying on the Task class for Spark/DBUtils/Logger initialization and configuration loading.

```Python
from charming_aurora.common import Task
from sklearn.datasets import fetch_california_housing
import pandas as pd


class SampleETLTask(Task):
    def _write_data(self):
        db = self.conf["output"].get("database", "default")
        table = self.conf["output"]["table"]
        self.logger.info(f"Writing housing dataset to {db}.{table}")
        _data: pd.DataFrame = fetch_california_housing(as_frame=True).frame
        df = self.spark.createDataFrame(_data)
        df.write.format("delta").mode("overwrite").saveAsTable(f"{db}.{table}")
        self.logger.info("Dataset successfully written")

    def launch(self):
        self.logger.info("Launching sample ETL task")
        self._write_data()
        self.logger.info("Sample ETL task finished!")

# if you're using python_wheel_task, you'll need the entrypoint function to be used in setup.py
def entrypoint():  # pragma: no cover
    task = SampleETLTask()
    task.launch()

# if you're using spark_python_task, you'll need the __main__ block to start the code execution
if __name__ == '__main__':
    entrypoint()
```

--------------------------------

### Configure Databricks CLI Token Authentication

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

Configures the Databricks CLI to use token authentication for a specific profile named 'charming-aurora'. The command will prompt for the Databricks workspace URL and the API token.

```bash
databricks configure --profile charming-aurora --token
```

--------------------------------

### Testing ETL and ML Jobs with Pytest (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This Python function demonstrates how to write a unit test for both ETL and ML tasks within a Databricks project using pytest. It shows how to instantiate task classes with configuration, launch them, and assert expected outcomes, such as table row counts for ETL and MLflow experiment/run existence for ML.

```python
from charming_aurora.tasks.sample_etl_task import SampleETLTask
from charming_aurora.tasks.sample_ml_task import SampleMLTask
from pyspark.sql import SparkSession
from pathlib import Path
import mlflow
import logging

def test_jobs(spark: SparkSession, tmp_path: Path):
    logging.info("Testing the ETL job")
    common_config = {"database": "default", "table": "sklearn_housing"}
    test_etl_config = {"output": common_config}
    etl_job = SampleETLTask(spark, test_etl_config)
    etl_job.launch()
    table_name = f"{test_etl_config['output']['database']}.{test_etl_config['output']['table']}"
    _count = spark.table(table_name).count()
    assert _count > 0
    logging.info("Testing the ETL job - done")

    logging.info("Testing the ML job")
    test_ml_config = {
        "input": common_config,
        "experiment": "/Shared/charming-aurora/sample_experiment"
    }
    ml_job = SampleMLTask(spark, test_ml_config)
    ml_job.launch()
    experiment = mlflow.get_experiment_by_name(test_ml_config['experiment'])
    assert experiment is not None
    runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id])
    assert runs.empty is False
    logging.info("Testing the ML job - done")
```

--------------------------------

### Deploy and Launch Workflow on Job Cluster (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Deploys workflow assets to Databricks and then launches a run of the workflow on a job cluster using the deployed assets. This simulates a production job run.

```bash
dbx deploy <workflow-name> --assets-only
dbx launch <workflow-name>  --from-assets --trace
```

--------------------------------

### Create Databricks Repo from Git URL (CLI)

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Creates a new Databricks Repo linked to a specified Git repository URL using the Databricks CLI. This command integrates your Git repository with the Databricks Repos feature.

```bash
databricks repos create --url <your repo URL> --provider <your-provider>
```

--------------------------------

### Initializing dbx Project from Python Package Template - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Demonstrates how to initialize a dbx project using a template that has been installed as a Python package. The `--package` argument specifies the name of the installed package containing the template.

```bash
dbx init --package=my-template-pkg
```

--------------------------------

### Tag and Push Git for Release

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Creates an annotated Git tag for a specific version and pushes the tag to the remote repository. This action is typically used to trigger a CI/CD release pipeline.

```bash
git tag -a v<your-project-version> -m "Release tag for version <your-project-version>"
git push origin --tags
```

--------------------------------

### Testing ETL Job with Pytest (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/python_quickstart.md

This snippet focuses specifically on the ETL task testing portion of the `test_jobs` function. It illustrates how to configure and launch an ETL task programmatically within a test and verify its output, such as checking the row count of the resulting table.

```python
# imports are omitted
def test_jobs(spark: SparkSession, tmp_path: Path):
    logging.info("Testing the ETL job")
    common_config = {"database": "default", "table": "sklearn_housing"}
    test_etl_config = {"output": common_config}
    etl_job = SampleETLTask(spark, test_etl_config)
    etl_job.launch()
    table_name = f"{test_etl_config['output']['database']}.{test_etl_config['output']['table']}"
    _count = spark.table(table_name).count()
    assert _count > 0
    logging.info("Testing the ETL job - done")
    # code of the ML task test is omitted
```

--------------------------------

### Defining Basic dbx Deployment File Structure (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/reference/deployment.md

This snippet illustrates the standard structure of a `dbx` deployment file in YAML format. It includes the `build` configuration, the required `environments` section with a named environment (e.g., `default`), and the required `workflows` section containing workflow definitions. The example shows a basic workflow with a `python_wheel_task`.

```yaml
build: #(1)
  python: "pip"

environments: #(2)
  default: #(3)
    workflows: #(4)
      - name: "workflow1" #(5)
        tasks:
          - task_key: "task1"
            # example task payload
            python_wheel_task:
              package_name: "some-pkg"
              entry_point: "some-ep"
```

--------------------------------

### Project Structure Example (Shell)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/file_references.md

Illustrates a typical project directory structure for a dbx project, showing the location of source code, tasks, and configuration files. This structure is relevant for understanding how file references are resolved relative to the project root.

```shell
.
├── charming_aurora #
│   ├── __init__.py
│   ├── common.py
│   └── tasks
│       ├── __init__.py
│       ├── sample_etl_task.py
│       └── sample_ml_task.py
├── conf
│   ├── deployment.yml
│   └── tasks
│       ├── sample_etl_config.yml
│       └── sample_ml_config.yml
```

--------------------------------

### Example Included Cluster Configuration (JSON)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

A JSON snippet representing a cluster configuration that can be included in a main deployment file using Jinja's 'include' tag, demonstrating reusable configuration blocks.

```json
{
    "spark_version": "some-version",
    "node_type_id": "some-node-type",
    "aws_attributes": {
        "first_on_demand": 0,
        "availability": "SPOT"
    },
    "num_workers": 2
}
```

--------------------------------

### Installing dbx with pip (Shell)

Source: https://github.com/databrickslabs/dbx/blob/main/README.md

This command demonstrates how to install the dbx package using pip, the standard package installer for Python. It requires Python 3.8+ and pip or conda to be installed on the system.

```shell
pip install dbx
```

--------------------------------

### Initializing dbx Project from Versioned Git Template - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Illustrates how to use the `--checkout` flag with `dbx init` to specify a particular version (tag, branch, or commit) when initializing a project from a Git repository template. This is useful for ensuring reproducible project setups.

```bash
#specific tag
dbx init --path=https://git/repo/with/template.git --checkout=v0.0.1

#specific branch
dbx init --path=https://git/repo/with/template.git --checkout=prod

#specific git commit
dbx init --path=https://git/repo/with/template.git --checkout=aaa111bbb
```

--------------------------------

### Installing Python Template Package - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Shows the command to install a Python package containing a dbx template using pip. This step is required before initializing a project from a template shipped as a Python package.

```bash
pip install "my-template-pkg==0.0.1" # or whatever version
```

--------------------------------

### Examples of Passing Parameters for Specific Task Types (dbx launch, bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/passing_parameters.md

These examples illustrate the required structure of the `--parameters` argument for various Databricks job task types (spark_jar_task, notebook_task, spark_python_task, python_wheel_task, spark_submit_task, pipeline_task, sql_task, dbt_task) when launching a workflow using `dbx launch`.

```bash
dbx launch <workflow_name> --parameters='{"jar_params": ["a1", "b1"]}' # spark_jar_task
dbx launch <workflow_name> --parameters='{"notebook_params":{"name":"john doe","age":"35"}}' # notebook_task
dbx launch <workflow_name> --parameters='{"python_params":["john doe","35"]}' # spark_python_task or python_wheel_task
dbx launch <workflow_name> --parameters='{"spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]}' # spark_submit_task
dbx launch <workflow_name> --parameters='{"python_named_params": {"name": "task", "data": "dbfs:/path/to/data.json"}}' # python_wheel_task
dbx launch <workflow_name> --parameters='{"pipeline_params": {"full_refresh": true}}' # pipeline_task as a part of a workflow
dbx launch <workflow_name> --parameters='{"sql_params": {"name": "john doe", "age": "35"}}' # sql_task
dbx launch <workflow_name> --parameters='{"dbt_commands": ["dbt deps", "dbt seed", "dbt run"]}' # dbt_task
```

--------------------------------

### Run Unit Tests with Pytest

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Executes unit tests located in the `tests/unit` directory using pytest. The `--cov` flag generates a coverage report to assess code test coverage.

```bash
pytest tests/unit --cov
```

--------------------------------

### Launching DLT Pipeline with Parameters using dbx

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/delta_live_tables.md

These Bash commands illustrate how to pass parameters to a Delta Live Tables (DLT) pipeline update when launching it via dbx. The --parameters flag is used, followed by a JSON string containing the desired parameters. Examples include triggering a full refresh or specifying a selection of tables for refresh, following the structure expected by the DLT API's Start Update endpoint.

```bash
dbx launch <pipeline-name> --parameters='{ "full_refresh": "true" }' # for full refresh
```

```bash
dbx launch <pipeline-name> --parameters='{ "refresh_selection": ["sales_orders_cleaned", "sales_order_in_chicago"] }' # start an update of selected tables
```

```bash
dbx launch <pipeline-name> --parameters='{
  "refresh_selection": ["sales_orders_cleaned", "sales_order_in_chicago"],
  "full_refresh_selection": ["customers", "sales_orders_raw"]
}' # start a full update of selected tables
```

--------------------------------

### Install Extras During dbx Execute (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/dependency_management.md

This Bash command shows how to instruct the dbx tool to install specific dependency 'extras' (defined in setup.py or pyproject.toml) when executing a job on a Databricks cluster, allowing environment-specific dependency installation.

```bash
dbx execute ... --pip-install-extras="test,other-extra,one-more-extra"
```

--------------------------------

### Running dbx Development Commands - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/contrib/CONTRIBUTING.md

These commands demonstrate how to use the `make` tool for common development tasks within the dbx project. They cover displaying help, cleaning and installing dependencies, running tests (including specific test files), fixing code formatting, and running linters.

```bash
make help
make clean install

make test
make test /tests/path/to/blah_test.py

make fix
make lint
```

--------------------------------

### Including Configuration Snippets in dbx Deployment (JSON/Jinja)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

Example of a dbx deployment JSON file using Jinja's 'include' tag to insert content from another file ('includes/cluster-test.json.j2') into the 'new_cluster' field, promoting modularity.

```json
{
    "environments": {
        "default": {
            "jobs": [
                {
                    "name": "your-job-name",
                    "new_cluster": {% include 'includes/cluster-test.json.j2' %},
                    "libraries": [],
                    "max_retries": 0,
                    "spark_python_task": {
                        "python_file": "file://placeholder_1.py"
                    }
                }
            ]
        }
    }
}
```

--------------------------------

### Execute Workflow on All-Purpose Cluster (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Runs a specified dbx workflow on an existing all-purpose Databricks cluster. This is useful for testing workflows in an interactive environment.

```bash
dbx execute <workflow-name> --cluster-name=<name of all-purpose cluster>
```

--------------------------------

### Configure JVM Project Deployment - dbx YAML - YAML

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/jvm/jvm_devloop.md

Provides an example of a dbx deployment file (deployment.yml) for a JVM project. It defines cluster properties, build commands (mvn clean package), and a workflow task that uses an instance pool, specifies a JAR library built locally, disables default packaging, and sets the main class name for a spark_jar_task. It highlights using Jinja functions and instance pools for faster iteration.

```YAML
custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "10.4.x-cpu-ml-scala2.12"

  basic-static-cluster: &basic-static-cluster
    new_cluster:
      <<: *basic-cluster-props
    num_workers: 2
    instance_pool_name: "dev-instance-pool-created-above" #(1)
    driver_instance_pool_name: "dev-instance-pool-created-above" #(2)

build:
    commands:
        - "mvn clean package" #(3)

environments:
  default:
    workflows:
      - name: "charming-aurora-sample-jvm"
        tasks:
          - task_key: "main"
            <<: *basic-static-cluster
            libraries:
              - jar: "{{ 'file://' + dbx.get_last_modified_file('target/scala-2.12', 'jar') }}" #(4)
            deployment_config: #(5)
              no_package: true
            spark_jar_task:
                main_class_name: "org.some.main.ClassName"
                parameters: []
```

--------------------------------

### Installing dbx with Cloud Storage Dependencies (Bash)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/concepts/artifact_storage.md

These snippets show the extra identifiers to append to the `pip install dbx` command to include necessary libraries for interacting with cloud-based artifact storage locations. `dbx[azure]` adds support for `wasbs://`, `dbx[aws]` for `s3://`, and `dbx[gcp]` for `gs://`. These extras are required for `dbx` to perform upload/download operations on these storage types.

```Bash
dbx[azure]
```

```Bash
dbx[aws]
```

```Bash
dbx[gcp]
```

--------------------------------

### Example Jinja Variables File (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

A YAML file defining variables ('TASK_CLUSTER', 'TASK_NAME') that can be referenced in dbx deployment Jinja templates using the 'var['...']' syntax, providing a structured way to manage parameters.

```yaml
TASK_CLUSTER:
    MIN_WORKERS: 1
    MAX_WORKERS: 5
TASK_NAME: 'main'
```

--------------------------------

### Install Python Wheel with Notebook-Scoped Libraries using dbx

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/dependency_management.md

This command is used within a Databricks notebook cell to install a Python wheel containing project code and dependencies. It leverages notebook-scoped libraries, ensuring the installation is specific to the current user's context and doesn't require a cluster restart. The `--force-reinstall` flag ensures the package is reinstalled even if a previous version exists.

```Databricks Notebook (Python)
%pip install --force-reinstall <versioned-wheel-path>/package.whl
```

--------------------------------

### Installing Package from Custom Repository in Notebook (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/devloop/mixed.md

Installs a Python package from a configured custom pypi repository (like Artifactory) directly within a notebook cell using the `%pip` magic command. This is useful for consuming packaged code with specific versions.

```python
%pip install package-from-artifactory
```

--------------------------------

### Referencing Environment Variables in dbx Deployment (YAML/Jinja)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

Example of a dbx deployment YAML file using Jinja syntax ('{{ env['VAR_NAME'] }}') to dynamically set a job tag based on an environment variable, adding flexibility to deployments.

```yaml
# only relevant block shown
environments:
  default:
    - name: "job-with-tags"
      tags:
       - job_group: "{{ env['JOB_GROUP'] }}"
```

--------------------------------

### Accessing packaged files with pkg_resources (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/packaging_files.md

Shows how to use the `pkg_resources.resource_filename` function from the `setuptools` package to get the filesystem path to files included in the Python package via `package_data`. This allows the application code to read or process these files.

```Python
import pkg_resources

raw_csv_path = pkg_resources.resource_filename(
    "<package-name>", "resources/raw/username.csv"
)
query_path = pkg_resources.resource_filename(
    "<package-name>", "resources/sql/create_table.sql"
)
```

--------------------------------

### Execute Workflow Interactively on Cluster (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Executes a dbx workflow interactively on a specified Databricks cluster. This command is useful for development and debugging directly on the cluster environment.

```bash
dbx execute <workflow-name> \
    --cluster-name="<some-cluster-name>"
```

--------------------------------

### dbx Deployment Configuration (Policy & Init Script)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/named_properties.md

Example of a dbx deployment configuration in YAML format. It demonstrates how to reference a cluster policy by name or ID and how to define additional init scripts directly within the cluster definition. dbx will merge these with the policy's init scripts.

```yaml
# irrelevant parts are omitted
environments:
  default:
    workflows:
      - name: workflow_name
        job_clusters:
        - new_cluster:
            policy_id: "cluster-policy://policy-with-pip-install-script"
            init_scripts:
            - dbfs:
                destination: dbfs:/some/path/install_sql_driver.sh
        tasks:
         ...
```

--------------------------------

### Referencing Variables File in dbx Deployment (YAML/Jinja)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

Example of a dbx deployment YAML file using Jinja syntax ('{{ var['...'] }}') to dynamically set task key and cluster autoscale parameters based on variables defined in a separate variables file.

```yaml
# irrelevant config parts are omitted
environments:
  default:
    workflows:
      - name: "charming-aurora-sample-etl"
        tasks:
          - task_key: "{{ var['TASK_NAME'] }}"
            new_cluster:
                autoscale:
                    min_workers: {{ var['TASK_CLUSTER']['MIN_WORKERS'] }}
                    max_workers: {{ var['TASK_CLUSTER']['MAX_WORKERS'] }}
```

--------------------------------

### Configure Custom PyPI Index with Bash Init Script

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/dependency_management.md

This bash script, intended as an init script, modifies the /etc/pip.conf file to add a custom PyPI index URL alongside the default one. This allows pip to install packages from the specified private repository during cluster initialization.

```Bash
echo """[global]
index-url=https://pypi.org/simple
extra-index-url=https://my.custom.pypi.example.com/simple/
""" > /etc/pip.conf
```

--------------------------------

### Passing Parameters using dbx launch --from-assets (Jobs API 2.0, notebook_task)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/passing_parameters.md

Example of passing parameters to a `notebook_task` using `dbx launch --from-assets` with the Jobs API 2.0 format. Parameters are provided as a JSON object under the 'base_parameters' key.

```bash
dbx launch <workflow_name> --from-assets --parameters='{"base_parameters": {"key1": "value1", "key2": "value2"}}'
```

--------------------------------

### Invalid Combination of Build Options in dbx Deployment YAML

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/build_management.md

Shows an example of an invalid `deployment.yml` configuration where multiple exclusive build options (like `python` and `commands`) are specified simultaneously. This configuration will not work.

```YAML
build:
 python: "pip"
 commands:
 - "echo 'building!'"
 - "sleep 5"
 - "mvn clean package"
```

--------------------------------

### Passing Parameters using dbx launch (Jobs API 2.0, spark_submit_task)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/passing_parameters.md

Example of passing parameters to a `spark_submit_task` using the standard `dbx launch` command with the Jobs API 2.0 format. Parameters are provided as an array under the 'spark_submit_params' key.

```bash
dbx launch <workflow_name> --parameters='{"spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]}'
```

--------------------------------

### Execute Specific Task in Multitask Job (dbx)

Source: https://github.com/databrickslabs/dbx/blob/main/src/dbx/templates/projects/python_basic/render/{{cookiecutter.project_name}}/README.md

Executes a single task within a multitask job definition on an all-purpose Databricks cluster using dbx. This allows for targeted testing of individual job components.

```bash
dbx execute <workflow-name> \
    --cluster-name=<name of all-purpose cluster> \
    --job=<name of the job to test> \
    --task=<task-key-from-job-definition>
```

--------------------------------

### Configuring Different Workflow Types in dbx Deployment File (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/reference/deployment.md

This snippet demonstrates how to define different types of workflows within the `workflows` section of a `dbx` deployment file. It shows examples for `jobs-v2.1` (identified by the `tasks` section), `jobs-v2.0` (identified by the absence of `tasks`), and `pipeline` (explicitly set using `workflow_type: "pipeline"`).

```yaml
build:
  python: "pip"

environments:
  default:
    workflows:

      ################################################
      - name: "workflow-in-v2.1-format"
        tasks:
          - task_key: "task1"
            python_wheel_task:
              package_name: "some-pkg"
              entry_point: "some-ep"

      ################################################
      - name: "workflow-in-v2.0-format"
        spark_python_task:
          python_file: "file://some/file.py"

      ################################################
      - name: "workflow-in-pipeline-format"
        target: "some-target-db"
        workflow_type: "pipeline" # enforces the recognition
        libraries:
          - notebook:
              path: "/Repos/some/path"
```

--------------------------------

### Databricks Workflow Deployment with Git Source (YAML)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/devops/mixed.md

Example deployment configuration file (`conf/deployment.yml`) for a mixed-mode Databricks project. It defines a workflow with tasks, including a notebook task sourced from a remote Git repository (`git_source`) and a Python wheel task. The `deployment_config` for the notebook task disables automatic package dependency, as the package code is imported manually in the notebook.

```YAML
environments:
  default:
    workflows:
      - name: "mixed-mode-workflow"
        job_clusters:
        # omitted
        git_source:
          git_url: https://some-git-provider.com/some/remote/repo.git
          git_provider: "git-provider-name"
          git_branch: "main" # or git_tag or git_commit
        tasks:
          - task_key: "notebook-remote"
            notebook_task:
              notebook_path: "notebooks/sample_notebook"
            deployment_config:
              no_package: true
            job_cluster_key: "default"
          - task_key: "packaged"
            python_wheel_task:
              package_name: "<your-package-name>"
              entry_point: "<your-entry-point>"
            job_cluster_key: "default"
```

--------------------------------

### Using Custom Jinja Functions in dbx Deployment (YAML/Jinja)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/jinja_support.md

Example of a dbx deployment YAML file using Jinja syntax ('{{ custom.<function_name>(...) }}') to call a custom Python function ('custom.multiply_by_two') to dynamically set a cluster configuration value.

```yaml
# irrelevant config parts are omitted
environments:
  default:
    workflows:
      - name: "charming-aurora-sample-etl"
        tasks:
          - task_key: "some-task"
            new_cluster:
                autoscale:
                    min_workers: 1
                    max_workers: {{ custom.multiply_by_two(2) }}
```

--------------------------------

### Initializing dbx Project from Git Template (Generic) - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Shows the basic syntax for initializing a dbx project using a template located in a Git repository. The `--path` argument specifies the repository URL, and the optional `--checkout` argument allows specifying a branch, tag, or commit.

```bash
dbx init --path PATH [--checkout LOC]
```

--------------------------------

### Passing Task-Specific Parameters using dbx launch --from-assets (Jobs API 2.1, pipeline_task)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/passing_parameters.md

Example of passing task-specific parameters to a `pipeline_task` using `dbx launch --from-assets` with the Jobs API 2.1 format. Parameters are provided as an array of objects, each specifying 'task_key' and parameters like 'full_refresh'.

```bash
dbx launch <workflow_name> --from-assets <workflow_name> --parameters='[
    {"task_key": "some", "full_refresh": true}
]'
```

--------------------------------

### Define Dependencies with Setuptools in Python

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/dependency_management.md

This Python snippet shows a typical setup.py file structure using setuptools to define different sets of dependencies: main package requirements, local development requirements, and test requirements, using the extras_require mechanism.

```python
from setuptools import find_packages, setup
from your_package_name import __version__

PACKAGE_REQUIREMENTS = ["pyyaml"] #(1)

LOCAL_REQUIREMENTS = [ #(2)
  "pyspark==3.2.1",
  "delta-spark==1.1.0",
  "scikit-learn",
  "pandas",
  "mlflow",
]

TEST_REQUIREMENTS = [ #(3)
  # development & testing tools
  "pytest",
  "coverage[toml]",
  "pytest-cov",
  "dbx>=0.8"
]

setup(
  name="your_package_name",
  packages=find_packages(exclude=["tests", "tests.*"]),
  setup_requires=["setuptools","wheel"],
  install_requires=PACKAGE_REQUIREMENTS,
  extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS}, #(4)
  version=__version__,
)
```

--------------------------------

### Initializing dbx Project with Default Template - Bash

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/general/custom_templates.md

Shows how to use the `--template` option to initialize a dbx project using one of the built-in templates provided by dbx. Currently, `python_basic` is the only default template available.

```bash
dbx init --template=python_basic
```

--------------------------------

### Bash Commands to Run Tests on Job Cluster

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/integration_tests.md

These bash commands demonstrate how to deploy and launch the 'sample-tests' workflow as a job cluster using assets-based deployment. The first command deploys the necessary assets, and the second command launches the job using those deployed assets.

```Bash
dbx deploy sample-tests --assets-only
dbx launch sample-tests --from-assets
```

--------------------------------

### Databricks Cluster Policy Definition (Init Script)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/features/named_properties.md

Example of a Databricks cluster policy definition in JSON format that enforces the inclusion of a specific init script by setting its destination as a fixed property.

```json
{
  "init_scripts.0.dbfs.destination": {
    "type": "fixed",
    "value": "dbfs://some/path/script.sh"
  }
}
```

--------------------------------

### Configuring package_data in setup.py (Python)

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/packaging_files.md

Demonstrates how to use the `package_data` field in a `setup.py` file to include arbitrary files (like SQL or CSV) within a Python package. It shows how to specify file patterns relative to the package directory.

```Python
from setuptools import setup
setup(
    ...
    package_data={'': ['resources/sql/*.sql', "resources/raw/*.csv"]},
    ...
)
```

--------------------------------

### YAML Workflow Configuration for Tests

Source: https://github.com/databrickslabs/dbx/blob/main/docs/guides/python/integration_tests.md

This YAML configuration defines a 'sample-tests' workflow within the default environment. It sets up a task named 'main' that uses a basic static cluster and executes the Python entrypoint script `tests/entrypoint.py` as a Spark Python task, passing parameters to pytest to run tests and collect coverage.

```YAML
environments:
  default:
    workflows:
      - name: "sample-tests"
        tasks:
          - task_key: "main"
            <<: *basic-static-cluster
            spark_python_task:
                python_file: "file://tests/entrypoint.py"
                # this call supports all standard pytest arguments
                parameters: ["file:fuse://tests/integration", "--cov=<insert-your-package-name>"]
```

--------------------------------

### Enabling Photon Runtime Engine for Databricks Job Clusters

Source: https://github.com/databrickslabs/dbx/blob/main/docs/reference/deployment.md

Provides a YAML configuration snippet demonstrating how to specify the `runtime_engine: PHOTON` property within a job cluster definition in the `dbx` deployment file to enable the Databricks Photon runtime.

```YAML
custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "your-spark-version"
    node_type_id: "your-node-type-id"
    spark_conf:
      spark.databricks.delta.preview.enabled: 'true'
    instance_pool_name: <enter pool name>
    driver_instance_pool_name: <enter pool name>
    runtime_engine: PHOTON
    init_scripts:
      - dbfs:
        destination: dbfs:/<enter your path>

```