Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
Databricks MLOps Stacks
https://github.com/databricks/mlops-stacks
Admin
Databricks MLOps Stacks is a customizable template for starting new machine learning projects on
...
Tokens:
9,227
Snippets:
92
Trust Score:
9.6
Update:
1 week ago
Context
Skills
Chat
Benchmark
83.8
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# Databricks MLOps Stacks Databricks MLOps Stacks is a customizable [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for bootstrapping production-grade machine learning projects on the Databricks Lakehouse platform. It provides a complete, opinionated project scaffold that covers three modular pillars: example ML code (model training, batch inference, feature engineering), ML resources defined as code via Databricks CLI bundles (jobs, experiments, model registry entries), and CI/CD pipelines for GitHub Actions, GitHub Actions for GitHub Enterprise Servers, Azure DevOps, and GitLab. The stack targets three runtime environments — dev, staging, and prod — and is compatible with AWS, Azure, and GCP. Out of the box, MLOps Stacks generates a project where data scientists can iterate on ML notebooks and Python modules, open pull requests that automatically trigger unit and integration tests in an isolated staging workspace, and promote tested code to production via a release branch. Optional extensions include Databricks Feature Store integration, MLflow Recipes-based training pipelines, and Unity Catalog model registration with three-level namespace support (`<catalog>.<schema>.<model>`). The template is implemented as a Go-template bundle and is instantiated entirely through the `databricks bundle init` CLI command — no Python cookiecutter tooling is required. --- ## Project Initialization ### `databricks bundle init` — Create a new MLOps project interactively Runs the bundle template wizard, prompting for all required parameters and generating the full project directory structure. Requires Databricks CLI ≥ v0.236.0. ```bash # Install / upgrade Databricks CLI (Linux / macOS) curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh # Authenticate the CLI databricks configure --host https://your-workspace.cloud.databricks.com # Initialize a new MLOps Stacks project interactively databricks bundle init mlops-stacks # The wizard will prompt for: # input_setup_cicd_and_project : CICD_and_Project | Project_Only | CICD_Only # input_project_name : my-fraud-detection (≥3 chars, no spaces / . \ /) # input_cloud : azure | aws | gcp # input_cicd_platform : github_actions | github_actions_for_github_enterprise_servers | # azure_devops | gitlab # input_databricks_staging_workspace_host : https://adb-xxxx.xx.azuredatabricks.net # input_databricks_prod_workspace_host : https://adb-yyyy.yy.azuredatabricks.net # input_default_branch : main # input_release_branch : release # ... (additional optional parameters below) ``` --- ## Non-Interactive Initialization via Config File ### `databricks bundle init --config-file` — Instantiate a project from a JSON parameter file Skips the interactive wizard entirely. Ideal for automation, onboarding scripts, and CI pipelines that must spin up new ML projects programmatically. ```bash # AWS project with GitHub Actions and Unity Catalog enabled cat > /tmp/my-project-config.json << 'EOF' { "input_setup_cicd_and_project": "CICD_and_Project", "input_root_dir": "my-fraud-project", "input_project_name": "my-fraud-project", "input_cloud": "aws", "input_cicd_platform": "github_actions", "input_databricks_staging_workspace_host": "https://your-staging-workspace.cloud.databricks.com", "input_databricks_prod_workspace_host": "https://your-prod-workspace.cloud.databricks.com", "input_default_branch": "main", "input_release_branch": "release", "input_read_user_group": "ml-team", "input_include_feature_store": "no", "input_include_mlflow_recipes": "no", "input_include_models_in_unity_catalog": "yes", "input_staging_catalog_name": "staging", "input_prod_catalog_name": "prod", "input_test_catalog_name": "test", "input_schema_name": "my_fraud_project", "input_unity_catalog_read_user_group": "account users", "input_inference_table_name": "dev.my_fraud_project.predictions" } EOF databricks bundle init mlops-stacks \ --config-file /tmp/my-project-config.json \ --output-dir /workspace/projects # Generated structure: # /workspace/projects/my-fraud-project/ # my_fraud_project/ # databricks.yml ← root bundle config # resources/ ← ML job / experiment / model definitions # training/notebooks/ ← Train.py # deployment/batch_inference/ ← BatchInference.py # deployment/model_deployment/ # tests/ # .github/workflows/ ← CI/CD workflow files # docs/ # README.md ``` --- ## Setup Mode: Project Only ### `input_setup_cicd_and_project: Project_Only` — Bootstrap ML code without CI/CD Generates only the ML code scaffold and Databricks bundle resource configs. Designed for data scientists who want to start iterating immediately; CI/CD can be added later by an ML engineer using `CICD_Only` mode. ```bash cat > /tmp/project-only.json << 'EOF' { "input_setup_cicd_and_project": "Project_Only", "input_root_dir": "my-churn-project", "input_project_name": "my-churn-project", "input_cloud": "azure", "input_read_user_group": "data-science-team", "input_include_feature_store": "no", "input_include_mlflow_recipes": "no", "input_include_models_in_unity_catalog": "no", "input_schema_name": "schema", "input_unity_catalog_read_user_group": "account users", "input_inference_table_name": "dev.my_churn_project.predictions" } EOF databricks bundle init mlops-stacks \ --config-file /tmp/project-only.json \ --output-dir ~/projects # After iterating on ML code, an ML engineer can add CI/CD later: cat > /tmp/cicd-only.json << 'EOF' { "input_setup_cicd_and_project": "CICD_Only", "input_root_dir": "my-churn-project", "input_cloud": "azure", "input_cicd_platform": "github_actions", "input_databricks_staging_workspace_host": "https://adb-staging.azuredatabricks.net", "input_databricks_prod_workspace_host": "https://adb-prod.azuredatabricks.net", "input_default_branch": "main", "input_release_branch": "release" } EOF databricks bundle init mlops-stacks \ --config-file /tmp/cicd-only.json \ --output-dir ~/projects ``` --- ## Databricks Bundle Resource Config (`databricks.yml`) ### `databricks.yml` — Root bundle configuration defining targets and variables Declares per-environment workspace URLs, Unity Catalog catalog overrides, and references to all ML resource YAML files. This file is the entry point for all `databricks bundle` commands run inside a generated project. ```yaml # <project_root>/my_fraud_project/databricks.yml (rendered from template) bundle: name: my-fraud-project variables: experiment_name: description: Experiment name for the model training. default: /Users/${workspace.current_user.userName}/${bundle.target}-my-fraud-project-experiment model_name: description: Model name for the model training. default: my-fraud-project-model # Unity Catalog: ${var.catalog_name}.schema.model-name catalog_name: description: The catalog name to save the trained model include: - ./resources/batch-inference-workflow-resource.yml - ./resources/ml-artifacts-resource.yml - ./resources/model-workflow-resource.yml # - ./resources/monitoring-resource.yml # uncomment after inference table exists targets: dev: mode: development default: true variables: catalog_name: dev workspace: host: # TODO: add dev workspace URL staging: variables: catalog_name: staging workspace: host: https://adb-staging.11.azuredatabricks.net prod: variables: catalog_name: prod workspace: host: https://adb-prod.22.azuredatabricks.net test: variables: catalog_name: test workspace: host: https://adb-staging.11.azuredatabricks.net # reuse staging for tests # Deploy resources to staging: # databricks bundle deploy --target staging # # Run the model training job in dev: # databricks bundle run model_training_job --target dev ``` --- ## Model Training Job Resource ### `model-workflow-resource.yml` — Databricks job definition for the training pipeline Defines a scheduled Databricks Jobs workflow with three sequential tasks: `Train → ModelValidation → ModelDeployment`. Cloud-specific cluster node types are resolved automatically from the `input_cloud` parameter. ```yaml # resources/model-workflow-resource.yml (rendered example for AWS + Unity Catalog) new_cluster: &new_cluster new_cluster: num_workers: 3 spark_version: 15.3.x-cpu-ml-scala2.12 node_type_id: i3.xlarge # azure: Standard_D3_v2 / gcp: n2-highmem-4 data_security_mode: "SINGLE_USER" custom_tags: clusterSource: mlops-stacks_0.4 resources: jobs: model_training_job: name: ${bundle.target}-my-fraud-project-model-training-job tasks: - task_key: Train notebook_task: notebook_path: ../training/notebooks/Train.py base_parameters: env: ${bundle.target} training_data_path: /databricks-datasets/nyctaxi-with-zipcodes/subsampled experiment_name: ${var.experiment_name} model_name: ${var.catalog_name}.my_fraud_project.${var.model_name} git_source_info: "url:${bundle.git.origin_url}; branch:${bundle.git.branch}" - task_key: ModelValidation depends_on: [{task_key: Train}] notebook_task: notebook_path: ../validation/notebooks/ModelValidation.py base_parameters: experiment_name: ${var.experiment_name} run_mode: dry_run # options: disabled | dry_run | enabled enable_baseline_comparison: "false" validation_input: "SELECT * FROM delta.`dbfs:/databricks-datasets/nyctaxi-with-zipcodes/subsampled`" model_type: regressor targets: fare_amount custom_metrics_loader_function: custom_metrics validation_thresholds_loader_function: validation_thresholds - task_key: ModelDeployment depends_on: [{task_key: ModelValidation}] notebook_task: notebook_path: ../deployment/model_deployment/notebooks/ModelDeployment.py base_parameters: env: ${bundle.target} schedule: quartz_cron_expression: "0 0 9 * * ?" # daily at 9 AM UTC timezone_id: UTC permissions: - level: CAN_VIEW group_name: users ``` --- ## Batch Inference Job Resource ### `batch-inference-workflow-resource.yml` — Databricks job for scheduled batch scoring Runs a batch inference notebook on a schedule, reading from a configured input table and writing predictions back to a Delta output table. Supports both Unity Catalog and workspace model registry. ```yaml # resources/batch-inference-workflow-resource.yml (rendered example) resources: jobs: batch_inference_job: name: ${bundle.target}-my-fraud-project-batch-inference-job tasks: - task_key: BatchInference notebook_task: notebook_path: ../deployment/batch_inference/notebooks/BatchInference.py base_parameters: env: ${bundle.target} # Unity Catalog model URI model_name: ${var.catalog_name}.my_fraud_project.${var.model_name} # Input table to run inference on input_table_name: staging.my_fraud_project.input_features # Output table to write predictions to output_table_name: staging.my_fraud_project.predictions git_source_info: "url:${bundle.git.origin_url}; branch:${bundle.git.branch}" schedule: quartz_cron_expression: "0 0 12 * * ?" # daily at noon UTC timezone_id: UTC permissions: - level: CAN_VIEW group_name: users # Deploy and run manually: # databricks bundle deploy --target staging # databricks bundle run batch_inference_job --target staging ``` --- ## Feature Store Integration ### `input_include_feature_store: yes` — Enable Databricks Feature Store components When enabled, the generated project includes feature engineering Python modules, a `GenerateAndWriteFeatures` notebook, Feature Store job resource configs, and integration tests for the feature pipeline. Feature tables follow the naming convention `<catalog>.<schema>.trip_pickup_features` / `trip_dropoff_features`. ```bash # Initialize with Feature Store enabled cat > /tmp/fs-project.json << 'EOF' { "input_setup_cicd_and_project": "CICD_and_Project", "input_project_name": "taxi-fare-prediction", "input_root_dir": "taxi-fare-prediction", "input_cloud": "aws", "input_cicd_platform": "github_actions", "input_databricks_staging_workspace_host": "https://staging.cloud.databricks.com", "input_databricks_prod_workspace_host": "https://prod.cloud.databricks.com", "input_default_branch": "main", "input_release_branch": "release", "input_read_user_group": "users", "input_include_feature_store": "yes", "input_include_mlflow_recipes": "no", "input_include_models_in_unity_catalog": "yes", "input_staging_catalog_name": "staging", "input_prod_catalog_name": "prod", "input_test_catalog_name": "test", "input_schema_name": "taxi_fare_prediction", "input_unity_catalog_read_user_group": "account users", "input_inference_table_name": "dev.taxi_fare_prediction.predictions" } EOF databricks bundle init mlops-stacks --config-file /tmp/fs-project.json --output-dir ~/projects # Generated Feature Store artifacts: # taxi_fare_prediction/ # feature_engineering/ # __init__.py # features/ # pickup_features.py ← FeatureFunction definitions # dropoff_features.py # notebooks/ # GenerateAndWriteFeatures.py # resources/ # feature-engineering-workflow-resource.yml # tests/ # feature_engineering/ # pickup_features_test.py # dropoff_features_test.py ``` --- ## MLflow Recipes Integration ### `input_include_mlflow_recipes: yes` — Enable MLflow Recipes-based training Replaces the default `Train.py` notebook with a `TrainWithMLflowRecipes.py` notebook that leverages MLflow Recipes steps and per-environment profile YAML files. Note: cannot be combined with `input_include_feature_store: yes` or `input_include_models_in_unity_catalog: yes`. ```bash cat > /tmp/recipes-project.json << 'EOF' { "input_setup_cicd_and_project": "CICD_and_Project", "input_project_name": "credit-scoring", "input_root_dir": "credit-scoring", "input_cloud": "azure", "input_cicd_platform": "azure_devops", "input_databricks_staging_workspace_host": "https://adb-staging.azuredatabricks.net", "input_databricks_prod_workspace_host": "https://adb-prod.azuredatabricks.net", "input_default_branch": "main", "input_release_branch": "release", "input_read_user_group": "users", "input_include_feature_store": "no", "input_include_mlflow_recipes": "yes", "input_include_models_in_unity_catalog": "no", "input_schema_name": "schema", "input_unity_catalog_read_user_group": "account users", "input_inference_table_name": "dev.credit_scoring.predictions" } EOF databricks bundle init mlops-stacks --config-file /tmp/recipes-project.json --output-dir ~/projects # Generated MLflow Recipes artifacts: # credit_scoring/training/ # notebooks/ # TrainWithMLflowRecipes.py ← replaces Train.py # profiles/ # databricks-dev.yaml ← dev workspace recipe config # databricks-staging.yaml ← staging recipe config # databricks-prod.yaml ← prod recipe config # databricks-test.yaml ← integration test recipe config ``` --- ## GitHub Actions CI/CD Workflows ### Generated GitHub Actions workflows — Automated testing and deployment pipeline The generated `.github/workflows/` directory contains four workflow files that implement the full CI/CD loop: PR validation, unit tests, bundle CI (validate resource configs), and bundle CD (deploy to staging on merge to main, deploy to prod on merge to release). ```yaml # .github/workflows/my-fraud-project-run-tests.yml (rendered example) name: Run Tests for my-fraud-project on: workflow_dispatch: pull_request: paths: - 'my_fraud_project/**' env: DATABRICKS_HOST: https://adb-staging.azuredatabricks.net ARM_TENANT_ID: ${{ secrets.STAGING_AZURE_SP_TENANT_ID }} ARM_CLIENT_ID: ${{ secrets.STAGING_AZURE_SP_APPLICATION_ID }} ARM_CLIENT_SECRET: ${{ secrets.STAGING_AZURE_SP_CLIENT_SECRET }} jobs: unit_tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: databricks/setup-cli@v0.236.0 - uses: actions/setup-python@v5 with: { python-version: "3.10" } - run: pip install -r my_fraud_project/test-requirements.txt - run: pytest my_fraud_project/tests/ integration_tests: runs-on: ubuntu-latest needs: [unit_tests] steps: - uses: actions/checkout@v4 - uses: databricks/setup-cli@v0.236.0 - name: Deploy bundle to test target run: databricks bundle deploy --target test - name: Run integration test job run: databricks bundle run my_fraud_project_integration_test --target test # Required GitHub secrets for Azure: # STAGING_AZURE_SP_TENANT_ID, STAGING_AZURE_SP_APPLICATION_ID, STAGING_AZURE_SP_CLIENT_SECRET # PROD_AZURE_SP_TENANT_ID, PROD_AZURE_SP_APPLICATION_ID, PROD_AZURE_SP_CLIENT_SECRET # WORKFLOW_TOKEN (GitHub PAT with workflow scope) # # Required GitHub secrets for AWS / GCP: # STAGING_WORKSPACE_TOKEN # PROD_WORKSPACE_TOKEN # WORKFLOW_TOKEN ``` --- ## Deploy CI/CD Workflow (Monorepo Support) ### `deploy-cicd.yml` — Dynamically wire CI/CD to an already-initialized project A manually dispatched GitHub Actions workflow that generates and commits the project-specific CI/CD workflow files into the repository. Designed for monorepo setups where a single CI/CD scaffold serves multiple ML projects initialized with `Project_Only`. ```bash # Trigger via GitHub CLI gh workflow run deploy-cicd.yml \ --repo your-org/your-monorepo \ --field project_name=my-fraud-project # The workflow will: # 1. Extract cicd.tar.gz from the repo root # 2. Merge cicd_params.json with the project's project_params.json # 3. Validate that input_cloud matches between CICD and project configs # 4. Append target workspace hosts to the project's databricks.yml # 5. Run: databricks bundle init ./cicd --config-file cicd_params.json # 6. Commit generated .github/workflows/ and databricks.yml changes # 7. Open a PR against the default branch for review # After merging the PR, the following project-specific workflows are active: # .github/workflows/my-fraud-project-run-tests.yml # .github/workflows/my-fraud-project-bundle-ci.yml # .github/workflows/my-fraud-project-bundle-cd-staging.yml # .github/workflows/my-fraud-project-bundle-cd-prod.yml ``` --- ## Bundle Deployment Commands ### `databricks bundle deploy` and `databricks bundle run` — Deploy and execute ML jobs Core operational commands used inside generated CI/CD workflows and by developers for manual deployment and job execution. ```bash # From within the generated project directory (e.g. my-fraud-project/) # Validate bundle configuration (runs in CI on every PR) databricks bundle validate --target staging # Deploy all ML resources (jobs, experiments, models) to staging databricks bundle deploy --target staging # Deploy to production (triggered on merge to release branch) databricks bundle deploy --target prod # Manually trigger the model training job in dev databricks bundle run model_training_job --target dev # Manually trigger batch inference in staging databricks bundle run batch_inference_job --target staging # Destroy all resources in a target (use with caution) databricks bundle destroy --target staging # Check deployed job run status databricks jobs list --output json | jq '.[] | select(.settings.name | contains("my-fraud-project"))' ``` --- ## Template Parameter Validation ### `databricks_template_schema.json` — Template schema with parameter constraints Defines all accepted parameters, their types, defaults, enum values, regex validation patterns, and conditional `skip_prompt_if` rules. Invalid parameters cause `databricks bundle init` to fail with a descriptive error. ```bash # Parameters that will cause initialization to FAIL: databricks bundle init mlops-stacks --config-file - << 'EOF' { "input_project_name": "a", # ERROR: too short (< 3 chars) "input_project_name": "my project", # ERROR: contains space "input_project_name": "my/project", # ERROR: contains slash "input_project_name": "my.project", # ERROR: contains period "input_databricks_staging_workspace_host": "http://no-https" # ERROR: must start with https } EOF # Valid project name examples: # my-fraud-detection ✓ (hyphens allowed) # my_fraud_detection ✓ (underscores allowed) # FraudDetection2024 ✓ (alphanumeric + caps) # Minimum Databricks CLI version check (enforced at init time): # min_databricks_cli_version: v0.236.0 databricks --version # must be ≥ v0.236.0 # Check the full schema: databricks bundle schema ``` --- ## Template Library Functions ### `library/template_variables.tmpl` — Go-template helpers for workspace URL normalization Template functions that strip query parameters from workspace URLs, resolve cloud-specific cluster node types, and normalize project names to alphanumeric-underscore format. These run at project generation time. ```bash # Workspace URLs with trailing fragments are automatically sanitized: # Input: https://adb-mycoolworkspace.11.azuredatabricks.net/?o=123456789#job/1234/run/9234 # Output: https://adb-mycoolworkspace.11.azuredatabricks.net # # Project name normalization (hyphens → underscores, special chars stripped): # Input: my-fraud-project-2024 # Output: my_fraud_project_2024 (used as Python module/directory name) # # Cloud-specific default cluster node types: # aws: i3.xlarge # azure: Standard_D3_v2 # gcp: n2-highmem-4 # # Default model and experiment naming: # model_name : <project_name>-model # experiment_name : /Users/<user>/<target>-<project_name>-experiment # # Unity Catalog three-level model name (when UC enabled): # <catalog_name>.<schema_name>.<project_name>-model # e.g. prod.my_fraud_project.my-fraud-project-model # Verify generated parameter substitution by inspecting the test params file # (only present in generated projects, not in the template repo itself): cat my-fraud-project/_params_testing_only.txt ``` --- ## Running Tests on the Template Repository ### `pytest tests` — Test suite for validating template generation correctness The test suite (in the `tests/` directory of the template repo itself) verifies that project generation succeeds for all combinations of cloud, CI/CD platform, and optional features, and that no template strings remain un-substituted in the output. ```bash # Install development dependencies pip install -r dev-requirements.txt # Also install: actionlint, databricks CLI, npm, act (for integration tests) # Run unit tests only (fast, no Databricks connection required) pytest tests # Run all tests including slow integration tests pytest tests --large # Run integration tests only pytest tests --large-only # Generate an example project locally for manual inspection (Azure + Azure DevOps): MLOPS_STACKS_PATH=~/mlops-stacks databricks bundle init "$MLOPS_STACKS_PATH" \ --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/azure/azure-devops.json" \ --output-dir /tmp/preview # Generate an AWS + GitHub Actions example: databricks bundle init "$MLOPS_STACKS_PATH" \ --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/aws/aws-github.json" \ --output-dir /tmp/preview # Generate a GCP + GitHub Actions example: databricks bundle init "$MLOPS_STACKS_PATH" \ --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/gcp/gcp-github.json" \ --output-dir /tmp/preview ``` --- ## Summary Databricks MLOps Stacks is primarily used in two scenarios: greenfield ML project creation and productionization of existing ML work. For new projects, teams run `databricks bundle init mlops-stacks` (interactively or with a config JSON) to get a fully wired project directory with training/inference notebooks, Databricks Jobs resource definitions, and CI/CD workflows in a single step. The generated project immediately supports the full ML development loop: code in notebooks → push a PR → CI runs unit and integration tests in the staging workspace → merge to main → staging ML resources auto-deploy → cut a release branch → production jobs pick up the new code. Optional Feature Store and MLflow Recipes extensions are activated by a single parameter flag at init time. For teams adopting MLOps Stacks in an existing repository or monorepo, the modular design allows copying individual components — just the `.github/workflows` CI/CD files, just the `resources/` bundle YAML definitions, or just the Python training modules — into an existing project. The `CICD_Only` and `Project_Only` initialization modes support staged adoption: data scientists can start with the ML code scaffold and have an ML engineer layer in CI/CD later using the `deploy-cicd.yml` dispatch workflow. The template itself is a standard Databricks asset bundle and can be forked and customized by updating `databricks_template_schema.json` to remove unnecessary parameters, hardcode organizational defaults, or add new resource types such as Delta Live Tables pipelines — making it a reusable internal platform template for any data science organization running on Databricks.