### Copy Test Environment Example Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Copy the example test environment file to your local directory. Update this file with your Databricks workspace details. ```bash cp test.env.example test.env ``` -------------------------------- ### Install and Create Hatch Environment Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Installs Hatch globally and creates the default development environment. Hatch manages Python versions automatically. ```bash pip install hatch hatch env create ``` -------------------------------- ### Install dbt-databricks Adapter Source: https://github.com/databricks/dbt-databricks/blob/main/README.md Install the dbt-databricks adapter using pip. Ensure you have Python 3.7 or above. ```bash pip install dbt-databricks ``` -------------------------------- ### Basic Macro Test Setup Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Sets up a basic test for Jinja2 macros using `MacroTestBase`. Requires `template_name` and `macro_folders_to_load` fixtures. ```python from unittest.mock import Mock import pytest from tests.unit.macros.base import MacroTestBase class TestCreateViewMacros(MacroTestBase): @pytest.fixture(scope="class") def template_name(self) -> str: """Required: Name of the template file to test""" return "create.sql" # File in dbt/include/databricks/macros/ @pytest.fixture(scope="class") def macro_folders_to_load(self) -> list: """Required: Macro directories to load relative to dbt/include/databricks/""" return ["macros", "macros/relations/view"] ``` -------------------------------- ### Install Hatch for Linux Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Installs Hatch, a recommended tool for managing Python environments and development workflows, on Linux systems. Ensure your PATH is updated to include the Hatch binary. ```bash curl -Lo hatch.tar.gz https://github.com/pypa/hatch/releases/latest/download/hatch-x86_64-unknown-linux-gnu.tar.gz tar -xzf hatch.tar.gz mkdir -p $HOME/bin mv hatch $HOME/bin/hatch chmod +x $HOME/bin/hatch echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc export PATH="$HOME/bin:$PATH" hatch env create ``` -------------------------------- ### Jinja2 Whitespace Control Example Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md An example demonstrating the use of Jinja2 whitespace control characters (`-`) to prevent blank lines in generated SQL, ensuring cleaner output. ```jinja select column1, column2 {%- if config.get('extra_column') -%} , extra_column {%- endif %} from table ``` -------------------------------- ### Macro Test Example: Create Table Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md A macro test for the `create_table` macro, which generates SQL for creating a table. This test verifies the generated SQL against an expected output. ```python from tests.unit.macros.base import MacroTestBase class TestCreateTable(MacroTestBase): @pytest.fixture(scope="class") def template_name(self) -> str: return "create.sql" # File in macros/relations/table/ @pytest.fixture(scope="class") def macro_folders_to_load(self) -> list: return ["macros", "macros/relations/table"] def test_create_table_sql(self, template_bundle): result = self.run_macro(template_bundle.template, "create_table", template_bundle.relation, "select 1") expected = "create table `database`.`schema`.`table` as (select 1)" self.assert_sql_equal(result, expected) ``` -------------------------------- ### Databricks Notebook-Scoped Library Installation Commands Source: https://github.com/databricks/dbt-databricks/blob/main/docs/workflow-job-submission.md Shows the Python commands prepended to model code when notebook-scoped libraries are enabled. This includes %pip install for specified packages and dbutils.library.restartPython() to ensure proper loading. ```python %pip install -q pandas numpy==1.24.0 scikit-learn>=1.0 dbutils.library.restartPython() # Your model code follows... ``` -------------------------------- ### Cluster-Level Python Package Installation for dbt Models Source: https://github.com/databricks/dbt-databricks/blob/main/docs/workflow-job-submission.md Installs Python packages at the cluster level for dbt models. Packages are installed when the cluster starts and are available for the entire cluster lifecycle. This is the default behavior. ```yaml models: - name: my_model config: packages: ["pandas", "numpy==1.24.0", "scikit-learn>=1.0"] index_url: "https://pypi.org/simple" # Optional: custom PyPI index notebook_scoped_libraries: false # Default behavior ``` -------------------------------- ### Notebook-Scoped Python Package Installation for dbt Models Source: https://github.com/databricks/dbt-databricks/blob/main/docs/workflow-job-submission.md Installs Python packages at the notebook level using %pip install magic commands. This method is suitable for serverless compute and all-purpose clusters, allowing different models to use different package versions without cluster restarts. ```yaml models: - name: my_model config: packages: ["pandas", "numpy==1.24.0", "scikit-learn>=1.0"] index_url: "https://pypi.org/simple" # Optional: custom PyPI index notebook_scoped_libraries: true # Enable notebook-scoped installation ``` -------------------------------- ### dbt profiles.yml configuration for Databricks Source: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-jobs.md Example configuration for a dbt profiles.yml file to connect to a Databricks cluster. It uses environment variables for sensitive information and specifies connection parameters. ```yaml databricks_demo: target: databricks_cluster outputs: databricks_cluster: type: databricks connect_retries: 5 connect_timeout: 180 schema: "" threads: 8 # This can be increased or decreased to control the parallism host: "{{ env_var('DBT_HOST') }}" http_path: "sql/protocolv1/o/{{ env_var('DBT_ORG_ID') }}/{{ env_var('DBT_CLUSTER_ID') }}" token: "{{ env_var('DBT_ACCESS_TOKEN') }}" ``` -------------------------------- ### Implementing Test Cleanup with Fixtures Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Use pytest fixtures with `yield` to define setup and cleanup logic for tests. Ensure resources like tables created during tests are properly dropped. ```python class TestWithCleanup: @pytest.fixture(scope="class", autouse=True) def setUp(self, project): # Setup yield # Cleanup project.run_sql("DROP TABLE IF EXISTS my_test_table") ``` -------------------------------- ### Unit Test Example: Redact Credentials Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md A unit test for the `redact_credentials` utility function, which masks sensitive values in SQL queries. This test ensures that credentials are correctly redacted. ```python from dbt.adapters.databricks.utils import redact_credentials def test_redact_credentials(): sql = "WITH (credential ('KEY' = 'SECRET_VALUE'))" expected = "WITH (credential ('KEY' = '[REDACTED]'))" assert redact_credentials(sql) == expected ``` -------------------------------- ### Configure Databricks CLI and get task details Source: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-jobs.md Configure the Databricks CLI for API version 2.1 and retrieve task details for a specific job run. This is the first step to identify the dbt task's run ID. ```bash $ databricks jobs configure --version=2.1 $ databricks runs get --run-id TASK_RUN_ID | jq .tasks ``` -------------------------------- ### Reusable Job Cluster Configuration for dbt Models Source: https://github.com/databricks/dbt-databricks/blob/main/docs/workflow-job-submission.md Defines a reusable job cluster configuration for dbt models, including task settings, post-hook tasks, and job cluster specifications. This allows for consistent environment setup for dbt jobs. ```yaml models: - name: my_model config: python_job_config: additional_task_settings: { task_key: "task_a", job_cluster_key: "cluster_a" } post_hook_tasks: [ { depends_on: [{ "task_key": "task_a" }], task_key: "OPTIMIZE_AND_VACUUM", job_cluster_key: "cluster_a", notebook_task: { notebook_path: "/OPTIMIZE_AND_VACUUM", source: "WORKSPACE", base_parameters: { database: "{{ target.database }}", schema: "{{ target.schema }}", table_name: "my_model", }, }, }, ] job_clusters: [ { job_cluster_key: "cluster_a", new_cluster: { spark_version: "{{ var('dbr_versions')['lts_v14'] }}", node_type_id: "{{ var('cluster_node_types')['large_job'] }}", runtime_engine: "{{ var('job_cluster_defaults.runtime_engine') }}", autoscale: { "min_workers": 1, "max_workers": 2 }, }, }, ] ``` ``` -------------------------------- ### Running dbt-databricks Macros Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Demonstrates four methods for executing macros: basic execution with arguments, relation-first pattern, raw execution preserving whitespace, and a custom helper method for complex macros. ```python def test_macro_execution_patterns(self, template_bundle): # Method 1: Basic macro execution with arguments # Use for simple macros that take explicit arguments result = self.run_macro(template_bundle.template, "macro_name", "arg1", "arg2") # Method 2: Relation-first pattern (very common in dbt-databricks) # Use for macros that follow the pattern macro(relation, other_args...) # Automatically passes the test relation as the first argument result = self.render_bundle(template_bundle, "create_table_as", "select 1") # Method 3: Raw execution (preserves whitespace/formatting) # Use when you need to test exact formatting, spacing, or newlines # Most tests should use run_macro() instead, which normalizes whitespace raw_result = self.run_macro_raw(template_bundle.template, "macro_name", "arg1") # Method 4: Custom helper method for complex macros # Use for macros with many arguments or complex setup # Makes tests more readable and reduces duplication result = self.render_create_view_as(template_bundle, sql="select 1") ``` ```python def render_create_view_as(self, template_bundle, sql="select 1"): """Helper method reduces complexity in individual test methods""" return self.run_macro( template_bundle.template, "databricks__create_view_as", # Full macro name with adapter prefix template_bundle.relation, # Relation as first argument sql # SQL to wrap in view ) ``` -------------------------------- ### V1 View Flow Diagram Source: https://github.com/databricks/dbt-databricks/blob/main/docs/view_flow.md Illustrates the process for V1 view flow, including pre-hooks, dropping existing relations, creating or replacing views, applying grants, and post-hooks. ```mermaid flowchart LR PRE[Run pre-hooks] DROP[Drop existing relation] CREATE[Create or replace view] GRANTS[Apply grants] TAGS[Apply tags via alter] POST[Run post-hooks] D1{Existing relation, and not view?} PRE-->D1 D1--yes-->DROP D1--"no"-->CREATE DROP-->CREATE CREATE-->GRANTS GRANTS-->TAGS TAGS-->POST ``` -------------------------------- ### Run Unit Tests Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Execute the project's unit tests using tox. This does not require a Databricks account and should be run before submitting a pull request. ```bash tox -e unit ``` -------------------------------- ### Run dbt-databricks Unit Tests with Hatch Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Executes only the unit tests, which are fast and do not require external dependencies. This is useful for quick checks during development. ```bash # Unit tests only (fast) hatch run unit ``` -------------------------------- ### Configure dbt-databricks Test Environment Variables Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Sets up essential environment variables for connecting to Databricks test environments. Choose one authentication method and configure compute-specific paths and Unity Catalog settings as needed. ```bash # Required for all environments export DBT_DATABRICKS_HOST_NAME="your-workspace.databricks.com" # Authentication (choose one method) # Option 1: Personal Access Token export DBT_DATABRICKS_TOKEN="your-token" # Option 2: Databricks OAuth export DBT_DATABRICKS_CLIENT_ID="your-client-id" export DBT_DATABRICKS_CLIENT_SECRET="your-client-secret" # Option 3: Azure AD OAuth export DBT_DATABRICKS_AZURE_CLIENT_ID="your-azure-client-id" export DBT_DATABRICKS_AZURE_CLIENT_SECRET="your-azure-client-secret" # Compute-specific paths export DBT_DATABRICKS_CLUSTER_HTTP_PATH="/sql/protocolv1/o/.../..." export DBT_DATABRICKS_UC_CLUSTER_HTTP_PATH="/sql/protocolv1/o/.../..." export DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH="/sql/warehouses/..." # Unity Catalog settings export DBT_DATABRICKS_UC_INITIAL_CATALOG="main" export DBT_DATABRICKS_UC_INITIAL_SCHEMA="default_schema" ``` -------------------------------- ### Get dbt artifact URL and download archive Source: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-jobs.md Extract the dbt artifact URL from the task output and download the artifact archive using curl. Ensure you have the correct DBT_TASK_RUN_ID. ```bash $ DBT_ARTIFACT_URL=$(databricks runs get-output --run-id DBT_TASK_RUN_ID | jq -r .dbt_output.artifacts_link) $ curl $DBT_ARTIFACT_URL --output artifact.tar.gz ``` -------------------------------- ### V1 Table Flow Diagram Source: https://github.com/databricks/dbt-databricks/blob/main/docs/table_flow.md Illustrates the V1 table flow, detailing steps from pre-hooks to post-hooks, including decisions for relation existence, replaceability, and language. ```mermaid flowchart LR PRE[Run pre-hooks] DROP[Drop existing relation] CSQL[create table...] CSQLD[create or replace table...] CPY[Create table with python] GRANTS[Apply grants] PYTBL[Apply tblproperties via alter] TAGS[Apply tags via alter] DOCS[Persist docs via alter] CONST[Apply constraints via alter] OPT[Run optimize] POST[Run post-hooks] D1{Existing relation?} D2{Replaceable?} D3{Language?} D4{Delta?} D5{Language?} PRE-->D1 D1--yes-->D2 D1--"no"-->D3 D2--yes-->D3 D2--"no"-->DROP DROP-->D3 D3--SQL-->D4 D3--Python-->CPY D4--yes-->CSQLD D4--"no"-->CSQL CPY-->GRANTS CSQLD-->GRANTS CSQL-->GRANTS GRANTS-->D5 D5--Python-->PYTBL D5--SQL-->TAGS PYTBL-->TAGS TAGS-->DOCS DOCS-->CONST CONST-->OPT OPT-->POST ``` -------------------------------- ### Build dbt-databricks Project Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Use Hatch to build the project's distribution packages. This command also cleans the dist folder before building. ```bash hatch build -c ``` -------------------------------- ### dbt Functional Test Class Example Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md A Python test class for dbt functional tests. It defines models using fixtures and includes a test method to run dbt and verify results. ```python from dbt.tests import util from tests.functional.adapter.my_feature import fixtures class TestIncrementalModel: @pytest.fixture(scope="class") def models(self): return { "my_model.sql": fixtures.my_model_sql, "schema.yml": fixtures.my_schema_yml, } def test_incremental_run(self, project): results = util.run_dbt(["run"]) assert len(results) == 1 # Verify table exists and has expected data results = project.run_sql("select count(*) from my_model", fetch="all") assert results[0][0] == 1 ``` -------------------------------- ### Run Integration Tests with All-Purpose Cluster (UC) Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Execute integration tests using an all-purpose cluster with Unity Catalog enabled. This requires a Databricks account and specific compute resources. ```bash tox -e integration-databricks-uc-cluster ``` -------------------------------- ### Invoke databricks_copy_into Macro Source: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-copy-into-macro-aws.md Load a CSV file from S3 into a Databricks Delta table using the `databricks_copy_into` macro. This example specifies temporary AWS credentials, file format options like header and schema merging, and copy options. ```sql dbt run-operation databricks_copy_into --args " target_table: source: 's3:///path' file_format: csv source_credential: AWS_ACCESS_KEY: '' AWS_SECRET_KEY: '' AWS_SESSION_TOKEN: '' format_options: mergeSchema: 'true' header: true copy_options: mergeSchema: 'true' " ``` -------------------------------- ### Use Capability System for Databricks Version Checks Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Prefer using the adapter's capability system over direct version comparisons for checking Databricks features. This ensures compatibility across different Databricks Runtime versions. ```python if adapter.has_capability(DBRCapability.COMMENT_ON_COLUMN): ``` ```jinja {% if adapter.has_dbr_capability('comment_on_column') %} ``` -------------------------------- ### Run Linters and Formatters with Tox Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Execute all linting and static type checks using Tox. This command should be run before committing changes to ensure code quality and style adherence. ```shell tox -e linter ``` -------------------------------- ### Run Integration Tests with SQL Warehouse (UC) Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Execute integration tests using a SQL Warehouse with Unity Catalog enabled. This requires a Databricks account and specific compute resources. ```bash tox -e integration-databricks-uc-sql-endpoint ``` -------------------------------- ### Run Code Quality Checks Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Executes code formatting, linting, and type-checking using Hatch. This command should be run before committing changes. ```bash hatch run code-quality ``` -------------------------------- ### SQL Comparison and Assertion Methods Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Explains three methods for comparing generated SQL: exact string comparison (preferred for handling differences), manual normalization for debugging, and partial matching for testing specific elements. ```python def test_sql_comparison_methods(self, template_bundle): result = self.run_macro(template_bundle.template, "create_table", template_bundle.relation) # Method 1: Exact string comparison (after normalization) - PREFERRED # Use for most tests - handles whitespace, case, and formatting differences # Provides clear error messages showing exactly what differs expected = "create table `some_database`.`some_schema`.`some_table` as (select 1)" self.assert_sql_equal(result, expected) # Method 2: Manual normalization for debugging # Use when assert_sql_equal fails and you need to see the normalized versions # Helpful for understanding why a test is failing clean_result = self.clean_sql(result) clean_expected = self.clean_sql(expected) assert clean_result == clean_expected, f"Expected: {clean_expected}, Got: {clean_result}" # Method 3: Partial matching for complex queries # Use when testing specific SQL elements without caring about exact formatting # Good for testing that required elements are present assert "`some_database`.`some_schema`.`some_table`" in result assert "create table" in result.lower() # Method 4: Complex validation # Use for testing multiple specific requirements lines = result.split('\n') assert any('CREATE TABLE' in line.upper() for line in lines) assert any('TBLPROPERTIES' in line.upper() for line in lines) ``` -------------------------------- ### Run dbt-databricks Tests with Hatch Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Use hatch commands to specify the correct test profile for different Databricks environments when running tests from the command line. ```bash hatch run cluster-e2e-dev # Tests with databricks_cluster ``` ```bash hatch run uc-cluster-e2e-dev # Tests with databricks_uc_cluster (CI uses sharded prepare-shards path) ``` ```bash hatch run sqlw-e2e-dev # Tests with databricks_uc_sql_endpoint (CI uses sharded prepare-shards path) ``` -------------------------------- ### Hatch Environment Management Commands Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Provides commands for managing development environments with Hatch, including creation, removal, and entering the environment shell. ```bash # Environment management hatch env create hatch env remove hatch shell ``` -------------------------------- ### Troubleshoot Hatch Environments Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md If Hatch is not reflecting changes in pyproject.toml, prune and recreate the environments to resolve issues. ```bash hatch env prune ``` ```bash hatch env create ``` -------------------------------- ### Run Relevant Tests Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Executes unit tests and functional tests as needed. Unit tests should always be run, while functional tests are for adapter changes. ```bash hatch run unit hatch run cluster-e2e-dev ``` -------------------------------- ### Run Integration Tests with All-Purpose Cluster Source: https://github.com/databricks/dbt-databricks/blob/main/CONTRIBUTING.MD Execute integration tests using a standard all-purpose cluster. This requires a Databricks account and specific compute resources. ```bash tox -e integration-databricks-cluster ``` -------------------------------- ### V2 Table Flow Diagram Source: https://github.com/databricks/dbt-databricks/blob/main/docs/table_flow.md Illustrates the V2 table flow, featuring more detailed steps like intermediate materialization, staging tables, and specific constraint/tagging applications. ```mermaid flowchart LR PRE[Run pre-hooks] CLEANUP[Remove existing staging] INT[Create intermediate materialization of model via SQL] INTPY[Create intermediate materialization of model via Python] STAGE[Create staging table by model schema] FINAL[Create or replace target table by model schema] DROP[Drop existing relation] CHECK[Add check constraints via alter to target table] CHECK2[Add check constraints via alter to staging table] INSERT[Insert intermediate materialization into target table] INSERT2[Insert intermediate materialization into staging table] RENAME[Rename existing to backup] RENAME2[Rename staging to target] DROP2[Drop backup] TAGS[Apply tags via alter to target] TAGS2[Apply tags via alter to staging] GRANTS[Apply grants] OPT[Run optimize] POST[Run post-hooks] F1{{Flag: CanRename}} D0{Language?} D1{Existing relation?} D2{Replaceable?} CLEANUP-->PRE PRE-->D0 D0--SQL-->INT D0--Python-->INTPY INT-->D1 INTPY-->D1 D1--yes-->F1 D1--"no"-->FINAL F1--yes-->STAGE F1--"no"-->D2 D2--"yes"-->FINAL D2--"no"-->DROP DROP-->FINAL FINAL-->CHECK STAGE-->CHECK2 CHECK-->TAGS TAGS-->INSERT CHECK2-->TAGS2 TAGS2-->INSERT2 INSERT2-->RENAME RENAME-->RENAME2 RENAME2-->DROP2 DROP2-->GRANTS INSERT-->GRANTS GRANTS-->OPT OPT-->POST ``` -------------------------------- ### Check dbt-databricks Version Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Display the current version of the dbt-databricks project managed by Hatch. ```bash hatch version ``` -------------------------------- ### Check Databricks Runtime Capabilities in Python Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Demonstrates how to check for specific Databricks Runtime (DBR) capabilities in Python code using the adapter's `has_capability` method. ```python # In Python code if adapter.has_capability(DBRCapability.ICEBERG): # Use Iceberg features ``` -------------------------------- ### Essential Hatch Commands for Development Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Runs code quality checks, unit tests, and functional tests using Hatch. These commands ensure code integrity and functionality. ```bash hatch run code-quality hatch run unit hatch run cluster-e2e-dev ``` -------------------------------- ### Optional Macro Test Configuration Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Provides optional fixtures for loading additional Databricks or Spark templates, useful for macro dependencies. ```python @pytest.fixture(scope="class") def databricks_template_names(self) -> list: """Load additional Databricks templates for macro dependencies""" return ["adapters.sql", "relations/table.sql"] ``` ```python @pytest.fixture(scope="class") def spark_template_names(self) -> list: """Load Spark templates when inheriting from dbt-spark macros""" return ["adapters.sql"] ``` -------------------------------- ### Generate HTML Coverage Report Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Run the 'unit-with-cov' hatch command to generate an HTML report detailing test coverage. This report helps in identifying areas of the codebase that are not adequately tested. ```bash hatch run unit-with-cov # Opens HTML coverage report ``` -------------------------------- ### Run dbt-databricks Development Commands Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Execute common development tasks for the dbt-databricks adapter, including code quality checks, unit tests, and functional tests. For specific tests, use pytest directly. ```bash hatch run code-quality # Format, lint, type-check hatch run unit # Run unit tests hatch run cluster-e2e-dev # Run functional tests # For specific tests, use pytest directly: hatch run pytest path/to/test_file.py::TestClass::test_method -v ``` -------------------------------- ### Databricks Adapter Inheritance Chain Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Illustrates the inheritance hierarchy of the DatabricksAdapter within the dbt ecosystem. ```text DatabricksAdapter (impl.py) ↳ SparkAdapter (from dbt-spark) ↳ SQLAdapter (from dbt-core) ↳ BaseAdapter (from dbt-core) ``` -------------------------------- ### Test Table Constraints with Arbitrary SQL Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md This test verifies table constraints by creating a table with constraints and querying the information schema to confirm their presence. ```python def test_table_constraints(self, project): # Create a table with constraints project.run_sql(""" CREATE TABLE test_constraints ( id INT NOT NULL, name STRING ) USING DELTA """) # Query information schema to verify constraints constraints = project.run_sql( f""" SELECT constraint_name, constraint_type FROM information_schema.table_constraints WHERE table_schema = '{project.test_schema}' AND table_name = 'test_constraints' """, fetch="all" ) assert len(constraints) > 0 assert any('NOT NULL' in str(constraint) for constraint in constraints) ``` -------------------------------- ### Test SQL Generation Macro Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Test the SQL generation macro using string assertions without needing to mock adapter calls. This verifies that the macro produces the correct SQL string. ```python def test_metadata_sql_generation(self, template_bundle): """Test SQL generation without mocking adapter calls""" result = self.run_macro(template_bundle.template, "get_table_metadata_sql", template_bundle.relation) expected = "describe table extended `some_database`.`some_schema`.`some_table`" self.assert_sql_equal(result, expected) ``` -------------------------------- ### Define SQL and YAML Fixtures for dbt Tests Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Use a Python file to define SQL models and YAML schemas for dbt functional tests. This promotes reusability and keeps tests clean. ```python my_model_sql = """ {{ config(materialized='incremental', unique_key='id') }} select 1 as id, 'test' as name """ my_schema_yml = """ version: 2 models: - name: my_model columns: - name: id description: 'ID column' """ ``` -------------------------------- ### Define Catalog for a Source in dbt Source: https://github.com/databricks/dbt-databricks/blob/main/docs/uc.md Specify the Unity Catalog for a dbt source by using the 'catalog' keyword. Ensure schema is defined separately. ```yaml version: 2 sources: - name: jaffle_shop catalog: ecommerce schema: product tables: - name: items - name: users ``` -------------------------------- ### V2 View Flow Diagram Source: https://github.com/databricks/dbt-databricks/blob/main/docs/view_flow.md Details the V2 view flow, which includes altering existing views, creating new views, applying grants, and running post-hooks. It also highlights the option to use a 'replace flow' for certain scenarios. ```mermaid flowchart LR PRE[Run pre-hooks] ALTER[Alter existing view] CREATE[Create view] GRANTS[Apply grants] TAGS[Apply tags via alter] POST[Run post-hooks] REPLACE_FLOW[Use replace flow] D1{Existing relation?} D2{{Existing relation is view and update_via_alter?}} D4{Matches project definition?} PRE-->D1 D1--yes-->D2 D1--"no"-->CREATE D2--yes-->D4 D2--"no"-->REPLACE_FLOW REPLACE_FLOW-->TAGS D4--yes-->GRANTS D4--"no"-->ALTER ALTER-->GRANTS CREATE-->TAGS TAGS-->GRANTS GRANTS-->POST ``` -------------------------------- ### Define SQL and CSV Fixtures Separately Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md This pattern separates SQL and CSV content into dedicated fixture files for improved readability and reusability across multiple test files. ```python # tests/functional/adapter/incremental/fixtures.py incremental_model_sql = """ {{ config(materialized='incremental', unique_key='id') }} select 1 as id, 'Alice' as name, 'active' as status union all select 2 as id, 'Bob' as name, 'inactive' as status union all select 3 as id, 'Charlie' as name, 'active' as status """ expected_result_csv = """id,name,status 1,Alice,active 2,Bob,inactive 3,Charlie,active""" ``` ```python # tests/functional/adapter/incremental/test_incremental.py from tests.functional.adapter.incremental import fixtures class TestIncrementalModel: @pytest.fixture(scope="class") def seeds(self): return {"expected_result.csv": fixtures.expected_result_csv} @pytest.fixture(scope="class") def models(self): return {"incremental_model.sql": fixtures.incremental_model_sql} def test_incremental_behavior(self, project): # Test implementation same as above ``` -------------------------------- ### Test Macro Execution Logic Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Test the execution macro by mocking the `run_query` function to verify that it correctly processes and returns results. This focuses on the logic within the execution macro. ```python def test_metadata_execution(self, template_bundle): """Test execution logic with minimal mocking""" template_bundle.context["run_query"] = Mock( return_value=Mock(table=[["col1", "string"], ["col2", "int"]]) ) result = self.run_macro(template_bundle.template, "get_table_metadata", template_bundle.relation) # Test that it returns the table results correctly assert len(result) == 2 ``` -------------------------------- ### Check Databricks Runtime Capabilities in Jinja Macros Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Shows how to conditionally execute SQL based on Databricks Runtime (DBR) capabilities within Jinja macros. ```jinja {% if adapter.has_dbr_capability('comment_on_column') %} COMMENT ON COLUMN ... {% else %} ALTER TABLE ... ALTER COLUMN ... {% endif %} {% if adapter.has_dbr_capability('insert_by_name') %} INSERT INTO table BY NAME SELECT ... {% else %} INSERT INTO table SELECT ... -- positional {% endif %} ``` -------------------------------- ### Manipulate Model Configuration for Macro Testing Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Use this when testing macros that read model configuration or adapter settings. It allows simulating different table properties, storage formats, or clustering configurations. ```python def test_macro_with_config(self, config, template_bundle): """Test how macros respond to different model configurations""" config["tblproperties"] = {"key": "value"} # Model-level table properties config["file_format"] = "delta" # Storage format setting config["liquid_clustering"] = ["col1", "col2"] # Clustering configuration result = self.run_macro(template_bundle.template, "create_table_as", template_bundle.relation, "select 1") # Verify the generated SQL includes the configuration assert "tblproperties" in result assert "delta" in result.lower() ``` -------------------------------- ### Test Materialization Macro Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Pattern for testing materialization macros. This involves setting configuration options like 'materialized' and 'file_format' before rendering the macro. ```python # Pattern 1: Testing materialization macros def test_materialization_macro(self, config, template_bundle): config["materialized"] = "table" config["file_format"] = "delta" result = self.render_bundle(template_bundle, "materialization_macro") ``` -------------------------------- ### Create Temporary Table for Testing Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Use the project's test schema to create temporary tables for testing purposes. This ensures that test data does not persist and is isolated to the test run. ```python def test_creates_temp_table(self, project): table_name = f"{project.database}.{project.test_schema}.temp_table" project.run_sql(f"CREATE TABLE {table_name} (id INT)") # Test logic ``` -------------------------------- ### Run dbt-databricks Functional Tests with Hatch Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Executes functional (integration) tests against specific Databricks environments for local development. These tests require a connection to a Databricks cluster or SQL Warehouse. ```bash # Functional tests by environment (local development) hatch run cluster-e2e-dev # HMS cluster (CI runs this sharded) hatch run uc-cluster-e2e-dev # Unity Catalog cluster (CI runs this sharded) hatch run sqlw-e2e-dev # SQL Warehouse (CI runs this sharded) ``` -------------------------------- ### Run dbt-databricks Unit Tests Across Python Versions with Hatch Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Executes unit tests across multiple Python versions (3.9-3.12) to ensure compatibility. This command is typically used to verify broader Python environment support. ```bash # Unit tests across Python versions (for compatibility) hatch run test:unit # All Python versions (3.9-3.12) ``` -------------------------------- ### Hatch Testing Commands Source: https://github.com/databricks/dbt-databricks/blob/main/docs/dbt-databricks-dev.md Commands for running various types of tests using Hatch, including unit tests across different Python versions and functional tests for different environments. ```bash # Testing hatch run unit hatch run test:unit hatch run cluster-e2e-dev hatch run uc-cluster-e2e-dev hatch run sqlw-e2e-dev ``` -------------------------------- ### Test Temporary AWS Credentials Source: https://github.com/databricks/dbt-databricks/blob/main/docs/databricks-copy-into-macro-aws.md Verify that your temporary AWS credentials are valid by listing objects in a specified S3 bucket and path. Replace placeholders with your temporary credentials and bucket details. ```bash $ AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_SESSION_TOKEN= aws s3 ls s3:///path ``` -------------------------------- ### Test Adapter Dispatch Macro Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Pattern for testing adapter-specific macros. This involves calling a macro that is expected to be dispatched by the adapter. ```python # Pattern 2: Testing adapter dispatch def test_adapter_specific_macro(self, template_bundle): result = self.run_macro(template_bundle.template, "databricks__specific_macro") ``` -------------------------------- ### Test Macros with Database Operations Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Test macros that interact with the database during rendering by mocking adapter calls. This allows testing macro logic without actual database execution. ```python class TestAdvancedMacroPatterns(MacroTestBase): @pytest.fixture(scope="class") def template_name(self) -> str: return "adapters.sql" @pytest.fixture(scope="class") def databricks_template_names(self) -> list: # Load dependencies when testing complex macros that call other macros # Essential for materialization macros that use table/view creation macros return ["relations/table.sql", "materializations/table.sql"] @pytest.fixture(autouse=True, scope="function") def setup_mocks(self, context): """Auto-setup common mocks for all tests in this class Use autouse fixtures when all tests in a class need the same mocks. Reduces repetition and ensures consistent test setup. """ context["adapter"].get_columns_in_relation = Mock(return_value=[]) context["load_result"] = Mock(return_value={"data": []}) def test_macro_with_database_operations(self, template_bundle): """Test macros that make adapter calls during rendering Some macros call adapter.run_query() to get metadata or check table state. Mock these calls to test the macro logic without database dependencies. """ # Mock adapter calls that the macro will make template_bundle.context["adapter"].run_query = Mock( return_value=Mock(table=[["result1"], ["result2"]]) ) result = self.run_macro(template_bundle.template, "get_table_metadata", template_bundle.relation) # Verify both the SQL generated and adapter calls made expected_sql = "describe table extended `some_database`.`some_schema`.`some_table`" self.assert_sql_equal(result, expected_sql) # Verify adapter was called correctly (important for side-effect testing) template_bundle.context["adapter"].run_query.assert_called_once() ``` -------------------------------- ### Upgrade dbt-databricks Adapter Source: https://github.com/databricks/dbt-databricks/blob/main/README.md Upgrade to the latest version of the dbt-databricks adapter using pip. ```bash pip install --upgrade dbt-databricks ``` -------------------------------- ### Jinja Macro for Multi-Statement SQL Source: https://github.com/databricks/dbt-databricks/blob/main/AGENTS.md Use this pattern to define a macro that returns a list of SQL statements for multi-statement execution. The incremental materialization automatically handles lists of statements. ```jinja {% macro my_multi_statement_strategy(args) %} {%- set statements = [] -%} {#-- Build first statement --#} {%- set statement1 -%} DELETE FROM {{ target_relation }} WHERE some_condition {%- endset -%} {%- do statements.append(statement1) -%} {#-- Build second statement --#} {%- set statement2 -%} INSERT INTO {{ target_relation }} SELECT * FROM {{ source_relation }} {%- endset -%} {%- do statements.append(statement2) -%} {{- return(statements) -}} {% endmacro %} ``` -------------------------------- ### dbt-databricks Test Directory Structure Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Illustrates the standard organization for unit and functional tests within the dbt-databricks project. Includes conftest.py for global pytest configuration and profiles.py for test profile settings. ```directory tests/ ├── unit/ # Unit tests (no external dependencies) │ ├── api_client/ # API client functionality │ ├── relation_configs/ # Relation configuration logic │ ├── python/ # Python model handling │ ├── macros/ # Macro testing framework │ └── test_adapter.py # Core adapter functionality ├── functional/ # Integration tests (requires Databricks) │ └── adapter/ # Adapter-specific functionality │ ├── basic/ # Basic model operations │ ├── incremental/ # Incremental materialization │ ├── python_model/ # Python model execution │ ├── streaming_tables/ # Streaming table features │ └── ... # Feature-specific test suites ├── conftest.py # Global pytest configuration, including the default compute used when running tests in your IDE └── profiles.py # Test profile configurations ``` -------------------------------- ### Mock External Dependencies for Macro Testing Source: https://github.com/databricks/dbt-databricks/blob/main/docs/testing.md Use this when testing macros that call other dbt functions or adapter methods. It isolates the macro under test from external dependencies by mocking responses. ```python def test_macro_with_context_mocks(self, template_bundle): """Mock external dependencies to isolate macro behavior""" # Mock dbt built-in functions that the macro calls template_bundle.context["get_columns_in_query"] = Mock(return_value=[]) template_bundle.context["column_mask_exists"] = Mock(return_value=False) # Mock adapter methods to simulate database state template_bundle.context["adapter"].get_relation = Mock(return_value=None) template_bundle.context["adapter"].run_query = Mock(return_value=Mock(table=[])) result = self.run_macro(template_bundle.template, "my_macro", template_bundle.relation) # Test that the macro handles the mocked responses correctly ```