### Install SSL Certificates for Python on macOS

Source: https://intugle.github.io/data-tools/docs/getting-started

Installs SSL certificates for Python installations from python.org on macOS. Replace '3.XX' with your specific Python version. Not needed for Homebrew Python.

```bash
/Applications/Python\ 3.XX/Install\ Certificates.command
```

--------------------------------

### Configure LLM Environment Variables

Source: https://intugle.github.io/data-tools/docs/getting-started

Sets environment variables for LLM provider and API key. This example uses OpenAI's GPT-3.5 Turbo model.

```bash
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
```

--------------------------------

### Install Intugle Package with Pip

Source: https://intugle.github.io/data-tools/docs/getting-started

Installs the 'intugle' Python package using pip. This command should be run after activating the virtual environment.

```bash
pip install intugle
```

--------------------------------

### Create and Activate Python Virtual Environment

Source: https://intugle.github.io/data-tools/docs/getting-started

Creates a Python virtual environment named '.venv' and activates it. This is a standard practice for managing project dependencies.

```bash
python -m venv .venv
source .venv/bin/activate
```

--------------------------------

### Install libomp on macOS with Homebrew

Source: https://intugle.github.io/data-tools/docs/getting-started

Installs the 'libomp' library on macOS using the Homebrew package manager. This is a dependency for some Python packages on macOS.

```bash
brew install libomp
```

--------------------------------

### Qdrant Vector Database Setup (Bash)

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Shell commands to start a Qdrant vector database using Docker. It maps ports, mounts a volume for persistent storage, and names the container. This is a prerequisite for certain semantic search configurations.

```bash
# Prerequisites: Run Qdrant vector database
docker run -d -p 6333:6333 -p 6334:6334 \
    -v qdrant_storage:/qdrant/storage:z \
    --name qdrant qdrant/qdrant
```

--------------------------------

### Inject Custom LLM Instance in Intugle Settings

Source: https://intugle.github.io/data-tools/docs/getting-started

Demonstrates how to inject a pre-initialized custom LLM instance into Intugle's settings before importing Intugle modules. The custom LLM must inherit from `langchain_core.language_models.chat_models.BaseChatModel`.

```python
# main.py
from intugle.core import settings

# This must be an object that inherits from BaseChatModel
my_llm_instance = ...

# Set the custom instance in the settings
settings.CUSTOM_LLM_INSTANCE = my_llm_instance

# Now, any intugle modules imported after this point will use your custom LLM

# ... rest of your code
```

--------------------------------

### Run Qdrant Instance using Docker

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-search

This command starts a Qdrant vector database instance using Docker. It maps ports 6333 and 6334 for Qdrant communication and uses a Docker volume for persistent storage. Ensure Docker is installed and running on your system.

```bash
docker run -d -p 6333:6333 -p 6334:6334 \
    -v qdrant_storage:/qdrant/storage:z \
    --name qdrant qdrant/qdrant
```

--------------------------------

### Start MCP Server for Vibe Coding

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Provides the command to start the MCP server, which facilitates natural language data product generation. The server runs on localhost:8000 and exposes a semantic layer endpoint.

```bash
# Start MCP server
intugle-mcp

# Server runs on localhost:8000
# Endpoint: http://localhost:8000/semantic_layer/mcp
```

--------------------------------

### Initialize and Use DataSet Object

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Provides an example of initializing a `DataSet` object with data source information, running individual pipeline stages like profiling and key identification, and accessing table-level metadata.

```python
from intugle.analysis.models import DataSet

# Initialize dataset
data_source = {"path": "data/patients.csv", "type": "csv"}
dataset = DataSet(data_source, name="patients")

# Run pipeline stages individually
dataset.profile(save=True)
dataset.identify_datatypes(save=True)
dataset.identify_keys(save=True)
dataset.generate_glossary(domain="Healthcare", save=True)

# Access table-level metadata
print(f"Table Name: {dataset.source_table_model.name}")
print(f"Primary Key: {dataset.source_table_model.key}")
print(f"Description: {dataset.source_table_model.description}")
```

--------------------------------

### Start MCP Server

Source: https://intugle.github.io/data-tools/docs/vibe-coding

This command initiates the built-in MCP server from the project's root directory. The server runs on localhost:8000 by default and mounts the 'semantic_layer' and 'adapter' services, making the semantic layer accessible to AI assistants.

```bash
intugle-mcp  
```

--------------------------------

### Install Intugle Python Package

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Demonstrates how to install the Intugle Python library, including optional support for Snowflake and Databricks, and macOS-specific dependencies.

```bash
# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install base package
pip install intugle

# Install with Snowflake support
pip install "intugle[snowflake]"

# Install with Databricks support
pip install "intugle[databricks]"

# macOS users: install libomp
brew install libomp
```

--------------------------------

### Defining Optional Dependencies in pyproject.toml

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Illustrates how to add a custom adapter's required driver library as an optional dependency in the `pyproject.toml` file, enabling installation via `pip install "intugle[myconnector]"`.

```toml
# In pyproject.toml

[project.optional-dependencies]
# ... other dependencies
myconnector = ["myconnector-driver-library>=1.0.0"]
```

--------------------------------

### Initializing and Running DataSet Stages

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/dataset

Shows how to initialize a DataSet with a data source (CSV in this example) and sequentially run analysis stages like profiling, datatype identification, key identification, and glossary generation, with options to save progress after each stage. Dataset names should not contain whitespaces.

```python
from intugle.analysis.models import DataSet

# Initialize the dataset  
data_source = {"path": "path/to/my_data.csv", "type": "csv"}  
dataset = DataSet(data_source, name="my_data")  
  
# Run each stage individually and save progress  
print("Step 1: Profiling...")  
dataset.profile(save=True)  
  
print("Step 2: Identifying Datatypes...")  
dataset.identify_datatypes(save=True)  
  
print("Step 3: Identifying Keys...")  
dataset.identify_keys(save=True)  
  
print("Step 4: Generating Glossary...")  
dataset.generate_glossary(domain="my_domain", save=True)
```

--------------------------------

### Build and Deploy Semantic Model with Snowflake

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Shows how to initialize a `SemanticModel` with Snowflake datasets, build the model, and deploy it back to Snowflake. Includes configuration example for external Snowflake connections.

```python
from intugle import SemanticModel

# Snowflake datasets
datasets = {
    "CUSTOMERS": {
        "identifier": "CUSTOMERS",  # Must match key
        "type": "snowflake"
    },
    "ORDERS": {
        "identifier": "ORDERS",
        "type": "snowflake"
    }
}

sm = SemanticModel(datasets, domain="E-commerce")
sm.build()

# Deploy to Snowflake: sync metadata and create semantic view
sm.deploy(target="snowflake")

# Custom semantic view name
sm.deploy(target="snowflake", model_name="my_custom_semantic_view")
```

```yaml
# profiles.yml for external Snowflake connection
snowflake:
  type: snowflake
  account: your_snowflake_account
  user: your_username
  password: your_password
  role: your_role
  warehouse: your_warehouse
  database: your_database
  schema: your_schema
```

--------------------------------

### Configure LLM Provider for Intugle

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Shows how to configure the Large Language Model (LLM) provider and API key for Intugle, using OpenAI as an example.

```bash
# Configure LLM provider
export LLM_PROVIDER="openai:gpt-3.5-turbo"
export OPENAI_API_KEY="your-openai-api-key"
```

--------------------------------

### Generate Data Product Specification with Prompt

Source: https://intugle.github.io/data-tools/docs/vibe-coding

This example demonstrates invoking the 'create-dp' prompt within an MCP-compatible client to generate a 'product_spec' dictionary. The prompt takes a natural language request, such as 'show me the top 5 patients with the most claims', and utilizes tools like 'get_tables' and 'get_schema' to construct the specification.

```bash
/create-dp show me the top 5 patients with the most claims  
```

```json
{
  "name": "top_5_patients_by_claims",
  "fields": [
    {
      "id": "patients.first",
      "name": "first_name"
    },
    {
      "id": "patients.last",
      "name": "last_name"
    },
    {
      "id": "claims.id",
      "name": "number_of_claims",
      "category": "measure",
      "measure_func": "count"
    }
  ],
  "filter": {
    "sort_by": [
      {
        "id": "claims.id",
        "alias": "number_of_claims",
        "direction": "desc"
      }
    ],
    "limit": 5
  }
}
```

--------------------------------

### Install Intugle with Databricks Dependencies

Source: https://intugle.github.io/data-tools/docs/connectors/databricks

Installs the Intugle library with optional dependencies required for Databricks integration. This includes PySpark, sqlglot, and databricks-sql-connector. Ensure you have Python and pip installed.

```bash
pip install "intugle[databricks]"
```

--------------------------------

### Join and Filter Across Tables for Patient Conditions

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product/advanced-examples

This example shows how to join tables and apply filters across them to answer complex questions. It finds patients from Boston diagnosed with conditions containing the word 'fracture'. The snippet defines a product specification and its corresponding generated SQL, demonstrating implicit joins and filtering on fields from different tables.

```python
product_spec = {  
    "name": "conditions_of_boston_patients",  
    "fields": [  
        {"id": "patients.first", "name": "first_name"},  
        {"id": "patients.last", "name": "last_name"},  
        {"id": "conditions.description", "name": "condition"},  
    ],  
    "filter": {  
        "selections": [  
            {"id": "patients.city", "values": ["Boston"]},  
        ],  
        "wildcards": [  
            {  
                "id": "conditions.description",  
                "value": "fracture",  
                "option": "contains",  
            }  
        ],  
        "limit": 10,  
    },  
}  

```

```sql
SELECT  
  "patients"."first" as first_name,  
  "patients"."last" as last_name,  
  "conditions"."description" as condition  
FROM conditions  
LEFT JOIN patients  
  ON "conditions"."patient" = "patients"."id"  
WHERE ("patients"."city" IN ('Boston',))  
  AND "conditions"."description" LIKE '%fracture%'  
LIMIT 10  

```

--------------------------------

### Semantic Search - Standalone Client Initialization (Python)

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Demonstrates how to initialize the SemanticSearch client in Python. It can load configurations from default .yml files or from a custom project path, allowing for flexible setup of semantic search capabilities.

```python
# Standalone semantic search
from intugle.semantic_search import SemanticSearch

# Initialize (loads from .yml files)
search_client = SemanticSearch()

# Or specify custom path
# search_client = SemanticSearch(project_base="/path/to/models")
```

--------------------------------

### Combine Multiple Filters for Patients Data

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product/advanced-examples

This example demonstrates how to apply multiple filter criteria to find male patients in Boston. It uses the 'selections' and 'wildcards' lists within the 'filter' object. By default, conditions are combined with an 'AND' operator. This snippet defines a product specification for a data product and shows the generated SQL.

```python
product_spec = {  
    "name": "male_patients_in_boston",  
    "fields": [  
        {"id": "patients.first", "name": "first_name"},  
        {"id": "patients.last", "name": "last_name"},  
        {"id": "patients.city", "name": "city"},  
        {"id": "patients.gender", "name": "gender"},  
    ],  
    "filter": {  
        "selections": [  
            {"id": "patients.city", "values": ["Boston"]},  
            {"id": "patients.gender", "values": ["M"]},  
        ],  
        "limit": 10,  
    },  
}  

```

```sql
SELECT  
  "patients"."first" as first_name,  
  "patients"."last" as last_name,  
  "patients"."city" as city,  
  "patients"."gender" as gender  
FROM patients  
WHERE ("patients"."city" IN ('Boston',))  
  AND ("patients"."gender" IN ('M',))  
LIMIT 10  

```

--------------------------------

### Configure Snowflake Connection via profiles.yml

Source: https://intugle.github.io/data-tools/docs/connectors/snowflake

Example 'profiles.yml' file demonstrating how to provide Snowflake connection credentials for external environments. This includes account, user, password, role, warehouse, database, and schema.

```yaml
snowflake:
  type: snowflake
  account: <your_snowflake_account>
  user: <your_username>
  password: <your_password>
  role: <your_role>
  warehouse: <your_warehouse>
  database: <your_database>
  schema: <your_schema>
```

--------------------------------

### Install Intugle with Snowflake Dependencies

Source: https://intugle.github.io/data-tools/docs/connectors/snowflake

Installs the Intugle package with optional dependencies for Snowflake integration, including the 'snowflake-snowpark-python' library.

```bash
pip install "intugle[snowflake]"
```

--------------------------------

### Example Product Spec Combining Patient and Condition Data

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product/joins

This Python example defines a `product_spec` for a data product named 'patient_conditions'. It selects patient names and condition descriptions, implicitly instructing the DataProduct builder to join the 'patients' and 'conditions' tables. A limit of 5 records is also applied.

```python
product_spec = {  
    "name": "patient_conditions",  
    "fields": [  
        {"id": "patients.first", "name": "first_name"},  
        {"id": "patients.last", "name": "last_name"},  
        {"id": "conditions.description", "name": "condition"},  
    ],  
    "filter": {"limit": 5}  
}  

```

--------------------------------

### Semantic Search - Build and Search Model (Python)

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Provides a Python example for building a semantic model from specified datasets and then performing a semantic search query. It requires the SemanticModel class and assumes datasets are available. This process includes auto-indexing.

```python
from intugle import SemanticModel

# Build semantic model first
datasets = {
    "patients": {"path": "data/patients.csv", "type": "csv"},
    "allergies": {"path": "data/allergies.csv", "type": "csv"},
}

sm = SemanticModel(datasets, domain="Healthcare")
sm.build()

# Perform semantic search (auto-indexes on first run)
results = sm.search("reason for hospital visit")
print(results)
```

--------------------------------

### Generate Data Product with Natural Language via MCP

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Illustrates how to use natural language commands with an MCP-compatible client (like Cursor, Claude Code, or Gemini CLI) to automatically generate a data product specification. The example shows generating a data product for the top 5 patients with the most claims.

```json
# Natural language data product generation
# In MCP-compatible client (Cursor, Claude Code, Gemini CLI):
# /create-dp show me the top 5 patients with the most claims

# AI generates this specification automatically:
{
    "name": "top_5_patients_by_claims",
    "fields": [
        {"id": "patients.first", "name": "first_name"},
        {"id": "patients.last", "name": "last_name"},
        {
            "id": "claims.id",
            "name": "number_of_claims",
            "category": "measure",
            "measure_func": "count"
        }
    ],
    "filter": {
        "sort_by": [
            {
                "id": "claims.id",
                "alias": "number_of_claims",
                "direction": "desc"
            }
        ],
        "limit": 5
    }
}
```

--------------------------------

### Configure Databricks Connection in profiles.yml

Source: https://intugle.github.io/data-tools/docs/connectors/databricks

Provides example configurations for the profiles.yml file to connect to Databricks. This file can be used for external connections requiring host, http_path, token, schema, and catalog, or for notebook connections specifying schema and catalog.

```yaml
databricks:  
  host: <your_databricks_host>  
  http_path: <your_sql_warehouse_http_path>  
  token: <your_personal_access_token>  
  schema: <your_schema>  
  catalog: <your_catalog> # Optional, for Unity Catalog  
```

```yaml
databricks:  
  schema: <your_schema>  
  catalog: <your_catalog> # Optional, for Unity Catalog  
```

--------------------------------

### Configure Custom Connector in profiles.yml

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Example YAML configuration for a custom connector named 'myconnector'. It specifies connection details like host, port, user, password, and schema required for the adapter to establish a connection.

```yaml
# profiles.yml for custom connector
myconnector:
  host: localhost
  port: 5432
  user: myuser
  password: mypassword
  schema: public
```

--------------------------------

### Sort Data by Single Column Example (JSON Schema)

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product/sorting

An example of a JSON schema for sorting patient data by healthcare expenses in descending order. It includes field definitions and filter parameters.

```json
{
    "name": "patients_by_expenses",
    "fields": [
        {"id": "patients.first", "name": "first_name"},
        {"id": "patients.last", "name": "last_name"},
        {"id": "patients.healthcare_expenses", "name": "expenses"}
    ],
    "filter": {
        "sort_by": [
            {
                "id": "patients.healthcare_expenses",
                "direction": "desc"
            }
        ],
        "limit": 5
    }
}
```

--------------------------------

### Sort Data by Aggregated Field Example (JSON Schema)

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product/sorting

An example of a JSON schema for sorting cities by their total healthcare expenses in descending order. It defines a sum aggregation and uses the alias for sorting.

```json
{
    "name": "total_healthcare_expenses_by_city",
    "fields": [
        {"id": "patients.city", "name": "city"},
        {
            "id": "patients.healthcare_expenses",
            "name": "total_expenses",
            "category": "measure",
            "measure_func": "sum",
        },
    ],
    "filter": {
        "sort_by": [
            {
                "alias": "total_expenses",
                "direction": "desc",
            }
        ],
        "limit": 5,
    },
}
```

--------------------------------

### Implement Custom Connector Adapter Class Skeleton

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Provides a basic skeleton for a custom adapter class in Python, inheriting from `intugle.adapters.adapter.Adapter`. This class will contain the core logic for interacting with a specific data source. It includes necessary imports and placeholders for the database driver and abstract method implementations. Users should refer to existing adapters like `DatabricksAdapter` or `SnowflakeAdapter` for comprehensive examples.

```python
from typing import Any, Optional
import pandas as pd
from intugle.adapters.adapter import Adapter
from intugle.adapters.factory import AdapterFactory
from intugle.adapters.models import ColumnProfile, ProfilingOutput
from .models import MyConnectorConfig, MyConnectorConnectionConfig
from intugle.core import settings

# Import your database driver

class MyConnectorAdapter(Adapter):
    # Adapter implementation details go here
    pass

```

--------------------------------

### Access LinkPredictor via SemanticModel in Python

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

This example demonstrates accessing the LinkPredictor functionality through a SemanticModel. It shows how to build a SemanticModel with specified datasets and domain, then retrieve the link predictor instance and discovered links. Shortcuts for accessing links as a list or DataFrame are also provided. The code assumes 'datasets' is a pre-defined list of DataSet objects and 'SemanticModel' is imported.

```python
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()

predictor = sm.link_predictor

discovered_links = sm.links
links_dataframe = sm.links_df

sm.link_predictor.show_graph()
```

--------------------------------

### Get All Column Profiles DataFrame (Python)

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

This code retrieves a single Pandas DataFrame containing all column profiling metrics from all processed datasets using the 'profiling_df' property of a SemanticModel object. It then prints the first few rows of this DataFrame.

```python
# Get a single DataFrame of all column profiles
all_profiles = sm.profiling_df
print(all_profiles.head())
```

--------------------------------

### Get Consolidated Business Glossary DataFrame (Python)

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

This code retrieves a unified business glossary as a Pandas DataFrame from all datasets using the 'glossary_df' property of a SemanticModel object. The DataFrame includes table name, column name, description, and tags for every column. The first few rows are then printed.

```python
# Get a single, unified business glossary
full_glossary = sm.glossary_df
print(full_glossary.head())
```

--------------------------------

### DataProduct Sorting with Limit in Python

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

This Python snippet shows how to configure sorting and limiting for a DataProduct query. The 'filter' section within the 'product_spec' dictionary is used to define an array of 'sort_by' objects, each specifying the field 'id' and the 'direction' ('asc' or 'desc'). A 'limit' can also be applied to restrict the number of results. This example sorts patients by healthcare expenses in descending order and limits the output to the top 5.

```python
product_spec = {
    "name": "patients_by_expenses",
    "fields": [
        {"id": "patients.first", "name": "first_name"},
        {"id": "patients.last", "name": "last_name"},
        {"id": "patients.healthcare_expenses", "name": "expenses"},
    ],
    "filter": {
        "sort_by": [
            {
                "id": "patients.healthcare_expenses",
                "direction": "desc"
            }
        ],
        "limit": 5
    }
}
```

--------------------------------

### Python: Initialize and Build Semantic Model

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-search

Demonstrates how to define datasets, initialize a SemanticModel with a specified domain, build the model, and perform a semantic search. Assumes dataset configurations are available.

```Python
datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    # ... add other datasets
}

# Initialize and build the semantic model
sm = SemanticModel(datasets, domain="Healthcare")
sm.build()

# Perform a semantic search
search_results = sm.search("reason for hospital visit")

# View the search results
print(search_results)
```

--------------------------------

### Build Data Product with Specification - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/data-product

Demonstrates how to define a product specification as a Python dictionary and use the DataProduct class to build a unified data product. It shows how to select fields, specify measures with aggregation functions, and apply sorting and limits. The generated SQL query and the resulting data as a Pandas DataFrame can be accessed.

```python
from intugle import DataProduct  
  
# 1. Define the product specification for your data product  
product_spec = {  
  "name": "top_patients_by_claim_count",  
  "fields": [  
    {  
      "id": "patients.first",  
      "name": "first_name",  
    },  
    {  
      "id": "patients.last",  
      "name": "last_name",  
    },  
    {  
      "id": "claims.id",  
      "name": "number_of_claims",  
      "category": "measure",  
      "measure_func": "count"  
    }  
  ],  
  "filter": {  
    "sort_by": [  
      {  
        "id": "claims.id",  
        "alias": "number_of_claims",  
        "direction": "desc"  
      }  
    ],  
    "limit": 10  
  }  
}  
  
# 2. Initialize the DataProduct  
# It automatically loads the manifest from the current directory  
dp = DataProduct()  
  
# 3. Build the data product  
data_product = dp.build(product_spec)  
  
# 4. Access the results  
# View the data as a Pandas DataFrame  
print(data_product.to_df())  
  
# You can also inspect the generated SQL query  
print(data_product.sql_query)  

```

--------------------------------

### Build and Search with Intugle Search Client

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Demonstrates how to initialize the Intugle search client, build a search index, and perform natural language searches. Results contain metadata about the matched columns.

```python
search_client.initialize()

# Search with natural language
results = search_client.search("reason for hospital visit")

# Results include: column_id, score, relevancy, column_name,
# column_glossary, table_name, uniqueness, completeness
print(results)
```

--------------------------------

### Initialize and Use SemanticModel

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-search

This Python code shows the basic usage of the `SemanticModel` for building and performing searches. The `sm.build()` method prepares metadata, and the first call to `sm.search()` automatically vectors and indexes the data in Qdrant.

```python
from intugle import SemanticModel

# Assuming SemanticModel is initialized and configured
# sm = SemanticModel(...)

# Build the semantic model metadata
sm.build()

# Perform a search (first time will also index)
# results = sm.search("your natural language query")

```

--------------------------------

### Build Semantic Model with CSV Datasets

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Demonstrates creating a `SemanticModel` using local CSV files, running the full build pipeline (profile, predict links, generate glossary), and accessing enriched dataset metadata.

```python
from intugle import SemanticModel

# Initialize with datasets
datasets = {
    "patients": {"path": "data/patients.csv", "type": "csv"},
    "claims": {"path": "data/claims.csv", "type": "csv"},
    "allergies": {"path": "data/allergies.csv", "type": "csv"},
}

# Create semantic model with domain context
sm = SemanticModel(datasets, domain="Healthcare")

# Run full pipeline: profile, predict links, generate glossary
sm.build()

# Or run stages individually for granular control
sm.profile()
sm.predict_links()
sm.generate_glossary()

# Force rebuild, ignoring cache
sm.build(force_recreate=True)

# Access enriched datasets
patients_dataset = sm.datasets['patients']
print(f"Primary Key: {patients_dataset.source_table_model.key}")
print(f"Description: {patients_dataset.source_table_model.description}")

# Access utility DataFrames
profiling_data = sm.profiling_df  # All column profiles
relationships = sm.links_df  # Predicted relationships
glossary = sm.glossary_df  # Business glossary
```

--------------------------------

### Build and Deploy Semantic Model with Databricks

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Demonstrates initializing a `SemanticModel` with Databricks datasets, building the model, and deploying it to Databricks with various options. Includes configuration for external Databricks connections.

```python
from intugle import SemanticModel

# Databricks datasets
datasets = {
    "CUSTOMERS": {
        "identifier": "CUSTOMERS",
        "type": "databricks"
    },
    "ORDERS": {
        "identifier": "ORDERS",
        "type": "databricks"
    }
}

sm = SemanticModel(datasets, domain="E-commerce")
sm.build()

# Deploy to Databricks: sync glossary, tags, and set constraints
sm.deploy(target="databricks")

# Control deployment options
sm.deploy(
    target="databricks",
    sync_glossary=True,
    sync_tags=True,
    set_primary_keys=True,
    set_foreign_keys=True
)
```

```yaml
# profiles.yml for external Databricks connection
databricks:
  host: your_databricks_host
  http_path: your_sql_warehouse_http_path
  token: your_personal_access_token
  schema: your_schema
  catalog: your_catalog
```

--------------------------------

### Get Discovered Links DataFrame (Python)

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

This snippet demonstrates how to access a Pandas DataFrame of all discovered relationships (links) using the 'links_df' property of a SemanticModel object. The resulting DataFrame is then printed.

```python
# Get a DataFrame of all predicted links
all_links = sm.links_df
print(all_links)
```

--------------------------------

### Build Unified Data Queries with DataProduct in Python

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

This snippet illustrates how to use the DataProduct class to define and build a unified data query. It involves specifying the product's name, fields (with aliasing and measure functions), and filtering criteria (sorting and limiting). The code shows initializing DataProduct, building the product specification, and accessing the results as a DataFrame or inspecting the generated SQL query. Dependencies include 'intugle.DataProduct'.

```python
from intugle import DataProduct

product_spec = {
    "name": "top_patients_by_claim_count",
    "fields": [
        {
            "id": "patients.first",
            "name": "first_name",
        },
        {
            "id": "patients.last",
            "name": "last_name",
        },
        {
            "id": "claims.id",
            "name": "number_of_claims",
            "category": "measure",
            "measure_func": "count"
        }
    ],
    "filter": {
        "sort_by": [
            {
                "id": "claims.id",
                "alias": "number_of_claims",
                "direction": "desc"
            }
        ],
        "limit": 10
    }
}

dp = DataProduct()

data_product = dp.build(product_spec)

print(data_product.to_df())

print(data_product.sql_query)
```

--------------------------------

### Initialize SemanticModel from Dictionary - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

Initializes the SemanticModel using a dictionary where keys are dataset names and values contain configuration like path and type. This is the recommended and most common method. It requires the 'intugle' library.

```python
from intugle import SemanticModel

datasets = {
    "allergies": {"path": "path/to/allergies.csv", "type": "csv"},
    "patients": {"path": "path/to/patients.csv", "type": "csv"},
    "claims": {"path": "path/to/claims.csv", "type": "csv"},
}

sm = SemanticModel(datasets, domain="Healthcare")

```

--------------------------------

### Initialize SemanticModel from DataSet Objects - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

Initializes the SemanticModel with a list of pre-configured DataSet objects for more advanced scenarios. This requires the 'intugle' library and pre-instantiated DataSet objects.

```python
from intugle import SemanticModel, DataSet

# Create DataSet objects first
dataset_allergies = DataSet(data={"path": "path/to/allergies.csv", "type": "csv"}, name="allergies")
dataset_patients = DataSet(data={"path": "path/to/patients.csv", "type": "csv"}, name="patients")

# Initialize the SemanticModel with the list of objects
sm = SemanticModel([dataset_allergies, dataset_patients], domain="Healthcare")

```

--------------------------------

### Registering MyConnector Adapter with Factory

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Provides Python functions to register a custom adapter (`MyConnectorAdapter`) with the `AdapterFactory`. `can_handle_myconnector` checks if a given data configuration is compatible, and `register` adds the adapter to the factory.

```python
# In src/intugle/adapters/types/myconnector/myconnector.py

def can_handle_myconnector(df: Any) -> bool:
    try:
        MyConnectorConfig.model_validate(df)
        return True
    except Exception:
        return False

def register(factory: AdapterFactory):
    # Check if the required driver is installed
    # if MYCONNECTOR_DRIVER_AVAILABLE:
    factory.register("myconnector", can_handle_myconnector, MyConnectorAdapter)
```

--------------------------------

### Adding Adapter to Default Plugins List

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Shows how to include the newly created custom adapter module in the `DEFAULT_PLUGINS` list within the `AdapterFactory` to make it discoverable by Intugle.

```python
# In src/intugle/adapters/factory.py

DEFAULT_PLUGINS = [
    "intugle.adapters.types.pandas.pandas",
    # ... other adapters
    "intugle.adapters.types.myconnector.myconnector",
]
```

--------------------------------

### Python: Standalone Semantic Search Initialization and Querying

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-search

Shows how to use the `SemanticSearch` class directly for initializing the search index and performing queries. This is useful when bypassing the full `SemanticModel` pipeline. Assumes .yml files are in the default location or a custom path is provided.

```Python
from intugle.semantic_search import SemanticSearch

# This assumes your project's .yml files are in the default location.
# You can also specify the path to your models directory:
# search_client = SemanticSearch(project_base="/path/to/your/models")
search_client = SemanticSearch()

# 1. Initialize the search index.
# This reads the .yml files, vectorizes the metadata, and populates Qdrant.
# You only need to run this once, or whenever your source metadata changes.
print("Initializing semantic search index...")
search_client.initialize()
print("Initialization complete.")

# 2. Perform a search.
query = "reason for hospital visit"
search_results = search_client.search(query)

# View the results
print(search_results)
```

--------------------------------

### Implement Custom Connector Adapter

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Provides a skeleton for implementing a custom data adapter in Intugle. It includes methods for initialization, profiling, query execution, and data loading, along with registration logic.

```python
# Step 2: Implement adapter in myconnector.py
from typing import Any, Optional
import pandas as pd
from intugle.adapters.adapter import Adapter
from intugle.adapters.factory import AdapterFactory
from intugle.adapters.models import ColumnProfile, ProfilingOutput
from .models import MyConnectorConfig, MyConnectorConnectionConfig
from intugle.core import settings

class MyConnectorAdapter(Adapter):
    def __init__(self):
        connection_params = settings.PROFILES.get("myconnector", {})
        config = MyConnectorConnectionConfig.model_validate(connection_params)
        # self.connection = myconnector_driver.connect(**config.model_dump())
        pass

    def profile(self, data: Any, table_name: str) -> ProfilingOutput:
        # Return table metadata: row count, column names, dtypes
        raise NotImplementedError()

    def column_profile(self, data: Any, table_name: str, column_name: str,
                      total_count: int) -> Optional[ColumnProfile]:
        # Return column statistics
        raise NotImplementedError()

    def execute(self, query: str):
        # Execute query and return results
        raise NotImplementedError()

    def to_df_from_query(self, query: str) -> pd.DataFrame:
        # Execute and return DataFrame
        raise NotImplementedError()

    def create_table_from_query(self, table_name: str, query: str) -> str:
        # Materialize query as table/view
        raise NotImplementedError()

    def create_new_config_from_etl(self, etl_name: str) -> "DataSetData":
        return MyConnectorConfig(identifier=etl_name)

    def intersect_count(self, table1: "DataSet", column1_name: str,
                       table2: "DataSet", column2_name: str) -> int:
        # Calculate intersecting values count
        raise NotImplementedError()

    def load(self, data: Any, table_name: str):
        pass

    def to_df(self, data: DataSetData, table_name: str):
        config = MyConnectorConfig.model_validate(data)
        return self.to_df_from_query(f"SELECT * FROM {config.identifier}")

    def get_details(self, data: DataSetData):
        config = MyConnectorConfig.model_validate(data)
        return config.model_dump()

# Step 3: Register adapter
def can_handle_myconnector(df: Any) -> bool:
    try:
        MyConnectorConfig.model_validate(df)
        return True
    except Exception:
        return False

def register(factory: AdapterFactory):
    factory.register("myconnector", can_handle_myconnector, MyConnectorAdapter)
```

--------------------------------

### Semantic Search - Environment Configuration (Bash)

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Shell commands to set environment variables for configuring semantic search, including Qdrant URL, API keys for OpenAI or Azure OpenAI, and the embedding model name. These variables are crucial for connecting to the necessary services.

```bash
# Configure environment
export QDRANT_URL="http://localhost:6333"
export QDRANT_API_KEY="your-qdrant-api-key"
export EMBEDDING_MODEL_NAME="openai:ada"
export OPENAI_API_KEY="your-openai-api-key"

# For Azure OpenAI
export EMBEDDING_MODEL_NAME="azure_openai:ada"
export AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
export AZURE_OPENAI_ENDPOINT="your-azure-openai-endpoint"
export OPENAI_API_VERSION="your-openai-api-version"
```

--------------------------------

### Explore Primary Key Description and Discovered Links (Python)

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

This snippet shows how to print the description of the primary key for a customers dataset and display discovered links using the LinkPredictor. It assumes 'customers_dataset' and 'link_predictor' objects are already initialized.

```python
print(f"Primary Key for customers: {customers_dataset.source_table_model.description}")
print("Discovered Links:")
print(link_predictor.get_links_df())
```

--------------------------------

### Deploy Semantic Model to Databricks

Source: https://intugle.github.io/data-tools/docs/connectors/databricks

Shows how to deploy a built Semantic Model to Databricks using the 'deploy()' method. This synchronizes metadata (comments, tags) and sets constraints (primary, foreign keys). Optional parameters allow granular control over the deployment process.

```python
# Deploy the model to Databricks  
sm.deploy(target="databricks")  
  
# You can also control which parts of the deployment to run  
sm.deploy(  
    target="databricks",  
    sync_glossary=True,  
    sync_tags=True,  
    set_primary_keys=True,  
    set_foreign_keys=True  
)
```

--------------------------------

### Configure Environment Variables for OpenAI Embeddings

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-search

These environment variables configure the connection to a Qdrant instance and an OpenAI embedding model. Set `QDRANT_URL`, `QDRANT_API_KEY` (if authorization is enabled), `EMBEDDING_MODEL_NAME`, and `OPENAI_API_KEY`.

```bash
# The URL of your running Qdrant instance
export QDRANT_URL="http://localhost:6333"

# Your Qdrant API key (only if you have enabled authorization)
export QDRANT_API_KEY="your-qdrant-api-key"

# The embedding model to use (the default is openai:ada)
export EMBEDDING_MODEL_NAME="openai:ada"

# Your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key"
```

--------------------------------

### Run Full Semantic Model Pipeline - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

Executes the entire SemanticModel pipeline, including profiling, link prediction, and business glossary generation, in the correct sequence. The build() method can also force a re-run of all stages, ignoring cached results.

```python
# Run the full pipeline from start to finish
sm.build()

# You can also force it to re-run everything, ignoring any cached results
sm.build(force_recreate=True)

```

--------------------------------

### Generate Business Glossary - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

Executes the business glossary generation stage, using an LLM to create business-friendly context for the data. This stage assumes the profile() stage has already been run. The generated information is saved back into each dataset's .yml file.

```python
# Run the glossary generation stage
# This assumes profile() has already been run
sm.generate_glossary()

```

--------------------------------

### MyConnectorAdapter Class Definition

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Defines the abstract base class for a custom data adapter, outlining methods for profiling, querying, and data manipulation. Specific implementations like `MyConnectorAdapter` inherit from this and must implement these methods.

```python
class MyConnectorAdapter(Adapter):
    def __init__(self):
        # Initialize your connection here
        connection_params = settings.PROFILES.get("myconnector", {})
        config = MyConnectorConnectionConfig.model_validate(connection_params)
        # self.connection = myconnector_driver.connect(**config.model_dump())
        pass

    # --- Must be implemented ---

    def profile(self, data: Any, table_name: str) -> ProfilingOutput:
        # Return table-level metadata: row count, column names, and dtypes
        raise NotImplementedError()

    def column_profile(self, data: Any, table_name: str, column_name: str, total_count: int) -> Optional[ColumnProfile]:
        # Return column-level statistics: null count, distinct count, samples, etc.
        raise NotImplementedError()

    def execute(self, query: str):
        # Execute a query and return the raw results
        raise NotImplementedError()

    def to_df_from_query(self, query: str) -> pd.DataFrame:
        # Execute a query and return the result as a pandas DataFrame
        raise NotImplementedError()

    def create_table_from_query(self, table_name: str, query: str) -> str:
        # Materialize a query as a new table or view
        raise NotImplementedError()

    def create_new_config_from_etl(self, etl_name: str) -> "DataSetData":
        # Return a new MyConnectorConfig for a materialized table
        return MyConnectorConfig(identifier=etl_name)

    def intersect_count(self, table1: "DataSet", column1_name: str, table2: "DataSet", column2_name: str) -> int:
        # Calculate the count of intersecting values between two columns
        raise NotImplementedError()

    # --- Other required methods ---

    def load(self, data: Any, table_name: str):
        # For database adapters, this is often a no-op
        pass

    def to_df(self, data: DataSetData, table_name: str):
        # Read an entire table into a pandas DataFrame
        config = MyConnectorConfig.model_validate(data)
        return self.to_df_from_query(f"SELECT * FROM {config.identifier}")

    def get_details(self, data: DataSetData):
        config = MyConnectorConfig.model_validate(data)
        return config.model_dump()
```

--------------------------------

### Run Profiling Stage - Python

Source: https://intugle.github.io/data-tools/docs/core-concepts/semantic-intelligence/semantic-model

Executes the profiling stage of the SemanticModel pipeline, which performs a deep analysis of each dataset, including structure, content, datatype identification, and key identification. Progress is saved to a .yml file for each dataset.

```python
# Run only the profiling and key identification stage
sm.profile()

```

--------------------------------

### Update DataSetData Type Hint with New Connector Model

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Integrates the newly defined `MyConnectorConfig` into the `DataSetData` type hint within `src/intugle/adapters/models.py`. This ensures that the `intugle` factory can recognize and handle configurations for the new connector type, allowing it to be used alongside other supported data sources.

```python
# src/intugle/adapters/models.py

# ... other imports
from intugle.adapters.types.myconnector.models import MyConnectorConfig

DataSetData = pd.DataFrame | DuckdbConfig | ... | MyConnectorConfig
```

--------------------------------

### Define Pydantic Models for Connector Configuration

Source: https://intugle.github.io/data-tools/docs/connectors/implementing-a-connector

Defines Pydantic models for connection parameters and data identification for a custom connector. The `MyConnectorConnectionConfig` model specifies connection details like host, port, user, and password, while `MyConnectorConfig` defines how to identify a specific table or asset. These models are essential for configuring and interacting with the data source.

```python
from typing import Optional
from intugle.common.schema import SchemaBase

class MyConnectorConnectionConfig(SchemaBase):
    host: str
    port: int
    user: str
    password: str
    schema: Optional[str] = None

class MyConnectorConfig(SchemaBase):
    identifier: str
    type: str = "myconnector"
```

--------------------------------

### DataProduct - Wildcard Filtering (LIKE) (Python)

Source: https://context7.com/context7/intugle_github_io_data-tools/llms.txt

Shows how to perform wildcard filtering on string fields, allowing for 'contains', 'starts_with', and 'ends_with' matching. This is useful for flexible text-based searches. Requires the DataProduct class.

```python
# Wildcard filtering (LIKE)
product_spec = {
    "name": "fracture_conditions",
    "fields": [
        {"id": "conditions.description", "name": "condition_description"},
    ],
    "filter": {
        "wildcards": [
            {
                "id": "conditions.description",
                "value": "fracture",
                "option": "contains",  # starts_with, ends_with, exactly_matches
            }
        ],
    },
}

dp = DataProduct()
result = dp.build(product_spec)
```