### Database Filtering with EntitySet

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Explains how to filter database schemas to include or exclude specific tables and columns using the EntitySet pattern. This is useful for managing sensitive data or focusing on relevant parts of a large schema. Examples show including specific columns and excluding sensitive ones.

```python
from nl2sql.datasets.base import Dataset, Database

# Create dataset
dataset = Dataset.from_connection_strings(
    name_connstr_map={"mydb": "postgresql://user:pass@localhost/mydb"}
)

database = dataset.get_database("mydb")

# Filter to only include specific tables/columns
filtered_db = database.filter(
    filters=[
        "mydb.users.*",
```

```python
        "mydb.orders.id",    # Include only id from orders
        "mydb.orders.total"  # Include only total from orders
    ],
    filter_type="only"  # "only" keeps matching, "exclude" removes matching
)

# Filter to exclude sensitive columns
safe_db = database.filter(
    filters=[
        "mydb.users.password",
        "mydb.users.ssn",
        "*.*.secret_*"  # Wildcard patterns supported
    ],
    filter_type="exclude"
)

# Execute queries on filtered database
result = safe_db.execute("SELECT * FROM users LIMIT 10")
print(result)  # Returns pandas DataFrame
```

--------------------------------

### Initialize Dataset from Connection Strings

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Demonstrates how to create a Dataset object from multiple database connection strings, including options for excluding sensitive entities and providing a data dictionary for improved context.

```python
dataset = Dataset.from_connection_strings(
    name_connstr_map={
        "ecommerce": "postgresql://user:pass@localhost/ecommerce",
        "analytics": "bigquery://project/analytics_dataset"
    },
    exclude_entities=[
        "ecommerce.users.password_hash",
        "ecommerce.users.ssn",
        "*.*.created_at"
    ],
    data_dictionary={
        "ecommerce": {
            "description": "E-commerce transaction database",
            "tables": {
                "orders": {
                    "description": "Customer orders",
                    "columns": {
                        "total_amount": {"description": "Order total in USD", "type": "DECIMAL"}
                    }
                }
            }
        }
    },
    enum_limit=10
)
db = dataset.get_database("ecommerce")
print(dataset.list_databases)
```

--------------------------------

### Creating Custom Prompts for SQL Generation

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Demonstrates how to define and use custom prompt templates for the CoreSqlGenerator. This involves creating Langchain PromptTemplates, defining output parsers, and configuring a custom prompt object that can be passed to the generator.

```python
from langchain.prompts import PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from nl2sql.tasks.sql_generation.core import CoreSqlGenerator
from nl2sql.tasks.sql_generation.core import prompts as sg_prompts
from nl2sql.llms.vertexai import text_bison_32k

# Define custom output parser
custom_parser = StructuredOutputParser.from_response_schemas([
    ResponseSchema(name="thoughts", description="Step-by-step reasoning"),
    ResponseSchema(name="query", description="The SQL query")
])

# Create custom prompt template
custom_template = PromptTemplate(
    input_variables=["question", "table_info", "dialect"],
    template="""You are a SQL expert. Given the following database schema:
{table_info}

Write a {dialect} query to answer: {question}

{format_instructions}
"""
)

# Create custom prompt configuration
custom_prompt = sg_prompts.custom_prompt(
    prompt_template=custom_template,
    parser=custom_parser,
    post_processor=lambda x: x.get("query"),
    prompt_template_id="my_custom_prompt"
)

# Use custom prompt with generator
llm = text_bison_32k()
generator = CoreSqlGenerator(llm=llm, prompt=custom_prompt)
```

--------------------------------

### Vertex AI LLM Configuration (text_bison_32k and text_bison_latest)

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Shows how to create instances of Vertex AI LLM models, specifically text_bison_32k for larger context windows and text_bison_latest for standard use. Includes configuration of parameters like max_output_tokens, temperature, top_p, top_k, and candidate_count, as well as utility methods for token counting.

```python
from nl2sql.llms.vertexai import text_bison_32k, text_bison_latest

# Standard text-bison model (3K input tokens)
llm_standard = text_bison_latest(
    max_output_tokens=1024,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    candidate_count=3  # Generate multiple candidates
)

# text-bison-32k for large schemas (24K input tokens)
llm_32k = text_bison_32k(
    max_output_tokens=8000,
    temperature=0.1,
    top_p=0.8,
    top_k=40
)

# Utility methods
token_count = llm_32k.get_num_tokens("SELECT * FROM users")
max_tokens = llm_32k.get_max_input_tokens()  # Returns 24000
```

--------------------------------

### Execute NL2SQL Pipeline with CoreLinearExecutor

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Demonstrates how to initialize the CoreLinearExecutor using a standard dataset and execute a natural language query. It shows how to access the generated SQL, intermediate reasoning steps, and fetch results as a DataFrame.

```python
from nl2sql.datasets import fetch_dataset
from nl2sql.executors.linear_executor.core import CoreLinearExecutor
from nl2sql.llms.vertexai import text_bison_32k

dataset = fetch_dataset("spider.test")
executor = CoreLinearExecutor(dataset=dataset)

db_name = "pets_1"
question = "Find the average weight for each pet type."

result = executor(db_name, question)
print(result.generated_query)
print(result.selected_tables)
print(result.selected_columns)

df = executor.fetch_result(result)
print(df)
```

--------------------------------

### Configure Custom NL2SQL Pipeline with Excel Data

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Illustrates creating an executor from an Excel file while injecting custom task configurations. This includes defining specific LLM settings and few-shot Chain-of-Thought prompts for table, column, and SQL generation tasks.

```python
from nl2sql.executors.linear_executor.core import CoreLinearExecutor
from nl2sql.llms.vertexai import text_bison_32k
from nl2sql.tasks.table_selection.core import CoreTableSelector, prompts as ts_prompts
from nl2sql.tasks.column_selection.core import CoreColumnSelector, prompts as cs_prompts
from nl2sql.tasks.sql_generation.core import CoreSqlGenerator, prompts as sg_prompts

llm = text_bison_32k()
table_selector = CoreTableSelector(llm=llm, prompt=ts_prompts.CURATED_FEW_SHOT_COT_PROMPT)
column_selector = CoreColumnSelector(llm=llm, prompt=cs_prompts.CURATED_FEW_SHOT_COT_PROMPT)
sql_generator = CoreSqlGenerator(llm=llm, prompt=sg_prompts.CURATED_FEW_SHOT_COT_PROMPT)

executor = CoreLinearExecutor.from_excel(
    filepath="data/my_database.xlsx",
    dataset_name="my_dataset",
    project_id="gcp-project-id",
    core_table_selector=table_selector,
    core_column_selector=column_selector,
    core_sql_generator=sql_generator
)

result = executor("my_dataset", "What is the average order price?")
print(result.generated_query)
df = executor.fetch_result(result)
print(df)
```

--------------------------------

### Dataset Management

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Methods for initializing datasets from connection strings and fetching standard benchmark datasets.

```APIDOC
## Dataset.from_connection_strings

### Description
Initializes a dataset object from a map of database names to connection strings, with support for entity exclusion and data dictionaries.

### Method
POST (Python Class Method)

### Parameters
- **name_connstr_map** (dict) - Required - Map of database names to connection strings.
- **exclude_entities** (list) - Optional - List of entities to exclude in database.table.column format.
- **data_dictionary** (dict) - Optional - Metadata for tables and columns.

### Response
- **Dataset** (object) - Returns a Dataset instance containing the specified databases.
```

--------------------------------

### Initialize CoreLinearExecutor from Connection Strings

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Shows how to create an executor instance by mapping database names to SQLAlchemy connection strings. This allows the framework to interact with diverse database systems like PostgreSQL and BigQuery.

```python
from nl2sql.executors.linear_executor.core import CoreLinearExecutor

connection_map = {
    "sales_db": "postgresql://user:pass@localhost:5432/sales",
    "inventory_db": "bigquery://project-id/inventory_dataset"
}

executor = CoreLinearExecutor.from_connection_string_map(
    connection_string_map=connection_map
)

result = executor("sales_db", "What are the top 10 products by revenue?")
print(result.generated_query)
```

--------------------------------

### Few-shot Chain-of-Thought Prompting for Complex Queries

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Demonstrates using a few-shot Chain-of-Thought prompt with CoreSqlGenerator for complex SQL queries. It shows how to instantiate the generator and access the intermediate reasoning steps and the generated query.

```python
generator3 = CoreSqlGenerator(
    llm=llm,
    prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT
)
result3 = generator3(db=database, question=question)

# Access the reasoning process
print(result3.intermediate_steps[0]['parsed_response']['thoughts'])
print(result3.intermediate_steps[0]['parsed_response']['query'])
```

--------------------------------

### CoreEvalFix: Evaluating and Fixing SQL Queries

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Illustrates the CoreEvalFix task for evaluating generated SQL queries and automatically fixing syntax errors using retry logic. It includes fetching a dataset, initializing an LLM, defining a broken query, and then attempting to fix it.

```python
from nl2sql.datasets import fetch_dataset
from nl2sql.llms.vertexai import text_bison_32k
from nl2sql.tasks.eval_fix.core import CoreEvalFix, prompts

dataset = fetch_dataset("spider.test")
database = dataset.get_database("pets_1")
llm = text_bison_32k()

# A query with potential issues
question = "Find the average weight for each pet type."
broken_query = "SELECT pet_type, AVERAGE(weight) FROM pet GROUP BY pet_type"

# Create eval/fix task with retry configuration
eval_fix = CoreEvalFix(
    llm=llm,
    prompt=prompts.CURATED_ZERO_SHOT_PROMPT,
    num_retries=10  # Maximum fix attempts
)

# Attempt to fix the query
result = eval_fix(db=database, question=question, query=broken_query)

print(f"Original query: {result.original_query}")
print(f"Fixed query: {result.modified_query}")
# Output: Fixed query: SELECT pet_type, AVG(weight) FROM pets GROUP BY pet_type

# View fix attempts in intermediate steps
for step in result.intermediate_steps:
    print(step)
```

--------------------------------

### Environment Configuration for NL2SQL

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Shows how to configure environment variables for the NL2SQL project, specifically enabling local analytics and logging by setting the NL2SQL_ENABLE_ANALYTICS environment variable to '1'.

```python
import os

# Enable local analytics/logging
os.environ["NL2SQL_ENABLE_ANALYTICS"] = "1"
```

--------------------------------

### CoreSqlGenerator

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Task for generating executable SQL queries from natural language questions.

```APIDOC
## CoreSqlGenerator

### Description
Generates SQL queries using filtered tables and columns, supporting dialect-specific templates.

### Method
POST (Task Execution)

### Parameters
- **db** (Database) - Required - The database instance.
- **question** (str) - Required - The natural language query.

### Response
- **result** (object) - Contains 'generated_query' string.
```

--------------------------------

### Configure NL2SQL Logging and GCP Project

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Sets environment variables to configure logging to a Google Cloud Storage bucket and disable system info for privacy. It also sets the Google Cloud project for BigQuery dataset operations. These settings are crucial for managing logs and specifying the GCP environment for the NL2SQL framework.

```python
import os
os.environ["NL2SQL_LOG_BUCKET"] = "my-logs-bucket"
os.environ["NL2SQL_DISABLE_SYSINFO"] = "1"
os.environ["GOOGLE_CLOUD_PROJECT"] = "my-gcp-project"
```

--------------------------------

### Initialize CoreLinearExecutor for NL2SQL

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Imports and initializes the CoreLinearExecutor from the nl2sql library. This executor is fundamental for processing natural language queries and converting them into SQL. Once initialized, all subsequent executor and task operations will be logged according to the previously set environment variables.

```python
from nl2sql.executors.linear_executor.core import CoreLinearExecutor
# Now all executor and task operations will be logged
```

--------------------------------

### Fetch Benchmark Datasets

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Utility function to load standard benchmark datasets like Spider for testing and evaluation purposes.

```python
from nl2sql.datasets import fetch_dataset

train_dataset = fetch_dataset("spider.train")
test_dataset = fetch_dataset("spider.test")
database = train_dataset.get_database("culture_company")
print(train_dataset.list_databases)
```

--------------------------------

### Generate SQL Queries

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Uses CoreSqlGenerator to produce SQL queries from natural language questions, utilizing filtered tables and columns with support for various prompt strategies.

```python
from nl2sql.datasets import fetch_dataset
from nl2sql.llms.vertexai import text_bison_32k
from nl2sql.tasks.sql_generation.core import CoreSqlGenerator, prompts

dataset = fetch_dataset("spider.test")
database = dataset.get_database("pets_1")
llm = text_bison_32k()
question = "Find the average weight for each pet type."

generator1 = CoreSqlGenerator(llm=llm)
result1 = generator1(db=database, question=question)

generator2 = CoreSqlGenerator(llm=llm, prompt=prompts.CURATED_ZERO_SHOT_PROMPT)
result2 = generator2(db=database, question=question)
```

--------------------------------

### CoreLinearExecutor from Excel

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Initializes the executor from an Excel file, automatically creating a BigQuery dataset schema.

```APIDOC
## POST /executor/from_excel

### Description
Creates an executor instance by parsing an Excel file and mapping it to a BigQuery dataset.

### Method
POST

### Endpoint
/executor/from_excel

### Parameters
#### Request Body
- **filepath** (string) - Required - Path to the Excel file.
- **dataset_name** (string) - Required - Name for the target dataset.
- **project_id** (string) - Required - GCP project ID.

### Request Example
{
  "filepath": "data/my_database.xlsx",
  "dataset_name": "my_dataset",
  "project_id": "gcp-project-id"
}
```

--------------------------------

### CoreLinearExecutor from Connection String

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Initializes the executor using a mapping of database names to SQLAlchemy connection strings.

```APIDOC
## POST /executor/from_connection_string

### Description
Configures the executor to interface with external databases like PostgreSQL, MySQL, or BigQuery via connection strings.

### Method
POST

### Endpoint
/executor/from_connection_string

### Parameters
#### Request Body
- **connection_string_map** (object) - Required - Dictionary mapping database names to connection strings.

### Request Example
{
  "connection_string_map": {
    "sales_db": "postgresql://user:pass@localhost:5432/sales",
    "inventory_db": "bigquery://project-id/inventory_dataset"
  }
}
```

--------------------------------

### CoreLinearExecutor Execution

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Orchestrates the complete NL2SQL pipeline including table selection, column selection, and SQL generation.

```APIDOC
## POST /executor/run

### Description
Executes the linear pipeline to convert a natural language question into a SQL query using a specified database context.

### Method
POST

### Endpoint
/executor/run

### Parameters
#### Request Body
- **db_name** (string) - Required - The identifier for the database connection.
- **question** (string) - Required - The natural language query to be converted.

### Request Example
{
  "db_name": "pets_1",
  "question": "Find the average weight for each pet type."
}

### Response
#### Success Response (200)
- **generated_query** (string) - The resulting SQL query.
- **selected_tables** (list) - List of tables identified for the query.
- **selected_columns** (list) - List of columns identified for the query.
```

--------------------------------

### Perform Table Selection

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Uses CoreTableSelector to identify relevant tables from a database based on a natural language question. Supports zero-shot and few-shot Chain-of-Thought prompting.

```python
from nl2sql.datasets import fetch_dataset
from nl2sql.llms.vertexai import text_bison_latest
from nl2sql.tasks.table_selection.core import CoreTableSelector, prompts

dataset = fetch_dataset("spider.train")
database = dataset.get_database("culture_company")
llm = text_bison_latest()
question = "What are all company names that have a corresponding movie directed in 1999?"

selector1 = CoreTableSelector(llm=llm)
result1 = selector1(db=database, question=question)

selector2 = CoreTableSelector(llm=llm, prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT)
result2 = selector2(db=database, question=question)
```

--------------------------------

### CoreTableSelector

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Task for identifying relevant tables from a database based on a natural language question.

```APIDOC
## CoreTableSelector

### Description
Selects relevant tables from a database using LLM prompting strategies like zero-shot or few-shot Chain-of-Thought.

### Method
POST (Task Execution)

### Parameters
- **db** (Database) - Required - The database instance.
- **question** (str) - Required - The natural language query.
- **prompt** (PromptTemplate) - Optional - The prompting strategy to use.

### Response
- **result** (object) - Contains 'available_tables' and 'selected_tables'.
```

--------------------------------

### Perform Column Selection

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Uses CoreColumnSelector to identify relevant columns from pre-selected tables based on a natural language question.

```python
from nl2sql.datasets import fetch_dataset
from nl2sql.llms.vertexai import text_bison_32k
from nl2sql.tasks.column_selection.core import CoreColumnSelector, prompts

dataset = fetch_dataset("spider.test")
database = dataset.get_database("pets_1")
llm = text_bison_32k()
question = "Find the average weight for each pet type."

selector = CoreColumnSelector(llm=llm)
result = selector(db=database, question=question)

selector_cot = CoreColumnSelector(llm=llm, prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT)
result_cot = selector_cot(db=database, question=question)
```

--------------------------------

### CoreColumnSelector

Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt

Task for selecting relevant columns from pre-selected tables based on a natural language question.

```APIDOC
## CoreColumnSelector

### Description
Filters columns from a database based on the user's question, typically following table selection.

### Method
POST (Task Execution)

### Parameters
- **db** (Database) - Required - The database instance.
- **question** (str) - Required - The natural language query.

### Response
- **result** (object) - Contains 'available_columns' and 'selected_columns'.
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.