### Database Filtering with EntitySet Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Explains how to filter database schemas to include or exclude specific tables and columns using the EntitySet pattern. This is useful for managing sensitive data or focusing on relevant parts of a large schema. Examples show including specific columns and excluding sensitive ones. ```python from nl2sql.datasets.base import Dataset, Database # Create dataset dataset = Dataset.from_connection_strings( name_connstr_map={"mydb": "postgresql://user:pass@localhost/mydb"} ) database = dataset.get_database("mydb") # Filter to only include specific tables/columns filtered_db = database.filter( filters=[ "mydb.users.*", ``` ```python "mydb.orders.id", # Include only id from orders "mydb.orders.total" # Include only total from orders ], filter_type="only" # "only" keeps matching, "exclude" removes matching ) # Filter to exclude sensitive columns safe_db = database.filter( filters=[ "mydb.users.password", "mydb.users.ssn", "*.*.secret_*" # Wildcard patterns supported ], filter_type="exclude" ) # Execute queries on filtered database result = safe_db.execute("SELECT * FROM users LIMIT 10") print(result) # Returns pandas DataFrame ``` -------------------------------- ### Initialize Dataset from Connection Strings Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Demonstrates how to create a Dataset object from multiple database connection strings, including options for excluding sensitive entities and providing a data dictionary for improved context. ```python dataset = Dataset.from_connection_strings( name_connstr_map={ "ecommerce": "postgresql://user:pass@localhost/ecommerce", "analytics": "bigquery://project/analytics_dataset" }, exclude_entities=[ "ecommerce.users.password_hash", "ecommerce.users.ssn", "*.*.created_at" ], data_dictionary={ "ecommerce": { "description": "E-commerce transaction database", "tables": { "orders": { "description": "Customer orders", "columns": { "total_amount": {"description": "Order total in USD", "type": "DECIMAL"} } } } } }, enum_limit=10 ) db = dataset.get_database("ecommerce") print(dataset.list_databases) ``` -------------------------------- ### Creating Custom Prompts for SQL Generation Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Demonstrates how to define and use custom prompt templates for the CoreSqlGenerator. This involves creating Langchain PromptTemplates, defining output parsers, and configuring a custom prompt object that can be passed to the generator. ```python from langchain.prompts import PromptTemplate from langchain.output_parsers import ResponseSchema, StructuredOutputParser from nl2sql.tasks.sql_generation.core import CoreSqlGenerator from nl2sql.tasks.sql_generation.core import prompts as sg_prompts from nl2sql.llms.vertexai import text_bison_32k # Define custom output parser custom_parser = StructuredOutputParser.from_response_schemas([ ResponseSchema(name="thoughts", description="Step-by-step reasoning"), ResponseSchema(name="query", description="The SQL query") ]) # Create custom prompt template custom_template = PromptTemplate( input_variables=["question", "table_info", "dialect"], template="""You are a SQL expert. Given the following database schema: {table_info} Write a {dialect} query to answer: {question} {format_instructions} """ ) # Create custom prompt configuration custom_prompt = sg_prompts.custom_prompt( prompt_template=custom_template, parser=custom_parser, post_processor=lambda x: x.get("query"), prompt_template_id="my_custom_prompt" ) # Use custom prompt with generator llm = text_bison_32k() generator = CoreSqlGenerator(llm=llm, prompt=custom_prompt) ``` -------------------------------- ### Vertex AI LLM Configuration (text_bison_32k and text_bison_latest) Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Shows how to create instances of Vertex AI LLM models, specifically text_bison_32k for larger context windows and text_bison_latest for standard use. Includes configuration of parameters like max_output_tokens, temperature, top_p, top_k, and candidate_count, as well as utility methods for token counting. ```python from nl2sql.llms.vertexai import text_bison_32k, text_bison_latest # Standard text-bison model (3K input tokens) llm_standard = text_bison_latest( max_output_tokens=1024, temperature=0.1, top_p=0.8, top_k=40, candidate_count=3 # Generate multiple candidates ) # text-bison-32k for large schemas (24K input tokens) llm_32k = text_bison_32k( max_output_tokens=8000, temperature=0.1, top_p=0.8, top_k=40 ) # Utility methods token_count = llm_32k.get_num_tokens("SELECT * FROM users") max_tokens = llm_32k.get_max_input_tokens() # Returns 24000 ``` -------------------------------- ### Execute NL2SQL Pipeline with CoreLinearExecutor Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Demonstrates how to initialize the CoreLinearExecutor using a standard dataset and execute a natural language query. It shows how to access the generated SQL, intermediate reasoning steps, and fetch results as a DataFrame. ```python from nl2sql.datasets import fetch_dataset from nl2sql.executors.linear_executor.core import CoreLinearExecutor from nl2sql.llms.vertexai import text_bison_32k dataset = fetch_dataset("spider.test") executor = CoreLinearExecutor(dataset=dataset) db_name = "pets_1" question = "Find the average weight for each pet type." result = executor(db_name, question) print(result.generated_query) print(result.selected_tables) print(result.selected_columns) df = executor.fetch_result(result) print(df) ``` -------------------------------- ### Configure Custom NL2SQL Pipeline with Excel Data Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Illustrates creating an executor from an Excel file while injecting custom task configurations. This includes defining specific LLM settings and few-shot Chain-of-Thought prompts for table, column, and SQL generation tasks. ```python from nl2sql.executors.linear_executor.core import CoreLinearExecutor from nl2sql.llms.vertexai import text_bison_32k from nl2sql.tasks.table_selection.core import CoreTableSelector, prompts as ts_prompts from nl2sql.tasks.column_selection.core import CoreColumnSelector, prompts as cs_prompts from nl2sql.tasks.sql_generation.core import CoreSqlGenerator, prompts as sg_prompts llm = text_bison_32k() table_selector = CoreTableSelector(llm=llm, prompt=ts_prompts.CURATED_FEW_SHOT_COT_PROMPT) column_selector = CoreColumnSelector(llm=llm, prompt=cs_prompts.CURATED_FEW_SHOT_COT_PROMPT) sql_generator = CoreSqlGenerator(llm=llm, prompt=sg_prompts.CURATED_FEW_SHOT_COT_PROMPT) executor = CoreLinearExecutor.from_excel( filepath="data/my_database.xlsx", dataset_name="my_dataset", project_id="gcp-project-id", core_table_selector=table_selector, core_column_selector=column_selector, core_sql_generator=sql_generator ) result = executor("my_dataset", "What is the average order price?") print(result.generated_query) df = executor.fetch_result(result) print(df) ``` -------------------------------- ### Dataset Management Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Methods for initializing datasets from connection strings and fetching standard benchmark datasets. ```APIDOC ## Dataset.from_connection_strings ### Description Initializes a dataset object from a map of database names to connection strings, with support for entity exclusion and data dictionaries. ### Method POST (Python Class Method) ### Parameters - **name_connstr_map** (dict) - Required - Map of database names to connection strings. - **exclude_entities** (list) - Optional - List of entities to exclude in database.table.column format. - **data_dictionary** (dict) - Optional - Metadata for tables and columns. ### Response - **Dataset** (object) - Returns a Dataset instance containing the specified databases. ``` -------------------------------- ### Initialize CoreLinearExecutor from Connection Strings Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Shows how to create an executor instance by mapping database names to SQLAlchemy connection strings. This allows the framework to interact with diverse database systems like PostgreSQL and BigQuery. ```python from nl2sql.executors.linear_executor.core import CoreLinearExecutor connection_map = { "sales_db": "postgresql://user:pass@localhost:5432/sales", "inventory_db": "bigquery://project-id/inventory_dataset" } executor = CoreLinearExecutor.from_connection_string_map( connection_string_map=connection_map ) result = executor("sales_db", "What are the top 10 products by revenue?") print(result.generated_query) ``` -------------------------------- ### Few-shot Chain-of-Thought Prompting for Complex Queries Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Demonstrates using a few-shot Chain-of-Thought prompt with CoreSqlGenerator for complex SQL queries. It shows how to instantiate the generator and access the intermediate reasoning steps and the generated query. ```python generator3 = CoreSqlGenerator( llm=llm, prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT ) result3 = generator3(db=database, question=question) # Access the reasoning process print(result3.intermediate_steps[0]['parsed_response']['thoughts']) print(result3.intermediate_steps[0]['parsed_response']['query']) ``` -------------------------------- ### CoreEvalFix: Evaluating and Fixing SQL Queries Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Illustrates the CoreEvalFix task for evaluating generated SQL queries and automatically fixing syntax errors using retry logic. It includes fetching a dataset, initializing an LLM, defining a broken query, and then attempting to fix it. ```python from nl2sql.datasets import fetch_dataset from nl2sql.llms.vertexai import text_bison_32k from nl2sql.tasks.eval_fix.core import CoreEvalFix, prompts dataset = fetch_dataset("spider.test") database = dataset.get_database("pets_1") llm = text_bison_32k() # A query with potential issues question = "Find the average weight for each pet type." broken_query = "SELECT pet_type, AVERAGE(weight) FROM pet GROUP BY pet_type" # Create eval/fix task with retry configuration eval_fix = CoreEvalFix( llm=llm, prompt=prompts.CURATED_ZERO_SHOT_PROMPT, num_retries=10 # Maximum fix attempts ) # Attempt to fix the query result = eval_fix(db=database, question=question, query=broken_query) print(f"Original query: {result.original_query}") print(f"Fixed query: {result.modified_query}") # Output: Fixed query: SELECT pet_type, AVG(weight) FROM pets GROUP BY pet_type # View fix attempts in intermediate steps for step in result.intermediate_steps: print(step) ``` -------------------------------- ### Environment Configuration for NL2SQL Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Shows how to configure environment variables for the NL2SQL project, specifically enabling local analytics and logging by setting the NL2SQL_ENABLE_ANALYTICS environment variable to '1'. ```python import os # Enable local analytics/logging os.environ["NL2SQL_ENABLE_ANALYTICS"] = "1" ``` -------------------------------- ### CoreSqlGenerator Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Task for generating executable SQL queries from natural language questions. ```APIDOC ## CoreSqlGenerator ### Description Generates SQL queries using filtered tables and columns, supporting dialect-specific templates. ### Method POST (Task Execution) ### Parameters - **db** (Database) - Required - The database instance. - **question** (str) - Required - The natural language query. ### Response - **result** (object) - Contains 'generated_query' string. ``` -------------------------------- ### Configure NL2SQL Logging and GCP Project Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Sets environment variables to configure logging to a Google Cloud Storage bucket and disable system info for privacy. It also sets the Google Cloud project for BigQuery dataset operations. These settings are crucial for managing logs and specifying the GCP environment for the NL2SQL framework. ```python import os os.environ["NL2SQL_LOG_BUCKET"] = "my-logs-bucket" os.environ["NL2SQL_DISABLE_SYSINFO"] = "1" os.environ["GOOGLE_CLOUD_PROJECT"] = "my-gcp-project" ``` -------------------------------- ### Initialize CoreLinearExecutor for NL2SQL Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Imports and initializes the CoreLinearExecutor from the nl2sql library. This executor is fundamental for processing natural language queries and converting them into SQL. Once initialized, all subsequent executor and task operations will be logged according to the previously set environment variables. ```python from nl2sql.executors.linear_executor.core import CoreLinearExecutor # Now all executor and task operations will be logged ``` -------------------------------- ### Fetch Benchmark Datasets Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Utility function to load standard benchmark datasets like Spider for testing and evaluation purposes. ```python from nl2sql.datasets import fetch_dataset train_dataset = fetch_dataset("spider.train") test_dataset = fetch_dataset("spider.test") database = train_dataset.get_database("culture_company") print(train_dataset.list_databases) ``` -------------------------------- ### Generate SQL Queries Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Uses CoreSqlGenerator to produce SQL queries from natural language questions, utilizing filtered tables and columns with support for various prompt strategies. ```python from nl2sql.datasets import fetch_dataset from nl2sql.llms.vertexai import text_bison_32k from nl2sql.tasks.sql_generation.core import CoreSqlGenerator, prompts dataset = fetch_dataset("spider.test") database = dataset.get_database("pets_1") llm = text_bison_32k() question = "Find the average weight for each pet type." generator1 = CoreSqlGenerator(llm=llm) result1 = generator1(db=database, question=question) generator2 = CoreSqlGenerator(llm=llm, prompt=prompts.CURATED_ZERO_SHOT_PROMPT) result2 = generator2(db=database, question=question) ``` -------------------------------- ### CoreLinearExecutor from Excel Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Initializes the executor from an Excel file, automatically creating a BigQuery dataset schema. ```APIDOC ## POST /executor/from_excel ### Description Creates an executor instance by parsing an Excel file and mapping it to a BigQuery dataset. ### Method POST ### Endpoint /executor/from_excel ### Parameters #### Request Body - **filepath** (string) - Required - Path to the Excel file. - **dataset_name** (string) - Required - Name for the target dataset. - **project_id** (string) - Required - GCP project ID. ### Request Example { "filepath": "data/my_database.xlsx", "dataset_name": "my_dataset", "project_id": "gcp-project-id" } ``` -------------------------------- ### CoreLinearExecutor from Connection String Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Initializes the executor using a mapping of database names to SQLAlchemy connection strings. ```APIDOC ## POST /executor/from_connection_string ### Description Configures the executor to interface with external databases like PostgreSQL, MySQL, or BigQuery via connection strings. ### Method POST ### Endpoint /executor/from_connection_string ### Parameters #### Request Body - **connection_string_map** (object) - Required - Dictionary mapping database names to connection strings. ### Request Example { "connection_string_map": { "sales_db": "postgresql://user:pass@localhost:5432/sales", "inventory_db": "bigquery://project-id/inventory_dataset" } } ``` -------------------------------- ### CoreLinearExecutor Execution Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Orchestrates the complete NL2SQL pipeline including table selection, column selection, and SQL generation. ```APIDOC ## POST /executor/run ### Description Executes the linear pipeline to convert a natural language question into a SQL query using a specified database context. ### Method POST ### Endpoint /executor/run ### Parameters #### Request Body - **db_name** (string) - Required - The identifier for the database connection. - **question** (string) - Required - The natural language query to be converted. ### Request Example { "db_name": "pets_1", "question": "Find the average weight for each pet type." } ### Response #### Success Response (200) - **generated_query** (string) - The resulting SQL query. - **selected_tables** (list) - List of tables identified for the query. - **selected_columns** (list) - List of columns identified for the query. ``` -------------------------------- ### Perform Table Selection Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Uses CoreTableSelector to identify relevant tables from a database based on a natural language question. Supports zero-shot and few-shot Chain-of-Thought prompting. ```python from nl2sql.datasets import fetch_dataset from nl2sql.llms.vertexai import text_bison_latest from nl2sql.tasks.table_selection.core import CoreTableSelector, prompts dataset = fetch_dataset("spider.train") database = dataset.get_database("culture_company") llm = text_bison_latest() question = "What are all company names that have a corresponding movie directed in 1999?" selector1 = CoreTableSelector(llm=llm) result1 = selector1(db=database, question=question) selector2 = CoreTableSelector(llm=llm, prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT) result2 = selector2(db=database, question=question) ``` -------------------------------- ### CoreTableSelector Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Task for identifying relevant tables from a database based on a natural language question. ```APIDOC ## CoreTableSelector ### Description Selects relevant tables from a database using LLM prompting strategies like zero-shot or few-shot Chain-of-Thought. ### Method POST (Task Execution) ### Parameters - **db** (Database) - Required - The database instance. - **question** (str) - Required - The natural language query. - **prompt** (PromptTemplate) - Optional - The prompting strategy to use. ### Response - **result** (object) - Contains 'available_tables' and 'selected_tables'. ``` -------------------------------- ### Perform Column Selection Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Uses CoreColumnSelector to identify relevant columns from pre-selected tables based on a natural language question. ```python from nl2sql.datasets import fetch_dataset from nl2sql.llms.vertexai import text_bison_32k from nl2sql.tasks.column_selection.core import CoreColumnSelector, prompts dataset = fetch_dataset("spider.test") database = dataset.get_database("pets_1") llm = text_bison_32k() question = "Find the average weight for each pet type." selector = CoreColumnSelector(llm=llm) result = selector(db=database, question=question) selector_cot = CoreColumnSelector(llm=llm, prompt=prompts.CURATED_FEW_SHOT_COT_PROMPT) result_cot = selector_cot(db=database, question=question) ``` -------------------------------- ### CoreColumnSelector Source: https://context7.com/googlecloudplatform/nl2sql/llms.txt Task for selecting relevant columns from pre-selected tables based on a natural language question. ```APIDOC ## CoreColumnSelector ### Description Filters columns from a database based on the user's question, typically following table selection. ### Method POST (Task Execution) ### Parameters - **db** (Database) - Required - The database instance. - **question** (str) - Required - The natural language query. ### Response - **result** (object) - Contains 'available_columns' and 'selected_columns'. ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.