### Install SQLGlot with make Source: https://github.com/tobymao/sqlglot/blob/main/README.md Install SQLGlot from a local checkout using the make command. Optionally prefix with UV=1 to use uv for installation. ```bash # Optionally prefix with UV=1 to use uv for the installation make install ``` -------------------------------- ### Install Development Requirements Source: https://github.com/tobymao/sqlglot/blob/main/README.md Install development requirements for SQLGlot using the make command. Optionally prefix with UV=1 to use uv for installation. ```bash # Optionally prefix with UV=1 to use uv for the installation make install-dev ``` -------------------------------- ### Install SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/CLAUDE.md Commands for installing SQLGlot, including development and C extension versions. Supports installation with uv for faster dependency management. ```bash # Basic installation make install ``` ```bash # Development installation (Python only, no C extension) make install-dev ``` ```bash # Development installation with mypyc C extension make install-devc ``` ```bash # Install pre-commit hooks make install-pre-commit ``` ```bash # With uv (faster): UV=1 make install-dev ``` -------------------------------- ### Install SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Install SQLGlot using make commands. Use `install-dev` for Python-only development or `install-devc` with the C extension. `install-pre-commit` sets up hooks. ```bash make install # Development installation (Python only, no C extension) make install-dev # Development installation with mypyc C extension make install-devc # Install pre-commit hooks make install-pre-commit # With uv (faster): UV=1 make install-dev ``` -------------------------------- ### Install SQLGlot with pip Source: https://github.com/tobymao/sqlglot/blob/main/README.md Install the pure Python version of SQLGlot using pip. For C extensions, use `pip3 install "sqlglot[c]"`. ```bash # Pure python version pip3 install sqlglot # C extensions compiled with mypyc # prebuilt wheel if available for your platform, otherwise builds from source pip3 install "sqlglot[c]" ``` -------------------------------- ### Example `token -> Callable` mapping for FUNCTIONS Source: https://github.com/tobymao/sqlglot/blob/main/posts/onboarding.md Demonstrates how SQLGlot maps function names to callable Python functions for AST node creation. This example shows the `FUNCTIONS` dictionary, specifically how `LOG2` and `LOG10` are handled. ```Python3 FUNCTIONS: t.Dict[str, t.Callable] = { "LOG2": lambda args: exp.Log(this=exp.Literal.number(2), expression=seq_get(args, 0)), "LOG10": lambda args: exp.Log(this=exp.Literal.number(10), expression=seq_get(args, 0)), "MOD": build_mod, …, } ``` -------------------------------- ### Example SQL Query Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md This is a sample SQL query that will be used to walk through the steps SQLGlot takes to run it over Python objects. ```sql SELECT bar.a, b + 1 AS b FROM bar JOIN baz ON bar.a = baz.a WHERE bar.a > 1 ``` -------------------------------- ### SQL Predicate Pushdown Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md Illustrates the `pushdown_predicates` optimization rule, which moves filters into the innermost possible subqueries for improved performance. ```sql SELECT * FROM ( SELECT * FROM x AS x ) AS y WHERE y.a = 1; SELECT * FROM ( SELECT * FROM x AS x WHERE y.a = 1 ) AS y WHERE TRUE ``` -------------------------------- ### SQL Table Qualification Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md Demonstrates how `qualify_tables` adds database and catalog qualifiers to tables and forces an alias. It also ensures column ambiguity is resolved and expands star selections. ```sql SELECT * FROM x; SELECT "db"."x" AS "x"; ``` -------------------------------- ### SQL Simplification Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md Illustrates boolean and mathematical simplification rules within SQLGlot's optimizer. This rule transforms expressions into their simplest equivalent forms. ```sql ((NOT FALSE) AND (x = x)) AND (TRUE OR 1 <> 3); x = x; 1 + 1; 2; ``` -------------------------------- ### SQL Query Diff Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/sql_diff.md Demonstrates generating a diff between two SQL queries by parsing them into ASTs and then using the diff function. The output shows the sequence of edits (Remove, Insert, Keep) required to transform the source query into the target query. ```python >>> from sqlglot import parse_one, diff >>> diff(parse_one("SELECT a + b + c, d, e"), parse_one("SELECT a - b + c, e, f")) Remove(Add) Remove(Column(d)) Remove(Identifier(d)) Insert(Sub) Insert(Column(f)) Insert(Identifier(f)) Keep(Select, Select) Keep(Add, Add) Keep(Column(a), Column(a)) Keep(Identifier(a), Identifier(a)) Keep(Column(b), Column(b)) Keep(Identifier(b), Identifier(b)) Keep(Column(c), Column(c)) Keep(Identifier(c), Identifier(c)) Keep(Column(e), Column(e)) Keep(Identifier(e), Identifier(e)) ``` -------------------------------- ### SQL Normalization to DNF and CNF Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md Shows the conversion of SQL predicates into Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF). This is part of the normalization process. ```sql -- DNF (A AND B) OR (B AND C AND D); -- CNF (A OR C) AND (A OR D) AND B; ``` -------------------------------- ### Optimized SQL Diff with Schema Source: https://github.com/tobymao/sqlglot/blob/main/posts/sql_diff.md Compares two SQL queries after optimizing them using a provided schema. This example shows how to parse, optimize, and then diff queries, finally counting the non-Keep edits. ```python >>> schema={\"t\": {\"a\": \"INT\", \"b\": \"INT\", \"c\": \"INT\", \"d\": \"INT\"}} >>> source = """ ... SELECT 1 + 1 + a ... FROM t ... WHERE b = 1 OR (c = 2 AND d = 3) ... """ >>> target = """ ... SELECT 2 + a ... FROM t ... WHERE (b = 1 OR c = 2) AND (b = 1 OR d = 3) ... """ >>> optimized_source = optimize(parse_one(source), schema=schema) >>> optimized_target = optimize(parse_one(target), schema=schema) >>> edit_script = diff(optimized_source, optimized_target) >>> sum(0 if isinstance(e, Keep) else 1 for e in edit_script) 0 ``` -------------------------------- ### Inspect SQL AST Representation Source: https://github.com/tobymao/sqlglot/blob/main/README.md Demonstrates how to get the Abstract Syntax Tree (AST) representation of a parsed SQL query using the `repr` function. ```python from sqlglot import parse_one print(repr(parse_one("SELECT a + 1 AS z"))) ``` ```text Select( expressions=[ Alias( this=Add( this=Column( this=Identifier(this=a, quoted=False)), expression=Literal(this=1, is_string=False)), alias=Identifier(this=z, quoted=False))]) ``` -------------------------------- ### SQL Unnesting Subqueries Example Source: https://github.com/tobymao/sqlglot/blob/main/posts/python_sql_engine.md Demonstrates how `unnest_subqueries` converts subqueries in predicates into equivalent left joins, simplifying the query structure. ```sql -- The subquery can be converted into a left join SELECT * FROM x AS x WHERE ( SELECT y.a AS a FROM y AS y WHERE x.a = y.a ) = 1; SELECT * FROM x AS x LEFT JOIN ( SELECT y.a AS a FROM y AS y WHERE TRUE GROUP BY y.a ) AS "_u_0" ON x.a = "_u_0".a WHERE ("_u_0".a = 1 AND NOT "_u_0".a IS NULL) ``` -------------------------------- ### Transform AST Nodes Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Applies a transformation function to each node in an AST, allowing for modification of the SQL structure. This example replaces columns named 'a' with a function call. ```python from sqlglot import parse_one, exp def transformer(node): if isinstance(node, exp.Column) and node.name == "a": return parse_one("FUN(a)", dialect="dialect") return node transformed = tree.transform(transformer) ``` -------------------------------- ### Transpile SQL to Spark SQL with Identifier Delimitation Source: https://github.com/tobymao/sqlglot/blob/main/README.md Translate a SQL query to Spark SQL, format it, and delimit all identifiers using backticks. This example also demonstrates casting to `FLOAT` which is Spark SQL's equivalent of `REAL`. ```python import sqlglot # Spark SQL requires backticks (`) for delimited identifiers and uses `FLOAT` over `REAL` sql = """WITH baz AS (SELECT a, c FROM foo WHERE a = 1) SELECT f.a, b.b, baz.c, CAST("b"."a" AS REAL) d FROM foo f JOIN bar b ON f.a = b.a LEFT JOIN baz ON f.a = baz.a""" # Translates the query into Spark SQL, formats it, and delimits all of its identifiers print(sqlglot.transpile(sql, write="spark", identify=True, pretty=True)[0]) ``` ```sql WITH `baz` AS ( SELECT `a`, `c` FROM `foo` WHERE `a` = 1 ) SELECT `f`.`a`, `b`.`b`, `baz`.`c`, CAST(`b`.`a` AS FLOAT) AS `d` FROM `foo` AS `f` JOIN `bar` AS `b` ON `f`.`a` = `b`.`a` LEFT JOIN `baz` ON `f`.`a` = `baz`.`a` ``` -------------------------------- ### Serve API Documentation Locally Source: https://github.com/tobymao/sqlglot/blob/main/README.md Use the `make docs-serve` command to build and serve the SQLGlot API documentation locally using pdoc. ```bash make docs-serve ``` -------------------------------- ### Run SQLGlot Tests and Linters Source: https://github.com/tobymao/sqlglot/blob/main/README.md Execute various testing and linting commands using make. Options include style checks, unit tests, integration tests, and compiled tests. ```bash make style # Only linter checks ``` ```bash make unit # Only unit tests (pure Python) ``` ```bash make test # Unit and integration tests (pure Python) ``` ```bash make unitc # Only unit tests (mypyc compiled) ``` ```bash make testc # Unit and integration tests (mypyc compiled) ``` ```bash make check # Full test suite & linter checks ``` ```bash make clean # Remove compiled C artifacts (.so files, build dirs) ``` -------------------------------- ### Benchmark SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Run benchmarks using make commands. `bench` runs the parsing benchmark, and `bench-optimize` runs the optimization benchmark. ```bash # Run parsing benchmark make bench # Run optimization benchmark make bench-optimize ``` -------------------------------- ### Test SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Run tests using make commands. `test` runs all tests, `testc` includes the C extension. `unit` and `unitc` run only unit tests. Specific tests can be run with `python -m unittest`. ```bash # Run all tests (pure Python, hides .so files during run) make test # Run all tests with mypyc C extension (builds extension first) make testc # Run only unit tests (skip integration tests, pure Python) make unit # Run only unit tests with C extension make unitc # Run specific test file python -m unittest tests.test_expressions # Run specific test class python -m unittest tests.test_expressions.TestExpressions # Run specific test method python -m unittest tests.test_expressions.TestExpressions.test_alias ``` -------------------------------- ### Deploy New SQLGlot Version Source: https://github.com/tobymao/sqlglot/blob/main/README.md Follow these steps to deploy a new SQLGlot version: pull the latest changes, tag the new version, and push the tags. ```bash git pull ``` ```bash git tag v28.5.0 ``` ```bash git push && git push --tags ``` -------------------------------- ### Define Custom SQL Dialect in Python Source: https://github.com/tobymao/sqlglot/blob/main/README.md Subclass the Dialect class to define custom SQL tokenizers and generators. This example shows how to set custom quotes, identifiers, keywords, and type mappings. ```python from sqlglot import exp from sqlglot.dialects.dialect import Dialect from sqlglot.generator import Generator from sqlglot.tokens import Tokenizer, TokenType class Custom(Dialect): class Tokenizer(Tokenizer): QUOTES = ["'", '"'] IDENTIFIERS = ["`"] KEYWORDS = { **Tokenizer.KEYWORDS, "INT64": TokenType.BIGINT, "FLOAT64": TokenType.DOUBLE, } class Generator(Generator): TRANSFORMS = {exp.Array: lambda self, e: f"[{self.expressions(e)}]"} TYPE_MAPPING = { exp.DataType.Type.TINYINT: "INT64", exp.DataType.Type.SMALLINT: "INT64", exp.DataType.Type.INT: "INT64", exp.DataType.Type.BIGINT: "INT64", exp.DataType.Type.DECIMAL: "NUMERIC", exp.DataType.Type.FLOAT: "FLOAT64", exp.DataType.Type.DOUBLE: "FLOAT64", exp.DataType.Type.BOOLEAN: "BOOL", exp.DataType.Type.TEXT: "STRING", } print(Dialect["custom"]) ``` -------------------------------- ### Build SQL Expressions Incrementally Source: https://github.com/tobymao/sqlglot/blob/main/README.md Shows how to construct SQL queries by chaining methods to build expressions, including adding conditions. ```python from sqlglot import select, condition where = condition("x=1").and_("y=1") select("*").from_("y").where(where).sql() ``` ```sql 'SELECT * FROM y WHERE x = 1 AND y = 1' ``` -------------------------------- ### Run Optimization Benchmark with SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/README.md Execute the optimization benchmark suite for SQLGlot. This command runs tests focused on the performance of SQLGlot's optimization features. ```bash make bench-optimize ``` -------------------------------- ### Register Custom SQLGlot Dialect Source: https://github.com/tobymao/sqlglot/blob/main/README.md Register a custom dialect for SQLGlot using setuptools entry points in your setup.py file. This allows your custom dialect to be discovered and used by SQLGlot. ```python from setuptools import setup setup( name="mydb-sqlglot-dialect", entry_points={ "sqlglot.dialects": [ "mydb = my_package.dialect:MyDB", ], }, ) ``` -------------------------------- ### Optimize SQL Queries with Schema Source: https://github.com/tobymao/sqlglot/blob/main/README.md Shows how to optimize a SQL query using the `optimize` function, providing a schema for type inference and analysis. ```python import sqlglot from sqlglot.optimizer import optimize print( optimize( sqlglot.parse_one(""" SELECT A OR (B OR (C AND D)) FROM x WHERE Z = date '2021-01-01' + INTERVAL '1' month OR 1 = 0 """), schema={"x": {"A": "INT", "B": "INT", "C": "INT", "D": "INT", "Z": "STRING"}} ).sql(pretty=True) ) ``` ```sql SELECT ( "x"."a" <> 0 OR "x"."b" <> 0 OR "x"."c" <> 0 ) AND ( "x"."a" <> 0 OR "x"."b" <> 0 OR "x"."d" <> 0 ) AS "_col_0" FROM "x" AS "x" WHERE CAST("x"."z" AS DATE) = CAST('2021-02-01' AS DATE) ``` -------------------------------- ### Instantiate AST Nodes Directly Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md Create AST nodes directly by instantiating `exp.Column` or other expression classes. Ensure the arguments are valid to produce a correctly generating SQL AST. ```python col = exp.Column( this=exp.to_identifier("c") ) node.append("expressions", col) ``` -------------------------------- ### Run Parsing Benchmark with SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/README.md Execute the parsing benchmark suite for SQLGlot. This command initiates a series of tests to measure the performance of SQLGlot's parsing capabilities. ```bash make bench ``` -------------------------------- ### Build Initial Matching Set from Candidates Source: https://github.com/tobymao/sqlglot/blob/main/posts/sql_diff.md Builds the initial set of matched leaf nodes by extracting pairs with the highest similarity scores from a candidate heap. ```python matching_set = set() while candidate_matchings: ``` -------------------------------- ### Benchmark SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/CLAUDE.md Commands for running performance benchmarks in SQLGlot, specifically for parsing and optimization. ```bash # Run parsing benchmark make bench ``` ```bash # Run optimization benchmark make bench-optimize ``` -------------------------------- ### Lint and Type Check SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Use make commands for linting and type checking. `style` runs the linter and formatter. `check` performs full checks including tests. ```bash # Run linter and formatter only make style # Run full checks (style + pure Python tests + C extension tests) make check ``` -------------------------------- ### SQLGlot: Use Expression Builders for Building Expressions Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Utilize helper functions like `exp.func()`, `exp.array()`, `exp.and_()`, etc., for building expressions. These provide a cleaner interface and handle parsing automatically compared to direct class instantiation. ```python exp.func("name", *args) ``` ```python exp.array(e1, e2, ...) ``` ```python exp.and_(e1, e2, ...) ``` ```python exp.or_(e1, e2, ...) ``` ```python exp.case().when(cond, val).else_(default) ``` ```python exp.cast(expr, "TYPE") ``` ```python exp.column("col", "table") ``` ```python exp.null() ``` -------------------------------- ### SQLGlot: Use SQL Templates for Complex Expressions Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md For complex SQL generation, use templates with `exp.maybe_parse()` and `exp.replace_placeholders()`. This approach is more maintainable than manually constructing SQL strings. ```python # Define template with :placeholder syntax MY_TEMPLATE: exp.Expression = exp.maybe_parse( "CASE WHEN :arg IS NULL THEN NULL ELSE :result END" ) # In generator method def myfunc_sql(self, expression): result = exp.replace_placeholders( self.MY_TEMPLATE.copy(), arg=expression.this, result=some_expression, ) return self.sql(result) ``` -------------------------------- ### Find actual tables using scope analysis Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md After building the scope, iterate through `scope.selected_sources` to identify actual `exp.Table` instances, distinguishing them from CTEs or subqueries. This provides a reliable method for finding tables. ```python tables = [ source # Traverse the Scope tree, not the AST for scope in root.traverse() # `selected_sources` contains sources that have been selected in this scope, e.g. in a FROM or JOIN clause. # `alias` is the name of this source in this particular scope. # `node` is the AST node instance # if the selected source is a subquery (including common table expressions), # then `source` will be the Scope instance for that subquery. # if the selected source is a table, # then `source` will be a Table instance. for alias, (node, source) in scope.selected_sources.items() if isinstance(source, exp.Table) ] for table in tables: print(table) # y -- Success! ``` -------------------------------- ### SQLGlot: Use Existing Expression Classes Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Prefer using existing, dedicated expression classes like `exp.FromBase64` over `exp.Anonymous` when available. Check `expressions.py` for existing classes before creating anonymous ones. ```python from_base64 = exp.FromBase64(this=input_expr) ``` -------------------------------- ### Test SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/CLAUDE.md Commands for running tests in SQLGlot, including options for pure Python, C extension, unit tests, and specific test files or methods. ```bash # Run all tests (pure Python, hides .so files during run) make test ``` ```bash # Run all tests with mypyc C extension (builds extension first) make testc ``` ```bash # Run only unit tests (skip integration tests, pure Python) make unit ``` ```bash # Run only unit tests with C extension make unitc ``` ```bash # Run specific test file python -m unittest tests.test_expressions ``` ```bash # Run specific test class python -m unittest tests.test_expressions.TestExpressions ``` ```bash # Run specific test method python -m unittest tests.test_expressions.TestExpressions.test_alias ``` -------------------------------- ### Handle Unsupported Transpilation Errors Source: https://github.com/tobymao/sqlglot/blob/main/README.md Demonstrates how to handle unsupported translations by default (warning) or by raising an error using `unsupported_level`. ```python import sqlglot sqlglot.transpile("SELECT APPROX_DISTINCT(a, 0.1) FROM foo", read="presto", write="hive") ``` ```sql APPROX_COUNT_DISTINCT does not support accuracy 'SELECT APPROX_COUNT_DISTINCT(a) FROM foo' ``` ```python import sqlglot sqlglot.transpile("SELECT APPROX_DISTINCT(a, 0.1) FROM foo", read="presto", write="hive", unsupported_level=sqlglot.ErrorLevel.RAISE) ``` ```text sqlglot.errors.UnsupportedError: APPROX_COUNT_DISTINCT does not support accuracy ``` -------------------------------- ### Build AST with High-Level Methods Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md Use high-level builder methods to programmatically construct SQL expressions. String arguments are parsed, so be mindful of the dialect. Pass Expressions directly to avoid parsing. ```python ast = ( exp .select("a", "b") .from_("x") .where("b < 4") .limit(10) ) ``` -------------------------------- ### Initialize MappingSchema in Python Source: https://github.com/tobymao/sqlglot/blob/main/posts/onboarding.md Create a MappingSchema object to represent database schema structure. This is used by other SQLGlot modules like the optimizer and for column-level lineage. ```Python schema = MappingSchema({"t": {"x": "int", "y": "datetime"}}) ``` -------------------------------- ### Lint and Type Check SQLGlot Source: https://github.com/tobymao/sqlglot/blob/main/CLAUDE.md Commands for linting and type checking the SQLGlot codebase. 'make style' runs linter and formatter, while 'make check' performs full checks including tests. ```bash # Run linter and formatter only make style ``` ```bash # Run full checks (style + pure Python tests + C extension tests) make check ``` -------------------------------- ### Transpile SQL between dialects Source: https://github.com/tobymao/sqlglot/blob/main/README.md Specify both read and write dialects when transpiling SQL. The default is the SQLGlot dialect. ```python parse_one(sql, dialect="spark").sql(dialect="duckdb") ``` ```python transpile(sql, read="spark", write="duckdb") ``` -------------------------------- ### Calculate Semantic Difference Between SQL Expressions Source: https://github.com/tobymao/sqlglot/blob/main/README.md Shows how to use the `diff` function to find the semantic differences between two SQL expressions and represent them as a sequence of actions. ```python from sqlglot import diff, parse_one diff(parse_one("SELECT a + b, c, d"), parse_one("SELECT c, a - b, d")) ``` ```text [ Remove(expression=Add( this=Column( this=Identifier(this=a, quoted=False)), expression=Column( this=Identifier(this=b, quoted=False)))), Insert(expression=Sub( this=Column( this=Identifier(this=a, quoted=False)), expression=Column( this=Identifier(this=b, quoted=False)))), Keep( source=Column(this=Identifier(this=a, quoted=False)), target=Column(this=Identifier(this=a, quoted=False))), ... ] ``` -------------------------------- ### Inspect AST Structure with repr Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md Utilize Python's `repr` function to view the hierarchical structure of a SQLGlot AST. ```python repr(ast) # Select( # expressions=[ # Column( # this=Identifier(this=a, quoted=False))], # from=From( # this=Subquery( # this=Select( # expressions=[ # Column( # this=Identifier(this=a, quoted=False))], # from=From( # this=Table( # this=Identifier(this=x, quoted=False)))) # alias=TableAlias( # this=Identifier(this=x, quoted=False))))) # ``` -------------------------------- ### SQL Expression with Dateutil Dependency Source: https://github.com/tobymao/sqlglot/blob/main/README.md This SQL expression demonstrates the use of interval arithmetic. The optimizer requires the `dateutil` module to simplify expressions like this, specifically for month-based intervals. ```sql x + interval '1' month ``` -------------------------------- ### Low-Level AST Manipulation Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md Utilize low-level builder methods like `set` and `append` when high-level methods are unavailable. Prefer these over direct mutation of `Expression.args` to ensure proper reference updates. ```python node = ast.args["from"].this.this # These all do the same thing: # high-level node.select("c", copy=False) # low-level node.set("expressions", node.expressions + [exp.column("c")]) node.append("expressions", exp.column("c")) node.replace(node.copy().select("c")) ``` -------------------------------- ### Execute SQL with Python Objects Source: https://github.com/tobymao/sqlglot/blob/main/README.md Use the `execute` function to run SQL queries against Python dictionaries representing tables. This is useful for unit testing and native execution. ```python from sqlglot.executor import execute tables = { "sushi": [ {"id": 1, "price": 1.0}, {"id": 2, "price": 2.0}, {"id": 3, "price": 3.0}, ], "order_items": [ {"sushi_id": 1, "order_id": 1}, {"sushi_id": 1, "order_id": 1}, {"sushi_id": 2, "order_id": 1}, {"sushi_id": 3, "order_id": 2}, ], "orders": [ {"id": 1, "user_id": 1}, {"id": 2, "user_id": 2}, ], } execute( """ SELECT o.user_id, SUM(s.price) AS price FROM orders o JOIN order_items i ON o.id = i.order_id JOIN sushi s ON i.sushi_id = s.id GROUP BY o.user_id """, tables=tables ) ``` -------------------------------- ### Modify Parsed SQL Trees Source: https://github.com/tobymao/sqlglot/blob/main/README.md Illustrates how to parse a SQL query and then modify its structure, such as changing the FROM clause. ```python from sqlglot import parse_one parse_one("SELECT x FROM y").from_("z").sql() ``` ```sql 'SELECT x FROM z' ``` -------------------------------- ### Transpile DuckDB to Hive Date/Time Functions Source: https://github.com/tobymao/sqlglot/blob/main/README.md Translate date/time functions from DuckDB dialect to Hive dialect. Ensure the input SQL is valid DuckDB syntax. ```python import sqlglot sqlglot.transpile("SELECT EPOCH_MS(1618088028295)", read="duckdb", write="hive")[0] ``` ```sql 'SELECT FROM_UNIXTIME(1618088028295 / POW(10, 3))' ``` -------------------------------- ### Build SQL Programmatically Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Constructs SQL queries programmatically using SQLGlot's expression builders. Useful for dynamically generating complex SQL statements. ```python from sqlglot import select, condition select("*").from_("y").where(condition("x=1").and_("y=1")).sql() ``` -------------------------------- ### Trace column lineage using scope and qualified AST Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md After qualifying the AST, use `find_all_in_scope` with the scope root to identify the source (table or subquery) for each column. This method avoids traversing into subqueries, focusing on direct sources. ```python from sqlglot.optimizer.scope import find_all_in_scope root = build_scope(ast) # `find_all_in_scope` is similar to `Expression.find_all`, except it doesn't traverse into subqueries for column in find_all_in_scope(root.expression, exp.Column): print(f"{column} => {root.sources[column.table]}") # x.a => Scope # x.b => Scope ``` -------------------------------- ### Parse SQL to AST Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Parses a SQL string into an Abstract Syntax Tree (AST) for a specified dialect. Useful for analyzing or transforming SQL. ```python import sqlglot expression = sqlglot.parse_one("SELECT * FROM table", dialect="spark") ``` -------------------------------- ### Qualify Tables and Columns in a Query Source: https://github.com/tobymao/sqlglot/blob/main/posts/onboarding.md Employ the `qualify` function to unambiguously qualify table and column names within a SQL query. Provide a schema to define table structures and ensure proper aliasing and prefixing. ```Python import sqlglot from sqlglot.optimizer.qualify import qualify schema = {"tbl": {"col": "INT"}} expression = sqlglot.parse_one("SELECT col FROM tbl") qualify(expression, schema=schema).sql() # 'SELECT "tbl"."col" AS "col" FROM "tbl" AS "tbl"' ``` -------------------------------- ### Find All Tables in SQL JOIN Statements Source: https://github.com/tobymao/sqlglot/blob/main/README.md Use `find_all(exp.Table)` to locate all table references within a SQL query, including those involved in JOIN operations. This is useful for schema analysis. ```python from sqlglot import parse_one, exp # find all tables (x, y, z) for table in parse_one("SELECT * FROM x JOIN y JOIN z").find_all(exp.Table): print(table.name) ``` -------------------------------- ### Generate Candidate Leaf Matchings Source: https://github.com/tobymao/sqlglot/blob/main/posts/sql_diff.md Generates a list of candidate matchings between leaf nodes of source and target SQL ASTs. Filters based on node type and similarity score (>= 0.6). ```python from heapq import heappush, heappop candidate_matchings = [] source_leaves = _get_leaves(self._source) target_leaves = _get_leaves(self._target) for source_leaf in source_leaves: for target_leaf in target_leaves: if _is_same_type(source_leaf, target_leaf): similarity_score = dice_coefficient( source_leaf, target_leaf ) if similarity_score >= 0.6: heappush( candidate_matchings, ( -similarity_score, len(candidate_matchings), source_leaf, target_leaf, ), ) ``` -------------------------------- ### SQL Generation with Generator Helper Methods Source: https://github.com/tobymao/sqlglot/blob/main/CLAUDE.md For generating function calls within generator methods, use `self.func()` for simplicity and clarity. Avoid constructing `exp.Func` directly. ```python def myfunc_sql(self, expression): # Don't: return self.sql(exp.Func(this="MY_FUNC", expressions=[expression.this])) # Do: return self.func("MY_FUNC", expression.this) ``` -------------------------------- ### SQLGlot: Use TRANSFORMS for Renaming Functions Source: https://github.com/tobymao/sqlglot/blob/main/AGENTS.md Use the `TRANSFORMS` dictionary for simple one-liners like `rename_func("OTHER_NAME")` or when a function needs to be renamed. For any single entry point function, prefer auto-discovered methods. ```python class Generator: TRANSFORMS = { exp.ArrayLength: rename_func("LENGTH"), } ``` -------------------------------- ### Build and traverse scope for semantic analysis Source: https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md Use `build_scope` to create a scope tree from an AST, which provides semantic context for queries. Traverse the scope tree using `.traverse()` to analyze query structure beyond the raw AST. ```python from sqlglot.optimizer.scope import build_scope ast = parse_one('' WITH x AS ( SELECT a FROM y ) SELECT a FROM x '') root = build_scope(ast) for scope in root.traverse(): print(scope) # Scope