### Install dbt-postgres Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Install the dbt-postgres adapter using pip. This command also installs dbt-core and other necessary dependencies. ```bash pip install dbt-postgres ``` -------------------------------- ### Compile and Install Hydra Columnar from Source Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/README.md Clone the Hydra repository, navigate to the columnar directory, configure, and install the extension. This is for advanced users or custom builds. ```bash git clone https://github.com/hydradatabase/hydra.git cd columnar && ./configure && make install ``` -------------------------------- ### Start Hydra Docker Instance Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Fetches the Hydra Docker image and starts a new instance named 'hydra'. It configures the database name, user, and password, and maps port 5432. This command is run in your terminal. ```bash docker run --name hydra -e POSTGRES_DB=db \ -e POSTGRES_USER=user \ -e POSTGRES_PASSWORD=password \ -p 5432:5432 -d ghcr.io/hydradatabase/hydra ``` -------------------------------- ### dbt Postgres Profile Configuration Example Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Example of a dbt profile configuration for a PostgreSQL target in `profiles.yml`. Includes essential connection parameters and optional advanced settings. ```yaml company-name: target: dev outputs: dev: type: postgres host: [hostname] user: [username] password: [password] port: [port] dbname: [database name] schema: [dbt schema] threads: [1 or more] keepalives_idle: 0 # default 0, indicating the system default. See below connect_timeout: 10 # default 10 seconds retries: 1 # default 1 retry on error/timeout when opening connections search_path: [optional, override the default postgres search_path] role: [optional, set the role dbt assumes when executing queries] sslmode: [optional, set the sslmode used to connect to the database] ``` -------------------------------- ### Install Dependencies for Compiling from Source Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/README.md Install necessary dependencies for compiling Hydra Columnar from source on a Linux system. Specify the PostgreSQL version. ```bash POSTGRES_VERSION=16 \ apt-get install lsb-release gcc make libssl-dev autoconf pkg-config \ postgresql-${POSTGRES_VERSION} postgresql-server-dev-${POSTGRES_VERSION} \ libcurl4-gnutls-dev liblz4-dev libzstd-dev ``` -------------------------------- ### IMMV Creation and Usage Example Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Demonstrates creating an IMMV named 'm' from table 't0', showing the initial state, inserting a new row into 't0', and the automatically updated state of 'm'. Note the potential notice about indexing for efficient maintenance. ```sql SELECT create_immv('m', 'SELECT * FROM t0'); -- NOTICE: could not create an index on immv "m" automatically -- DETAIL: This target list does not have all the primary key columns, or this view does not contain DISTINCT clause. -- HINT: Create an index on the immv for efficient incremental maintenance. -- create_immv -- ------------- -- 3 --(1 row) SELECT * FROM m; -- automatically updated -- i -- --- -- 1 -- 2 -- 3 --(3 rows) INSERT INTO t0 VALUES (4); -- INSERT 0 1 SELECT * FROM m; -- automatically updated -- i -- --- -- 1 -- 2 -- 3 -- 4 --(4 rows) ``` -------------------------------- ### Install pgspecial for \copy Command Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Installs pgspecial version less than 2, recommended via conda for compatibility with the \copy command. Use this command in your conda environment. ```bash %conda install "pgspecial<2" -c conda-forge -y --quiet ``` -------------------------------- ### Enable Columnar Extension Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/using-hydra-columnar.md Install the columnar extension on your database. This is a one-time setup required for open-source installations. ```sql CREATE EXTENSION IF NOT EXISTS columnar; ``` -------------------------------- ### Install JupySQL and Dependencies Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Installs necessary Python packages for JupySQL, Pandas, PyArrow, and Psycopg2. Use this command in your Jupyter environment. ```python %pip install jupysql pandas pyarrow psycopg2-binary --quiet ``` -------------------------------- ### Install Hydra Columnar on macOS Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/README.md Use Homebrew and pgxman to install Hydra Columnar on macOS. Ensure pgxman is installed first. ```shell brew install pgxman/tap/pgxman pgxman install hydra_columnar ``` -------------------------------- ### Create a Columnar Materialized View for Sales Summary Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Example of creating a materialized view using the `columnar` storage method to compute monthly sales summaries. ```sql CREATE MATERIALIZED VIEW sales_summary USING columnar AS SELECT date_trunc('month', order_date) AS month, product_id, SUM(quantity) AS total_quantity, SUM(quantity * price) AS total_sales FROM sales GROUP BY date_trunc('month', order_date), product_id; ``` -------------------------------- ### Run Hydra Columnar with Docker Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/README.md Clone the Hydra repository, set up environment variables, and start the Docker Compose service to run Hydra Columnar. Connect using psql. ```bash git clone https://github.com/hydradatabase/hydra && cd hydra cp .env.example .env docker compose up # in another tab psql postgres://postgres:hydra@127.0.0.1:5432 ``` -------------------------------- ### Create Table for CSV Import Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/load/from-local-csv-file.md Before importing, create a table with columns that match the order and data types of your CSV file. This example shows a basic structure. ```sql -- Create a table with the same columns and data types as the CSV file CREATE TABLE my_table ( name text, email text, ... ); ``` -------------------------------- ### Review task execution logs Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Query the `cron.job_run_details` table to inspect the execution history of tasks, ordered by start time. ```sql SELECT * FROM cron.job_run_details ORDER BY start_time DESC LIMIT 10 ``` -------------------------------- ### Configure New Database Default Table Type Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/using-hydra-columnar.md After creating a new database, set its default table access method to columnar. Requires the columnar extension to be installed. ```sql CREATE EXTENSION IF NOT EXISTS columnar; ALTER DATABASE new_database_name SET default_table_access_method = 'columnar'; ``` -------------------------------- ### Track Segment Event Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/segment.md Example of tracking a 'deployments_show' event with properties using the Segment analytics API. ```javascript analytics.track('deployments_show', { deployment_name: 'heroic-rabbitmq-62', deployment_type: 'RabbitMQ' }); ``` -------------------------------- ### Install Hydra Columnar on Linux Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/README.md Install Hydra Columnar on Ubuntu 22.04 Jammy using the pgxman script. This method is the easiest for Linux. ```bash curl -sfL https://install.pgx.sh | sh - pgxman install hydra_columnar ``` -------------------------------- ### Get IMMV Definition Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Use `get_immv_def` to reconstruct the underlying SELECT command for an IMMV. ```sql get_immv_def(immv regclass) RETURNS text ``` -------------------------------- ### Populate SaaS Retention Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/saas-retention-tracking-and-churn-with-powerbi.md SQL script to populate the user_activity table with sample data. ```sql INSERT INTO user_activity (record_date, user_id, monthly_retention, monthly_churn) SELECT ("date")::date AS record_date, user_id, retention AS monthly_retention, churn AS monthly_churn FROM ( SELECT date_trunc('month', generate_series(date '2015-01-01', date '2022-12-01', interval '1 month')) AS "date", (random() * 10000)::int AS user_id, CASE WHEN random() > 0.1 THEN round(random(), 4) ELSE 0 END AS retention, CASE WHEN random() < 0.1 THEN round(random(), 4) ELSE 0 END AS churn FROM generate_series(1, 96) ) AS generated_data; ``` -------------------------------- ### Set up Postgres External Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/from-a-database-dbms.md Use these SQL commands to set up a foreign data wrapper for a remote PostgreSQL database. Replace placeholders with your server details. ```sql CREATE EXTENSION postgres_fdw; CREATE SERVER remote_pg_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '...', port '...', dbname '...'); CREATE USER MAPPING FOR CURRENT_USER SERVER remote_pg_server OPTIONS (user '...', password '...'); ``` -------------------------------- ### Set up MySQL External Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/from-a-database-dbms.md Use these SQL commands to set up a foreign data wrapper for a remote MySQL database. Replace placeholders with your server details. ```sql CREATE EXTENSION mysql_fdw; CREATE SERVER remote_mysql_server FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host '...', port '...'); CREATE USER MAPPING FOR CURRENT_USER SERVER remote_mysql_server OPTIONS (username '...', password '...'); ``` -------------------------------- ### Create Incremental Materialized View (IMMV) Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Use `create_immv` to define and populate a new IMMV. Triggers and unique indexes are automatically created if possible. ```sql create_immv(immv_name text, view_definition text) RETURNS bigint ``` -------------------------------- ### Create SaaS Retention Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/saas-retention-tracking-and-churn-with-powerbi.md SQL script to create the user_activity table for SaaS retention tracking. ```sql CREATE TABLE user_activity ( record_date DATE PRIMARY KEY, user_id INT, monthly_retention DECIMAL(5, 4), monthly_churn DECIMAL(5, 4) ); ``` -------------------------------- ### Get Hydra Container ID Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Filters and retrieves the quiet ID of the Hydra container. This ID is used for subsequent stop and remove commands. ```python %%capture out ! docker container ls --filter ancestor=ghcr.io/hydradatabase/hydra --quiet ``` -------------------------------- ### Connect with sslmode=verify-full and sslrootcert Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/tls.md Configure `psql` to connect to Hydra with full certificate verification by specifying the root certificate bundle in the connection string. ```shell psql "postgres://.../d123456?sslmode=verify-full&sslrootcert=/etc/ssl/cert.pem" ``` -------------------------------- ### Get a task's job ID Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Query the `cron.job` table using the task name to retrieve its unique job ID, which is needed for modifications. ```sql SELECT cron.jobid FROM cron.job WHERE jobname = 'refresh-abc'; ``` -------------------------------- ### Create a Database Backup with pg_dump Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/load/from-hydra-open-source.md Use pg_dump to create a compressed backup file of your PostgreSQL database. Ensure you have the correct hostname, username, password, and database name. ```shell pg_dump -Fc --no-acl --no-owner \ -h your.db.hostname \ -U username \ databasename > mydb.dump ``` -------------------------------- ### Connect to Hydra using Service File Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/tls.md Connect to your Hydra data warehouse using `psql` by referencing the service entry defined in your `~/.pg_service.conf` file. ```shell psql service=hydra ``` -------------------------------- ### Connect to PostgreSQL Database Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Establishes a connection to a PostgreSQL database using JupySQL. Ensure your connection string is correct. ```python %sql postgresql://user:password@localhost/db ``` -------------------------------- ### Generated Index Creation DDL Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Example of the 'create index' DDL statements generated by dbt for configured indexes. These are executed within the same transaction as the main table creation. ```sql create index if not exists "3695050e025a7173586579da5b27d275" on "my_target_database"."my_target_schema"."indexed_model" using hash (column_a); create unique index if not exists "1bf5f4a6b48d2fd1a9b0470f754c1b0d" on "my_target_database"."my_target_schema"."indexed_model" (column_a, column_b); ``` -------------------------------- ### psql \COPY Command Syntax Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/load/from-local-csv-file.md Reference for the psql \COPY command syntax, detailing its parameters and options for data import. ```sql \COPY { table_name | ( query ) } [ ( column [, ...] ) ] FROM { 'filename' | STDIN } [ [ WITH ] ( option [, ...] ) ] ``` -------------------------------- ### Create Row-Based Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/faqs.md Use `USING heap` to create a traditional row-based table instead of the default columnar format. ```sql CREATE TABLE new_table (...) USING heap; ``` -------------------------------- ### Create Google Sheets Foreign Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/load/from-google-sheets.md Use this SQL code to create a foreign table in Hydra that connects to a Google Spreadsheet. Ensure you have the multicorn extension installed and a Google Service Account configured. ```sql CREATE EXTENSION multicorn; CREATE SERVER multicorn_gspreadsheet FOREIGN DATA WRAPPER multicorn OPTIONS ( wrapper 'gspreadsheet_fdw.GspreadsheetFdw' ); CREATE FOREIGN TABLE test_spreadsheet ( id character varying, name character varying ) server multicorn_gspreadsheet options( gskey '...', serviceaccount '...' ); ``` -------------------------------- ### Modified DDL for Hydra Compatibility Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/migrate-from-redshift.md Example of a Redshift table DDL after modifications for Hydra. Comments indicate original Redshift-specific clauses like ENCODE, DISTSTYLE, DISTKEY, SORTKEY, and ownership ALTER statements that are not compatible with Hydra. ```sql --DROP TABLE tickit.users; CREATE TABLE IF NOT EXISTS tickit.users ( userid INTEGER NOT NULL -- ENCODE RAW ,username CHAR(8) -- ENCODE lzo ,firstname VARCHAR(30) -- ENCODE lzo ,lastname VARCHAR(30) -- ENCODE lzo ,city VARCHAR(30) -- ENCODE lzo ,state CHAR(2) -- ENCODE lzo ,email VARCHAR(100) -- ENCODE lzo ,phone CHAR(14) -- ENCODE lzo ,likesports BOOLEAN -- ENCODE RAW ,liketheatre BOOLEAN -- ENCODE RAW ,likeconcerts BOOLEAN -- ENCODE RAW ,likejazz BOOLEAN -- ENCODE RAW ,likeclassical BOOLEAN -- ENCODE RAW ,likeopera BOOLEAN -- ENCODE RAW ,likerock BOOLEAN -- ENCODE RAW ,likevegas BOOLEAN -- ENCODE RAW ,likebroadway BOOLEAN -- ENCODE RAW ,likemusicals BOOLEAN -- ENCODE RAW ) -- DISTSTYLE KEY -- DISTKEY (userid) -- SORTKEY ( -- userid -- ) ; -- ALTER TABLE tickit.users owner to "admin"; -- Permissions -- GRANT RULE, SELECT, REFERENCES, TRIGGER ON TABLE tickit.users TO "admin"; -- GRANT SELECT ON TABLE tickit.users TO public; ``` -------------------------------- ### Standard Materialized View Creation Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md This is the standard SQL command to create a regular materialized view, shown for comparison with IMMV creation. ```sql CREATE MATERIALIZED VIEW sales_test AS SELECT * FROM sales; ``` -------------------------------- ### Load GitHub Events SQL File Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/getting-started/loading-sample-data.md Use this command to load the GitHub Events dataset from a SQL file using psql. ```bash psql ... -f path_to_github_events.sql ``` -------------------------------- ### Create SQLAlchemy Engine and Load Data with Pandas Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/predictive-analytics-with-deepnote.md Create a database engine using SQLAlchemy with provided credentials and load the 'Salary' table from the 'public' schema into a pandas DataFrame. This is necessary for performing predictions using Python. ```python import pandas as pd from sqlalchemy import create_engine engine = create_engine('postgresql://user:pass@server:5432/database') data = pd.read_sql_table('Salary',con= engine, schema= 'public') ``` -------------------------------- ### Review all scheduled tasks Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Query the `cron.job` table to view all currently configured tasks and their parameters. ```sql SELECT * FROM cron.job; ``` -------------------------------- ### Create and Seed Sample Data Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/with-metabase.md Creates a 'sample_data' table with random integer samples and timestamps in Hydra using psql. Ensure your Hydra connection string is set or the PGCONN environment variable is configured. ```shell psql "$PGCONN" \ -c "CREATE TABLE sample_data (id uuid NOT NULL PRIMARY KEY DEFAULT gen_random_uuid(), sample integer, timestamp timestamptz) USING columnar;" \ -c "INSERT INTO sample_data (sample, timestamp) SELECT floor((random() + random() + random() + random() + random() + random()) / 6 * 100000000)::int, to_timestamp(EXTRACT(epoch from NOW()) - floor(random() * 2600000)) FROM generate_series(1, 50000);" ``` -------------------------------- ### Configure psql Service File for TLS Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/tls.md Set up a service entry in `~/.pg_service.conf` to manage Hydra connection parameters, including SSL mode and root certificate path. ```ini [hydra] user=u123456 password=UdW3zJT9FLfIpJrli47HMmL1 host=1ed22fba-b20a-6680-afd9-91fc7c62485e.us-east-1.aws.hydradb.io port=5432 dbname=d123456 sslmode=verify-full sslrootcert=/etc/ssl/cert.pem ``` -------------------------------- ### Import CSV Data using psql \COPY Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/load/from-local-csv-file.md Use the \COPY command to load data from a local CSV file into your table. Ensure the path, delimiter, and CSV/HEADER options are correctly set. ```sql -- Import the data from the CSV file into the table \COPY my_table FROM 'path/to/file.csv' DELIMITER ',' CSV HEADER; ``` -------------------------------- ### Analyze a Simple Query in PostgreSQL Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/query-optimization.md Use EXPLAIN ANALYZE to view the execution plan and actual performance of a query. This is useful for identifying bottlenecks in simple SELECT statements. ```sql EXPLAIN ANALYZE SELECT * FROM customer; ``` -------------------------------- ### Full Schema Vacuuming with Custom Schema Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/updates-and-deletes.md Execute a full schema vacuum on a specified schema. ```sql SELECT columnar.vacuum_full('public'); ``` -------------------------------- ### Load GitHub Events SQL File in psql Console Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/getting-started/loading-sample-data.md Execute this command within the psql console to load the GitHub Events dataset from a SQL file. ```sql \i path_to_github_events.sql ``` -------------------------------- ### Set PGSSLROOTCERT Environment Variable Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/tls.md Configure applications using `libpq` to validate certificates by setting the `PGSSLROOTCERT` environment variable to the path of your root certificate bundle. ```shell PGSSLROOTCERT=/etc/ssl/cert.pem ``` -------------------------------- ### Create Base Backup for PITR Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/disaster-recovery-and-point-in-time-recovery-pitr.md Use `pg_basebackup` to create a base backup of your PostgreSQL database, including WAL files, which is essential for PITR. ```bash pg_basebackup -h your_host -p your_port -U your_user -D /path/to/your/backup --wal-method=stream ``` -------------------------------- ### Run Summary Statistics Query Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/with-metabase.md Executes a SQL query in Metabase to calculate minimum, maximum, count, average, and standard deviation for the 'sample' column in the 'sample_data' table. ```sql SELECT MIN(sample), MAX(sample), COUNT(sample), AVG(sample), STDDEV(sample) FROM sample_data; ``` -------------------------------- ### Check Psycopg2 Version Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Imports the psycopg2 library and prints its version. This is the PostgreSQL adapter for Python, crucial for database connections. ```python import psycopg2; psycopg2.__version__ ``` -------------------------------- ### Query Salary Table with SQL Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/predictive-analytics-with-deepnote.md Run a SQL query to retrieve the first 100 rows from the public.salary table. This is useful for verifying the data connection and structure. ```sql SELECT * FROM public.salary LIMIT 100 ``` -------------------------------- ### Create a Materialized View Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Defines the syntax for creating a materialized view. Use `USING columnar` for large views optimized for analytics. ```sql CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] view_name [ (column_name [, ...] ) ] [ USING method ] AS query [ WITH [ NO ] DATA ] ``` -------------------------------- ### Full Schema Vacuuming with Custom Stripe Count Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/updates-and-deletes.md Initiate a full schema vacuum, specifying the number of stripes to vacuum per table. ```sql SELECT columnar.vacuum_full(stripe_count => 1000); ``` -------------------------------- ### Create Partitioned Table with Mixed Storage Types Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/using-hydra-columnar.md Define a partitioned table where individual partitions can be either row-based (heap) or columnar. ```sql CREATE TABLE parent(ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); -- columnar partition CREATE TABLE p0 PARTITION OF parent FOR VALUES FROM ('2020-01-01') TO ('2020-02-01') USING columnar; -- columnar partition CREATE TABLE p1 PARTITION OF parent FOR VALUES FROM ('2020-02-01') TO ('2020-03-01') USING columnar; -- row partition CREATE TABLE p2 PARTITION OF parent FOR VALUES FROM ('2020-03-01') TO ('2020-04-01') USING heap; INSERT INTO parent VALUES ('2020-01-15', 10, 100, 'one thousand'); -- columnar INSERT INTO parent VALUES ('2020-02-15', 20, 200, 'two thousand'); -- columnar INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row ``` -------------------------------- ### Explicitly Create Row or Columnar Tables Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/using-hydra-columnar.md Specify the storage format ('heap' for row-based, 'columnar') when creating a table using the USING keyword. ```sql CREATE TABLE heap_table (...) USING heap; ``` ```sql CREATE TABLE columnar_table (...) USING columnar; ``` -------------------------------- ### Create Incrementally Updated Materialized View (IMMV) Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Creates an incrementally updated materialized view using the `create_immv` function. This allows for automatic content updates. ```sql test=# SELECT create_immv('immv', 'SELECT a.aid, b.bid, a.abalance, b.bbalance FROM pgbench_accounts a JOIN pgbench_branches b USING(bid)'); NOTICE: created index "immv_index" on immv "immv" create_immv ------------- 10000000 (1 row) ``` -------------------------------- ### Connect to PostgreSQL and switch to 'postgres' database Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Connect to your Hydra instance using psql and then switch to the 'postgres' database to administer scheduler tasks. ```bash $ psql service=hydra d123456=> \c postgres psql (13.7, server 13.6 (Ubuntu 13.6-1.pgdg18.04+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) You are now connected to database "postgres" as user "u123456". postgres=> ``` -------------------------------- ### Check JupySQL Version Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Imports the JupySQL library and prints its version. Useful for ensuring compatibility and reproducibility. ```python # jupysql import sql; sql.__version__ ``` -------------------------------- ### Prepare Data and Train Linear Regression Model Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/predictive-analytics-with-deepnote.md Define independent (X) and dependent (Y) variables from the dataset, then train a Linear Regression model. This prepares the model to learn patterns for prediction. ```python #defining variables X and Y X = data.iloc[:,0].values.reshape(-1,1) print(X) Y = data.iloc[:,1].values.reshape(-1,1) print(Y) #creating a Linear Regression model for prediction model = LinearRegression() #Fitting the model on the Dataset model.fit(X,Y) Y_pred = model.predict(X) # make predictions ``` -------------------------------- ### Configure Columnar Storage for IMMV Creation Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Sets the default table access method to 'columnar' and limits parallel workers to 1 for creating an IMMV on a columnar source table. These settings are temporary and specific to the current session. ```sql SET default_table_access_method = 'columnar'; SET max_parallel_workers = 1; SELECT create_immv('sales_test', 'SELECT * FROM sales'); ``` -------------------------------- ### Create a new scheduled task Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Use the `cron.schedule_in_database` function to create a new task with a specified name, schedule, target database, and SQL command. ```sql SELECT cron.schedule_in_database( 'refresh-abc', -- task name, '30 * * * *', -- schedule 'd123456', -- database $$ REFRESH MATERIALIZED VIEW CONCURRENTLY abc $$ -- command ); ``` -------------------------------- ### Enable pg_ivm Extension Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md This SQL command enables the pg_ivm extension on your database. It should be run once before using any pg_ivm functionalities. ```sql CREATE EXTENSION IF NOT EXISTS pg_ivm; ``` -------------------------------- ### Create User Mapping for S3 Access Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/migrate-from-redshift.md Establishes a user mapping for the current user to access the S3 foreign server, using provided AWS access keys. ```sql CREATE USER MAPPING FOR CURRENT_USER SERVER parquet_s3_srv OPTIONS (user '/ACCESSKEY/', password '/SECRETKEY/'); ``` -------------------------------- ### Configure Indexes for Multiple Resources Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Configure indexes for multiple dbt resources at once using project-level or subdirectory configurations in your dbt project. ```yaml models: project_name: subdirectory: +indexes: - columns: ['column_a'] type: hash ``` -------------------------------- ### Check PgSpecial Version Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Imports the pgspecial library and prints its version. This library provides special SQL commands for PostgreSQL. ```python import pgspecial; pgspecial.__version__ ``` -------------------------------- ### Load JupySQL Extension Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Loads the JupySQL extension to enable SQL execution within Jupyter notebooks. ```python %load_ext sql ``` -------------------------------- ### Convert Table to Columnar Access Method Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/database-health-metrics.md Use this function to convert an existing heap table to columnar storage. Ensure the table exists before execution. ```sql CREATE TABLE my_table (i INT8) USING heap; -- convert to columnar SELECT columnar.alter_table_set_access_method('my_table', 'columnar'); ``` -------------------------------- ### Locate Root Certificate Bundle Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/tls.md Use `curl` to find the path to your system's CA certificate bundle, which is needed for certificate validation. ```shell $ curl -v https://hydras.io/ 2>&1 | grep -i CAfile * CAfile: /etc/ssl/cert.pem ``` -------------------------------- ### Check PyArrow Version Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Imports the pyarrow library and prints its version. Important for efficient data handling and interoperability. ```python import pyarrow; pyarrow.__version__ ``` -------------------------------- ### Count Specific Event Occurrences Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/segment.md Query to count the number of times a specific event, 'deployments_show', occurred. Assumes data is loaded into the 'production' schema. ```sql SELECT COUNT(id) -- Don't forget the schema: FROM . FROM production.deployments_show WHERE deployment_name = 'heroic-rabbitmq-62'; ``` -------------------------------- ### Import Libraries for Linear Regression Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/predictive-analytics-with-deepnote.md Import necessary libraries for data visualization and linear regression modeling. ```python #import Classes and lib needed import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression ``` -------------------------------- ### Columnar Aggregate Query (COUNT) Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/faqs.md Demonstrates a simple COUNT aggregation on a columnar table, highlighting performance improvements. ```sql SELECT COUNT(*) FROM hits; ``` -------------------------------- ### Grant Connect Permission to Database Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/segment.md Use this SQL statement to grant a user permission to connect to a specific database. Replace `` with the actual username. ```sql GRANT CONNECT ON DATABASE TO ``` -------------------------------- ### Create Unified Events View Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/cloud-warehouse-operations/using-hydra-scheduler.md Create a view that combines data from both the columnar and row-based event tables. This allows querying all events as a single dataset. ```sql CREATE VIEW events_all AS SELECT * FROM events_columnar UNION SELECT * FROM events_row; ``` -------------------------------- ### Create an Incrementally Maintainable Materialized View (IMMV) Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md This SQL function creates an IMMV named 'sales_test' based on the query 'SELECT * FROM sales'. Triggers are automatically set up for immediate updates upon base table modifications. ```sql SELECT create_immv('sales_test', 'SELECT * FROM sales'); ``` -------------------------------- ### Update and Refresh Standard Materialized View Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/materialized-views.md Demonstrates updating a base table and then refreshing a standard materialized view. Note the significant time taken for the refresh operation. ```sql test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; UPDATE 1 Time: 9.052 ms test=# REFRESH MATERIALIZED VIEW mv_normal ; REFRESH MATERIALIZED VIEW Time: 20575.721 ms (00:20.576) ``` -------------------------------- ### Visualize Actual vs. Predicted Salaries Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/predictive-analytics-with-deepnote.md Reshape predictions, create a DataFrame comparing real and predicted salaries, and plot the first 10 data points as a bar graph to visualize the model's performance. ```python Y_pred = Y_pred.reshape(-1) difference = pd.DataFrame({'Real' : data. salary, 'Predicted': Y_pred, }) print(difference) graph = difference.head(10) graph.plot(kind='bar') plt.title('Actual vs Predicted') plt.ylabel('salary') ``` -------------------------------- ### Calculate Revenue in Sigma Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/key-performance-indicators-with-sigma.md Use this formula to calculate total revenue based on unit price and quantity. This is a foundational step for many financial KPIs. ```sql Revenue: [Unit Price] * [Quantity] ``` -------------------------------- ### Create Parquet S3 Foreign Data Wrapper Extension Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/guides/migrate-from-redshift.md Enables access to S3 data stored in Parquet format using the parquet_s3_fdw extension. ```sql CREATE EXTENSION parquet_s3_fdw; ``` -------------------------------- ### Configure Unlogged Table Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Use the 'unlogged=True' config to create tables that are not written to the write-ahead log, offering potential performance gains at the cost of safety. ```sql {{ config(materialized='table', unlogged=True) }} select ... ``` ```yaml models: +unlogged: true ``` -------------------------------- ### Force Index Scan with pg_hint_plan Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/concepts/optimizing-query-performance.md Use pg_hint_plan to force a specific query plan, such as an IndexScan on the 'hits' table. This is useful when the default query planner does not select the optimal plan. ```sql EXPLAIN ANALYZE /*+ IndexScan(hits) */ SELECT url, COUNT(*) FROM hits; ``` -------------------------------- ### Check Pandas Version Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/analyze/jupyter.md Imports the pandas library and prints its version. Essential for data manipulation tasks and reproducibility. ```python import pandas; pandas.__version__ ``` -------------------------------- ### Grant Create Schema Permission on Database Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/centralize-data/stream/segment.md Use this SQL statement to grant a user permission to create schemas within a database. Replace `` and `` with the appropriate names. ```sql GRANT CREATE ON DATABASE TO ``` -------------------------------- ### Configure Table Indexes Source: https://github.com/hydradatabase/columnar-docs/blob/main/columnar/organize/data-hygiene/dbt-transformation-and-model-management.md Define indexes for tables to optimize queries involving joins or where clauses. Specify columns, uniqueness, and index type. ```sql {{ config( materialized = 'table', indexes=[ {'columns': ['column_a'], 'type': 'hash'}, {'columns': ['column_a', 'column_b'], 'unique': True}, ] )}} select ... ```