### Install from Source Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Install the PostgreSQL HLL extension after building it from source. This command typically requires superuser privileges. ```bash sudo make install ``` -------------------------------- ### Install and Verify PostgreSQL-HLL Extension Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Install the hll extension and verify its installation by calling an empty HLL function. This is the first step before using any HLL functionalities. ```sql CREATE EXTENSION hll; SELECT hll_empty(); ``` -------------------------------- ### Run PostgreSQL HLL Extension Tests Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md After starting the PostgreSQL server, execute the test suite for the HLL extension using 'make installcheck'. This command verifies the functionality of the installed extension. ```bash make installcheck ``` -------------------------------- ### Start PostgreSQL Server Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Start a PostgreSQL server instance using 'pg_ctl'. This command requires the data directory ('-D data'), a log file ('-l logfile'), and a configuration flag ('-c'). Ensure the server is running on the default port. ```bash pg_ctl -D data -l logfile -c start ``` -------------------------------- ### Verify HLL Extension Installation Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md After creating the HLL extension, use this command in psql to list all installed extensions and confirm that 'hll' is present. This also shows its version, schema, and description. ```sql \dx ``` -------------------------------- ### HLL Operator Precedence and Associativity Examples Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/07-operators-and-expressions.md Demonstrates the left-associativity of the '||' operator in HLL operations. Shows examples of adding then unioning, and unioning then adding. ```sql -- Parsed as ((a || b) || c) a || b || c -- Add then union (hll_empty() || hll_hash_integer(1)) || another_hll -- Union then add (sketch1 || sketch2) || hll_hash_integer(100) ``` -------------------------------- ### Install HLL Extension Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/10-reference-index.md Use this SQL command to enable the HLL extension in your PostgreSQL database. ```sql CREATE EXTENSION hll; ``` -------------------------------- ### Initialize and Add Data to HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Demonstrates creating a table with an hll type, initializing an empty HLL, adding hashed integers and text, and querying the cardinality. This is a basic 'Hello World' example for using the extension. ```sql --- Make a dummy table CREATE TABLE helloworld ( id integer, set hll ); --- Insert an empty HLL INSERT INTO helloworld(id, set) VALUES (1, hll_empty()); --- Add a hashed integer to the HLL UPDATE helloworld SET set = hll_add(set, hll_hash_integer(12345)) WHERE id = 1; --- Or add a hashed string to the HLL UPDATE helloworld SET set = hll_add(set, hll_hash_text('hello world')) WHERE id = 1; --- Get the cardinality of the HLL SELECT hll_cardinality(set) FROM helloworld WHERE id = 1; ``` -------------------------------- ### Install PostgreSQL HLL RPM Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Install the built PostgreSQL HLL RPM package. This command installs the extension for use with PostgreSQL. ```bash rpm -Uv rpmbuild/RPMS/x86_64/postgresql11-hll-2.21.x86_64.rpm ``` -------------------------------- ### Basic HLL Usage Example Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/10-reference-index.md Demonstrates creating a table to store HLL data, aggregating distinct user IDs using HLL functions, and querying the estimated cardinality. ```sql -- Create storage CREATE TABLE daily_uniques ( date DATE PRIMARY KEY, users hll ); -- Aggregate distinct user IDs INSERT INTO daily_uniques (date, users) SELECT DATE(event_time), hll_add_agg(hll_hash_integer(user_id)) FROM events GROUP BY DATE(event_time); -- Query cardinality SELECT date, hll_cardinality(users) AS daily_users FROM daily_uniques ORDER BY date DESC LIMIT 30; ``` -------------------------------- ### HLL Add Operator Example Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Demonstrates the use of the `||` operator as a shorthand for the `hll_add` function, showing different operand orders. ```sql hll_add(users, hll_hash_integer(123)) ``` ```sql users || hll_hash_integer(123) ``` ```sql hll_hash_integer(123) || users ``` -------------------------------- ### Example of HLL Hash Function for Binary Data Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_bytea` to add binary data (bytea) to an HLL sketch. Useful for hashing raw data blobs. ```sql SELECT hll_empty() || hll_hash_bytea(data) ``` -------------------------------- ### Cumulative Add Format Example Source: https://github.com/citusdata/postgresql-hll/blob/master/sql/data/README.txt Illustrates the format for cumulative add operations, where new values are added to an existing multiset accumulator. ```text Format: cumulative add ``` -------------------------------- ### Install Debugging Build RPM Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Install the debugging symbols RPM for PostgreSQL HLL. This is useful for debugging purposes. ```bash rpm -Uv rpmbuild/RPMS/x86_64/postgresql11-hll-debuginfo-2.21.x86_64.rpm ``` -------------------------------- ### Cumulative Union Format Example Source: https://github.com/citusdata/postgresql-hll/blob/master/sql/data/README.txt Illustrates the format for cumulative union operations, where multisets are combined with an accumulator. ```text Format: cumulative union ``` -------------------------------- ### Hash Functions Returning hll_hashval Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Shows examples of hash functions that return values of type hll_hashval. ```sql -- Hash functions return hll_hashval SELECT hll_hash_integer(42); -- Returns hll_hashval SELECT hll_hash_text('hello'); -- Returns hll_hashval ``` -------------------------------- ### Example of HLL Hash Function for Any Type Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_any` to add values of any data type to an HLL sketch. This provides flexibility when the exact type is unknown or varies. ```sql SELECT hll_empty() || hll_hash_any(column) ``` -------------------------------- ### Compute Cardinality Bounds During Aggregation Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/11-utility-and-advanced.md Demonstrates how to get floor and ceiling estimates of cardinality during an aggregation process. Requires an accumulator state and uses functions like `hll_floor_card_unpacked` and `hll_ceil_card_unpacked`. ```sql DO $$ DECLARE acc INTERNAL; floor_card INT8; ceil_card INT8; avg_card FLOAT; BEGIN -- Build accumulator (simplified for example) acc := hll_add_trans0(NULL, hll_hash_integer(1)); -- Get bounds floor_card := hll_floor_card_unpacked(acc); ceil_card := hll_ceil_card_unpacked(acc); avg_card := hll_card_unpacked(acc); RAISE NOTICE 'Floor: %, Avg: %, Ceiling: %', floor_card, avg_card, ceil_card; END $$; ``` -------------------------------- ### Getting HLL Expthresh Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Retrieves a 2-tuple of the specified and effective EXPLICIT promotion cutoffs for an HLL object. ```sql SELECT hll_expthresh(hll); ``` -------------------------------- ### Example of HLL Hash Function for Boolean Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_boolean` to add boolean values to an HLL sketch. This is useful for tracking distinct boolean states. ```sql SELECT hll_empty() || hll_hash_boolean(is_active) ``` -------------------------------- ### Nested Union Cardinality Estimates in SQL Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/09-algorithm-and-accuracy.md Demonstrates equivalent cardinality estimates for nested HLL unions using SQL. These examples show different ways to achieve the same result, highlighting the associative property of HLL unions. ```sql SELECT #hll_union(a, hll_union(b, c)); SELECT #hll_union(hll_union(a, b), c); SELECT #hll_union_agg(ARRAY[a, b, c]); ``` -------------------------------- ### Basic HLL Table Declaration Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Example of creating a table with basic HLL columns without specifying type modifiers. ```sql CREATE TABLE user_sketches ( date DATE PRIMARY KEY, all_users hll, us_users hll, eu_users hll ); ``` -------------------------------- ### Example of HLL Hash Function for Smallint Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_smallint` to add small integer values to an HLL sketch. Ensure the column is cast to smallint if necessary. ```sql SELECT hll_empty() || hll_hash_smallint(status::smallint) ``` -------------------------------- ### Example of HLL Hash Function for Text Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_text` to add text values to an HLL sketch. Commonly used for hashing strings like email addresses. ```sql SELECT hll_empty() || hll_hash_text(email) ``` -------------------------------- ### HLL Union Operator Example Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Illustrates the use of the `||` operator for merging two hll structures, equivalent to `hll_union`. ```sql hll_union(male_users, female_users) ``` ```sql male_users || female_users ``` ```sql female_users || male_users ``` -------------------------------- ### HLL Type Conversion Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/MANIFEST.md Shows examples of HLL type conversion functions used for input, output, and serialization/deserialization of HLL states. These are essential for data import, export, and storage. ```SQL SELECT hll_in(text_representation) FROM hll_data; SELECT hll_out(hll_state) FROM hll_table; SELECT hll_recv(binary_data) FROM hll_blobs; SELECT hll_send(hll_state) FROM hll_table; ``` -------------------------------- ### Example of HLL Hash Function for Integer Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_integer` to add integer values to an HLL sketch. This is commonly used for user IDs or other integer identifiers. ```sql SELECT hll_empty() || hll_hash_integer(user_id) ``` -------------------------------- ### Get HLL Explicit Thresholds (Manual) Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Retrieves the manually set explicit representation promotion thresholds for an HLL. The effective threshold matches the specified value for manual settings. ```sql SELECT (hll_expthresh(hll_empty(11, 5, 10))).*; -- Returns: specified=10, effective=10 ``` -------------------------------- ### Custom Intermediate Cardinality Computation Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/11-utility-and-advanced.md Provides an example of computing the cardinality of an intermediate HLL state without finalizing the aggregation. This function is a placeholder and would require access to aggregate context in a real implementation. ```sql -- Compute cardinality of intermediate state without finalizing CREATE FUNCTION approx_cardinality_during_agg() RETURNS TABLE (intermediate_card FLOAT) AS $$ BEGIN -- Pseudo-code; real implementation would need aggregate context RETURN QUERY SELECT hll_card_unpacked( hll_add_trans0(NULL, hll_hash_integer(x)) )::FLOAT FROM generate_series(1, 1000) AS s(x) GROUP BY (x / 100); END; $$ LANGUAGE plpgsql; ``` -------------------------------- ### Create Table with HLL Columns Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Demonstrates creating a table with multiple hll columns, showcasing default parameters and various custom configurations using type modifiers. ```sql -- Create table with HLL columns CREATE TABLE daily_stats ( date DATE, users hll, -- Default parameters products hll(12), -- Custom log2m only sessions hll(13, 6), -- Custom log2m and regwidth events hll(11, 5, -1, 1) -- All parameters specified ); -- Ensure column follows parameters INSERT INTO daily_stats VALUES ('2024-01-01', hll_empty()), ('2024-01-01', hll_empty(12)); ``` -------------------------------- ### Getting HLL Register Width Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Retrieves the register bit-width of an HLL object. ```sql SELECT hll_regwidth(hll); ``` -------------------------------- ### Initialize PostgreSQL Data Directory Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Before running tests, initialize a PostgreSQL data directory using 'initdb'. This command sets up the necessary file structure for a new database cluster. Specify a directory name, e.g., 'data'. ```bash initdb -D data ``` -------------------------------- ### Getting HLL Schema Version Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Retrieves the schema version of an HLL object. ```sql SELECT hll_schema_version(hll); ``` -------------------------------- ### Create Empty HLL with Full Custom Configuration Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/01-core-functions.md Creates an empty HyperLogLog structure with all parameters (log2m, regwidth, expthresh, sparseon) explicitly defined. ```sql SELECT hll_empty(12, 5, -1, 1); ``` -------------------------------- ### Creating hll_hashval via Hashing Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Illustrates the recommended method of creating hll_hashval using hash functions. ```sql -- Recommended: use hash functions SELECT hll_hash_integer(user_id); ``` -------------------------------- ### HyperLogLog Operation Steps Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/09-algorithm-and-accuracy.md Outlines the four main steps involved in processing an input value's hash within the HyperLogLog algorithm. ```text 1. Hash the input value to a 64-bit hash 2. Extract the first log2m bits → register index j 3. Extract remaining (64 - log2m) bits → look for leading zero position ρ 4. Update register j with max(register[j], ρ) ``` -------------------------------- ### Getting HLL Type Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Retrieves the type value of an HLL object, which is specific to its schema version. ```sql SELECT hll_type(hll); ``` -------------------------------- ### Create and Populate Multi-Dimensional Event Summary Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/08-common-patterns.md This snippet sets up a table for multi-dimensional HLL aggregation (e.g., by country and device type) and populates it with aggregated user data from events. It supports detailed segmentation of user activity. ```sql CREATE TABLE event_summary ( date DATE, country VARCHAR(2), device_type VARCHAR(20), all_users hll, new_users hll, returning_users hll, PRIMARY KEY (date, country, device_type) ); INSERT INTO event_summary (date, country, device_type, all_users) SELECT DATE(event_time), country, device_type, hll_add_agg(hll_hash_bigint(user_id)) FROM events GROUP BY DATE(event_time), country, device_type; ``` -------------------------------- ### Import/Export HLL Data with Bytea Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Demonstrates how to insert bytea-formatted HLL data into an hll column and how to serialize an hll column to bytea for export. Use these patterns for data persistence and inter-system communication. ```sql -- Import/export patterns INSERT INTO hll_column VALUES (E'\\x...'::hll); SELECT hll_serialize(hll_column) FROM table; ``` -------------------------------- ### Valid HLL Function Call Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Example of a valid function call with HLL parameters within the allowed ranges. ```sql -- Valid SELECT hll_empty(11, 5, -1, 1); ``` -------------------------------- ### Aggregating HLLs with hll_add_agg Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Example of using `hll_add_agg` to create an HLL from a table or result set, grouping by a specific column. ```sql SELECT date, hll_add_agg(hll_hash_integer(user_id)) FROM facts GROUP BY 1; ``` -------------------------------- ### Create Empty HLL with Default Parameters Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/01-core-functions.md Creates an empty HyperLogLog structure using default settings for log2m and regwidth. ```sql SELECT hll_empty(); ``` -------------------------------- ### Domain Separation with Different Hash Seeds Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/02-hash-functions.md Demonstrates how to use different seeds to create independent HLL sketches for distinct data domains. This allows for separate analysis of data partitioned by a 'domain' column, with Sketch 1 using seed 0 and Sketch 2 using seed 1. ```sql -- Separate sketches for different hash domains SELECT hll_add_agg(hll_hash_text(user_email, 0)) FROM users GROUP BY domain; -- Sketch 1: seed 0 SELECT hll_add_agg(hll_hash_text(user_email, 1)) FROM users GROUP BY domain; -- Sketch 2: seed 1 (independent from seed 0) ``` -------------------------------- ### Build from Source with Default pg_config Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Build the PostgreSQL HLL extension from source using the pg_config found in the system's PATH. This is the simplest build command when pg_config is accessible. ```bash make ``` -------------------------------- ### Get HLL Schema Version Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Retrieve the storage schema version of a HyperLogLog object. Currently, the version is always 1. ```sql SELECT hll_schema_version(hll_empty()); -- Returns: 1 SELECT hll_schema_version(my_hll); ``` -------------------------------- ### HyperLogLog Representation Promotion Sequence Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/09-algorithm-and-accuracy.md Illustrates the automatic promotion sequence between the different representation levels of the HyperLogLog algorithm. This sequence is based on cardinality thresholds and memory usage. ```text EMPTY → EXPLICIT → SPARSE → FULL ``` -------------------------------- ### Create Fact Table Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Defines a sample fact table to store user visit data. This table is intended to be large, necessitating efficient aggregation techniques. ```sql CREATE TABLE facts ( date date, user_id integer, activity_type smallint, referrer varchar(255) ); ``` -------------------------------- ### Valid HLL Type Declaration Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/06-types-and-type-modifiers.md Example of a valid CREATE TABLE statement with HLL type modifiers within the allowed ranges. ```sql -- Valid CREATE TABLE t (x hll(11, 5, -1, 1)); ``` -------------------------------- ### Domain Separation with Hash Seeds Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/09-algorithm-and-accuracy.md Demonstrates how different seeds produce different hash values for the same input, enabling domain separation. Using the same seed ensures interoperability between HLL sketches. ```sql -- Same input, different seeds, different hashes SELECT hll_hash_integer(42, 0); -- Hash A SELECT hll_hash_integer(42, 1); -- Hash B (different) -- Same seed allows interoperable HLLs SELECT hll_hash_integer(user_id, 0); -- Seed 0 ``` -------------------------------- ### Getting HLL Log2m Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Retrieves the log-base-2 of the number of registers for an HLL object. For non-FULL/SPARSE types, it returns the value that would be used if promoted. ```sql SELECT hll_log2m(hll); ``` -------------------------------- ### Get Ceiling Cardinality for Unpacked HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/11-utility-and-advanced.md Returns a conservative integer upper bound of the cardinality for an unpacked HLL accumulator. Useful for capacity planning. ```sql hll_ceil_card_unpacked(internal) ``` -------------------------------- ### Get HLL Explicit Thresholds (Auto) Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Retrieves the auto-tuned explicit representation promotion thresholds for an HLL. The effective threshold is implementation-dependent when auto mode is used. ```sql SELECT (hll_expthresh(hll_empty(11, 5, -1))).*; -- Returns: specified=-1, effective=(implementation-dependent) ``` -------------------------------- ### Build from Source with Alternate Compiler Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Build the PostgreSQL HLL extension from source, specifying an alternate C/C++ compiler like 'gcc'. This is useful if the default compiler (e.g., an older clang) is not desired. ```bash make CC=gcc CXX=gcc ``` -------------------------------- ### Convert Deserialized HLL to Cardinality Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/05-configuration-functions.md Deserializes a bytea representation of an HyperLogLog and then calculates its cardinality. This is useful for quickly getting the count of distinct elements represented by the HLL. ```sql SELECT hll_cardinality(hll_deserialize(sketch)) FROM hll_archive; ``` -------------------------------- ### Execute rpmbuild Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Build the PostgreSQL HLL RPM package using the created tarball. This command initiates the RPM build process. ```bash rpmbuild -tb postgresql${PGSHRT}-hll-${VER}.tar.gz ``` -------------------------------- ### Create hll Table with Custom Parameters Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Use this snippet to create a table with an HLL column, specifying custom parameters for log2m, regwidth, expthresh, and sparseon. ```sql CREATE TABLE t (sketch hll(log2m, regwidth, expthresh, sparseon)); ``` -------------------------------- ### Multi-dimensional Breakdown of Unique Users Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Break down unique user counts by multiple dimensions, such as country and device type. This example aggregates HLL data grouped by these dimensions. ```sql SELECT country, device_type, hll_cardinality(hll_add_agg(hll_hash_integer(user_id))) FROM events GROUP BY country, device_type; ``` -------------------------------- ### Build from Source with Specific pg_config Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Build the PostgreSQL HLL extension from source, specifying the path to pg_config if it's not in the system's PATH. This ensures the extension is built against the correct PostgreSQL version. ```bash PG_CONFIG=/usr/pgsql-9.11/bin/pg_config make ``` -------------------------------- ### Get HLL Internal Representation Type Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Determine the internal representation type of a HyperLogLog object. This helps understand the current algorithm being used for storing distinct values. ```sql -- Monitor representation as data is added SELECT hll_type(hll_empty()); -- 1 (EMPTY) SELECT hll_type(hll_empty() || hll_hash_integer(1)); -- 2 (EXPLICIT) -- Check type of stored HLL SELECT hll_type(users) FROM daily_uniques; -- Track type changes during aggregation SELECT date, hll_type(users) FROM daily_uniques ORDER BY date; ``` -------------------------------- ### Hashing Smallint Values with hll_hash_smallint Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Shows how to hash smallint values using `hll_hash_smallint`. An optional hash seed can be provided. ```sql hll_hash_smallint(4) ``` ```sql hll_hash_smallint(4, 123/*hash seed*/) ``` -------------------------------- ### Create hll Table with Defaults Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Use this snippet to create a table with an HLL column using default parameters. ```sql CREATE TABLE t (sketch hll); ``` -------------------------------- ### Create and Populate Daily Unique User Storage Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/08-common-patterns.md This snippet demonstrates how to create a table to store daily HLL sketches for unique users and populate it by aggregating user IDs from events. It's useful for building historical unique user counts. ```sql CREATE TABLE daily_users ( date DATE PRIMARY KEY, users hll ); INSERT INTO daily_users (date, users) SELECT DATE(event_time), hll_add_agg(hll_hash_integer(user_id)) FROM events GROUP BY DATE(event_time); ``` -------------------------------- ### Memory Calculation for FULL Representation Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/09-algorithm-and-accuracy.md Demonstrates the memory calculation for the FULL representation of HLL sketches based on log2m and regwidth parameters. ```sql Memory = (log2m + regwidth) × 2^log2m bits = (log2m + regwidth) × 2^(log2m - 3) bytes For log2m=11, regwidth=5: Memory = 16 × 256 = 4,096 bytes = 4 KB For log2m=14, regwidth=5: Memory = 19 × 2,048 = 38,912 bytes = 38 KB ``` -------------------------------- ### Example of HLL Hash Function for Bigint Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Demonstrates using `hll_hash_bigint` to add bigint values to an HLL sketch. Suitable for large integer identifiers like transaction IDs. ```sql SELECT hll_empty() || hll_hash_bigint(transaction_id) ``` -------------------------------- ### HLL Aggregate Transition Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/MANIFEST.md Illustrates the use of aggregate transition functions for HLL, such as union and addition, which are crucial for parallel aggregation and building HLL states incrementally. These functions manage the internal state transitions during aggregation. ```SQL SELECT hll_union_trans(hll_state1, hll_state2) FROM hll_tables; SELECT hll_add_trans0(hll_state, value) FROM hll_table; SELECT hll_add_trans1(hll_state, value) FROM hll_table; SELECT hll_add_trans2(hll_state, value) FROM hll_table; SELECT hll_add_trans3(hll_state, value) FROM hll_table; SELECT hll_add_trans4(hll_state, value) FROM hll_table; SELECT hll_union_internal(hll_state1, hll_state2) FROM hll_tables; ``` -------------------------------- ### Configure HLL Sparse to Full Promotion Threshold Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/05-configuration-functions.md Configures the memory threshold for promoting SPARSE to FULL representation of HyperLogLog data. Changes apply only to the current database connection. ```sql SELECT hll_set_max_sparse(-1); ``` ```sql SELECT hll_set_max_sparse(0); ``` ```sql SELECT hll_set_max_sparse(512); ``` ```sql SELECT hll_set_max_sparse(256); -- 2^8 registers in SPARSE ``` -------------------------------- ### SQL Indexing with HLL Equality Operator Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/07-operators-and-expressions.md Shows how to create an index on an HLL column using the equality operator, which supports indexing. ```sql CREATE INDEX ON hll_table (users); -- Uses = operator ``` -------------------------------- ### Get Human-Readable HLL Representation Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/01-core-functions.md Use `hll_print` to obtain a string representation of an HyperLogLog's internal state. This is useful for debugging and understanding the HLL's structure and contents. ```sql -- View the structure of an HLL SELECT hll_print(hll_empty()); ``` ```sql -- Debug an HLL after operations SELECT hll_print(my_hll) FROM user_sketches; ``` -------------------------------- ### Creating an Empty HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Creates an empty HLL object with optional parameters for log2m, regwidth, expthresh, and sparseon. Defaults will be used if parameters are omitted. ```sql SELECT hll_empty([log2m[, regwidth[, expthresh[, sparseon]]]]); ``` -------------------------------- ### Import HLL Sketches from External Systems Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/08-common-patterns.md Load HLL sketches created by other libraries or systems. This snippet demonstrates creating a table to store external sketches and then deserializing the bytea data to calculate cardinality. ```sql CREATE TABLE external_sketches ( source_system VARCHAR(50), sketch_date DATE, sketch_data bytea ); SELECT source_system, sketch_date, hll_cardinality(hll_deserialize(sketch_data)) AS user_count FROM external_sketches WHERE sketch_date >= '2024-01-01'; ``` -------------------------------- ### Configuration and Control Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/MANIFEST.md Allows for setting session-level defaults, controlling output versions, and managing the serialization/deserialization of HyperLogLog structures. ```APIDOC ## Configuration and Control Functions ### Description Allows for setting session-level defaults, controlling output versions, and managing the serialization/deserialization of HyperLogLog structures. ### Functions - `hll_set_defaults(log2m, regwidth, expthresh, sparseon)` — Set session defaults - `hll_set_output_version(integer)` — Set binary format version - `hll_set_max_sparse(integer)` — Configure promotion threshold - `hll_serialize(hll)` — Export to bytea - `hll_deserialize(bytea)` — Import from bytea ### Includes - Connection-level scope documentation - Threshold value interpretation - Performance tuning guidance - Export/import patterns - Parameter validation details ``` -------------------------------- ### Count Distinct Users in the Last 30 Days Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md Efficiently count distinct users within the last 30 days using the cardinality operator '#'. This is a concise way to get a quick estimate. ```sql SELECT #hll_add_agg(hll_hash_integer(user_id)) FROM events WHERE date >= CURRENT_DATE - 30; ``` -------------------------------- ### Send Serialized HLL to External System Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/05-configuration-functions.md Shows how to serialize an HLL and then encode it to a hex string, typically for sending to an external system that expects a string representation of binary data. ```sql SELECT encode(hll_serialize(my_hll), 'hex'); ``` -------------------------------- ### Get Cardinality of HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/07-operators-and-expressions.md Use the # operator to retrieve the distinct count of elements within an HLL. This can be used in SELECT lists, WHERE clauses, ORDER BY clauses, window functions, aggregates, and arithmetic operations. ```sql SELECT #my_hll; ``` ```sql SELECT * FROM daily_stats WHERE #users > 1000; ``` ```sql SELECT date, #users FROM daily_stats ORDER BY #users DESC; ``` ```sql SELECT date, #hll_union_agg(users) OVER ( ORDER BY date ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS week_users FROM daily_uniques; ``` ```sql SELECT AVG(#users) FROM daily_stats; ``` ```sql SELECT #morning_users + #afternoon_users FROM daily_splits; ``` -------------------------------- ### Conditional Cardinality Check in Expressions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/07-operators-and-expressions.md Uses the '#' operator to get the cardinality of an HLL sketch within a CASE statement to categorize data based on count thresholds. This is useful for analyzing traffic levels or other metrics. ```sql -- Count based on threshold SELECT CASE WHEN #users > 10000 THEN 'high' WHEN #users > 1000 THEN 'medium' ELSE 'low' END AS traffic_level FROM daily_stats; ``` -------------------------------- ### Build Debug Version from Source Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Build a debug version of the PostgreSQL HLL extension from source. Set the DEBUG environment variable to 1 before running make. ```bash DEBUG=1 make ``` -------------------------------- ### Get Floor Cardinality of Unpacked HLL State Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/11-utility-and-advanced.md Returns the floor (lower bound) of cardinality for an unpacked HyperLogLog. Useful for conservative cardinality estimates, often used in aggregate functions where both floor and ceiling estimates are needed. ```sql hll_floor_card_unpacked(internal) RETURNS int8 ``` -------------------------------- ### Casting Bytea to HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/REFERENCE.md Demonstrates casting a bytea representation to the hll data type. ```sql SELECT hll_cardinality(E'\xDEADBEEF'); ``` ```sql SELECT hll_cardinality(E'\xDEADBEEF'::hll); ``` -------------------------------- ### Get Log2 of Registers (hll_log2m) Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Use hll_log2m to retrieve the log base 2 of the number of registers in an HLL structure. This value indicates the trade-off between accuracy and memory usage. Higher values mean better accuracy but more memory. ```sql SELECT hll_log2m(hll_empty(11)); -- Returns: 11 SELECT hll_log2m(hll_empty(13)); -- Returns: 13 -- Check all stored HLLs SELECT hll_log2m(users) FROM daily_uniques; ``` -------------------------------- ### Export HLL Sketches to JSON with HEX Encoding Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/08-common-patterns.md Serialize HLL sketches into a JSON format with HEX encoding for use in external systems. This example generates a JSON object containing the date, estimated user count, and the sketch as a hex string. ```sql SELECT json_build_object( 'date', date, 'user_count_estimate', hll_cardinality(users), 'sketch_hex', encode(hll_serialize(users), 'hex') ) AS hll_data FROM daily_users WHERE date >= CURRENT_DATE - INTERVAL '30 days' \gset output_file 'hll_export.jsonl' ``` -------------------------------- ### HLL Aggregate Functions for Different Scenarios Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/03-aggregate-functions.md Demonstrates the use of hll_add_agg with varying parameters to control precision and handle large cardinalities. Also shows hll_union_agg for combining pre-computed HLLs. ```sql -- Default setup hll_add_agg(hashval) -- Higher precision hll_add_agg(hashval, 13) -- Very large cardinalities hll_add_agg(hashval, 15, 6) -- Exact for small, probabilistic for large hll_add_agg(hashval, 11, 5, 16) -- Combine pre-computed HLLs hll_union_agg(hll) ``` -------------------------------- ### Configuration Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/MANIFEST.md Reference for the 5 configuration functions used in postgresql-hll. ```APIDOC ## Configuration Functions ### Description Reference for the configuration functions. ### Function List - hll_set_log2m(log2m) - hll_set_sparse_on(sparse_on) - hll_set_precision(precision) - hll_set_seed(seed) - hll_reset_config() ### Parameters - **log2m** (integer) - Required - The base-2 logarithm of the number of registers. - **sparse_on** (boolean) - Required - Whether to use sparse representation. - **precision** (integer) - Required - The precision of the sketch. - **seed** (integer) - Required - The seed for hashing. ### Return Value - **void** - **void** - **void** - **void** - **void** ``` -------------------------------- ### Get Register Bit Width (hll_regwidth) Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/04-metadata-functions.md Use hll_regwidth to find the number of bits used per register in an HLL structure. This, along with log2m, determines the maximum cardinality that can be accurately estimated. Higher values increase maximum cardinality but use more memory per register. ```sql SELECT hll_regwidth(hll_empty(11, 5)); -- Returns: 5 SELECT hll_regwidth(hll_empty(14, 6)); -- Returns: 6 ``` -------------------------------- ### HLL Add Aggregate with Custom Log2m, Regwidth, and Expthresh Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/03-aggregate-functions.md This function allows for custom configuration of log2m, regwidth, and expthresh for HyperLogLog aggregation. The expthresh parameter controls the promotion threshold from EXPLICIT to SPARSE representation, enabling exact precision for small sets before switching to probabilistic counting. ```sql SELECT hll_add_agg(hll_hash_integer(user_id), 11, 5, 16) FROM events GROUP BY event_type; ``` ```sql SELECT hll_add_agg(hll_hash_text(value), 12, 4, 0) FROM data_stream; ``` -------------------------------- ### Set HLL and PostgreSQL Versions Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Set environment variables for the HLL extension version and the PostgreSQL major version. These are used in subsequent commands. ```bash export VER=2.21 export PGSHRT=11 ``` -------------------------------- ### Create and Populate Daily Uniques Aggregate Table Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Sets up an aggregate table to store daily unique user counts using the hll data type. It hashes user IDs and aggregates them into a single hll value per day. ```sql -- Create the destination table CREATE TABLE daily_uniques ( date date UNIQUE, users hll ); -- Fill it with the aggregated unique statistics INSERT INTO daily_uniques(date, users) SELECT date, hll_add_agg(hll_hash_integer(user_id)) FROM facts GROUP BY 1; ``` -------------------------------- ### Import HLL Data Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/10-reference-index.md Use `hll_deserialize` to import HLL data from binary format. This can be done directly from a bytea type or by decoding a hex string. ```sql SELECT hll_deserialize(bytea_data); SELECT hll_deserialize(decode(hex_string, 'hex')); ``` -------------------------------- ### Hashing Integer Values with hll_hash_integer Source: https://github.com/citusdata/postgresql-hll/blob/master/README.md Illustrates hashing integer values with `hll_hash_integer`. An optional hash seed can be provided. ```sql hll_hash_integer(21474836) ``` ```sql hll_hash_integer(21474836, 123/*hash seed*/) ``` -------------------------------- ### Building HLLs with Aggregate Functions Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/03-aggregate-functions.md Illustrates the efficient use of hll_add_agg within a GROUP BY clause to build HLL structures per group. Contrasts this with the inefficient scalar function approach. ```sql -- Using aggregate: one operation per group SELECT date, hll_cardinality(hll_add_agg(hll_hash_integer(user_id))) FROM events GROUP BY date; -- Not: multiple scalar add operations per row -- (inefficient and requires post-processing) SELECT hll_cardinality(hll_add(hll_add(hll_empty(), ...), ...)); ``` -------------------------------- ### Chain Multiple Additions to HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/01-core-functions.md Illustrates adding multiple distinct hashed values (text and integer) to an empty HyperLogLog structure sequentially using the `||` operator. ```sql SELECT hll_empty() || hll_hash_text('user1') || hll_hash_text('user2') || hll_hash_integer(99); ``` -------------------------------- ### Count Daily Unique Users with HLL Source: https://github.com/citusdata/postgresql-hll/blob/master/_autodocs/README.md This snippet demonstrates how to store and query daily unique users using HLL. It includes creating a table, populating it with aggregated HLL data from raw events, and querying daily and weekly unique counts. ```sql CREATE TABLE daily_stats ( date DATE PRIMARY KEY, users hll ); INSERT INTO daily_stats (date, users) SELECT DATE(event_time), hll_add_agg(hll_hash_integer(user_id)) FROM events GROUP BY DATE(event_time); SELECT date, hll_cardinality(users) AS daily_users FROM daily_stats ORDER BY date DESC LIMIT 30; SELECT DATE_TRUNC('week', date)::DATE AS week, hll_cardinality(hll_union_agg(users)) AS weekly_users FROM daily_stats GROUP BY DATE_TRUNC('week', date) ORDER BY week DESC; ```