### Install Dependencies and Run v2 Dev Server Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/README.md Sets up the v2 Node.js/React development environment and starts the development server. ```bash cd benchmarks-website npm install npm run dev ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/vortex-data/vortex/blob/develop/vortex-web/README.md Run this command to install all necessary Node.js dependencies for the project. ```bash npm install ``` -------------------------------- ### Install Development Prerequisites (macOS) Source: https://github.com/vortex-data/vortex/blob/develop/README.md Setup recommended dependencies for Vortex development on macOS, including Rust toolchain and Git submodules. ```bash # Optional but recommended dependencies brew install flatbuffers protobuf # For .fbs and .proto files brew install duckdb # For benchmarks # Install Rust toolchain curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # or brew install rustup # Initialize submodules git submodule update --init --recursive # Setup dependencies with uv uv sync --all-packages ``` -------------------------------- ### Start Storybook Development Server Source: https://github.com/vortex-data/vortex/blob/develop/vortex-web/README.md Launches the Storybook development server for isolated UI component development and previewing. This does not require Rust or WASM setup. ```bash npm run storybook ``` -------------------------------- ### Run Vortex C Examples Source: https://github.com/vortex-data/vortex/blob/develop/vortex-ffi/README.md Commands to build and run the C examples provided with Vortex. Ensure the build directory is configured with `BUILD_EXAMPLES=1`. ```sh cmake -Bbuild -DBUILD_EXAMPLES=1 cmake --build build ./build/examples/dtype ./build/examples/scan ./build/examples/scan_to_arrow ./build/examples/write_sample ``` -------------------------------- ### Install bench-orchestrator CLI Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Installs the bench-orchestrator CLI tool using uv. This command makes the `vx-bench` command available. ```bash uv tool install "bench_orchestrator @ ./bench-orchestrator/" ``` -------------------------------- ### Python Example: Read and Filter Vortex Files Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/duckdb.md Demonstrates how to use the DuckDB Python client to install, load, and query Vortex files, including filtering. ```python import duckdb duckdb.sql("INSTALL vortex") duckdb.sql("LOAD vortex") result = duckdb.sql("SELECT * FROM read_vortex('data.vortex') WHERE age > 30") result.show() ``` -------------------------------- ### Build and Run C++ Example with CMake Source: https://github.com/vortex-data/vortex/blob/develop/vortex-cxx/examples/README.md Use these commands to build and execute the C++ example project with CMake. Ensure you are in the project's root directory. ```bash mkdir -p build cd build cmake .. make -j$(nproc) ./hello-vortex ``` -------------------------------- ### Install Vortex TUI Source: https://github.com/vortex-data/vortex/blob/develop/README.md Install the `vx` command-line tool for browsing Vortex files. The pre-built binary is recommended for speed. ```bash # Install pre-built binary (fast, recommended) cargo binstall vortex-tui # Or build from source cargo install vortex-tui --locked # Or run via Python without installing uvx --from vortex-data vx --help # Usage vx browse ``` -------------------------------- ### Cross-engine comparison example Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md An example command to run benchmarks across different query engines on the same dataset, facilitating direct performance comparisons. ```bash # Run all engines on the same data ``` -------------------------------- ### Start Development Server (Full App) Source: https://github.com/vortex-data/vortex/blob/develop/vortex-web/README.md Starts the Vite development server. This command first builds the WebAssembly (WASM) module in debug mode before launching the server. ```bash # Start dev server (builds WASM in debug mode, then starts Vite) npm run dev ``` -------------------------------- ### Verify vx CLI Installation Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/install.md Confirms that the vx CLI has been installed correctly by checking its help command. ```bash vx --help ``` -------------------------------- ### Install Doxygen Source: https://github.com/vortex-data/vortex/blob/develop/docs/README.md Install the Doxygen tool, which is required for building the Vortex documentation. This command uses Homebrew for installation. ```bash brew install doxygen ``` -------------------------------- ### Install Cargo Nextest Source: https://github.com/vortex-data/vortex/blob/develop/CLAUDE.md Install the cargo-nextest tool if it is not already available. This is a prerequisite for using `cargo nextest run`. ```bash cargo install --locked cargo-nextest ``` -------------------------------- ### Serve Vortex Documentation with Live Reloading Source: https://github.com/vortex-data/vortex/blob/develop/docs/README.md Start a local server to build the Vortex documentation with live-reloading capabilities. This is useful for development to see changes as they are made. ```bash uv run make serve ``` -------------------------------- ### Local Development: Run Benchmark Server Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/AGENTS.md Starts the benchmark server locally. Requires an ingest bearer token. ```bash INGEST_BEARER_TOKEN=dev cargo run -p vortex-bench-server ``` -------------------------------- ### Install vx CLI with Binstall Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/install.md Downloads a pre-built binary for the vx CLI. Requires cargo-binstall. ```bash cargo binstall vortex-tui ``` -------------------------------- ### Install Vortex Data Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/python.rst Use pip to install the vortex-data package. ```bash pip install vortex-data ``` -------------------------------- ### Install Docker Compose Plugin Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Downloads and installs the Docker Compose plugin for Linux aarch64. Ensures the plugin is executable. ```bash sudo mkdir -p /usr/local/lib/docker/cli-plugins sudo curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-aarch64 -o /usr/local/lib/docker/cli-plugins/docker-compose sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose ``` -------------------------------- ### Install and Load Vortex Extension Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/duckdb.md Use these SQL commands to install and load the Vortex extension in DuckDB. This is required before using Vortex-specific functions. ```sql INSTALL vortex; LOAD vortex; ``` -------------------------------- ### Install vx CLI from source with Cargo Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/install.md Builds the vx CLI from source. This can be slow due to the large dependency tree. ```bash cargo install vortex-tui ``` -------------------------------- ### Pull and Start v3 Container Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Pulls the latest image for the v3 server and starts it in detached mode. Includes a smoke test to verify the service is responding. ```bash cd /opt/benchmarks-website docker compose pull vortex-bench-server docker compose up -d vortex-bench-server # Smoke-check on the host: curl -sf http://127.0.0.1:3001/health || echo "v3 not responding" ``` -------------------------------- ### Open and read a Vortex file Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/vortex-python.md Open a Vortex file lazily using vx.open and get its length. This example assumes 'example.vortex' exists and is populated. ```python import pyarrow.parquet as pq vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex') f = vx.open('example.vortex') len(f) ``` -------------------------------- ### Deploy Benchmarks Website with Docker Compose Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Copies the docker-compose.yml file to the application directory and starts the services in detached mode. ```bash sudo mkdir -p /opt/benchmarks-website sudo cp docker-compose.yml /opt/benchmarks-website/ cd /opt/benchmarks-website docker compose up -d ``` -------------------------------- ### Add Vortex Python Package Source: https://github.com/vortex-data/vortex/blob/develop/README.md Install the Vortex Python package using uv. ```bash uv add vortex-data ``` -------------------------------- ### Install Docker on Amazon Linux 2023 Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Installs Docker, enables the service, and adds the current user to the docker group. Requires a new login or `newgrp docker` to take effect. ```bash sudo yum install -y docker sudo systemctl enable --now docker sudo usermod -aG docker $USER newgrp docker ``` -------------------------------- ### Run vx CLI without installation using uvx Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/install.md Executes the vx CLI without a full installation. Requires uv. ```bash uvx --from vortex-data vx --help ``` -------------------------------- ### Install Daily DuckDB Backup Cron Job Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Installs a backup script and configures a daily cron job to run it. The script is copied to a stable location, and the cron entry is added to /etc/cron.d/. ```bash # Copy the backup script from the repo checkout to a stable location. sudo install -m 755 -o root -g root \ benchmarks-website/server/scripts/backup.sh \ /usr/local/bin/vortex-bench-backup.sh # Cron entry: 06:00 UTC daily, after the nightly bench finishes. sudo tee /etc/cron.d/vortex-bench-backup >/dev/null <<'CRON' 0 6 * * * root /usr/local/bin/vortex-bench-backup.sh >> /var/log/vortex-bench-backup.log 2>&1 CRON sudo chmod 644 /etc/cron.d/vortex-bench-backup # The instance IAM role already permits writes to # s3://vortex-ci-benchmark-results/ (same role v2's cat-s3.sh uses). ``` -------------------------------- ### DictArray with Slice and RunEndArray Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/internals/execution.md Example of a DictArray using a sliced RunEndArray, illustrating how incremental execution preserves optimization opportunities like Dict-RLE. ```pseudocode dict: values: primitive(...) codes: slice(runend(...)) # Dict-RLE pattern hidden by slice ``` -------------------------------- ### Run TPC-H Benchmarks Source: https://github.com/vortex-data/vortex/blob/develop/README.md Execute TPC-H benchmarks using `vx-bench`, comparing different engines and formats. Ensure the benchmark orchestrator is installed. ```bash # Install the benchmark orchestrator uv tool install "bench_orchestrator @ ./bench-orchestrator/" # Run TPC-H benchmarks vx-bench run tpch --engine datafusion,duckdb --format parquet,vortex # Compare results vx-bench compare --run latest ``` -------------------------------- ### Read Vortex Files with Filters Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/duckdb.md Example of reading a Vortex file while applying a filter to select specific rows. This demonstrates predicate pushdown. ```sql SELECT name, age FROM read_vortex('data.vortex') WHERE age > 30; ``` -------------------------------- ### Local Development: Migrator E2E with S3 Dump Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/AGENTS.md Runs the migrator end-to-end against a real S3 dump and then starts the server. ```bash cargo run -p vortex-bench-migrate -- run --output ./bench.duckdb VORTEX_BENCH_DB=./bench.duckdb INGEST_BEARER_TOKEN=dev \ cargo run -p vortex-bench-server ``` -------------------------------- ### Install Dependencies on macOS Source: https://github.com/vortex-data/vortex/blob/develop/vortex-bench/README.md Install necessary packages on macOS using Homebrew to resolve 'Failed to compress to parquet' errors. This includes duckdb, cmake, ninja, and pkg-config. ```bash brew install duckdb cmake ninja pkg-config vcpkg ``` -------------------------------- ### Open Vortex file for reading Source: https://context7.com/vortex-data/vortex/llms.txt Use `session.open_options()` to get a `VortexOpenOptions` builder. Options include `open_path` for local files, `open_buffer` for in-memory data, and `open_object_store` for object storage. ```rust use vortex::VortexSessionDefault; use vortex::file::OpenOptionsSessionExt; use vortex::session::VortexSession; #[tokio::main] async fn main() -> vortex::error::VortexResult<()> { let session = VortexSession::default(); let file = session .open_options() .with_file_size(8_192) // optional hint — avoids one extra I/O .open_path("output.vortex") .await?; println!("rows={}", file.len()?); Ok(()) } ``` -------------------------------- ### Rust Binary: `generate` command structure Source: https://github.com/vortex-data/vortex/blob/develop/vortex-test/compat-gen/README.md The `generate` command in the `vortex-compat` Rust binary follows three phases: Setup (downloading external data), Build (constructing arrays in parallel), and Write (serializing arrays to .vortex files and creating a manifest). ```bash # Generate fixtures locally cargo run -p vortex-compat --release -- generate --output /tmp/fixtures ``` -------------------------------- ### Read Vortex File in Java Source: https://github.com/vortex-data/vortex/blob/develop/docs/api/java/index.rst Basic example demonstrating how to open a Vortex file and read an array from it using the Vortex Java API. Ensure the file path is correct. ```java import dev.vortex.api.File; import dev.vortex.api.Array; // Open a Vortex file File vortexFile = File.open("path/to/file.vortex"); // Read arrays from the file Array array = vortexFile.readArray(); // Work with the array data System.out.println("Array length: " + array.getLength()); ``` -------------------------------- ### Serve Vortex TUI Browser via WASM Source: https://github.com/vortex-data/vortex/blob/develop/vortex-tui/README.md Build and serve the TUI browser for WASM by running `make serve` from the `vortex-tui` directory. This requires `wasm-pack` to be installed. ```bash # From the vortex-tui directory: make serve ``` -------------------------------- ### Basic Test Example with Views Source: https://github.com/vortex-data/vortex/blob/develop/vortex-sqllogictest/README.md A reusable test pattern for basic operations, utilizing views over Vortex files. This approach is used because DuckDB and DataFusion lack a shared syntax for creating tables backed by external storage formats. Ensure substitution is enabled using `control substitution on` for `$__TEST_DIR__`. ```text query I COPY (values (1, 2), (3, 4)) TO '$__TEST_DIR__/test.vortex'; ---- 2 statement ok CREATE VIEW foo AS SELECT * FROM '$__TEST_DIR__/test.vortex'; query II SELECT * FROM foo; ---- 1 2 3 4 statement ok DROP VIEW IF EXISTS foo; ``` -------------------------------- ### Build Vortex C++ Bindings Source: https://github.com/vortex-data/vortex/blob/develop/vortex-cxx/README.md Standard build steps using CMake and Ninja. Ensure CMake version 3.22+ and a C++20 compatible compiler are installed. ```bash mkdir build cmake -Bbuild -GNinja cmake --build build -j ``` -------------------------------- ### Basic performance comparison workflow Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md A workflow demonstrating how to run benchmarks on a baseline, switch to a feature branch, run benchmarks again, and then compare the results. ```bash git checkout main vx-bench run tpch -e datafusion -f parquet,vortex -l baseline git checkout feature/my-optimization vx-bench run tpch -e datafusion -f parquet,vortex -l feature vx-bench compare --runs baseline,feature ``` -------------------------------- ### Run ClickBench Benchmarks with query_bench Source: https://github.com/vortex-data/vortex/blob/develop/vortex-bench/README.md Execute the ClickBench benchmark suite using the unified query_bench runner. Ensure the 'query_bench' binary is available. ```bash cargo run --bin query_bench -- clickbench ``` -------------------------------- ### Run gpu-scan-cli Benchmark Source: https://github.com/vortex-data/vortex/blob/develop/vortex-cuda/gpu-scan-cli/README.md Execute the gpu-scan-cli tool to benchmark Vortex files. Set environment variables for layout and logging. Use --json for trace output. ```bash FLAT_LAYOUT_INLINE_ARRAY_NODE=true RUST_LOG=vortex_cuda=trace,info \ cargo run --release --bin gpu-scan-cli -- ./path/to/file.vortex ``` -------------------------------- ### Build Vortex Documentation Source: https://github.com/vortex-data/vortex/blob/develop/docs/README.md Build the HTML documentation for Vortex. This command uses `uv run make html` to generate the documentation. ```bash uv run make html ``` -------------------------------- ### Build Executable with Vortex FFI Shared Library (write_sample) Source: https://github.com/vortex-data/vortex/blob/develop/vortex-ffi/examples/CMakeLists.txt This snippet sets up a build target for 'write_sample.c', linking it against the 'vortex_ffi_shared' library. ```cmake add_executable(write_sample write_sample.c) target_link_libraries(write_sample PRIVATE vortex_ffi_shared) ``` -------------------------------- ### Python Orchestrator: List Versions Source: https://github.com/vortex-data/vortex/blob/develop/vortex-test/compat-gen/README.md List all available versions stored in the compatibility fixture store. If a specific version is provided, its manifest.json is printed. ```bash python scripts/compat.py list [--store ] [--version ] ``` -------------------------------- ### Install Vortex Python Package Source: https://github.com/vortex-data/vortex/blob/develop/docs/api/python/index.rst Install the core Vortex Python package using pip. For optional integrations with libraries like Polars, Pandas, NumPy, DuckDB, or Ray, use the extras syntax. ```bash pip install vortex-data ``` ```bash pip install vortex-data[polars,pandas,numpy,duckdb,ray] ``` -------------------------------- ### Download Sample Data Source: https://github.com/vortex-data/vortex/blob/develop/docs/getting-started/convert.md Use `curl` to download the sample Parquet data for conversion. ```bash curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet ``` -------------------------------- ### Open Vortex File and Get Length Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/pyarrow.md Lazily open a Vortex file using `vortex.open` and check its length. ```python f = vx.open('example.vortex') len(f) ``` -------------------------------- ### Analyze Storage Format Performance Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Run a comprehensive benchmark to analyze the performance of different storage formats. Use the compare command to analyze within the run or specify a baseline for comparison. ```bash vx-bench run tpch \ -e datafusion \ -f parquet,vortex,vortex-compact \ -i 10 \ -l format-analysis ``` ```bash vx-bench compare --run format-analysis ``` ```bash vx-bench compare --run format-analysis --baseline datafusion:parquet ``` -------------------------------- ### Generate Fixtures Locally Source: https://github.com/vortex-data/vortex/blob/develop/vortex-test/compat-gen/README.md Use this command to generate compatibility fixtures in a specified directory. ```bash cargo run -p vortex-compat --release -- generate --output /tmp/fixtures ``` -------------------------------- ### VortexSession::default Source: https://context7.com/vortex-data/vortex/llms.txt Creates a fully-configured Vortex session by wiring together all built-in components. This is the recommended starting point for applications. ```APIDOC ## VortexSession::default — Create a fully-configured Vortex session ### Description A `VortexSession` is the central registry for array encodings, layout strategies, scalar functions, optimizer kernels, and the async I/O runtime. `VortexSession::default()` (via the `VortexSessionDefault` trait) wires all built-in components together and is the recommended starting point for any application. ### Code Example ```rust use vortex::VortexSessionDefault; use vortex::session::VortexSession; let session = VortexSession::default(); // session is now ready for reading, writing, and in-memory compression ``` ``` -------------------------------- ### Run TPC-H Benchmarks with query_bench Source: https://github.com/vortex-data/vortex/blob/develop/vortex-bench/README.md Execute the TPC-H benchmark suite using the unified query_bench runner. Ensure the 'query_bench' binary is available. ```bash cargo run --bin query_bench -- tpch ``` -------------------------------- ### Write Storybook Stories Source: https://github.com/vortex-data/vortex/blob/develop/vortex-web/README.md Example of how to define a Storybook story for a React component. Place story files alongside components as `*.stories.tsx`. ```tsx import type {Meta, StoryObj} from '@storybook/react-vite'; import {MyComponent} from './MyComponent'; const meta: Meta = { component: MyComponent, }; export default meta; type Story = StoryObj; export const Default: Story = { args: {}, }; ``` -------------------------------- ### Vortex ExecutionStep Enum Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/internals/execution.md Defines the possible outcomes of an encoding's decode step. Used to guide the scheduler on the next action. ```rust pub enum ExecutionStep { /// Push the parent onto the stack, focus a single child, and resume the /// parent once that child matches the predicate. ExecuteSlot(usize, DonePredicate), /// Detach a child, append it into the current activation's builder, and /// keep the parent as current_array for the next iteration. AppendChild(usize), /// Execution is complete. If a builder is active, it is finalized here. Done, } ``` -------------------------------- ### Run SQL Benchmarks with Orchestrator Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/benchmarking.md Runs SQL benchmarks using the bench-orchestrator CLI, specifying targets, output file, and build options. ```bash uv run --project bench-orchestrator vx-bench run tpch \ --targets-json '[{"engine":"datafusion","format":"parquet"},{"engine":"duckdb","format":"vortex"}]' \ --output results.json \ --no-build ``` -------------------------------- ### Query with Filter and Projection Pushdown Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/datafusion.md Demonstrates how filters and projections are pushed down into the Vortex scan for efficient data processing. Unsupported filters fall back to post-scan evaluation. ```rust let df = ctx.sql("SELECT col2, col1 FROM my_table WHERE col1 < 50 AND col3 = 'abc'").await.unwrap(); df.show().await.unwrap(); ``` -------------------------------- ### Within-Run Comparison Table Format Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Example of the pivot table format for within-run comparisons, showing query performance across different engine and format combinations. ```text ┌───────┬──────────────────────┬────────────────────────┐ │ Query │ duckdb:parquet (base)│ duckdb:vortex │ ├───────┼──────────────────────┼────────────────────────┤ │ 1 │ 100.5ms │ 80.2ms (0.80x) │ │ 2 │ 200.1ms │ 150.0ms (0.75x) │ └───────┴──────────────────────┴────────────────────────┘ ``` -------------------------------- ### Separate Setup from Profiled Code Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/benchmarking.md Use `bencher.with_inputs(|| ...)` to ensure that fixture construction is excluded from benchmark timing. This is crucial for accurate performance measurements. ```rust bencher .with_inputs(|| bench_fixture())) .bench_refs(|(array, indices)| { array.take(indices.to_array()).unwrap() }); ``` -------------------------------- ### Walkthrough: Chunked Bool Array Execution via AppendChild Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/internals/execution.md Trace of executing a Chunked Bool Array using the builder path, demonstrating AppendChild and chunk processing. ```text Input: Chunked { chunks[0] = Bool[true, false], chunks[1] = Bool[false], chunks[2] = Bool[true, true], } Goal: Canonical BoolArray Iteration 1: Step 1 → not done Step 2a → skipped (root, no stacked parent) Step 2b → None Step 3 → AppendChild(1) create current_builder = BoolBuilder [] append chunks[0] current_array = Chunked(next_builder_slot = 2) current_builder = BoolBuilder [true, false] Iteration 2: Step 1 → not done Step 2a / 2b → skipped (builder active; current_array is partially consumed) Step 3 → AppendChild(2) append chunks[1] current_array = Chunked(next_builder_slot = 3) current_builder = BoolBuilder [true, false, false] Iteration 3: Step 1 → not done Step 2a / 2b → skipped Step 3 → AppendChild(3) append chunks[2] current_array = Chunked(next_builder_slot = 4) current_builder = BoolBuilder [true, false, false, true, true] Iteration 4: Step 1 → not done Step 2a / 2b → skipped Step 3 → Done finish current_builder result = BoolArray [true, false, false, true, true] → Result: BoolArray [true, false, false, true, true] ``` -------------------------------- ### Build Vortex C++ Bindings with CMake Source: https://github.com/vortex-data/vortex/blob/develop/docs/api/cpp/index.rst Build the C++ bindings using CMake. Ensure you have CMake 3.22+, a C++20 compatible compiler, and a Rust toolchain installed. ```bash cd vortex-cxx mkdir build && cd build cmake .. make -j$(nproc) ``` -------------------------------- ### Multi-Run Comparison Table Format Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Example of the pivot table format for multi-run comparisons, displaying query performance across different runs, engines, and formats with ratios. ```text ┌───────┬────────┬─────────┬──────────────┬──────────────────┐ │ Query │ Engine │ Format │ run1 (base) │ run2 │ ├───────┼────────┼─────────┼──────────────┼──────────────────┤ │ 1 │ duckdb │ parquet │ 100ms │ 95ms (0.95x) │ │ 1 │ duckdb │ vortex │ 80ms │ 75ms (0.94x) │ └───────┴────────┴─────────┴──────────────┴──────────────────┘ ``` -------------------------------- ### Example Projection Expression in Vortex Source: https://github.com/vortex-data/vortex/blob/develop/docs/concepts/expressions.md Illustrates how a projection expression is applied to an array, resulting in a deferred computation structure. This structure represents the application of the expression before actual computation. ```yaml bitpacked: bitwidth: 4 buffer: Buffer ``` ```yaml scalar_fn(struct.pack): names: ["x", "y"] inputs: - bitpacked: bitwidth: 4 buffer: Buffer - scalar_fn(binary.add): inputs: - bitpacked: bitwidth: 4 buffer: Buffer - constant(1) ``` -------------------------------- ### Create DuckDB Data Directory Source: https://github.com/vortex-data/vortex/blob/develop/benchmarks-website/ec2-init.txt Sets up the data directory for DuckDB, assuming an EBS volume is mounted at the specified path. Ensures correct ownership and permissions. ```bash # Assumes an EBS volume is already mounted at /opt/benchmarks-website/data. sudo mkdir -p /opt/benchmarks-website/data sudo chown root:root /opt/benchmarks-website/data sudo chmod 755 /opt/benchmarks-website/data ``` -------------------------------- ### Profile query_bench with Instruments Source: https://github.com/vortex-data/vortex/blob/develop/vortex-bench/README.md Profile the 'query_bench' binary using 'cargo instruments' for performance analysis. This command opens the results in Instruments on macOS. ```bash cargo instruments -p vortex-bench --bin query_bench --template Time --profile bench -- tpch ``` -------------------------------- ### Create a default Vortex session Source: https://context7.com/vortex-data/vortex/llms.txt Use `VortexSession::default()` to initialize a fully-configured Vortex session with all built-in components wired together. This is the recommended starting point for applications. ```rust use vortex::VortexSessionDefault; use vortex::session::VortexSession; let session = VortexSession::default(); // session is now ready for reading, writing, and in-memory compression ``` -------------------------------- ### Compare engine:format combinations within a single run Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Compares different engine:format combinations within the most recent benchmark run. Results are presented in a pivot table format. ```bash vx-bench compare --run latest ``` -------------------------------- ### Vortex File Structure Source: https://github.com/vortex-data/vortex/blob/develop/docs/specs/file-format.md Defines the basic structure of a Vortex file, starting and ending with a magic number and containing segments of binary data, a version tag, and postscript length. ```text <4 bytes> magic number 'VTXF' ... segments of binary data, optionally with inter-segment padding ... postscript data <2 bytes> u16 version tag <2 bytes> u16 postscript length <4 bytes> magic number 'VTXF' ``` -------------------------------- ### Walkthrough: RunEndArray Execution to Canonical Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/internals/execution.md Trace of executing a RunEndArray to a Canonical form, detailing steps like ExecuteSlot and child processing. ```text Input: RunEndArray { ends: [3, 7, 10], values: [A, B, C], len: 10 } Goal: Canonical (PrimitiveArray or similar) Iteration 1: Step 1 → not done Step 2a → skipped (root, no stacked parent) Step 2b → None Step 3 → ends are not Primitive yet? ExecuteSlot(0, Primitive::matches) Stack: [(RunEnd, child_idx=0, Primitive::matches)] Focus on: ends current_builder = None Iteration 2: Step 1 → done (ends already match Primitive) Pop stack → replace child 0 in RunEnd Iteration 3: Step 1 → not done Step 2a → skipped (root again after the pop) Step 2b → None Step 3 → values are not Canonical yet? ExecuteSlot(1, AnyCanonical::matches) Stack: [(RunEnd, child_idx=1, AnyCanonical::matches)] Focus on: values Iteration 4: Step 1 → done (values already match AnyCanonical) Pop stack → replace child 1 in RunEnd Iteration 5: Step 1 → not done Step 2a → skipped (root) Step 2b → None Step 3 → all children ready, decode runs: [A, A, A, B, B, B, B, C, C, C] Done → return PrimitiveArray → Result: PrimitiveArray [A, A, A, B, B, B, B, C, C, C] ``` -------------------------------- ### Configure Catch2 Compilation Source: https://github.com/vortex-data/vortex/blob/develop/vortex-ffi/test/CMakeLists.txt Configures Catch2 compilation by defining specific preprocessor macros. This example disables POSIX signals for Catch2, which can be useful for cross-platform compatibility or specific testing scenarios. ```cmake target_compile_definitions(Catch2 PRIVATE CATCH_CONFIG_NO_POSIX_SIGNALS) ``` -------------------------------- ### Incrementally build a Utf8 array with a builder Source: https://context7.com/vortex-data/vortex/llms.txt Use `builder_with_capacity` and `ArrayBuilder` to construct arrays row-by-row without pre-allocating all data. This example demonstrates building a non-nullable UTF-8 string array. ```rust use vortex::array::builders::{ArrayBuilder, builder_with_capacity}; use vortex::array::dtype::{DType, Nullability}; use vortex::array::{LEGACY_SESSION, VortexSessionExecute}; let mut builder = builder_with_capacity(&DType::Utf8(Nullability::NonNullable), 4); builder.append_scalar(&"alpha".into()).unwrap(); builder.append_scalar(&"beta".into()).unwrap(); builder.append_scalar(&"gamma".into()).unwrap(); builder.append_scalar(&"delta".into()).unwrap(); let array = builder.finish(); let mut ctx = LEGACY_SESSION.create_execution_ctx(); assert_eq!(array.execute_scalar(0, &mut ctx).unwrap(), "alpha".into()); ``` -------------------------------- ### Run SQL Benchmarks Directly Source: https://github.com/vortex-data/vortex/blob/develop/docs/developer-guide/benchmarking.md Executes SQL benchmarks using per-engine binaries for DataFusion and DuckDB. ```bash cargo run --release --bin datafusion-bench -- ``` ```bash cargo run --release --bin duckdb-bench -- ``` -------------------------------- ### Read Vortex File into Pandas DataFrame Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/pandas.md Open a Vortex file, scan its data into memory, and convert it to a Pandas DataFrame. Use `.scan()` to get an iterator, `.read_all()` to collect batches, and `.to_pandas()` for conversion. ```python import vortex as vx import pyarrow.parquet as pq vx.io.write(pq.read_table("_static/example.parquet"), 'example.vortex') f = vx.open('example.vortex') df = f.scan().read_all().to_pandas() df[['tip_amount', 'fare_amount']].head(3) ``` -------------------------------- ### Run benchmarks with options Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Executes a specified benchmark suite with custom options for engines, formats, queries, and iterations. The `--targets-json` option allows for precise specification of engine-format pairs. ```bash vx-bench run [options] ``` ```bash vx-bench run tpch --engine datafusion,duckdb --format parquet,vortex --queries "1,2,5" --iterations 5 --label "my-run" --track-memory --output "/path/to/output.jsonl" ``` ```bash vx-bench run tpch --targets-json '[{"engine":"datafusion","format":"parquet"}]' ``` -------------------------------- ### Scan Vortex File with Filter and Projection Pushdown in Rust Source: https://context7.com/vortex-data/vortex/llms.txt Use `.scan()` to get a builder, then apply `.with_filter()` and `.with_projection()` before materializing the stream. Filters and projections are pushed into the file reader for efficient data processing. ```rust use vortex::VortexSessionDefault; use vortex::array::stream::ArrayStreamExt; use vortex::array::expr::{gt, lit, root, select}; use vortex::file::OpenOptionsSessionExt; use vortex::session::VortexSession; #[tokio::main] async fn main() -> vortex::error::VortexResult<()> { let session = VortexSession::default(); // Read rows where the root value > 2 let result = session .open_options() .open_path("output.vortex") .await? .scan()? .with_filter(gt(root(), lit(2u64))) .into_array_stream()? .read_all() .await?; println!("rows after filter={}", result.len()); // 2 (values 3, 4) // Projection: read only the "value" column from a struct file let projected = session .open_options() .open_path("struct.vortex") .await? .scan()? .with_projection(select(["value"], root())) .into_array_stream()? .read_all() .await?; println!("projected rows={}", projected.len()); Ok(()) } ``` -------------------------------- ### Filter Data Using Nested Field Access Source: https://github.com/vortex-data/vortex/blob/develop/docs/api/python/expr.rst Demonstrates how to filter a Vortex array based on a value within a nested structure. This example shows accessing a field 'yy' within a nested 'y' object and comparing it to a string. ```python import vortex as vx import vortex.expr as ve import pyarrow as pa array = pa.array([ {"x": 1, "y": {"yy": "a"}}, {"x": 2, "y": {"yy": "b"}}, ]) vx.io.write(vx.array(array), '/tmp/foo.vortex') (vx.file.open('/tmp/foo.vortex') .scan(expr=vx.expr.column("y")["yy"] == "a") .read_all() .to_pylist() ) ``` -------------------------------- ### Create External Table and Query with SQL Source: https://github.com/vortex-data/vortex/blob/develop/docs/user-guide/datafusion.md Create an external table pointing to Vortex files and query it using SQL. Ensure the Vortex format is registered. ```rust ctx.sql("CREATE EXTERNAL TABLE my_table STORED AS vortex LOCATION '/path/to/vortex/files'").await.unwrap(); let df = ctx.sql("SELECT * FROM my_table WHERE col1 > 10").await.unwrap(); df.show().await.unwrap(); ``` -------------------------------- ### Run Benchmark and Compare Engines Source: https://github.com/vortex-data/vortex/blob/develop/bench-orchestrator/README.md Run a benchmark for a specific dataset and compare the performance of different engines and formats. The comparison table is displayed automatically after the run. ```bash vx-bench run tpch -e datafusion,duckdb -f parquet -l engine-comparison ``` ```bash vx-bench compare --run engine-comparison ```