### Run Arrow Examples Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md Examples demonstrating Arrow array construction and data handling can be executed using the `cargo run --example` command. ```bash cargo run --example builders cargo run --example dynamic_types cargo run --example read_csv ``` -------------------------------- ### Install and Run fastavro Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Instructions to install the fastavro library and run the script to generate an Avro file. Ensure Python 3 is available before installation. ```bash python -m pip install --upgrade fastavro ``` ```bash python create_comprehensive_avro_file.py ``` ```bash mv comprehensive_e2e.avro arrow-avro/test/data/ ``` -------------------------------- ### Install fastavro Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Install the fastavro library to create Avro files. This is a prerequisite for running the creation script. ```bash pip install fastavro ``` -------------------------------- ### Prepare Development Environment Source: https://github.com/apache/arrow-rs/blob/main/arrow-pyarrow-integration-testing/README.md Installs necessary tools like maturin, toml, pytest, pytz, and pyarrow for development and testing. ```bash python -m venv venv virtualenv/bin/pip install maturin toml pytest pytz pyarrow>=5.0 ``` -------------------------------- ### Install CLI Binaries Source: https://github.com/apache/arrow-rs/blob/main/parquet/CONTRIBUTING.md Install the command-line interface binaries for Arrow Rust, such as parquet-schema, parquet-read, and parquet-rowcount, using 'cargo install --features cli'. ```bash cargo install --features cli ``` -------------------------------- ### Compile with Docker Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Builds the project using Rust's official Docker image, ensuring a consistent environment. Installs rustfmt and builds the project. ```bash docker run --rm -v $(pwd):/arrow-rs -it rust /bin/bash -c "cd /arrow-rs && rustup component add rustfmt && cargo build" ``` -------------------------------- ### Install fastavro and download script Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md These bash commands install the 'fastavro' Python package and download the Avro file generation script from a GitHub Gist. Ensure Python 3 is available before running. ```bash # 1) Ensure Python 3 is available, then install fastavro python -m pip install --upgrade fastavro # 2) Fetch the script curl -L -o create_avro_decimal_files.py \ https://gist.githubusercontent.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7/raw/create_avro_decimal_files.py ``` -------------------------------- ### Install FlightSQL CLI Client Source: https://github.com/apache/arrow-rs/blob/main/arrow-flight/README.md Install the FlightSQL command-line interface client using Cargo. This command enables the 'cli', 'flight-sql', and 'tls' features. ```bash $ cargo install --features=cli,flight-sql,tls --bin=flight_sql_client --path=. --locked ``` -------------------------------- ### Initialize Git Submodules Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Pulls down test data required for running tests and examples. This populates data in `./parquet-testing/data` and `./testing`. ```bash git submodule update --init ``` -------------------------------- ### Expand Thrift Macros Example Source: https://github.com/apache/arrow-rs/blob/main/parquet/THRIFT.md Command to expand Thrift macros for a specific module and features, useful for understanding generated code. ```sh % cargo expand -p parquet --lib --all-features basic ``` -------------------------------- ### Install Archery Tool Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Install the Archery tool, which is used for managing Arrow projects and running integration tests. Ensure you are in the arrow directory and use pip with the integration extra. ```shell cd arrow pip install -e dev/archery[integration] ``` -------------------------------- ### Install fastavro for Avro generation Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Install the fastavro Python package, which is required for generating the Avro test file. Ensure Python 3 is available. ```bash python3 -m pip install --upgrade fastavro ``` -------------------------------- ### Release Voting Email Template Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md This is an example email template to be sent to dev@arrow.apache.org for initiating a release vote. It includes details about the release candidate, commit hash, tarball location, and changelog. ```email To: dev@arrow.apache.org Subject: [VOTE][RUST] Release Apache Arrow Hi, I would like to propose a release of Apache Arrow Rust Implementation, version 4.1.0. This release candidate is based on commit: a5dd428f57e62db20a945e8b1895de91405958c4 [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Please download, verify checksums and signatures, run the unit tests, and vote on the release. The vote will be open for at least 72 hours. [ ] +1 Release this as Apache Arrow Rust [ ] +0 [ ] -1 Do not release this as Apache Arrow Rust because... [1]: https://github.com/apache/arrow-rs/tree/a5dd428f57e62db20a945e8b1895de91405958c4 [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-4.1.0 [3]: https://github.com/apache/arrow-rs/blob/a5dd428f57e62db20a945e8b1895de91405958c4/CHANGELOG.md ``` -------------------------------- ### Async reading from object stores with object_store feature Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Utilize the 'object_store' feature for asynchronous reading of Avro files from various object storage backends. This example demonstrates reading from a local file system. ```rust use std::sync::Arc; use arrow_avro::reader::{AsyncAvroFileReader, AvroObjectReader}; use futures::TryStreamExt; use object_store::ObjectStore; use object_store::local::LocalFileSystem; use object_store::path::Path; #[tokio::main] async fn main() -> anyhow::Result<()> { let store: Arc = Arc::new(LocalFileSystem::new()); let path = Path::from("data/example.avro"); let meta = store.head(&path).await?; let reader = AvroObjectReader::new(store, path); let stream = AsyncAvroFileReader::builder(reader, meta.size, 1024) .try_build() .await?; let batches: Vec<_> = stream.try_collect().await?; for batch in batches { println!("rows: {}", batch.num_rows()); } Ok(()) } ``` -------------------------------- ### Execute a FlightSQL Statement Query Source: https://github.com/apache/arrow-rs/blob/main/arrow-flight/README.md Example of executing a SQL statement query using the FlightSQL CLI client. This demonstrates connecting to a host and running a simple SELECT query. ```bash $ flight_sql_client --host example.com statement-query "SELECT 1;" +----------+ | Int64(1) | +----------+ | 1 | +----------+ ``` -------------------------------- ### Rust Deprecation Attribute Example Source: https://github.com/apache/arrow-rs/blob/main/README.md Use the `#[deprecated]` attribute to mark APIs for deprecation. Specify the version in which it was deprecated and provide a note suggesting the preferred alternative. ```rust #[deprecated(since = "51.0.0", note = "Use `date_part` instead")] ``` -------------------------------- ### Run All Tests Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Execute all unit and integration tests for the project. Use the `-p` flag to specify a particular crate. ```bash cargo test ``` ```bash cargo test -p arrow ``` ```bash cargo test -p parquet ``` ```bash cargo test -p arrow --all-features ``` ```bash cargo test --doc ``` -------------------------------- ### Run All Benchmarks Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Execute all benchmarks within the project. This is useful for verifying performance changes. ```bash cargo bench ``` -------------------------------- ### Build Documentation with Cargo Source: https://github.com/apache/arrow-rs/blob/main/parquet/CONTRIBUTING.md Generate project documentation using 'cargo doc --no-deps --all-features'. To view it in a browser, add the '--open' flag. ```bash cargo doc --no-deps --all-features --open ``` -------------------------------- ### Query Available Target CPUs and Features Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md Use `rustc` commands to list all supported target CPUs and features for fine-grained optimization. ```shell $ rustc --print target-cpus $ rustc --print target-features ``` -------------------------------- ### Generate Arrow Flight Rust Code Source: https://github.com/apache/arrow-rs/blob/main/arrow-flight/CONTRIBUTING.md Run this script to regenerate Rust code for Arrow Flight. This is required after modifying protobuf definitions or the dependencies of the gen directory. Ensure you have protoc installed. ```bash ./regen.sh ``` -------------------------------- ### Build C++ Integration Binaries Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Build the C++ integration test binaries. This involves navigating to the C++ directory, configuring the build with CMake, and then compiling using ninja. ```shell # build cpp binaries cd arrow/cpp mkdir build cd build cmake -DARROW_BUILD_INTEGRATION=ON -DARROW_FLIGHT=ON --preset ninja-debug-minimal .. ninja ``` -------------------------------- ### Create and Push Git Tag Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md This command creates a git tag for a release candidate and pushes it to the 'apache' remote. Replace `` and `` with the appropriate release version and release candidate number. ```bash git fetch apache git tag - apache/main # push tag to apache git push apache - ``` -------------------------------- ### Enable SSE4.2 Instructions for Build Source: https://github.com/apache/arrow-rs/blob/main/parquet/CONTRIBUTING.md To leverage SSE4.2 instructions for performance improvements, prepend the RUSTFLAGS environment variable to your cargo build command. ```bash RUSTFLAGS="-C target-feature=+sse4.2" cargo build ``` -------------------------------- ### Check Code Formatting Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Verify code formatting using `rustfmt`. The first command checks all crates, while the second specifically checks the parquet crate with additional configuration. ```bash cargo +stable fmt --all -- --check ``` ```bash cargo fmt -p parquet -- --check --config skip_children=true `find ./parquet -name "*.rs" \! -name format.rs` ``` -------------------------------- ### Checkout Arrow Repository Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Clone the Apache Arrow repository to your local machine. This is the first step before linking Rust source code. ```shell # check out arrow git clone git@github.com:apache/arrow.git ``` -------------------------------- ### Run Archery Integration Tests with Multiple Implementations and Golden Files Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Execute Archery integration tests with a comprehensive set of Arrow implementations and multiple golden directories. This command reproduces the test results typically seen in CI. ```shell archery integration --run-flight --with-cpp=1 --with-csharp=1 --with-java=1 --with-js=1 --with-go=1 --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/0.14.1 --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/0.17.1 --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-bigendian --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/1.0.0-littleendian --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/2.0.0-compression --gold-dirs=/arrow/testing/data/arrow-ipc-stream/integration/4.0.0-shareddict ``` -------------------------------- ### Compile Arrow Rust Project Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Builds the entire project using Cargo. Assumes you are in the project's root directory. ```bash cargo build ``` -------------------------------- ### Arrow Avro Build for Async Object Store Access Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Enable asynchronous reading from object stores like S3 and GCS by including the 'object_store' feature. ```toml arrow-avro = { version = "58", features = ["object_store"] } ``` -------------------------------- ### Develop and Test Arrow Integration Source: https://github.com/apache/arrow-rs/blob/main/arrow-pyarrow-integration-testing/README.md Activates the virtual environment, builds the Rust crate for development, and runs pytest to execute integration tests. ```bash source venv/bin/activate maturin develop pytest -v . ``` -------------------------------- ### Run Rust Flight Integration Test Client Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Directly execute the Rust Flight integration test client. This command connects to a running server and performs integration tests, often used for debugging. ```shell # run rust client (you can see file names if you run archery --debug $ arrow/rust/target/debug/flight-test-integration-client --host localhost --port=49153 --path /tmp/generated_dictionary_unsigned.json ``` -------------------------------- ### Build Rust Binaries Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Compile the Rust components of the Arrow project. This command builds all Rust crates within the arrow-rs project. ```shell # build rust: cd ../arrow-rs cargo build --all ``` -------------------------------- ### Fetch Avro generation script Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Download the Python script used to generate the Avro test file. This script utilizes fastavro to create an OCF file with the specified schema and random data. ```bash curl -L -o generate_duration_avro.py \ https://gist.githubusercontent.com/nathaniel-d-ef/c253cb180b041023e3ccfe9df20ccef7/raw/06c8ca1321efcd8e1c8746fd65aa013e1a566944/generate_duration_avro.py ``` -------------------------------- ### Run Clippy Lints Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Execute Clippy with default settings to check for lints across the entire workspace, all targets, and all features. Warnings are treated as errors. ```bash cargo clippy --workspace --all-targets --all-features -- -D warnings ``` -------------------------------- ### Set RUSTFLAGS for AVX512 and Native CPU Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md Enable native CPU features and full AVX512 usage by setting specific flags in `RUSTFLAGS`. ```ignore RUSTFLAGS="-C target-cpu=native -C target-feature=-prefer-256-bit" ``` -------------------------------- ### Minimal Arrow Avro Build Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Use this configuration for common pipelines requiring minimal dependencies. It enables deflate and snappy compression. ```toml arrow-avro = { version = "58", default-features = false, features = ["deflate", "snappy"] } ``` -------------------------------- ### Run C++ Flight Integration Test Server Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Directly execute the C++ Flight integration test server. This is useful for debugging specific test scenarios. ```shell # Run cpp server $ arrow/cpp/build/debug/flight-test-integration-server -port 49153 ``` -------------------------------- ### Documenting Unsafe Code Usage in Rust Source: https://github.com/apache/arrow-rs/blob/main/arrow/CONTRIBUTING.md When using `unsafe` blocks, provide documentation explaining the benefits, the reasons why safe code is not feasible, and the invariants the user must uphold for soundness. Use `debug_assert`s for relevant checks. ```rust // JUSTIFICATION // Benefit // Describe the benefit of using unsafe. E.g. // "30% performance degradation if the safe counterpart is used, see bench X." // Soundness // Describe why the code remains sound (according to the definition of rust's unsafe code guidelines). E.g. // "We bounded check these values at initialization and the array is immutable." let ... = unsafe { ... }; ``` ```rust // Soundness // This is not sound because .... see https://issues.apache.org/jira/browse/ARROW-nnnnn ``` -------------------------------- ### Run Archery Integration Tests Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Execute the Archery integration tests. This command runs producers and consumers for enabled Arrow implementations (e.g., C++ and Rust) and compares their results. ```shell archery integration --with-cpp=true --with-rust=true ``` -------------------------------- ### Arrow Avro Build with Schema Fingerprint Helpers Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Enable MD5 and SHA256 schema fingerprinting by including the 'md5' and 'sha256' features. This is useful for custom prefixing or validation. ```toml arrow-avro = { version = "58", features = ["md5", "sha256"] } ``` -------------------------------- ### Run Specific Benchmarks Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Execute benchmarks for a specific crate or a particular benchmark function within a crate. This allows for targeted performance testing. ```bash cargo bench -p arrow ``` ```bash cargo bench -p arrow-cast --bench parse_time ``` -------------------------------- ### Set Benchmark Baseline Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Establish a baseline for performance benchmarks by saving the results of the main branch. This is crucial for comparing performance improvements. ```bash git checkout main cargo bench --bench parse_time -- --save-baseline main git checkout feature cargo bench --bench parse_time -- --baseline main ``` -------------------------------- ### Format Code with Cargo Source: https://github.com/apache/arrow-rs/blob/main/parquet/CONTRIBUTING.md Before submitting a pull request, use 'cargo fmt --all' to ensure consistent code formatting across the project. ```bash cargo fmt --all ``` -------------------------------- ### Create Release Candidate Tarball Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md Use this script to create and upload a release candidate tarball to the Apache distribution server. It also generates an email template for the release voting process. ```shell ./dev/release/create-tarball.sh 4.1.0 2 ``` -------------------------------- ### Run Archery Integration Tests with Golden Files Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Run Archery integration tests, specifying golden directories for comparison. This is often used in CI to verify test results against predefined golden files. ```shell archery integration --with-cpp=true --with-rust=true --gold-dirs=/path/to/arrow/testing/data/arrow-ipc-stream/integration/0.14.1 --gold-dirs=/path/to/arrow/testing/data/arrow-ipc-stream/integration/0.17.1 ``` -------------------------------- ### Arrow Avro Build with Zstandard Compression Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Include Zstandard compression for modern data lake scenarios. This builds upon the minimal configuration by adding zstd support. ```toml arrow-avro = { version = "58", default-features = false, features = ["deflate", "snappy", "zstd"] } ``` -------------------------------- ### Set RUSTFLAGS for Native CPU Optimization Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md Override LLVM defaults to optimize for the current CPU's features by setting the `RUSTFLAGS` environment variable. ```ignore RUSTFLAGS="-C target-cpu=native" ``` -------------------------------- ### Generate Avro decimal files Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md This command executes the downloaded Python script to generate Avro decimal files. The '-o' flag specifies the output directory, defaulting to 'arrow-avro/test/data'. The script prints a verification dump by default. ```bash # 3) Generate the files (prints a verification dump by default) python create_avro_decimal_files.py -o arrow-avro/test/data ``` -------------------------------- ### Add arrow-avro with specific features Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Disable default features and explicitly enable desired features like compression codecs (e.g., deflate, snappy) for reduced build size and dependencies. ```toml [dependencies] arrow-avro = { version = "58.0.0", default-features = false, features = ["deflate", "snappy"] } ``` -------------------------------- ### Set LD_LIBRARY_PATH for Library Loading Source: https://github.com/apache/arrow-rs/blob/main/parquet/CONTRIBUTING.md If you encounter a 'Library not loaded' error, ensure that LD_LIBRARY_PATH is correctly set to include the Arrow Rust library path. ```bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib ``` -------------------------------- ### Push Release Tag to GitHub Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md After a release is approved, push the release tag to GitHub. This command creates a tag with the specified version and release candidate number and then pushes it to the 'apache' remote. ```shell git tag - git push apache ``` -------------------------------- ### Link Rust Source Code Source: https://github.com/apache/arrow-rs/blob/main/arrow-integration-testing/README.md Create a symbolic link from your arrow-rs source code directory into the checked-out arrow repository. This allows the build system to find the Rust components. ```shell # link rust source code into arrow ln -s arrow/rust ``` -------------------------------- ### Read Avro OCF file into Arrow RecordBatch Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Use ReaderBuilder to create an Avro reader from a file and iterate over RecordBatches. Ensure the file path is correct. ```rust use std::fs::File; use std::io::BufReader; use arrow_avro::reader::ReaderBuilder; use arrow_array::RecordBatch; fn main() -> anyhow::Result<()> { let file = BufReader::new(File::open("data/example.avro")?); let mut reader = ReaderBuilder::new().build(file)?; while let Some(batch) = reader.next() { let batch: RecordBatch = batch?; println!("rows: {}", batch.num_rows()); } Ok(()) } ``` -------------------------------- ### Verify Release Candidate Tarball Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md This script assists in the verification process of a release candidate tarball. Ensure you have downloaded the tarball and its associated signatures before running. ```shell ./dev/release/verify-release-candidate.sh 4.1.0 2 ``` -------------------------------- ### Set Test Data Environment Variables Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Optionally specifies custom locations for test data if they are not in the default submodule paths. ```bash # Optionally specify a different location for test data export PARQUET_TEST_DATA=$(cd ./parquet-testing/data; pwd) export ARROW_TEST_DATA=$(cd ./testing/data; pwd) ``` -------------------------------- ### Add Dependencies to Cargo.toml Source: https://github.com/apache/arrow-rs/blob/main/parquet_derive/README.md Include the parquet and parquet_derive crates in your project's Cargo.toml file to use their functionalities. ```toml [dependencies] parquet = "39.0.0" parquet_derive = "39.0.0" ``` -------------------------------- ### Write Arrow RecordBatch to Avro OCF (in-memory) Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/README.md Create an AvroWriter with a schema and a sink (e.g., Vec) to serialize an Arrow RecordBatch into Avro format. The resulting bytes can be retrieved using into_inner(). ```rust use std::sync::Arc; use arrow_avro::writer::AvroWriter; use arrow_array::{ArrayRef, Int32Array, RecordBatch}; use arrow_schema::{DataType, Field, Schema}; fn main() -> anyhow::Result<()> { let schema = Schema::new(vec![Field::new("id", DataType::Int32, false)]); let batch = RecordBatch::try_new( Arc::new(schema.clone()), vec![Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef], )?; let sink: Vec = Vec::new(); let mut w = AvroWriter::new(sink, schema)?; w.write(&batch)?; w.finish()?; assert!(!w.into_inner().is_empty()); Ok(()) } ``` -------------------------------- ### Publish Arrow Crates to Crates.io Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md Publish individual Arrow Rust crates to crates.io after an official project release has been made. Ensure you have an account and are added as an owner to the 'arrow' crate on crates.io. ```shell (cd arrow-buffer && cargo publish) ``` ```shell (cd arrow-schema && cargo publish) ``` ```shell (cd arrow-data && cargo publish) ``` ```shell (cd arrow-array && cargo publish) ``` ```shell (cd arrow-select && cargo publish) ``` ```shell (cd arrow-ord && cargo publish) ``` ```shell (cd arrow-cast && cargo publish) ``` ```shell (cd arrow-ipc && cargo publish) ``` ```shell (cd arrow-csv && cargo publish) ``` ```shell (cd arrow-json && cargo publish) ``` ```shell (cd arrow-avro && cargo publish) ``` ```shell (cd arrow-string && cargo publish) ``` ```shell (cd arrow-row && cargo publish) ``` ```shell (cd arrow-pyarrow && cargo publish) ``` ```shell (cd arrow && cargo publish) ``` ```shell (cd arrow-avro && cargo publish) ``` ```shell (cd arrow-flight && cargo publish) ``` ```shell (cd parquet-variant && cargo publish) ``` ```shell (cd parquet-variant-json && cargo publish) ``` ```shell (cd parquet-variant-compute && cargo publish) ``` ```shell (cd parquet-geospatial && cargo publish) ``` ```shell (cd parquet && cargo publish) ``` ```shell (cd parquet_derive && cargo publish) ``` ```shell (cd arrow-integration-test && cargo publish) ``` -------------------------------- ### Write List Header and Elements Source: https://github.com/apache/arrow-rs/blob/main/parquet/THRIFT.md Writes a list to a Thrift protocol by first writing the list header with the element count and type, then serializing each element individually. ```rust // write the list header writer.write_list_begin(ElementType::Struct, page_locations.len)?; // write the elements for i in 0..len { page_locations[i].write_thrift(writer)?; } ``` -------------------------------- ### Create Avro Union File Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Execute the Python script to generate the union_fields.avro file. This script uses fastavro to write various Avro union types. ```bash python3 create_avro_union_file.py # Outputs: ./union_fields.avro ``` -------------------------------- ### Compile Specific Workspace Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Builds a specific workspace within the Arrow Rust project. Navigate to the workspace directory first. ```bash cd arrow && cargo build ``` -------------------------------- ### Update Rust Toolchain Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Ensures you are using the latest stable version of Rust for testing and development. ```bash rustup update stable ``` -------------------------------- ### Update Versions and Generate Changelog Source: https://github.com/apache/arrow-rs/blob/main/dev/release/README.md This script updates versions in Cargo.toml and markdown files, generates a changelog from GitHub issues, and commits these changes. Ensure your GitHub token is set as an environment variable. ```bash git checkout main git pull git checkout -b # Update versions. Make sure to run it before the next step since we do not want CHANGELOG-old.md affected. sed -i '' -e 's/14.0.0/39.0.0/g' `find . -name 'Cargo.toml' -or -name '*.md' | grep -v CHANGELOG.md | grep -v README.md` git commit -a -m 'Update version' # ensure your github token is available export ARROW_GITHUB_API_TOKEN= # manually edit ./dev/release/update_change_log.sh to reflect the release version # create the changelog ./dev/release/update_change_log.sh # commit the initial changes git commit -a -m 'Create changelog' # run automated script to copy labels to issues based on referenced PRs # (NOTE 1: this must be done by a committer / other who has # write access to the repository) # # NOTE 2: this must be done after creating the initial CHANGELOG file python dev/release/label_issues.py # review change log / edit issues and labels if needed, rerun, repeat as necessary # note you need to revert changes to CHANGELOG-old.md if you want to rerun the script ./dev/release/update_change_log.sh # Commit the changes git commit -a -m 'Update changelog' git push ``` -------------------------------- ### Crate Root Declarations Source: https://github.com/apache/arrow-rs/blob/main/parquet_derive/README.md Declare the necessary crates in your crate's root file (e.g., lib.rs or main.rs) to enable the derive macros and functionalities. ```rust extern crate parquet; #[macro_use] extern crate parquet_derive; ``` -------------------------------- ### Link Git Pre-commit Hook Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Safely create a symbolic link to the pre-commit.sh script to enable git pre-commit hooks. ```bash ln -s ../../pre-commit.sh .git/hooks/pre-commit ``` -------------------------------- ### Read List Header and Elements Source: https://github.com/apache/arrow-rs/blob/main/parquet/THRIFT.md Reads a list from a Thrift protocol by first obtaining the list header to determine the number of elements, then iterating to read each element. ```rust // read the list header let list_ident = prot.read_list_begin()?; // allocate vector with enough capacity let mut page_locations = Vec::with_capacity(list_ident.size as usize); // read elements for _ in 0..list_ident.size { page_locations.push(read_page_location(prot)?); } ``` -------------------------------- ### Generate Avro duration logical types file Source: https://github.com/apache/arrow-rs/blob/main/arrow-avro/test/data/README.md Execute the Python script to generate the Avro OCF file containing duration logical types. The output defaults to the arrow-avro/test/data directory. Use the --num-rows option to specify the number of rows. ```bash python3 generate_duration_avro.py -o arrow-avro/test/data ``` -------------------------------- ### Set RUSTFLAGS for CPUs Newer Than Haswell Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md Optimize for CPUs released after 2013 (Haswell) by setting the `target-cpu` flag in `RUSTFLAGS`. ```ignore RUSTFLAGS="-C target-cpu=haswell" ``` -------------------------------- ### Configure Arrow for WASM in Cargo.toml Source: https://github.com/apache/arrow-rs/blob/main/arrow/README.md To compile Arrow for `wasm32-unknown-unknown`, disable default features and specify desired features, excluding test dependencies. ```toml [dependencies] arrow = { version = "5.0", default-features = false, features = ["csv", "ipc"] } ``` -------------------------------- ### Check Git Pre-commit Hook Existence Source: https://github.com/apache/arrow-rs/blob/main/CONTRIBUTING.md Before linking a new pre-commit hook, check if one already exists in the .git/hooks/pre-commit path. ```bash ls -l .git/hooks/pre-commit ```