### Install Project Files Source: https://github.com/apache/orc/blob/main/CMakeLists.txt Installs LICENSE and NOTICE files to the share/doc/orc directory. This ensures documentation files are available after installation. ```cmake install ( FILES LICENSE NOTICE DESTINATION "share/doc/orc") ``` -------------------------------- ### Set Example Directory Source: https://github.com/apache/orc/blob/main/CMakeLists.txt Defines the directory for example files. Ensure this path is correctly set for example compilation. ```cmake set (EXAMPLE_DIRECTORY ${PROJECT_SOURCE_DIR}/examples) ``` -------------------------------- ### Install Tool Binaries Source: https://github.com/apache/orc/blob/main/tools/src/CMakeLists.txt Installs all the defined C++ tool executables to the 'bin' directory. ```cmake install(TARGETS ${CPP_TOOL_NAMES} DESTINATION bin) ``` -------------------------------- ### Install PyArrow and Pandas Source: https://github.com/apache/orc/blob/main/site/_docs/pyarrow.md Install the recommended PyArrow package and Pandas for data manipulation. ```bash pip3 install pyarrow==20.0.0 pip3 install pandas ``` -------------------------------- ### Install Pandas and PyArrow Source: https://github.com/apache/orc/blob/main/site/_docs/pandas.md Install the necessary libraries for Pandas and ORC support. Ensure you are using a compatible version of Pandas. ```bash pip3 install pandas==2.3.3 pip3 install pyarrow ``` -------------------------------- ### Install ORC JARs Source: https://github.com/apache/orc/blob/main/java/CMakeLists.txt Installs the built ORC JAR files to the 'share' destination directory. ```cmake install( FILES ${ORC_JARS} DESTINATION share) ``` -------------------------------- ### Run Docker Container for Site Preview Source: https://github.com/apache/orc/blob/main/site/README.md Start a Docker container to serve a live preview of the ORC documentation site. Mounts the current directory to the container for live updates. ```bash docker run -d --name orc-container -p 4000:4000 -v $PWD:/home/orc/site apache/orc-dev:site ``` -------------------------------- ### Install Dask with ORC support Source: https://github.com/apache/orc/blob/main/site/_docs/dask.md Install Dask with dataframe support and pandas. Ensure you use a compatible version of Dask. ```bash pip3 install "dask[dataframe]==2025.5.1" pip3 install pandas ``` -------------------------------- ### Direct Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Illustrates the serialization of an unsigned integer sequence using direct encoding. The example shows the header components (encoding type, width, length) and the resulting byte sequence. ```text The unsigned sequence of [23713, 43806, 57005, 48879] would be serialized with direct encoding (1), a width of 16 bits (15), and length of 4 (3) as [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad, 0xbe, 0xef]. ``` -------------------------------- ### Install Meson Dependency Source: https://github.com/apache/orc/blob/main/README.md Install a specific dependency for the ORC project using Meson's wrap system. This command should be run from the project root. ```shell meson wrap install ``` -------------------------------- ### Build and Preview Website with Docker Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Build the ORC website using Docker and run it locally to preview changes. This involves building the Docker image, starting a container, and accessing the site via a browser. ```bash % docker build -t orc-site . % CONTAINER=$(docker run -d -p 4000:4000 orc-site) Check the website on [http://0.0.0.0:4000/](http://0.0.0.0:4000/) % docker cp $CONTAINER:/home/orc/site/target . % docker stop $CONTAINER ``` -------------------------------- ### Integer Run Length Encoding v1 - Run Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv1.md Demonstrates RLEv1 for sequences with a small fixed delta. The encoding starts with a header byte indicating the run length, followed by the delta, and then the first value as a varint. Example: 100 instances of 7. ```text For example, if the sequence is 100 instances of 7 the encoding would start with 100 - 3, followed by a delta of 0, and a varint of 7 for an encoding of [0x61, 0x00, 0x07]. ``` -------------------------------- ### CMake Dependencies and Installation Source: https://github.com/apache/orc/blob/main/c++/src/CMakeLists.txt This snippet defines dependencies between targets and configures the installation of targets and exported CMake files. Ensure that the installation paths and namespaces are correctly set for your project. ```cmake add_dependencies(orc orc-format_ep) ``` ```cmake install(TARGETS orc EXPORT orc_targets) ``` ```cmake install(EXPORT orc_targets DESTINATION ${ORC_INSTALL_CMAKE_DIR} NAMESPACE "orc::" FILE "orcTargets.cmake") ``` -------------------------------- ### Delta Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Illustrates the serialization of an unsigned sequence using delta encoding. The example shows the components: header, base value, delta base, and delta values. ```text The unsigned sequence of [2, 3, 5, 7, 11, 13, 17, 19, 23, 29] would be serialized with delta encoding (3), a width of 4 bits (3), length of 10 (9), a base of 2 (2), and first delta of 1 (2). The resulting sequence is [0xc6, 0x09, 0x02, 0x02, 0x22, 0x42, 0x42, 0x46]. ``` -------------------------------- ### Install ORC Configuration Header Source: https://github.com/apache/orc/blob/main/c++/include/CMakeLists.txt Installs the generated ORC configuration header file to the system's include directory. This makes build-specific configurations available to users of the library. ```cmake install(FILES "${CMAKE_CURRENT_BINARY_DIR}/orc/orc-config.hh" DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/orc" ) ``` -------------------------------- ### Direct Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv1.md Serializes an unsigned sequence of integers using direct encoding. The example shows the byte representation for a sequence with a specified width and length. ```text [0x5e, 0x03, 0x5c, 0xa1, 0xab, 0x1e, 0xde, 0xad, 0xbe, 0xef] ``` -------------------------------- ### Add Java Examples Test Source: https://github.com/apache/orc/blob/main/java/CMakeLists.txt Defines a CMake test named 'java-examples-test' that runs the ORC examples JAR to perform a write operation. ```cmake add_test( NAME java-examples-test COMMAND java -jar examples/orc-examples-${ORC_VERSION}-uber.jar write WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}) ``` -------------------------------- ### Install ORC Header Directory Source: https://github.com/apache/orc/blob/main/c++/include/CMakeLists.txt Installs the entire ORC header directory to the system's include path. This ensures all necessary header files for the ORC library are accessible. ```cmake install(DIRECTORY "orc/" DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/orc" FILES_MATCHING PATTERN "*.hh" ) ``` -------------------------------- ### Set Test Environment Variable for Example Directory Source: https://github.com/apache/orc/blob/main/c++/test/CMakeLists.txt Sets the ORC_EXAMPLE_DIR environment variable for the 'orc-test' test. This is useful for tests that need to access example files. ```cmake set_property(TEST orc-test PROPERTY ENVIRONMENT "ORC_EXAMPLE_DIR=${EXAMPLE_DIRECTORY}" ) ``` -------------------------------- ### Byte Run Length Encoding Examples Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Shows examples of Byte Run Length Encoding, a lightweight method for compressing sequences of identical byte values. Runs of 3 or more identical values are encoded with a header indicating length, while literal sequences use a negative length header. ```text For example, a hundred 0's is encoded as [0x61, 0x00] and the sequence 0x44, 0x45 would be encoded as [0xfe, 0x44, 0x45]. ``` -------------------------------- ### Patched Base Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv1.md Serializes a sequence of integers with varying bit widths using the Patched Base encoding. This example illustrates the byte representation including base value, data values, and patch list. ```text [0x8e, 0x13, 0x2b, 0x21, 0x07, 0xd0, 0x1e, 0x00, 0x14, 0x70, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e, 0x78, 0x82, 0x8c, 0x96, 0xa0, 0xaa, 0xb4, 0xbe, 0xfc, 0xe8] ``` -------------------------------- ### Patched Base Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Demonstrates the serialization of an unsigned integer sequence with varying bit widths using the patched base encoding. It includes the header details (encoding type, width, length, base value width, patch width, patch gap width, patch list length) and the resulting byte sequence. ```text The unsigned sequence of [2030, 2000, 2020, 1000000, 2040, 2050, 2060, 2070, 2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190] has a minimum of 2000, which makes the adjusted sequence [30, 0, 20, 998000, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190]. It has an encoding of patched base (2), a bit width of 8 (7), a length of 20 (19), a base value width of 2 bytes (1), a patch width of 12 bits (11), patch gap width of 2 bits (1), and a patch list length of 1 (1). The base value is 2000 and the combined result is [0x8e, 0x13, 0x2b, 0x21, 0x07, 0xd0, 0x1e, 0x00, 0x14, 0x70, 0x28, 0x32, 0x3c, 0x46, 0x50, 0x5a, 0x64, 0x6e, 0x78, 0x82, 0x8c, 0x96, 0xa0, 0xaa, 0xb4, 0xbe, 0xfc, 0xe8] ``` -------------------------------- ### Write Advanced ORC File with Map Column Source: https://github.com/apache/orc/blob/main/site/_docs/core-java.md This example demonstrates writing an ORC file with a map column. It requires careful handling of map offsets and lengths. ```java Path testFilePath = new Path("advanced-example.orc"); Configuration conf = new Configuration(); TypeDescription schema = TypeDescription.fromString("struct>"); Writer writer = OrcFile.createWriter(testFilePath, OrcFile.writerOptions(conf).setSchema(schema)); VectorizedRowBatch batch = schema.createRowBatch(); LongColumnVector first = (LongColumnVector) batch.cols[0]; LongColumnVector second = (LongColumnVector) batch.cols[1]; //Define map. You need also to cast the key and value vectors MapColumnVector map = (MapColumnVector) batch.cols[2]; BytesColumnVector mapKey = (BytesColumnVector) map.keys; LongColumnVector mapValue = (LongColumnVector) map.values; // Each map has 5 elements final int MAP_SIZE = 5; final int BATCH_SIZE = batch.getMaxSize(); // Ensure the map is big enough mapKey.ensureSize(BATCH_SIZE * MAP_SIZE, false); mapValue.ensureSize(BATCH_SIZE * MAP_SIZE, false); // add 1500 rows to file for(int r=0; r < 1500; ++r) { int row = batch.size++; first.vector[row] = r; second.vector[row] = r * 3; map.offsets[row] = map.childCount; map.lengths[row] = MAP_SIZE; map.childCount += MAP_SIZE; for (int mapElem = (int) map.offsets[row]; mapElem < map.offsets[row] + MAP_SIZE; ++mapElem) { String key = "row " + r + "." + (mapElem - map.offsets[row]); mapKey.setVal(mapElem, key.getBytes(StandardCharsets.UTF_8)); mapValue.vector[mapElem] = mapElem; } if (row == BATCH_SIZE - 1) { writer.addRowBatch(batch); batch.reset(); } } if (batch.size != 0) { writer.addRowBatch(batch); batch.reset(); } writer.close(); ``` -------------------------------- ### Build ORC C++ and Java Release Source: https://github.com/apache/orc/blob/main/site/_docs/building.md Standard build process for a release version of ORC, including C++ and Java components. Ensure all prerequisites are installed before running. ```shell % mkdir build % cd build % cmake .. % make package test-out ``` -------------------------------- ### Build and Package Benchmarks Source: https://github.com/apache/orc/blob/main/java/bench/README.md Run this command in the parent directory to build the benchmark library. It cleans, packages, and skips tests for the benchmark profile. ```bash % ./mvnw clean package -Pbenchmark -DskipTests % cd bench ``` -------------------------------- ### Build Java Project with Maven Wrapper Source: https://github.com/apache/orc/blob/main/AGENTS.md Use this command to build the Java project, skipping tests. Ensure you are using Java 17 or higher. ```bash cd java ./mvnw package -DskipTests ``` -------------------------------- ### Integer Run Length Encoding v1 - Decreasing Run Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv1.md Shows an RLEv1 example for a decreasing sequence. The header byte encodes the run length, the delta is negative, and the first value is varint encoded. Example: sequence running from 100 down to 1. ```text To encode the sequence of numbers running from 100 to 1, the first byte is 100 - 3, the delta is -1, and the varint is 100 for an encoding of [0x61, 0xff, 0x64]. ``` -------------------------------- ### Fetch Source Data Source: https://github.com/apache/orc/blob/main/java/bench/README.md Execute this script to download the source data for the benchmarks. Be aware that this script will fetch approximately 4GB of data. ```bash % ./fetch-data.sh ``` -------------------------------- ### Create ORC Writer with Schema Source: https://github.com/apache/orc/blob/main/site/_docs/core-cpp.md Initialize an ORC writer by specifying the output file, schema, and writer options. Ensure OrcFile.hh is included. ```cpp std::unique_ptr outStream = writeLocalFile("my-file.orc"); std::unique_ptr schema( Type::buildTypeFromString("struct")); WriterOptions options; std::unique_ptr writer = createWriter(*schema, outStream.get(), options); ``` -------------------------------- ### Build ORC C++ and Java with Verbose Output Source: https://github.com/apache/orc/blob/main/site/_docs/building.md Build process for ORC that shows detailed make commands. Useful for debugging build failures. ```shell % make package test-out VERBOSE=1 ``` -------------------------------- ### Run Write Benchmark Source: https://github.com/apache/orc/blob/main/java/bench/README.md Initiate a write benchmark using the Hive module. This command requires the Hive ORC benchmarks uber JAR. ```bash % java -jar hive/target/orc-benchmarks-hive-*-uber.jar write data ``` -------------------------------- ### Write and Read ORC File with Pandas Source: https://github.com/apache/orc/blob/main/site/_docs/pandas.md Demonstrates creating a Pandas DataFrame, writing it to an ORC file, and then reading the ORC file back. Supports reading specific columns. ```python import pandas as pd ``` ```python df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", None]}) ``` ```python df.to_orc("test.orc") ``` ```python pd.read_orc("test.orc") ``` ```python pd.read_orc("test.orc", columns=["col1"]) ``` -------------------------------- ### Run Full Read Benchmark Source: https://github.com/apache/orc/blob/main/java/bench/README.md Execute a full read benchmark using the Hive module. This command requires the Hive ORC benchmarks uber JAR. ```bash % java -jar hive/target/orc-benchmarks-hive-*-uber.jar read-all data ``` -------------------------------- ### Boolean Run Length Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Illustrates Boolean Run Length Encoding, where bits are packed into bytes from most significant to least significant and then encoded using byte run-length encoding. This example shows how a byte sequence represents a sequence of true and false values. ```text For example, the byte sequence [0xff, 0x80] would be one true followed by seven false values. ``` -------------------------------- ### Checkout and Prepare Release Branch Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Checkout the release branch and modify the CMakeLists.txt to remove the SNAPSHOT version. This is the first step in preparing for a release. ```bash % git checkout branch-X.Y % edit CMakeLists.txt % (mkdir build; cd build; cmake ..) ``` -------------------------------- ### Build with Sanitizers (Meson) Source: https://github.com/apache/orc/blob/main/README.md Enable AddressSanitizer (ASAN) and UndefinedBehaviorSanitizer (UBSAN) during the build process with Meson. This is useful for detecting memory and undefined behavior issues. ```shell meson setup build -Dbuildtype=debug -Db_sanitize=address,undefined --reconfigure Meson compile -C build Meson test ``` -------------------------------- ### Build ORC Java Only Source: https://github.com/apache/orc/blob/main/site/_docs/building.md Command to build only the Java component of ORC. Requires Java and Maven to be installed. ```shell % cd java % ./mvnw package ``` -------------------------------- ### Test Java Project with Maven Wrapper Source: https://github.com/apache/orc/blob/main/AGENTS.md Run the test suite for the Java project using the Maven wrapper. ```bash cd java ./mvnw test ``` -------------------------------- ### Define ORC Table Schema Source: https://github.com/apache/orc/blob/main/site/_docs/types.md Example of defining a table schema in ORC, illustrating nested types like maps and structs. ```sql create table Foobar ( myInt int, myMap map>, myTime timestamp ); ``` -------------------------------- ### Set ORC include directories Source: https://github.com/apache/orc/blob/main/c++/src/CMakeLists.txt Configures the include directories for the ORC library, specifying interface, public, and private include paths for both installation and build configurations. ```cmake target_include_directories (orc INTERFACE $ PUBLIC $ $ $ $ PRIVATE ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR} ) ``` -------------------------------- ### Build OS-Specific Docker Image with JDK 21 Source: https://github.com/apache/orc/blob/main/docker/README.md Build a Docker image for a specific operating system and JDK version (JDK 21 in this case). Navigate to the OS-specific directory first. ```bash cd docker/$os FOR jdk21: docker build -t "orc-$os-jdk21" --build-arg jdk=21 . ``` -------------------------------- ### Publish Website from Target Directory Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Commit and push the website changes from the 'target' directory (which tracks the 'asf-site' branch) to publish the updated site. ```bash % cd target % git commit -am "Publish site for X.Y.Z" % git push origin asf-site ``` -------------------------------- ### Get ORC File Metadata Source: https://github.com/apache/orc/blob/main/site/_docs/spark-ddl.md Use the `orc-tools meta` command to retrieve metadata information about an ORC file. Replace `` with the actual path to your ORC file. ```bash % orc-tools meta ``` -------------------------------- ### Test C++ Project Source: https://github.com/apache/orc/blob/main/AGENTS.md Execute the C++ test suite after building the project in the build directory. ```bash cd build make test-out ``` -------------------------------- ### Copy Site Files from Docker Container Source: https://github.com/apache/orc/blob/main/site/README.md Copy the generated documentation site files from the running Docker container to the local 'target' directory. ```bash docker cp orc-container:/home/orc/site/target . ``` -------------------------------- ### Bloom Filter Hashing Algorithm Steps Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Outlines the algorithm for computing k hashcodes and setting bit positions in a Bloom filter, starting from a 64-bit base hash code. ```plaintext 1. Get 64 bit base hash code from Murmur3 or Thomas Wang's hash algorithm. 2. Split the above hashcode into two 32-bit hashcodes (say hash1 and hash2). 3. k'th hashcode is obtained by (where k > 0): * combinedHash = hash1 + (k * hash2) 4. If combinedHash is negative flip all the bits: * combinedHash = ~combinedHash 5. Bit set position is obtained by performing modulo with m: * position = combinedHash % m 6. Set the position in bit set. The LSB 6 bits identifies the long index within bitset and bit position within the long uses little endian order. * bitset[position >>> 6] |= (1L << position); ``` -------------------------------- ### Create ORC Reader Source: https://github.com/apache/orc/blob/main/site/_docs/core-cpp.md Initialize an ORC reader by providing the input stream and reader options. OrcFile.hh must be included. ```cpp std::unique_ptr inStream = readLocalFile("my-file.orc"); ReaderOptions options; std::unique_ptr reader = createReader(inStream, options); ``` -------------------------------- ### Run Full Data Scan Source: https://github.com/apache/orc/blob/main/java/bench/README.md This command initiates a scan of all the generated data. It utilizes the ORC benchmarks core uber JAR. ```bash % java -jar core/target/orc-benchmarks-core-*-uber.jar scan data ``` -------------------------------- ### Mapper for ORC Files with Struct Schema Source: https://github.com/apache/orc/blob/main/site/_docs/mapred.md Implement a custom Mapper to process ORC files with a 'struct' schema. This example extracts the string field as the key and the integer field as the value. ```java public class MyMapper implements Mapper { // Input should be: struct public void map(NullWritable key, OrcStruct value, OutputCollector output, Reporter reporter) throws IOException { output.collect((Text) value.getFieldValue(0), (IntWritable) value.getFieldValue(1)); } public void configure(JobConf conf) { } public void close() { } } ``` -------------------------------- ### Base 128 Varint Encoding Examples Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Illustrates how unsigned integers are serialized using Base 128 Varint, where the most significant bit indicates continuation and the remaining bits store data. This format is used for small numbers to reduce data size. ```text Unsigned Original | Serialized :---------------- | :--------- 0 | 0x00 1 | 0x01 127 | 0x7f 128 | 0x80, 0x01 129 | 0x81, 0x01 16,383 | 0xff, 0x7f 16,384 | 0x80, 0x80, 0x01 16,385 | 0x81, 0x80, 0x01 ``` -------------------------------- ### CMakeLists.txt for ORC Integration Source: https://github.com/apache/orc/blob/main/conan/all/test_package/CMakeLists.txt This CMake script configures a C++ project to use the Apache ORC library. Ensure ORC is installed and discoverable by CMake's find_package command. It links the ORC library to the executable and sets the C++ standard to C++17. ```cmake cmake_minimum_required(VERSION 3.25.0) project(test_package LANGUAGES CXX) find_package(orc REQUIRED CONFIG) add_executable(${PROJECT_NAME} test_package.cpp) target_link_libraries(${PROJECT_NAME} PRIVATE orc::orc) target_compile_features(${PROJECT_NAME} PRIVATE cxx_std_17) ``` -------------------------------- ### Deploy Artifacts with Maven Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Deploy the release artifacts to Maven Central using the Apache Maven wrapper. Ensure your environment is set up for Apache releases. ```bash % cd java % ./mvnw -Papache-release clean deploy ``` -------------------------------- ### RLEv2 Short Repeat Encoding Example Source: https://github.com/apache/orc/blob/main/site/specification/ORCv2.md Illustrates the RLEv2 Short Repeat encoding for short sequences of repeating values. The header includes bits for encoding type, width of the repeating value (1-8 bytes), and repeat count (3-10). The value itself is zigzag encoded if signed and stored in big-endian format. ```text The unsigned sequence of [10000, 10000, 10000, 10000, 10000] would be serialized with short repeat encoding (0), a width of 2 bytes (1), and repeat count of 5 (2) as [0x0a, 0x27, 0x10]. ``` -------------------------------- ### Send GPG Public Key to Server Source: https://github.com/apache/orc/blob/main/site/develop/index.md Publishes your GPG public key to a public keyserver. Replace '' with the actual fingerprint obtained from 'gpg --list-secret-keys'. ```bash gpg --send-key ``` -------------------------------- ### Run All Docker Tests Source: https://github.com/apache/orc/blob/main/AGENTS.md Execute all test suites using the Docker scripts located in the docker/ directory. This is a recommended step before submitting changes. ```bash cd docker && ./run-all.sh local main ``` -------------------------------- ### Test ORC build with vcpkg Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Test the ORC package build with vcpkg to ensure all changes are correct before committing. ```bash % ./vcpkg build orc ``` -------------------------------- ### Create ORC package with Conan Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Create a new ORC package of a specified version using the 'conan create' command within the recipe directory. ```bash % conan create . --version=X.Y.Z ``` -------------------------------- ### Run Multiple Java Test Files with Maven Source: https://github.com/apache/orc/blob/main/site/develop/index.md Execute multiple Java test files matching a pattern in the 'core' module using Maven. ```bash ./mvnw test -pl core -Dtest='Test*Reader*' ``` -------------------------------- ### Create ORC Table with Zstandard Compression and Encryption Source: https://github.com/apache/orc/blob/main/site/_docs/spark-config.md Use this SQL statement to create an ORC table with specific compression, encryption, and masking options. Ensure the key provider path is correctly configured. ```sql CREATE TABLE encrypted ( ssn STRING, email STRING, name STRING ) USING ORC OPTIONS ( hadoop.security.key.provider.path "kms://http@localhost:9600/kms", orc.key.provider "hadoop", orc.encrypt "pii:ssn,email", orc.mask "nullify:ssn;sha256:email" ) ``` -------------------------------- ### Generate Source Tarball and Checksums Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Download the release tag, extract, repackage, and generate checksums and GPG signatures for the release artifacts. This step creates the distributable files. ```bash % wget https://github.com/apache/orc/archive/release-X.Y.Zrc0.tar.gz % tar xzf release-X.Y.Zrc0.tar.gz % mv orc-release-X.Y.Zrc0 orc-X.Y.Z % tar czf orc-X.Y.Z.tar.gz orc-X.Y.Z % mkdir orc-X.Y.Z-rc0 % mv orc-X.Y.Z.tar.gz orc-X.Y.Z-rc0 % cd orc-X.Y.Z-rc0 % shasum -a 256 orc-X.Y.Z.tar.gz > orc-X.Y.Z.tar.gz.sha256 % gpg --detach-sig --armor orc-X.Y.Z.tar.gz ``` -------------------------------- ### Run Individual Java Test File with Maven Source: https://github.com/apache/orc/blob/main/site/develop/index.md Execute a specific Java test file within the 'core' module using Maven. ```bash ./mvnw test -pl core -Dtest=TestRecordReaderImpl ``` -------------------------------- ### Configure JobConf for ORC InputFormat Source: https://github.com/apache/orc/blob/main/site/_docs/mapreduce.md Set the 'mapreduce.job.inputformat.class' to 'org.apache.orc.mapreduce.OrcInputFormat' and specify the input directory using 'mapreduce.input.fileinputformat.inputdir'. ```properties mapreduce.job.inputformat.class = org.apache.orc.mapreduce.OrcInputFormat mapreduce.input.fileinputformat.inputdir = your input directory ``` -------------------------------- ### Run Spark Benchmark Source: https://github.com/apache/orc/blob/main/java/bench/README.md Execute the Spark benchmark. This command requires the Spark ORC benchmarks uber JAR and the ORC_VERSION environment variable to be set. ```bash % java -jar spark/target/orc-benchmarks-spark-${ORC_VERSION}.jar spark data ``` -------------------------------- ### Run ORC Tools Jar Source: https://github.com/apache/orc/blob/main/site/_docs/java-tools.md Basic command structure for executing ORC tools. Replace X.Y.Z with the actual version number. ```shell java -jar orc-tools-X.Y.Z-uber.jar ``` -------------------------------- ### Clone vcpkg repository Source: https://github.com/apache/orc/blob/main/site/develop/make-release.md Clone the vcpkg repository to your local machine to begin the process of adding ORC. ```bash % git clone git@github.com:microsoft/vcpkg.git ```