### Starting a Flight Server Source: https://github.com/apache/arrow-java/blob/main/docs/source/flight.rst Provides a complete example of how to build and start a Flight server in Java. It includes setting up a producer, specifying a listening location, and managing the server lifecycle. ```Java class TutorialFlightProducer implements FlightProducer { @Override // Override methods or use NoOpFlightProducer for only methods needed } Location location = Location.forGrpcInsecure("0.0.0.0", 0); try( BufferAllocator allocator = new RootAllocator(); FlightServer server = FlightServer.builder( allocator, location, new TutorialFlightProducer() ).build(); ){ server.start(); System.out.println("Server listening on port " + server.getPort()); server.awaitTermination(); } catch (Exception e) { e.printStackTrace(); } ``` -------------------------------- ### Build Gandiva Java Project Source: https://github.com/apache/arrow-java/blob/main/gandiva/README.md This command compiles and installs the Gandiva Java project using Maven. It requires specifying the directory containing the C++ artifact. ```bash cd java mvn install -Dgandiva.cpp.build.dir= ``` -------------------------------- ### Running Java Code with --add-opens via Environment Variable Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst Example of how to execute Java code with the '--add-opens' flag using an environment variable and the Maven exec plugin. This is an alternative to configuring the Surefire plugin directly. ```bash JDK_JAVA_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED" mvn exec:java -Dexec.mainClass="YourMainCode" ``` -------------------------------- ### Create VectorSchemaRoot Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Demonstrates how to create a VectorSchemaRoot with a defined schema (string and integer fields) and populate it with data. It allocates memory for vectors, sets values, and prints the content in TSV format. Requires Arrow memory allocators and vector types. ```Java import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.IntVector; import org.apache.arrow.vector.VarCharVector; import org.apache.arrow.vector.VectorSchemaRoot; import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.types.pojo.Schema; import java.nio.charset.StandardCharsets; import java.util.HashMap; import java.util.Map; import static java.util.Arrays.asList; Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/null ); Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), /*children*/null ); Schema schema = new Schema(asList(age, name), /*metadata*/ null); try( BufferAllocator allocator = new RootAllocator(); VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); IntVector ageVector = (IntVector) root.getVector("age"); VarCharVector nameVector = (VarCharVector) root.getVector("name"); ){ ageVector.allocateNew(3); ageVector.set(0, 10); ageVector.set(1, 20); ageVector.set(2, 30); nameVector.allocateNew(3); nameVector.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); nameVector.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); nameVector.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); root.setRowCount(3); System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString()); } ``` -------------------------------- ### Example: Bump Version to 19.0.1-SNAPSHOT Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Example usage of the bump version script. ```console dev/release/bump_version.sh 19.0.0-SNAPSHOT ``` -------------------------------- ### Example: Verify Release 19.0.0 RC1 Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Example usage of the verify RC script. ```console dev/release/verify_rc.sh 19.0.0 1 ``` -------------------------------- ### Create Schema Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Illustrates how to construct an Arrow Schema, which is a collection of Fields defining the structure of tabular data. This example creates a schema with two nullable fields: an int32 column named 'A' and a UTF8 string column named 'B', along with schema-level metadata. ```Java import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.types.pojo.Schema; import java.util.HashMap; import java.util.Map; import static java.util.Arrays.asList; Map metadata = new HashMap<>(); metadata.put("K1", "V1"); metadata.put("K2", "V2"); Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); Schema schema = new Schema(asList(a, b), metadata); System.out.println("Schema created: " + schema); ``` -------------------------------- ### Expected Output (Shell) Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Illustrates the expected output when reading an Arrow IPC file containing one record batch with 'age' and 'name' columns. ```shell Record batches in file: 1 VectorSchemaRoot read: age name 10 Dave 20 Peter 30 Mary ``` -------------------------------- ### Create Field with Metadata Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Demonstrates the creation of an Arrow Field, which represents a column in tabular data. This example includes setting the field name, data type (UTF8 string), nullability, and associating key-value metadata with the field. ```Java import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.FieldType; import java.util.HashMap; import java.util.Map; Map metadata = new HashMap<>(); metadata.put("A", "Id card"); metadata.put("B", "Passport"); metadata.put("C", "Visa"); Field document = new Field("document", new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata), /*children*/ null); System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata()); ``` ```shell Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa} ``` -------------------------------- ### Pull Request Title and Body Example Source: https://github.com/apache/arrow-java/blob/main/CONTRIBUTING.md Examples of how to format a pull request title and body, including referencing issues and using labels. ```markdown GH-12345: Document the pull request process Explain how to open a pull request and what the title, body, and labels should be. Closes #12345. ``` ```markdown GH-42424: Expose Netty server builder in Flight Allow direct usage of gRPC APIs for low-level control. Closes #42424. ``` -------------------------------- ### Example: Release 19.0.0 RC1 Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Example usage of the release script for a specific version and release candidate. ```console dev/release/release.sh 19.0.0 1 ``` -------------------------------- ### Define Installation Paths and Install JNI Library Source: https://github.com/apache/arrow-java/blob/main/gandiva/CMakeLists.txt This snippet sets the installation directories for the Gandiva JNI library based on the `CMAKE_INSTALL_PREFIX` and architecture-specific subdirectories. It then uses the `install` command to place the compiled shared library into these specified locations, making it available for deployment. ```CMake set(ARROW_JAVA_JNI_GANDIVA_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib/gandiva_jni/${ARROW_JAVA_JNI_ARCH_DIR}") set(ARROW_JAVA_JNI_GANDIVA_BINDIR "${CMAKE_INSTALL_PREFIX}/bin/gandiva_jni/${ARROW_JAVA_JNI_ARCH_DIR}") install(TARGETS arrow_java_jni_gandiva LIBRARY DESTINATION ${ARROW_JAVA_JNI_GANDIVA_LIBDIR} RUNTIME DESTINATION ${ARROW_JAVA_JNI_GANDIVA_BINDIR}) ``` -------------------------------- ### Substrait Query Results Example Source: https://github.com/apache/arrow-java/blob/main/docs/source/substrait.rst This is an example output format for the results of a Substrait query executed against a dataset. ```text // Results example: FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3) 0 ALGERIA 0 haggle. carefully final deposits detect slyly agai 1 ARGENTINA 1 al foxes promise slyly according to the regular accounts. bold requests alon ``` -------------------------------- ### Maven Configuration for Apache Arrow Dependencies Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst This XML snippet provides an example of how to configure a Maven project's `pom.xml` file to include Apache Arrow Java modules. It demonstrates setting the Arrow version and declaring dependencies for `arrow-vector` and `arrow-memory-netty`. This setup ensures that the necessary Arrow libraries are downloaded from Maven Central. ```xml 4.0.0 org.example demo 1.0-SNAPSHOT 9.0.0 org.apache.arrow arrow-vector ${arrow.version} org.apache.arrow ``` -------------------------------- ### Java Module Compatibility with JDK Internals Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst This snippet demonstrates how to expose necessary Java Development Kit (JDK) internals when running Java modules, particularly for Apache Arrow. It addresses potential 'module does not open' errors by using the `--add-opens` JVM argument. The examples show direct command-line usage and indirect usage via environment variables. ```shell # Directly on the command line $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... # Indirectly via environment variables $ env JDK_JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -------------------------------- ### Create IntVector Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Creates a ValueVector for 32-bit integers, demonstrating how to allocate space, set values (including nulls), and manage the vector's state. This is a primitive fixed-size data type. ```Java import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.IntVector; try( BufferAllocator allocator = new RootAllocator(); IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator); ){ intVector.allocateNew(3); intVector.set(0,1); intVector.setNull(1); intVector.set(2,2); intVector.setValueCount(3); System.out.println("Vector created in memory: " + intVector); } ``` ```shell Vector created in memory: [1, null, 2] ``` -------------------------------- ### Verify Local Maven Repository Installation Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst This command verifies that the Arrow Java packages have been successfully installed into the local Maven repository (`~/.m2/repository`). It displays the directory structure, allowing you to confirm the presence of the installed JAR and POM files for various Arrow components. ```shell tree ~/.m2/repository/org/apache/arrow ``` -------------------------------- ### Read Arrow IPC File (Java) Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Reads data from a random-access Arrow IPC file. It iterates through record blocks, loads each batch, and prints the content of the VectorSchemaRoot. Requires Arrow Java libraries. ```Java import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowFileReader; import org.apache.arrow.vector.ipc.message.ArrowBlock; import org.apache.arrow.vector.VectorSchemaRoot; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; try( BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE); FileInputStream fileInputStream = new FileInputStream(new File("random_access_file.arrow")); ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), allocator); ){ System.out.println("Record batches in file: " + reader.getRecordBlocks().size()); for (ArrowBlock arrowBlock : reader.getRecordBlocks()) { reader.loadRecordBatch(arrowBlock); VectorSchemaRoot root = reader.getVectorSchemaRoot(); System.out.println("VectorSchemaRoot read: \n" + root.contentToTSVString()); } } catch (IOException e) { e.printStackTrace(); } ``` -------------------------------- ### Build Documentation with Sphinx Source: https://github.com/apache/arrow-java/blob/main/docs/README.md Instructions to build the documentation using Sphinx. This involves navigating to the 'docs' directory, installing the required Python packages from 'requirements.txt', and then running the 'make html' command to generate the HTML documentation. ```bash cd docs pip install -r requirements.txt make html ``` -------------------------------- ### Mailing List Announcement Example Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Example content for an announcement email to the Apache mailing lists regarding a new release. ```APIDOC To: announce@apache.org CC: dev@arrow.apache.org, user@arrow.apache.org Subject: [ANNOUNCE] Apache Arrow Java 18.2.0 released The Apache Arrow community is pleased to announce the Arrow Java 18.2.0 release. The release is available now from our website: https://arrow.apache.org/install/ and https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-java-18.2.0/ Read about what's new in the release at: https://arrow.apache.org/blog/2025/02/19/arrow-java-18.2.0/ Read the full changelog: https://github.com/apache/arrow-java/commits/v18.2.0 What is Apache Arrow? ------------------------------- Apache Arrow is a universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides low-overhead streaming and batch messaging, zero-copy interprocess communication (IPC), and vectorized in-memory analytics libraries. Please report any feedback to the mailing lists: https://lists.apache.org/list.html?dev@arrow.apache.org Regards, The Apache Arrow community. ``` -------------------------------- ### Defining Installation Directories Source: https://github.com/apache/arrow-java/blob/main/dataset/CMakeLists.txt Sets variables for the installation directories of the JNI dataset library and binaries, based on the installation prefix and architecture. ```cmake set(ARROW_JAVA_JNI_DATASET_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib/arrow_dataset_jni/${ARROW_JAVA_JNI_ARCH_DIR}") set(ARROW_JAVA_JNI_DATASET_BINDIR "${CMAKE_INSTALL_PREFIX}/bin/arrow_dataset_jni/${ARROW_JAVA_JNI_ARCH_DIR}") ``` -------------------------------- ### Installing Arrow Java Artifacts to Local Maven Repository Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst This series of shell commands utilizes `mvn install:install-file` to manually install downloaded Apache Arrow Java POM and JAR files into the local Maven repository. Each command specifies the file, group ID, artifact ID, version, and packaging type. ```shell $ mvn install:install-file -Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-java-root -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-format -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=9.0.0.dev501 -Dpackaging=jar $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=pom $ mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=pom ``` -------------------------------- ### Build Default Java Modules with Archery Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst Builds the default Java modules using Archery, a tool for managing Arrow builds. Requires Archery installation. ```bash cd arrow/java export JAVA_HOME= java --version archery docker run java ``` -------------------------------- ### Arrow Flight Java Example Usage Source: https://github.com/apache/arrow-java/blob/main/flight/flight-core/README.md Example usage of the Arrow Flight Java package, demonstrating how to integrate and utilize its features for data transfer. ```Java See the [Arrow Cookbook](https://arrow.apache.org/cookbook/java/flight.html). ``` -------------------------------- ### Maven Surefire Plugin Configuration for Testing Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst Configuration for the Maven Surefire plugin to include the '--add-opens' flag, which is necessary for running unit tests in Java environments that require explicit module access. ```xml org.apache.maven.plugins maven-surefire-plugin 3.0.0-M6 --add-opens=java.base/java.nio=ALL-UNNAMED ``` -------------------------------- ### Install Arrow Java JARs and POMs Locally Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst This snippet shows how to install individual Arrow Java library files (JARs and POMs) into the local Maven repository. This is useful for building from source or using specific versions not yet published to a remote repository. It requires the `mvn install:install-file` command with the correct file path, groupId, artifactId, version, and packaging type. ```shell mvn install:install-file -Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-core -Dversion=9.0.0.dev501 -Dpackaging=jar mvn install:install-file -Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=9.0.0.dev501 -Dpackaging=jar mvn install:install-file -Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=arrow-flight -Dversion=9.0.0.dev501 -Dpackaging=pom mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.pom" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=pom mvn install:install-file -Dfile="$(pwd)/flight-core-9.0.0.dev501.jar" -DgroupId=org.apache.arrow -DartifactId=flight-core -Dversion=9.0.0.dev501 -Dpackaging=jar ``` -------------------------------- ### Build Default Java Modules with Docker Compose Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst Builds the default Java modules using Docker Compose. Requires Docker and Docker Compose to be installed. ```bash cd arrow/java export JAVA_HOME= java --version docker compose run java ``` -------------------------------- ### Applying Java Code Style Fixes with Spotless Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/development.rst This Maven command automatically formats the Java source code to comply with the defined style guide. ```bash $ mvn spotless:apply ``` -------------------------------- ### Buffer Allocation Example Source: https://github.com/apache/arrow-java/blob/main/docs/source/memory.rst Demonstrates how to allocate a buffer using BufferAllocator and ArrowBuf in Java. It shows the creation of a RootAllocator, allocating a buffer, printing its details, and closing the buffer and allocator. ```Java import org.apache.arrow.memory.ArrowBuf; import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; try(BufferAllocator bufferAllocator = new RootAllocator(8 * 1024)){ ArrowBuf arrowBuf = bufferAllocator.buffer(4 * 1024); System.out.println(arrowBuf); arrowBuf.close(); } ``` -------------------------------- ### Install Apache Arrow (Java) with C-Data Module Source: https://github.com/apache/arrow-java/blob/main/c/README.md Command to install the Apache Arrow Java package with the C-data module enabled. This is executed from the project root directory. ```maven cd java mvn -Parrow-c-data install ``` -------------------------------- ### Installing the Shared Library Source: https://github.com/apache/arrow-java/blob/main/dataset/CMakeLists.txt Installs the built shared library (`arrow_dataset_jni`) to the specified library and runtime directories. ```cmake install(TARGETS arrow_java_jni_dataset LIBRARY DESTINATION ${ARROW_JAVA_JNI_DATASET_LIBDIR} RUNTIME DESTINATION ${ARROW_JAVA_JNI_DATASET_BINDIR}) ``` -------------------------------- ### Install JNI CData Library and Binaries Source: https://github.com/apache/arrow-java/blob/main/c/CMakeLists.txt Installs the compiled JNI CData shared library to the specified library directory and any runtime binaries to the binary directory, organized by architecture. ```cmake install(TARGETS arrow_java_jni_cdata LIBRARY DESTINATION ${ARROW_JAVA_JNI_C_LIBDIR} RUNTIME DESTINATION ${ARROW_JAVA_JNI_C_BINDIR}) ``` -------------------------------- ### Creating Dataset with User-Defined Schema Source: https://github.com/apache/arrow-java/blob/main/docs/source/dataset.rst This Java example demonstrates how to create an Arrow Dataset by providing a user-defined schema. It uses BufferAllocator, NativeMemoryPool, FileSystemDatasetFactory, and the finish(Schema schema) method. ```Java Schema schema = createUserSchema() Dataset dataset = factory.finish(schema); ``` -------------------------------- ### JDBC Connection URI Example Source: https://github.com/apache/arrow-java/blob/main/docs/source/flight_sql_jdbc_driver.rst Demonstrates how to establish a JDBC connection to an Arrow Flight SQL service using a connection URI. It specifies the host, port, and optional parameters like database and encryption. ```SQL jdbc:arrow-flight-sql://localhost:12345/?useEncryption=0&database=mydb ``` -------------------------------- ### IntVector Access using get() Source: https://github.com/apache/arrow-java/blob/main/docs/source/vector.rst Demonstrates accessing a value from an IntVector at a specific index using the get(index) method. This method is available for primitive vectors. The example retrieves the value at index 5. ```Java int value = vector.get(5); // value == 25 ``` -------------------------------- ### Apache Arrow Java Maven Dependencies Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst Example Maven dependencies for Apache Arrow Java modules, including arrow-vector and arrow-memory-netty. This snippet shows direct dependency declaration. ```xml org.apache.arrow arrow-vector ${arrow.version} org.apache.arrow arrow-memory-netty ${arrow.version} ``` -------------------------------- ### Build and Run Tests Source: https://github.com/apache/arrow-java/blob/main/c/README.md Instructions for building and running tests for the Apache Arrow Java project using Maven. ```maven mvn test ``` -------------------------------- ### Arrow IPC File Write Output Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst The expected output after successfully writing a VectorSchemaRoot to an Arrow IPC file, indicating the number of record batches and rows that were processed. ```shell Record batches written: 1. Number of rows written: 3 ``` -------------------------------- ### Create VarCharVector Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Creates a ValueVector for UTF-8 encoded strings, showing how to allocate space for variable-length data, set string values by converting them to bytes, and finalize the vector. This is a variable-size data type. ```Java import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.VarCharVector; try( BufferAllocator allocator = new RootAllocator(); VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator); ){ varCharVector.allocateNew(3); varCharVector.set(0, "one".getBytes()); varCharVector.set(1, "two".getBytes()); varCharVector.set(2, "three".getBytes()); varCharVector.setValueCount(3); System.out.println("Vector created in memory: " + varCharVector); } ``` ```shell Vector created in memory: [one, two, three] ``` -------------------------------- ### Clone Arrow Repository Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst Clones the Apache Arrow repository and initializes submodules, a prerequisite for building. ```bash git clone https://github.com/apache/arrow.git cd arrow git submodule update --init --recursive ``` -------------------------------- ### Running Maven Build with JNI Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/development.rst This command demonstrates how to build the Arrow Java project with JNI support, specifying build directories and enabling C data. ```bash $ cd arrow/java $ mvn \ -Darrow.cpp.build.dir=../java-dist/lib -Parrow-jni \ -Darrow.c.jni.dist.dir=../java-dist/lib -Parrow-c-data \ clean install ``` -------------------------------- ### Java Module Compatibility for Arrow Dataset Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst This snippet shows the JVM arguments needed when using the Apache Arrow dataset module. It specifies `--add-opens` to expose `java.base/java.nio` to both `org.apache.arrow.dataset` and `org.apache.arrow.memory.core`. This is crucial for preventing `InaccessibleObjectException` during operations that require access to these internal Java NIO components. ```shell # Directly on the command line $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... # Indirectly via environment variables $ env JDK_JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.dataset,org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -------------------------------- ### Write VectorSchemaRoot to Arrow IPC File Source: https://github.com/apache/arrow-java/blob/main/docs/source/quickstartguide.rst Demonstrates writing a VectorSchemaRoot to an Arrow IPC file for random access. It uses an ArrowFileWriter to write the data batches and includes error handling for IO operations. The output confirms the number of record batches and rows written. ```Java import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.IntVector; import org.apache.arrow.vector.VarCharVector; import org.apache.arrow.vector.VectorSchemaRoot; import org.apache.arrow.vector.ipc.ArrowFileWriter; import org.apache.arrow.vector.types.pojo.ArrowType; import org.apache.arrow.vector.types.pojo.Field; import org.apache.arrow.vector.types.pojo.FieldType; import org.apache.arrow.vector.types.pojo.Schema; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.util.HashMap; import java.util.Map; import static java.util.Arrays.asList; Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null); Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null); Schema schema = new Schema(asList(age, name)); try( BufferAllocator allocator = new RootAllocator(); VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator); IntVector ageVector = (IntVector) root.getVector("age"); VarCharVector nameVector = (VarCharVector) root.getVector("name"); ){ ageVector.allocateNew(3); ageVector.set(0, 10); ageVector.set(1, 20); ageVector.set(2, 30); nameVector.allocateNew(3); nameVector.set(0, "Dave".getBytes(StandardCharsets.UTF_8)); nameVector.set(1, "Peter".getBytes(StandardCharsets.UTF_8)); nameVector.set(2, "Mary".getBytes(StandardCharsets.UTF_8)); root.setRowCount(3); File file = new File("random_access_file.arrow"); try ( FileOutputStream fileOutputStream = new FileOutputStream(file); ArrowFileWriter writer = new ArrowFileWriter(root, /*provider*/ null, fileOutputStream.getChannel()); ) { writer.start(); writer.writeBatch(); writer.end(); System.out.println("Record batches written: " + writer.getRecordBlocks().size() + ". Number of rows written: " + root.getRowCount()); } catch (IOException e) { e.printStackTrace(); } } ``` -------------------------------- ### Resource Management with try-with-resources Source: https://github.com/apache/arrow-java/blob/main/docs/source/dataset.rst Illustrates the proper way to manage native resources for FileSystemDataset using the try-with-resources statement. ```Java String uri = "file:/opt/example.parquet"; ScanOptions options = new ScanOptions(/*batchSize*/ 32768); try ( BufferAllocator allocator = new RootAllocator(); DatasetFactory factory = new FileSystemDatasetFactory( allocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri); Dataset dataset = factory.finish(); Scanner scanner = dataset.newScan(options) ) { // Use scanner here } ``` -------------------------------- ### Downloading Arrow Java Artifacts Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst This shell script demonstrates how to download the necessary POM and JAR files for various Apache Arrow Java components from a specified release URL. It organizes these files into a dedicated directory. ```shell $ mkdir nightly-packaging-2022-07-30-0-github-java-jars $ cd nightly-packaging-2022-07-30-0-github-java-jars $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom $ wget https://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar $ tree ``` -------------------------------- ### Java Module Compatibility for Flight Core Source: https://github.com/apache/arrow-java/blob/main/docs/source/install.rst This snippet illustrates the JVM arguments required when using Apache Arrow's flight-core module. It includes both `--add-reads` to allow flight-core to access unnamed modules and `--add-opens` to expose specific `java.base/java.nio` packages. This is necessary to prevent `IllegalAccessError` related to internal class access. ```shell # Directly on the command line $ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ... # Indirectly via environment variables $ env JDK_JAVA_OPTIONS="--add-reads=org.apache.arrow.flight.core=ALL-UNNAMED --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ... ``` -------------------------------- ### Prepare Release Environment Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Sets up the necessary environment variables and tools for release management. This includes obtaining a GitHub personal access token (GH_TOKEN) and a GPG key ID (GPG_KEY_ID), and installing the GitHub CLI (`gh`). ```console $ cp dev/release/.env{.example,} $ chmod go-r dev/release/.env $ editor dev/release/.env ``` ```console # Install gh command # See https://github.com/cli/cli#installation ``` ```console # Prepare PGP key # See https://infra.apache.org/release-signing.html#genegrate ``` ```console # Update KEYS file in SVN repository $ svn co https://dist.apache.org/repos/dist/dev/arrow $ cd arrow $ head KEYS (This shows how to update KEYS) $ svn ci KEYS ``` -------------------------------- ### Install JNI ORC Shared Library Source: https://github.com/apache/arrow-java/blob/main/adapter/orc/CMakeLists.txt This command specifies the installation rules for the `arrow_java_jni_orc` shared library. It ensures that the compiled library is placed into the previously defined library and binary destination directories during the installation phase, making it available for use. ```CMake install(TARGETS arrow_java_jni_orc LIBRARY DESTINATION ${ARROW_JAVA_JNI_ORC_LIBDIR} RUNTIME DESTINATION ${ARROW_JAVA_JNI_ORC_BINDIR}) ``` -------------------------------- ### Build Default Java Modules with Maven Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/building.rst Builds the default Java modules using Maven. Requires setting JAVA_HOME and ensuring Java is in the PATH. ```bash cd arrow/java export JAVA_HOME= java --version mvn clean install ``` -------------------------------- ### Executing Performance Benchmarks with Conbench Source: https://github.com/apache/arrow-java/blob/main/docs/source/developers/development.rst This command shows how to run Java microbenchmarks using Conbench, specifying iterations, commit hash, Java home, source directory, and a benchmark filter. ```console $ cd benchmarks $ conbench java-micro \ --iterations=1 \ --commit=e90472e35b40f58b17d408438bb8de1641bfe6ef \ --java-home= \ --src= \ --benchmark-filter=org.apache.arrow.adapter.AvroAdapterBenchmarks.testAvroToArrow ``` -------------------------------- ### Snappy Compression Library License and Homepage Source: https://github.com/apache/arrow-java/blob/main/flight/flight-integration-tests/src/shade/NOTICE.txt Information about Snappy, a compression library from Google Inc. Includes license and homepage. ```APIDOC Snappy: LICENSE: license/LICENSE.snappy.txt (New BSD License) HOMEPAGE: https://github.com/google/snappy ``` -------------------------------- ### Define Installation Paths for JNI ORC Library Source: https://github.com/apache/arrow-java/blob/main/adapter/orc/CMakeLists.txt These commands define the installation directories for the JNI ORC shared library, separating library files from runtime executables. The paths are constructed using CMake installation prefixes and architecture-specific subdirectories to ensure proper deployment. ```CMake set(ARROW_JAVA_JNI_ORC_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib/arrow_orc_jni/${ARROW_JAVA_JNI_ARCH_DIR}") set(ARROW_JAVA_JNI_ORC_BINDIR "${CMAKE_INSTALL_PREFIX}/bin/arrow_orc_jni/${ARROW_JAVA_JNI_ARCH_DIR}") ``` -------------------------------- ### Installation Paths for JNI CData Source: https://github.com/apache/arrow-java/blob/main/c/CMakeLists.txt Defines the installation directories for the JNI CData shared library and its associated binaries, based on the CMAKE_INSTALL_PREFIX and architecture. ```cmake set(ARROW_JAVA_JNI_C_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib/arrow_cdata_jni/${ARROW_JAVA_JNI_ARCH_DIR}") set(ARROW_JAVA_JNI_C_BINDIR "${CMAKE_INSTALL_PREFIX}/bin/arrow_cdata_jni/${ARROW_JAVA_JNI_ARCH_DIR}") ``` -------------------------------- ### Dictionary Encoded Vectors Example Source: https://github.com/apache/arrow-java/blob/main/docs/source/ipc.rst Provides an example of creating and using dictionary encoded vectors within the Arrow streaming format. It includes setting up a dictionary provider and dictionary vectors. ```Java // create provider DictionaryProvider.MapDictionaryProvider provider = new DictionaryProvider.MapDictionaryProvider(); try ( final VarCharVector dictVector = new VarCharVector("dict", allocator); final VarCharVector vector = new VarCharVector("vector", allocator); ) { // create dictionary vector dictVector.allocateNewSafe(); dictVector.setSafe(0, "aa".getBytes()); dictVector.setSafe(1, "bb".getBytes()); dictVector.setSafe(2, "cc".getBytes()); dictVector.setValueCount(3); // create dictionary Dictionary dictionary = ``` -------------------------------- ### Protobuf Dependency Information and License Source: https://github.com/apache/arrow-java/blob/main/flight/flight-sql-jdbc-driver/src/shade/LICENSE.txt Details for the Protobuf library used in the project, including version, copyright, home page, and license. Includes the full BSD license text. ```text This binary artifact contains Protobuf 4.30.2. Copyright: Copyright 2008 Google Inc. All rights reserved. Home page: https://protobuf.dev/ License: https://github.com/protocolbuffers/protobuf/blob/v4.30.1/LICENSE (BSD) License text: | Copyright 2008 Google Inc. All rights reserved. | | Redistribution and use in source and binary forms, with or without | modification, are permitted provided that the following conditions are | met: | | * Redistributions of source code must retain the above copyright | notice, this list of conditions and the following disclaimer. | * Redistributions in binary form must reproduce the above | copyright notice, this list of conditions and the following disclaimer | in the documentation and/or other materials provided with the | distribution. | * Neither the name of Google Inc. nor the names of its | contributors may be used to endorse or promote products derived from | this software without specific prior written permission. | | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | | Code generated by the Protocol Buffer compiler is owned by the owner | of the input file used when generating it. This code is not | standalone and requires a support library to be linked with it. This | support library is itself covered by the above license. ``` -------------------------------- ### Accessing BigIntVector Values via get API Source: https://github.com/apache/arrow-java/blob/main/docs/source/vector.rst Shows how to access values stored in a BigIntVector using the direct 'get' API. It iterates through the vector, checks for nulls using isNull, and prints the integer values. ```Java // access via get API for (int i = 0; i < vector.getValueCount(); i++) { if (!vector.isNull(i)) { System.out.println(vector.get(i)); } } ``` -------------------------------- ### Prepare Release Candidate (RC) Source: https://github.com/apache/arrow-java/blob/main/dev/release/README.md Generates a release candidate (RC) for the project. This requires being an Apache Arrow committer, having a PGP key configured, and Maven set up for Apache releases. The script should be run on a clone of the main apache/arrow-java repository. ```bash # Configure Maven for Apache releases (settings-security.xml, settings.xml) # Test Maven configuration: # export GPG_TTY=$(tty) mvn clean install -Papache-release ``` ```console $ git clone git@github.com:apache/arrow-java.git $ cd arrow-java $ dev/release/release_rc.sh ${RC} # Send a vote email to dev@arrow.apache.org ``` ```console # Example to release RC1: $ dev/release/release_rc.sh 1 ```