### Substrait-Spark Justfile Targets

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

Lists the available targets in the `justfile` for managing the Substrait-Spark example applications. These targets simplify building, running Spark, and executing specific examples.

```bash
just
Available recipes:
    buildapp    # Builds the application into a JAR file
    consume arg # Consumes the Substrait plan file passed as the argument
    dataset     # Runs a Spark dataset api query and produces a Substrait plan
    spark       # Starts a simple Spark cluster locally in docker
    sql         # Runs a Spark SQL api query and produces a Substrait plan
```

--------------------------------

### Build and Run Substrait-Spark Examples

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This section provides bash commands for building the application JAR, setting up directories, copying datafiles, and running the Spark cluster using Docker Compose. It also shows how to submit the application to the Spark master.

```bash
just buildapp

# or

./gradlew build
mkdir -p ./_data  && chmod g+w ./_data
mkdir -p ./_apps

cp ./app/build/libs/app.jar ./_apps
cp ./app/src/main/resources/*.csv ./_data
```

```bash
just spark
```

```bash
docker exec -it subtrait-spark-spark-1 bash
/opt/spark/bin/spark-submit --master spark://subtrait-spark-spark-1:7077  --driver-memory 1G --executor-memory 1G /opt/spark-apps/app.jar <args>
```

--------------------------------

### Native Compilation with Gradle

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Builds the Isthmus-CLI as a native executable using Gradle. Requires GraalVM to be installed and its location set in the GRAALVM_HOME environment variable.

```bash
./gradlew nativeCompile
```

--------------------------------

### View Textual Representation of Substrait Plan (Example)

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This snippet shows a sample textual representation of a Substrait plan, generated from a Spark query. It details the relations, functions, and data sources involved in the query execution. This format is useful for understanding the plan's structure and can be generated using the `SubstraitStringify` utility.

```text
<Substrait Plan>
Root ::  ImmutableSort [colour, count]
<Substrait Relation>
+- Sort:: FieldRef#/I64/StructField{offset=1}  ASC_NULLS_FIRST
   +- Project:: [Str, I64, Str, I64]
      +- Aggregate:: FieldRef#/Str/StructField{offset=0}
         +- Project:: [Str, Str, Str, Str]
            +- Join:: INNER <ScalarFn>equal:any_any
                : arg0 = FieldRef#/Str/StructField{offset=0}
                : arg1 = FieldRef#/Str/StructField{offset=2}
               +- Project:: [Str, Str, Str, Str, Str, Str, Str, Str, Str]
                  +- Filter:: <ScalarFn>is_not_null:any
                      : arg0 = FieldRef#/Str/StructField{offset=0}
                     +- LocalFiles::
                         :  file:///opt/spark-data/vehicles_subset_2023.csv len=1547 partition=0 start=0
               +- Project:: [Str, Str, Str, Str, Str, Str, Str, Str, Str]
                  +- Filter:: <ScalarFn>and:bool
                      : arg0 = <ScalarFn>and:bool
                         : arg0 = <ScalarFn>is_not_null:any
                            : arg0 = FieldRef#/Str/StructField{offset=5}
                         : arg1 = <ScalarFn>equal:any_any
                            : arg0 = FieldRef#/Str/StructField{offset=5}
                            : arg1 = <StrLiteral P>
                      : arg1 = <ScalarFn>is_not_null:any
                         : arg0 = FieldRef#/Str/StructField{offset=1}
                     +- LocalFiles::
                         :  file:///opt/spark-data/tests_subset_2023.csv len=1491 partition=0 start=0
```

--------------------------------

### Execute Isthmus Tool with SQL Query

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This shell command shows an example of how to use the downloaded Isthmus binary package. It executes a SQL query against a defined table schema, demonstrating the tool's capability to process Substrait plans.

```shell
$ ./isthmus-macOS-1.0.0  -c "CREATE TABLE Persons ( firstName VARCHAR, lastName VARCHAR, zip INT )" "SELECT lastName, FROM Persons WHERE zip = 90210"
```

--------------------------------

### Display Isthmus-CLI Version

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Displays the installed version of the Isthmus-CLI. This command is executed from the compiled output directory.

```bash
$ ./isthmus-cli/build/native/nativeCompile/isthmus --version
```

--------------------------------

### Using Built-in Substrait Extension Functions in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Demonstrates accessing and utilizing Substrait extension functions from the default catalog for arithmetic, comparison, and string operations. It covers function URNs and how to build expressions using SubstraitBuilder. This example requires the substrait-java library.

```java
import io.substrait.dsl.SubstraitBuilder;
import io.substrait.extension.DefaultExtensionCatalog;
import io.substrait.extension.SimpleExtension;
import io.substrait.expression.Expression;
import io.substrait.relation.Rel;
import io.substrait.type.TypeCreator;
import java.util.Arrays;

SubstraitBuilder b = new SubstraitBuilder(
    DefaultExtensionCatalog.DEFAULT_COLLECTION
);
TypeCreator R = TypeCreator.REQUIRED;

// Available function URNs in default catalog
// DefaultExtensionCatalog.FUNCTIONS_ARITHMETIC
// DefaultExtensionCatalog.FUNCTIONS_BOOLEAN
// DefaultExtensionCatalog.FUNCTIONS_COMPARISON
// DefaultExtensionCatalog.FUNCTIONS_STRING
// DefaultExtensionCatalog.FUNCTIONS_DATETIME
// DefaultExtensionCatalog.FUNCTIONS_ROUNDING

Rel scan = b.namedScan(
    Arrays.asList("products"),
    Arrays.asList("product_id", "name", "price", "discount", "category"),
    Arrays.asList(R.I64, R.STRING, R.FP64, R.FP64, R.STRING)
);

// Using arithmetic functions: final_price = price * (1 - discount)
Rel projected = b.project(
    input -> Arrays.asList(
        b.fieldReference(input, 0),  // product_id
        b.fieldReference(input, 1),  // name
        // Calculate: price * (1.0 - discount)
        b.scalarFn(
            DefaultExtensionCatalog.FUNCTIONS_ARITHMETIC,
            "multiply:fp64_fp64",
            R.FP64,
            b.fieldReference(input, 2),  // price
            b.scalarFn(
                DefaultExtensionCatalog.FUNCTIONS_ARITHMETIC,
                "subtract:fp64_fp64",
                R.FP64,
                b.fp64(1.0),
                b.fieldReference(input, 3)  // discount
            )
        ),
        // String uppercase: UPPER(category)
        b.scalarFn(
            DefaultExtensionCatalog.FUNCTIONS_STRING,
            "upper:str",
            R.STRING,
            b.fieldReference(input, 4)
        )
    ),
    scan
);

// Filter with string matching: name LIKE '%widget%'
Rel filtered = b.filter(
    input -> b.scalarFn(
        DefaultExtensionCatalog.FUNCTIONS_STRING,
        "like:str_str",
        R.BOOLEAN,
        b.fieldReference(input, 1),  // name
        b.str("%widget%")
    ),
    projected
);

Plan plan = b.plan(b.root(filtered));

```

--------------------------------

### Example Isthmus Output Structure

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This JSON snippet represents a partial output from the Isthmus tool, illustrating the structure of a Substrait plan. It includes details about extension URIs, extensions, and relations, showcasing how the tool serializes query plans.

```json
{
  "extensionUris": [{
    "extensionUriAnchor": 1,
    "uri": "/functions_comparison.yaml"
  }],
  "extensions": [{
    "extensionFunction": {
      "extensionUriReference": 1,
      "functionAnchor": 0,
      "name": "equal:any_any"
    }
  }],
  "relations": [{
    "root": {
...
}
```

--------------------------------

### Maintenance Branch Creation (Shell)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Example commands for creating maintenance branches based on existing release tags. This is used for applying fixes to specific versions.

```shell
# Current version on production: 2.5.7, version planned to use for maintenance: 2.1.7
$ git checkout -b 2.1.x v2.1.7 # for fix
$ git checkout -b 2.x v2.1.7 # for maintenance
```

--------------------------------

### Pre-release Branch Creation (Shell)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Example command for creating a pre-release branch for experimenting with breaking changes. Commits to this branch are considered pre-releases.

```shell
# Current version on production: 2.5.7, version planned to use for maintenance: 2.1.7
$ git checkout -b new_feature_lots_changes beta
```

--------------------------------

### Example Field Access Depth Map Computation

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/OuterReferenceResolver.md

This example shows the resulting fieldAccessDepthMap for the given SQL query, illustrating how the depth is assigned to correlated fields based on their nesting level within subqueries. This map is crucial for resolving outer references correctly.

```text
fieldAccessDepthMap:
    l_partkey=$corr0.p_partkey:    $corr0.p_partkey -> 1
    ps_partkey = $corr0.p_partkey: $corr0.p_partkey -> 2
    ps_suppkey = $corr2.l_suppkey: $corr2.l_suppkey -> 1
```

--------------------------------

### Display Isthmus-CLI Help Information

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Shows the usage instructions and available options for the Isthmus-CLI. This includes options for input SQL, create statements, expressions, output format, and identifier casing.

```bash
$ ./isthmus-cli/build/native/nativeCompile/isthmus --help
```

--------------------------------

### Load and Convert Substrait Plan to Spark Plan in Java

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

Loads a binary protobuf Substrait plan from a file, converts it into a Substrait Plan object, and then transforms it into a Spark LogicalPlan for execution. Dependencies include standard Java IO and the Substrait-Java library. Inputs are a file path to the plan and a SparkSession.

```java
byte[] buffer = Files.readAllBytes(Paths.get("spark_sql_substrait.plan"));
io.substrait.proto.Plan proto = io.substrait.proto.Plan.parseFrom(buffer);

ProtoPlanConverter protoToPlan = new ProtoPlanConverter();
Plan plan = protoToPlan.from(proto);

ToLogicalPlan substraitConverter = new ToLogicalPlan(spark);
LogicalPlan sparkPlan = substraitConverter.convert(plan);
```

--------------------------------

### Create and Serialize Substrait Plan (Java)

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This Java code demonstrates how to obtain an optimized Spark LogicalPlan, convert it into a Substrait Plan object using `ToSubstraitRel`, and then serialize this Substrait Plan into a binary protobuf file. Error handling for file writing is included.

```java
LogicalPlan optimised = result.queryExecution().optimizedPlan();
System.out.println(optimised);

ToSubstraitRel toSubstrait = new ToSubstraitRel();
io.substrait.plan.Plan plan = toSubstrait.convert(optimised);

PlanProtoConverter planToProto = new PlanProtoConverter();
byte[] buffer = planToProto.toProto(plan).toByteArray();
try {
  Files.write(Paths.get(ROOT_DIR, "spark_sql_substrait.plan"), buffer);
} catch (IOException e){
  e.printStackTrace();
}
```

--------------------------------

### GPG Key Generation and Management (Shell)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Commands for generating, listing, and publishing GPG keys used for signing release artifacts. Ensure keyserver is accessible and the correct key ID is used.

```shell
gpg --full-gen-key
gpg --list-keys
gpg --keyserver keyserver.ubuntu.com --send-keys C8BA52B3
gpg --export-secret-keys C8BA52B3 | base64
```

--------------------------------

### SQL Query with Nested Subqueries

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/OuterReferenceResolver.md

This SQL query demonstrates a common pattern involving nested subqueries and correlations. It is used as an example to illustrate how the OuterReferenceResolver computes field access depths in the Substrait Java project.

```sql
SELECT p_partkey, p_size
FROM part p
WHERE p_size <
      (SELECT sum(l_orderkey)
       FROM lineitem l
       WHERE l.l_partkey = p.p_partkey
         AND l_linenumber >
             (SELECT count(*) cnt
              FROM partsupp ps
              WHERE ps.ps_partkey = p.p_partkey
                AND   PS.ps_suppkey = l.l_suppkey))
```

--------------------------------

### Download and Verify Substrait-Java POM and JAR

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This section shows how to download the core-1.0.0.pom and core-1.0.0.jar files for Substrait-Java, along with their associated PGP signatures. It also includes commands to verify the integrity and authenticity of these downloaded artifacts using GPG.

```shell
wget https://s01.oss.sonatype.org/content/repositories/staging/io/substrait/core/1.0.0/core-1.0.0.pom
wget https://s01.oss.sonatype.org/content/repositories/staging/io/substrait/core/1.0.0/core-1.0.0.pom.asc
gpg --verify /Users/substrait/core-1.0.0.pom.asc
```

```shell
wget https://s01.oss.sonatype.org/content/repositories/release/io/substrait/core/1.0.0/core-1.0.0.jar
wget https://s01.oss.sonatype.org/content/repositories/release/io/substrait/core/1.0.0/core-1.0.0.jar.asc
gpg --verify /Users/substrait/core-1.0.0.jar.asc
```

```shell
wget https://s01.oss.sonatype.org/content/repositories/release/io/substrait/core/1.0.0/core-1.0.0.pom
wget https://s01.oss.sonatype.org/content/repositories/release/io/substrait/core/1.0.0/core-1.0.0.pom.asc
gpg --verify /Users/substrait/core-1.0.0.pom.asc
```

--------------------------------

### Creating Substrait Window Functions in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Illustrates how to define and use window functions in Substrait for analytical queries using Java. This includes setting up partitioning, ordering, and specifying bounds for functions like ROW_NUMBER. Requires the substrait-java library.

```java
import io.substrait.dsl.SubstraitBuilder;
import io.substrait.extension.DefaultExtensionCatalog;
import io.substrait.expression.Expression;
import io.substrait.expression.WindowBound;
import io.substrait.relation.Rel;
import io.substrait.type.TypeCreator;
import java.util.Arrays;

SubstraitBuilder b = new SubstraitBuilder(
    DefaultExtensionCatalog.DEFAULT_COLLECTION
);
TypeCreator R = TypeCreator.REQUIRED;

// Query: SELECT employee_id, dept_id, salary,
//               ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) as rank
//        FROM employees
Rel scan = b.namedScan(
    Arrays.asList("employees"),
    Arrays.asList("employee_id", "dept_id", "salary"),
    Arrays.asList(R.I64, R.I64, R.FP64)
);

// Create window function with partitioning and ordering
Expression.WindowFunctionInvocation rowNumber =
    Expression.WindowFunctionInvocation.builder()
        .declaration(
            DefaultExtensionCatalog.DEFAULT_COLLECTION.getWindowFunction(
                SimpleExtension.FunctionAnchor.of(
                    DefaultExtensionCatalog.FUNCTIONS_WINDOW,
                    "row_number:"
                )
            )
        )
        .outputType(R.I64)
        .partitionBy(Arrays.asList(
            b.fieldReference(scan, 1)  // PARTITION BY dept_id
        ))
        .sorts(Arrays.asList(
            Expression.SortField.builder()
                .expr(b.fieldReference(scan, 2))  // ORDER BY salary
                .direction(Expression.SortDirection.DESC)
                .build()
        ))
        .boundsType(Expression.WindowBoundsType.ROWS)
        .lowerBound(WindowBound.Unbounded.INSTANCE)
        .upperBound(WindowBound.CurrentRow.INSTANCE)
        .build();

// Project with window function
Rel projected = b.project(
    input -> Arrays.asList(
        b.fieldReference(input, 0),  // employee_id
        b.fieldReference(input, 1),  // dept_id
        b.fieldReference(input, 2),  // salary
        rowNumber                     // row_number
    ),
    scan
);

Plan plan = b.plan(b.root(projected));

```

--------------------------------

### GPG Key Import (Shell)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Command to import the public GPG key from a keyserver, necessary for verifying signed artifacts.

```shell
$ gpg --keyserver keyserver.ubuntu.com --recv-keys C8BA52B3
gpg: key F4A1E652C8BA52B3: public key "Substrait Java Artifacts (Java artifact signing key for the Substrait project) <security@substrait.io>" imported
gpg: Total processed: 1
gpg:               imported: 1
```

--------------------------------

### Configure Maven for Substrait-Java Staging Repository

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This XML snippet demonstrates how to configure your Maven `pom.xml` file to include the Sonatype staging repository. This allows Maven to resolve and download Substrait-Java artifacts from the staging environment.

```xml
<repositories>
        <repository>
            <id>sonatype-staging</id>
            <url>https://s01.oss.sonatype.org/content/groups/staging</url>
        </repository>
    </repositories>

    <dependencies>
      ...
      <dependency>
          <groupId>io.substrait</groupId>
          <artifactId>core</artifactId>
          <version>1.0.0</version>
      </dependency>
      <dependency>
          <groupId>io.substrait</groupId>
          <artifactId>isthmus</artifactId>
          <version>1.0.0</version>
      </dependency>
      ...
    </dependencies>
```

--------------------------------

### Configure Gradle for Substrait-Java Staging Repository

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This Groovy snippet shows how to configure your Gradle build script to use the Sonatype staging repository. It adds the necessary repository URL and dependency declarations for `core` and `isthmus` artifacts.

```groovy
repositories {
    maven {
        url = uri("https://s01.oss.sonatype.org/content/groups/staging")
    }
}
dependencies {
    ...
    implementation 'io.substrait:core:1.0.0'
    implementation 'io.substrait:isthmus:1.0.0'
    ...
}
```

--------------------------------

### Serialize Substrait Plan to Protobuf in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Demonstrates how to serialize a Substrait Plan object into protobuf format and deserialize it back. This is essential for storing plans or transmitting them over a network. It involves using `PlanProtoConverter` and `ProtoPlanConverter`.

```java
import io.substrait.plan.Plan;
import io.substrait.plan.PlanProtoConverter;
import io.substrait.plan.ProtoPlanConverter;
import io.substrait.extension.DefaultExtensionCatalog;
import java.nio.file.Files;
import java.nio.file.Paths;

// Plan to protobuf bytes
Plan plan = /* constructed plan */;
PlanProtoConverter toProto = new PlanProtoConverter();
io.substrait.proto.Plan protoPlan = toProto.toProto(plan);
byte[] bytes = protoPlan.toByteArray();

// Save to file
Files.write(Paths.get("query.plan"), bytes);

// Load from file
byte[] loadedBytes = Files.readAllBytes(Paths.get("query.plan"));
io.substrait.proto.Plan loadedProtoPlan =
    io.substrait.proto.Plan.parseFrom(loadedBytes);

// Protobuf to Plan object
ProtoPlanConverter fromProto = new ProtoPlanConverter(
    DefaultExtensionCatalog.DEFAULT_COLLECTION
);
Plan reconstructedPlan = fromProto.from(loadedProtoPlan);

```

--------------------------------

### Execute Substrait Query Plan with Python

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This Python code snippet demonstrates how to read a Substrait plan from a file, decode it, and execute it using a provided table provider. It assumes the plan is base64 encoded.

```python
with open(PLAN_FILE, "wb") as file:
   planbytes = file.read()
   reader = substrait.run_query(
      base64.b64decode(planbytes),
         table_provider=self.simple_provider,
     )
   result = reader.read_all()
```

--------------------------------

### Convert Substrait Plan to Spark Logical Plan in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Loads and executes Substrait plans in Apache Spark using `ToLogicalPlan`. This process involves creating a SparkSession, registering necessary tables, loading a Substrait plan from a protobuf file, converting it to a Substrait `Plan` object, and then transforming it into a Spark `LogicalPlan`. The resulting Spark plan can then be executed within Spark.

```java
import io.substrait.spark.logical.ToLogicalPlan;
import io.substrait.plan.Plan;
import io.substrait.plan.ProtoPlanConverter;
import io.substrait.extension.DefaultExtensionCatalog;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan;
import java.nio.file.Files;
import java.nio.file.Paths;

// Create Spark session
SparkSession spark = SparkSession.builder()
    .appName("SubstraitToSpark")
    .master("local[*]")
    .getOrCreate();

// Register tables that the plan references
spark.read()
    .option("header", "true")
    .csv("employees.csv")
    .createOrReplaceTempView("employees");

spark.read()
    .option("header", "true")
    .csv("departments.csv")
    .createOrReplaceTempView("departments");

// Load Substrait plan from file
byte[] bytes = Files.readAllBytes(Paths.get("spark_query.plan"));
io.substrait.proto.Plan protoPlan = 
    io.substrait.proto.Plan.parseFrom(bytes);

// Convert protobuf to Plan object
ProtoPlanConverter protoConverter = new ProtoPlanConverter(
    DefaultExtensionCatalog.DEFAULT_COLLECTION
);
Plan plan = protoConverter.from(protoPlan);

// Convert Substrait plan to Spark logical plan
ToLogicalPlan converter = new ToLogicalPlan(spark);
LogicalPlan sparkPlan = converter.convert(plan);

// Execute query in Spark
Dataset<Row> result = Dataset.ofRows(spark, sparkPlan);
result.show();

// Save results
result.write()
    .option("header", "true")
    .csv("query_results");
```

--------------------------------

### Build Aggregation Queries with GROUP BY in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Illustrates constructing aggregation queries using SubstraitBuilder in Java. It defines grouping keys and multiple aggregate functions such as COUNT, SUM, and AVG, along with ordering. This requires the io.substrait library for plan construction.

```java
import io.substrait.dsl.SubstraitBuilder;
import io.substrait.extension.DefaultExtensionCatalog;
import io.substrait.plan.Plan;
import io.substrait.relation.Rel;
import io.substrait.type.TypeCreator;
import java.util.Arrays;

SubstraitBuilder b = new SubstraitBuilder(
    DefaultExtensionCatalog.DEFAULT_COLLECTION
);
TypeCreator R = TypeCreator.REQUIRED;

// Query: SELECT dept, COUNT(*), SUM(salary), AVG(age)
//        FROM employees GROUP BY dept ORDER BY dept
Rel scan = b.namedScan(
    Arrays.asList("employees"),
    Arrays.asList("id", "name", "dept", "salary", "age"),
    Arrays.asList(R.I64, R.STRING, R.STRING, R.FP64, R.I32)
);

Rel agg = b.aggregate(
    input -> b.grouping(input, 2),  // GROUP BY dept (index 2)
    input -> Arrays.asList(
        b.countStar(),              // COUNT(*)
        b.sum(input, 3),            // SUM(salary)
        b.avg(input, 4)             // AVG(age)
    ),
    scan
);

Rel sorted = b.sort(
    input -> b.sortFields(input, 0),  // ORDER BY dept
    agg
);

Plan plan = b.plan(b.root(sorted));

```

--------------------------------

### GPG Key Verification (Shell)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Steps to download and verify the GPG signature of release artifacts using the public key. This ensures the integrity and authenticity of the downloaded JAR and POM files.

```shell
wget https://s01.oss.sonatype.org/content/repositories/staging/io/substrait/core/1.0.0/core-1.0.0.jar
wget https://s01.oss.sonatype.org/content/repositories/staging/io/substrait/core/1.0.0/core-1.0.0.jar.asc
gpg --verify /Users/substrait/core-1.0.0.jar.asc
```

--------------------------------

### Build Substrait Plans with Fluent Java DSL

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Demonstrates using SubstraitBuilder to construct a query plan programmatically with a fluent DSL. It chains methods for scanning, filtering, projecting, and sorting data, utilizing lambda expressions for defining operations. Requires io.substrait libraries.

```java
import io.substrait.dsl.SubstraitBuilder;
import io.substrait.extension.DefaultExtensionCatalog;
import io.substrait.extension.SimpleExtension;
import io.substrait.plan.Plan;
import io.substrait.relation.Rel;
import io.substrait.type.TypeCreator;
import java.util.Arrays;

// Initialize builder with extension catalog
SimpleExtension.ExtensionCollection extensions =
    DefaultExtensionCatalog.DEFAULT_COLLECTION;
SubstraitBuilder b = new SubstraitBuilder(extensions);
TypeCreator R = TypeCreator.REQUIRED;
TypeCreator N = TypeCreator.NULLABLE;

// Build query: SELECT name, age FROM users WHERE age > 18 ORDER BY age DESC
Rel scan = b.namedScan(
    Arrays.asList("users"),
    Arrays.asList("id", "name", "age", "email"),
    Arrays.asList(R.I64, R.STRING, R.I32, R.STRING)
);

Rel filtered = b.filter(
    input -> b.scalarFn(
        DefaultExtensionCatalog.FUNCTIONS_COMPARISON,
        "gt:i32_i32",
        R.BOOLEAN,
        b.fieldReference(input, 2),
        b.i32(18)
    ),
    scan
);

Rel projected = b.project(
    input -> Arrays.asList(
        b.fieldReference(input, 1),  // name
        b.fieldReference(input, 2)   // age
    ),
    filtered
);

Rel sorted = b.sort(
    input -> b.sortField(input, 1, Rel.SortDirection.DESC),
    projected
);

Plan plan = b.plan(b.root(sorted));

```

--------------------------------

### Configure Gradle for Substrait-Java Maven Central

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This Groovy snippet demonstrates how to add Substrait-Java dependencies (`core` and `isthmus` version 1.0.0) to your Gradle build file for use with Maven Central.

```groovy
dependencies {
    ...
    implementation 'io.substrait:core:1.0.0'
    implementation 'io.substrait:isthmus:1.0.0'
    ...
}
```

--------------------------------

### Branch Configuration for Semantic Release (JSON)

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

Configuration defining the branch patterns used by semantic release for determining release types (e.g., pre-release, maintenance, stable).

```json
{
  "branches": [
    { "name": "+([0-9])?(.{+([0-9]),x}).x" },
    { "name": "main" },
    { "name": "next" },
    { "name": "next-major" },
    { "name": "beta", "prerelease": true },
    { "name": "alpha", "prerelease": true }
  ]
}
```

--------------------------------

### Convert SQL to Substrait Plan in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Converts a SQL query into a Substrait Plan using Apache Calcite for parsing and optimization. This involves defining a database schema using SQL `CREATE TABLE` statements, processing them into a catalog, and then using the `SqlToSubstrait` utility to perform the conversion. The resulting plan can then be serialized.

```java
import io.substrait.isthmus.SqlToSubstrait;
import io.substrait.isthmus.sql.SubstraitCreateStatementParser;
import io.substrait.plan.Plan;
import io.substrait.plan.PlanProtoConverter;
import org.apache.calcite.prepare.Prepare;

// Define database schema using CREATE TABLE statements
String schema = """
    CREATE TABLE lineitem (
        l_orderkey BIGINT,
        l_partkey BIGINT,
        l_suppkey BIGINT,
        l_quantity DECIMAL(15,2),
        l_extendedprice DECIMAL(15,2),
        l_discount DECIMAL(15,2),
        l_shipdate DATE
    );

    CREATE TABLE orders (
        o_orderkey BIGINT,
        o_custkey BIGINT,
        o_orderstatus VARCHAR(1),
        o_totalprice DECIMAL(15,2),
        o_orderdate DATE
    );
    """;

// Create catalog from DDL
Prepare.CatalogReader catalog =
    SubstraitCreateStatementParser.processCreateStatementsToCatalog(schema);

// Convert SQL query to Substrait plan
SqlToSubstrait sqlToSubstrait = new SqlToSubstrait();
String sql = """
    SELECT l.l_orderkey, SUM(l.l_quantity) as total_qty,
           SUM(l.l_extendedprice) as total_price
    FROM lineitem l
    INNER JOIN orders o ON l.l_orderkey = o.o_orderkey
    WHERE o.o_orderdate >= DATE '2023-01-01'
      AND l.l_shipdate >= DATE '2023-01-01'
    GROUP BY l.l_orderkey
    HAVING SUM(l.l_quantity) > 100
    ORDER BY total_price DESC
    LIMIT 10
    """;

Plan plan = sqlToSubstrait.convert(sql, catalog);

// Serialize to protobuf
PlanProtoConverter planToProto = new PlanProtoConverter();
byte[] bytes = planToProto.toProto(plan).toByteArray();

```

--------------------------------

### Spark and Substrait Inner Join Plan Comparison

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

Illustrates the structural similarity between Spark's LogicalPlan representation of an inner join and its equivalent in the Substrait format. This comparison highlights how the Substrait-Java library translates Spark operations. The Substrait format here is a custom string representation for readability.

```text
         +- Join Inner, (vehicle_id#2 = vehicle_id#10)
            :- Project [vehicle_id#2, colour#5]
            :  +- Filter isnotnull(vehicle_id#2)

            +- Project [vehicle_id#10]
               +- Filter ((isnotnull(test_result#14) AND (test_result#14 = P)) AND isnotnull(vehicle_id#10))
```

```substrait
            +- Join:: INNER <ScalarFn>equal:any_any
                : arg0 = FieldRef#/Str/StructField{offset=0}
                : arg1 = FieldRef#/Str/StructField{offset=2}
               +- Project:: [Str, Str, Str, Str, Str, Str, Str, Str, Str]
                  +- Filter:: <ScalarFn>is_not_null:any
                      : arg0 = FieldRef#/Str/StructField{offset=0}

               +- Project:: [Str, Str, Str, Str, Str, Str, Str, Str, Str]
                  +- Filter:: <ScalarFn>and:bool
                      : arg0 = <ScalarFn>and:bool
                         : arg0 = <ScalarFn>is_not_null:any
                            : arg0 = FieldRef#/Str/StructField{offset=5}
                         : arg1 = <ScalarFn>equal:any_any
                            : arg0 = FieldRef#/Str/StructField{offset=5}
                            : arg1 = <StrLiteral P>
                      : arg1 = <ScalarFn>is_not_null:any
                         : arg0 = FieldRef#/Str/StructField{offset=1}
```

--------------------------------

### Display Spark Logical and Optimized Query Plans

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This Java code retrieves and prints the logical and optimized query plans generated by Spark for a given DataFrame. It relies on Spark's queryExecution API. The input is a Spark DataFrame, and the output is the printed query plans to the console.

```java
      LogicalPlan logical = result.logicalPlan();
      System.out.println(logical);

      LogicalPlan optimised = result.queryExecution().optimizedPlan();
      System.out.println(optimised);
```

--------------------------------

### Convert Spark Logical Plan to Substrait in Java

Source: https://context7.com/substrait-io/substrait-java/llms.txt

Converts Apache Spark logical plans to the Substrait format using `ToSubstraitRel`. This process involves creating a SparkSession, loading data, executing a Spark SQL query, obtaining the optimized Spark logical plan, and then converting it to a Substrait `Plan` object which is serialized to protobuf. This enables cross-engine query execution.

```java
import io.substrait.spark.logical.ToSubstraitRel;
import io.substrait.plan.Plan;
import io.substrait.plan.PlanProtoConverter;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan;
import java.nio.file.Files;
import java.nio.file.Paths;

// Create Spark session
SparkSession spark = SparkSession.builder()
    .appName("SparkToSubstrait")
    .master("local[*]")
    .getOrCreate();

// Load data and create views
spark.read()
    .option("header", "true")
    .csv("employees.csv")
    .createOrReplaceTempView("employees");

spark.read()
    .option("header", "true")
    .csv("departments.csv")
    .createOrReplaceTempView("departments");

// Execute SQL query in Spark
String sql = """
    SELECT d.dept_name, COUNT(*) as emp_count, AVG(e.salary) as avg_salary
    FROM employees e
    INNER JOIN departments d ON e.dept_id = d.dept_id
    WHERE e.hire_date >= '2020-01-01'
    GROUP BY d.dept_name
    HAVING COUNT(*) > 5
    ORDER BY avg_salary DESC
    """;

Dataset<Row> dataset = spark.sql(sql);
dataset.show();

// Get optimized Spark logical plan
LogicalPlan sparkPlan = dataset.queryExecution().optimizedPlan();

// Convert to Substrait
ToSubstraitRel converter = new ToSubstraitRel();
Plan substraitPlan = converter.convert(sparkPlan);

// Serialize to protobuf
PlanProtoConverter planToProto = new PlanProtoConverter();
byte[] bytes = planToProto.toProto(substraitPlan).toByteArray();
Files.write(Paths.get("spark_query.plan"), bytes);

System.out.println("Substrait plan saved to spark_query.plan");
```

--------------------------------

### Add Substrait-Spark Dependency (Gradle)

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This snippet demonstrates how to include the Substrait-Spark library in your Gradle project (Groovy syntax). This requires Java 17 or higher.

```groovy
// https://mvnrepository.com/artifact/io.substrait/spark
implementation 'io.substrait:spark:0.36.0'
```

--------------------------------

### Load Datasets for PyArrow Table Provider

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This Python code loads data into PyArrow tables from Parquet files. These tables will be used by a custom table provider function to supply data to the Substrait query execution in PyArrow.

```python
test = pq.read_table(TESTS_PQ_FILE)
vehicles = pq.read_table(VEHICLES_PQ_FILE)
```

--------------------------------

### Projection using Substrait Java

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Demonstrates how to perform a projection operation using the Substrait Java CLI. It takes a SQL CREATE TABLE statement and an expression for the new column. The output is a JSON representation of the Substrait plan.

```shell
$ ./isthmus-cli/build/native/nativeCompile/isthmus -c "CREATE TABLE NATION (N_NATIONKEY BIGINT NOT NULL, N_NAME CHAR(25), N_REGIONKEY BIGINT NOT NULL, N_COMMENT VARCHAR(152))" \
    -e  "N_REGIONKEY + 10"
```

```json
{
  "extensionUris": [{
    "extensionUriAnchor": 1,
    "uri": "/functions_arithmetic.yaml"
  }],
  "extensions": [{
    "extensionFunction": {
      "extensionUriReference": 1,
      "functionAnchor": 0,
      "name": "add:i64_i64"
    }
  }],
  "referredExpr": [{
    "expression": {
      "scalarFunction": {
        "functionReference": 0,
        "args": [],
        "outputType": {
          "i64": {
            "typeVariationReference": 0,
            "nullability": "NULLABILITY_REQUIRED"
          }
        },
        "arguments": [{
          "value": {
            "selection": {
              "directReference": {
                "structField": {
                  "field": 2
                }
              },
              "rootReference": {
              }
            }
          }
        }, {
          "value": {
            "cast": {
              "type": {
                "i64": {
                  "typeVariationReference": 0,
                  "nullability": "NULLABILITY_REQUIRED"
                }
              },
              "input": {
                "literal": {
                  "i32": 10,
                  "nullable": false,
                  "typeVariationReference": 0
                }
              },
              "failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
            }
          }
        }],
        "options": []
      }
    },
    "outputNames": ["new-column"]
  }],
  "baseSchema": {
    "names": ["N_NATIONKEY", "N_NAME", "N_REGIONKEY", "N_COMMENT"],
    "struct": {
      "types": [{
        "i64": {
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_REQUIRED"
        }
      }, {
        "fixedChar": {
          "length": 25,
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_NULLABLE"
        }
      }, {
        "i64": {
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_REQUIRED"
        }
      }, {
        "varchar": {
          "length": 152,
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_NULLABLE"
        }
      }],
      "typeVariationReference": 0,
      "nullability": "NULLABILITY_REQUIRED"
    }
  },
  "expectedTypeUrls": []
}
```

--------------------------------

### SQL to Substrait Plan Conversion using isthmus-cli

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Converts a SQL CREATE TABLE statement and a SELECT query into a Substrait Plan. This process involves parsing the SQL, mapping it to Substrait's relational algebra, and generating a JSON representation of the plan. The command requires the SQL schema and query as arguments.

```bash
> $ ./isthmus-cli/build/native/nativeCompile/isthmus \
  -c "CREATE TABLE Persons ( firstName VARCHAR, lastName VARCHAR, zip INT )" \
     "SELECT lastName, firstName FROM Persons WHERE zip = 90210"

{
  "extensionUris": [{
    "extensionUriAnchor": 1,
    "uri": "/functions_comparison.yaml"
  }],
  "extensions": [{
    "extensionFunction": {
      "extensionUriReference": 1,
      "functionAnchor": 0,
      "name": "equal:any1_any1"
    }
  }],
  "relations": [{
    "root": {
      "input": {
        "project": {
          "common": {
            "emit": {
              "outputMapping": [3, 4]
            }
          },
          "input": {
            "filter": {
              "common": {
                "direct": {
                }
              },
              "input": {
                "read": {
                  "common": {
                    "direct": {
                    }
                  },
                  "baseSchema": {
                    "names": ["FIRSTNAME", "LASTNAME", "ZIP"],
                    "struct": {
                      "types": [{
                        "string": {
                          "typeVariationReference": 0,
                          "nullability": "NULLABILITY_NULLABLE"
                        }
                      }, {
                        "string": {
                          "typeVariationReference": 0,
                          "nullability": "NULLABILITY_NULLABLE"
                        }
                      }, {
                        "i32": {
                          "typeVariationReference": 0,
                          "nullability": "NULLABILITY_NULLABLE"
                        }
                      }],
                      "typeVariationReference": 0,
                      "nullability": "NULLABILITY_REQUIRED"
                    }
                  },
                  "namedTable": {
                    "names": ["PERSONS"]
                  }
                }
              },
              "condition": {
                "scalarFunction": {
                  "functionReference": 0,
                  "args": [{
                    "selection": {
                      "directReference": {
                        "structField": {
                          "field": 2
                        }
                      },
                      "rootReference": {
                      }
                    }
                  }, {
                    "literal": {
                      "i32": 90210,
                      "nullable": false,
                      "typeVariationReference": 0
                    }
                  }],
                  "outputType": {
                    "bool": {
                      "typeVariationReference": 0,
                      "nullability": "NULLABILITY_NULLABLE"
                    }
                  },
                  "arguments": []
                }
              }
            }
          },
          "expressions": [{
            "selection": {
              "directReference": {
                "structField": {
                  "field": 1
                }
              },
              "rootReference": {
              }
            }
          }, {
            "selection": {
              "directReference": {
                "structField": {
                  "field": 0
                }
              },
              "rootReference": {
              }
            }
          }]
        }
      },
      "names": ["LASTNAME", "FIRSTNAME"]
    }
  }],
  "expectedTypeUrls": []
}
```

--------------------------------

### Add Substrait-Spark Dependency (Maven)

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This snippet shows how to add the Substrait-Spark library as a Maven dependency to your project. Ensure you are using Java 17 or greater.

```xml
<!-- https://mvnrepository.com/artifact/io.substrait/spark -->
<dependency>
    <groupId>io.substrait</groupId>
    <artifactId>spark</artifactId>
    <version>0.36.0</version>
</dependency>
```

--------------------------------

### Filtering using Substrait Java

Source: https://github.com/substrait-io/substrait-java/blob/main/isthmus-cli/README.md

Illustrates how to apply a filter operation using the Substrait Java CLI. It involves a SQL CREATE TABLE statement and a comparison expression for filtering. The output is a JSON representing the Substrait plan.

```shell
$ ./isthmus-cli/build/native/nativeCompile/isthmus -c "CREATE TABLE NATION (N_NATIONKEY BIGINT NOT NULL, N_NAME CHAR(25), N_REGIONKEY BIGINT NOT NULL, N_COMMENT VARCHAR(152))" \
    -e  "N_REGIONKEY > 10"
```

```json
{
  "extensionUris": [{
    "extensionUriAnchor": 1,
    "uri": "/functions_comparison.yaml"
  }],
  "extensions": [{
    "extensionFunction": {
      "extensionUriReference": 1,
      "functionAnchor": 0,
      "name": "gt:any_any"
    }
  }],
  "referredExpr": [{
    "expression": {
      "scalarFunction": {
        "functionReference": 0,
        "args": [],
        "outputType": {
          "bool": {
            "typeVariationReference": 0,
            "nullability": "NULLABILITY_REQUIRED"
          }
        },
        "arguments": [{
          "value": {
            "selection": {
              "directReference": {
                "structField": {
                  "field": 2
                }
              },
              "rootReference": {
              }
            }
          }
        }, {
          "value": {
            "cast": {
              "type": {
                "i64": {
                  "typeVariationReference": 0,
                  "nullability": "NULLABILITY_REQUIRED"
                }
              },
              "input": {
                "literal": {
                  "i32": 10,
                  "nullable": false,
                  "typeVariationReference": 0
                }
              },
              "failureBehavior": "FAILURE_BEHAVIOR_UNSPECIFIED"
            }
          }
        }],
        "options": []
      }
    },
    "outputNames": ["new-column"]
  }],
  "baseSchema": {
    "names": ["N_NATIONKEY", "N_NAME", "N_REGIONKEY", "N_COMMENT"],
    "struct": {
      "types": [{
        "i64": {
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_REQUIRED"
        }
      }, {
        "fixedChar": {
          "length": 25,
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_NULLABLE"
        }
      }, {
        "i64": {
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_REQUIRED"
        }
      }, {
        "varchar": {
          "length": 152,
          "typeVariationReference": 0,
          "nullability": "NULLABILITY_NULLABLE"
        }
      }],
      "typeVariationReference": 0,
      "nullability": "NULLABILITY_REQUIRED"
    }
  },
  "expectedTypeUrls": []
}
```

--------------------------------

### Configure Maven for Substrait-Java Maven Central

Source: https://github.com/substrait-io/substrait-java/blob/main/ci/release/README.md

This XML snippet shows how to declare dependencies for Substrait-Java artifacts (`core` and `isthmus` with version 1.0.0) in your Maven `pom.xml` when using Maven Central as the repository.

```xml
<dependencies>
      ...
      <dependency>
          <groupId>io.substrait</groupId>
          <artifactId>core</artifactId>
          <version>1.0.0</version>
      </dependency>
      <dependency>
          <groupId>io.substrait</groupId>
          <artifactId>isthmus</artifactId>
          <version>1.0.0</version>
      </dependency>
      ...
    </dependencies>
```

--------------------------------

### Reading CSV Data with Spark

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This code snippet demonstrates how to read data from CSV files into Spark Datasets. It specifies options for the delimiter and header, and then displays the content of the datasets. This is a prerequisite for further data manipulation.

```java
      dsVehicles = spark.read().option("delimiter", ",").option("header", "true").csv(vehiclesFile);
      dsVehicles.show();

      dsTests = spark.read().option("delimiter", ",").option("header", "true").csv(testsFile);
      dsTests.show();
```

--------------------------------

### Create and Execute SQL Query in Spark

Source: https://github.com/substrait-io/substrait-java/blob/main/examples/substrait-spark/README.md

This snippet defines a standard SQL query to count vehicles by color that passed a safety test and then executes it using Spark SQL. It utilizes Java's string templating for the SQL query and Spark's sql() method. The input is a SparkSession and the output is a Spark DataFrame with the query results.

```java
      String sqlQuery = """
           SELECT vehicles.colour, count(*) as colourcount
           FROM vehicles
           INNER JOIN tests ON vehicles.vehicle_id=tests.vehicle_id
           WHERE tests.test_result = 'P'
           GROUP BY vehicles.colour
           ORDER BY count(*)
          """;
      var result = spark.sql(sqlQuery);
      result.show();
```