### Build Apache Accumulo from Source with Maven

Source: https://github.com/apache/accumulo/blob/main/README.md

This command uses Maven to compile, test, and package the Apache Accumulo source code into a binary tar.gz archive. The resulting file will be located in 'assemble/target/accumulo-<version>-bin.tar.gz'. Users can append '-DskipTests' to the command to bypass test execution during the build process.

```Maven
mvn package
```

--------------------------------

### Apache Accumulo Standalone Cluster Configuration Properties

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Defines the essential properties for setting up and managing an Apache Accumulo standalone cluster. This includes general cluster parameters, paths for client and server configurations, ZooKeeper quorum details, and specific properties for managing user principals and credentials, accommodating both Kerberos-enabled and unsecure installations.

```APIDOC
Configuration Properties for Standalone Clusters:
  accumulo.it.cluster.type:
    Required: Yes
    Description: The type of cluster is being defined.
    Valid Options: MINI, STANDALONE
  accumulo.it.cluster.clientconf:
    Required: Yes
    Description: Path to accumulo-client.properties.
  accumulo.it.cluster.standalone.admin.principal:
    Required: Yes
    Description: Standalone cluster principal (user) with all System permissions.
  accumulo.it.cluster.standalone.admin.password:
    Required: Yes (only valid w/o Kerberos)
    Description: Password for the principal.
  accumulo.it.cluster.standalone.admin.keytab:
    Required: Yes (only valid w/ Kerberos)
    Description: Keytab for the principal.
  accumulo.it.cluster.standalone.zookeepers:
    Required: Yes
    Description: ZooKeeper quorum used by the standalone cluster.
  accumulo.it.cluster.standalone.instance.name:
    Required: Yes
    Description: Accumulo instance name for the cluster.
  accumulo.it.cluster.standalone.hadoop.conf:
    Required: Yes
    Description: Hadoop configuration directory.
  accumulo.it.cluster.standalone.home:
    Required: Yes
    Description: Accumulo installation directory on cluster.
  accumulo.it.cluster.standalone.client.conf:
    Required: Yes
    Description: Accumulo conf directory on client.
  accumulo.it.cluster.standalone.server.conf:
    Required: Yes
    Description: Accumulo conf directory on server.
  accumulo.it.cluster.standalone.client.cmd.prefix:
    Required: No (Optional)
    Description: Prefix that will be added to Accumulo client commands.
  accumulo.it.cluster.standalone.server.cmd.prefix:
    Required: No (Optional)
    Description: Prefix that will be added to Accumulo service commands.

User Credential Properties (for Kerberos or unsecure installations):
  Note: Each property is suffixed with an integer ($x) to group keytab/password with username.
  accumulo.it.cluster.standalone.users.$x:
    Required: Yes (when Kerberos enabled or for unsecure)
    Description: The principal name for user $x.
  accumulo.it.cluster.standalone.passwords.$x:
    Required: Yes (only valid w/o Kerberos)
    Description: The password for user $x.
  accumulo.it.cluster.standalone.keytabs.$x:
    Required: Yes (only valid w/ Kerberos)
    Description: The path to the keytab for user $x.

Setting Properties:
  Command Line: -D<property_name>=<value> (e.g., -Daccumulo.it.cluster.standalone.principal=root)
  Properties File: Reference a file using "accumulo.it.cluster.properties".
  Precedence: Command line properties override file properties.
```

--------------------------------

### Generate and export Accumulo test data using shell commands

Source: https://github.com/apache/accumulo/blob/main/test/src/main/resources/v2_import_test/README.md

This sequence of Accumulo shell commands demonstrates how to create a table ('tableA'), add splits for distribution, insert sample data, compact the table to write data to files, offline the table, and finally export it to a specified HDFS directory. This process prepares a dataset for potential import into another Accumulo instance.

```Accumulo Shell
createtable tableA
addsplits -t tableA 2 4 6
insert -t tableA 1 1
insert -t tableA 1 cf cq 1
insert -t tableA 2 cf cq 2
insert -t tableA 3 cf cq 3
insert -t tableA 4 cf cq 4
insert -t tableA 5 cf cq 5
insert -t tableA 6 cf cq 6
insert -t tableA 7 cf cq 7

compact -w -t tableA

to see the current tablet files: 

scan -t accumulo.metadata -c file -np
offline -t tableA

exporttable -t tableA /accumulo/export_test
```

--------------------------------

### Run SpotBugs Static Code Analysis with Maven

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Perform a thorough static code analysis using SpotBugs, including a security plugin. This command runs the `verify` phase with the `sec-bugs` profile enabled and skips regular tests, focusing solely on code quality and potential security vulnerabilities.

```bash
mvn clean verify -Psec-bugs -DskipTests
```

--------------------------------

### Run a Specific Apache Accumulo Integration Test

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Execute a single integration test, such as `WriteAheadLogIT`, using Maven. This command targets a specific test class while skipping SpotBugs analysis, useful for focused debugging or verification. Integration tests require significant memory and disk space (3-4GB RAM, 10GB disk).

```bash
mvn clean verify -Dit.test=WriteAheadLogIT -Dtest=foo -Dspotbugs.skip
```

--------------------------------

### Run Standalone Apache Accumulo Cluster Integration Tests

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Execute integration tests against a pre-configured standalone Accumulo cluster. This command specifies a properties file for the cluster and targets tests categorized as `StandaloneCapableCluster`. Before running, ensure the `accumulo-test` jar is copied to the cluster's lib folder, as not all ITs are suitable for standalone execution.

```bash
mvn clean verify -Dtest=foo -Daccumulo.it.properties=/home/user/my_cluster.properties -Dfailsafe.groups=StandaloneCapableCluster -Dspotbugs.skip
```

--------------------------------

### Run Apache Accumulo Unit Tests with Maven

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Execute all unit tests for Apache Accumulo by invoking the Maven `package` phase. This ensures all modules are built and resolvable, preventing issues with stale artifacts. The `maven-surefire-plugin` is used by default to run JUnit tests.

```bash
mvn clean package
```

--------------------------------

### Run MiniAccumuloCluster Integration Tests

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Execute integration tests that utilize MiniAccumuloCluster (MAC), a multi-process Accumulo implementation managed via Java APIs. These tests can use local filesystem or MiniDFSCluster and are run by default during the `integration-test` lifecycle phase, though they incur extra startup/shutdown time.

```bash
mvn clean verify -Dspotbugs.skip
```

--------------------------------

### Create HDFS directory for Accumulo export

Source: https://github.com/apache/accumulo/blob/main/test/src/main/resources/v2_import_test/README.md

This command creates a new directory in the HDFS '/accumulo' namespace, which will serve as the destination for the exported Accumulo table data. This step is a prerequisite before initiating the table export.

```HDFS Shell
hadoop fs -mkdir /accumulo/export_test
```

--------------------------------

### Run SunnyDay Apache Accumulo Integration Tests

Source: https://github.com/apache/accumulo/blob/main/TESTING.md

Execute the `SunnyDay` category of integration tests, which represent a minimal set designed to verify basic Accumulo functionality. These tests are typically run before submitting patches or bug fixes to quickly ensure no core functions were broken by changes.

```bash
mvn clean verify -Psunny
```

--------------------------------

### View contents of Accumulo export distcp file

Source: https://github.com/apache/accumulo/blob/main/test/src/main/resources/v2_import_test/README.md

This command uses the Hadoop HDFS shell to display the contents of the 'distcp.txt' file, which is generated during the Accumulo table export. This file lists the paths to the tablet files and the export metadata zip file that were part of the exported dataset.

```HDFS Shell
hadoop fs -cat /accumulo/export_test/distcp.txt
```

--------------------------------

### Mutation Version 2 Serialization Format

Source: https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/data/doc-files/mutation-serialization.html

Describes the byte-level layout for the Version 2 serialization format of Apache Accumulo's `Mutation` data. This format introduces a control byte, uses variable-length encoding for integers and longs, and details the main data layout and the 'data' block layout for each entry, including fields like row ID, data length, column families, qualifiers, visibilities, timestamps, and values.

```APIDOC
Mutation Version 2 Data Serialization Layout:
  byte 0: control byte (top bit = 1 for version 2; bottom bit = values present flag)
  next integer: length of row ID
  next _n_ bytes: row ID
  next integer: data length
  next _n_ bytes: data (see "Version 2 Data Block Layout")
  next integer: number of entries
  next integer: number of values (if values present flag is set)
  next _n_ sets of integers and byte arrays: _n_ value lengths and value data bytes (if values present flag is set)
  (Note: variable length encoding for integers and longs)

Version 2 "data" Block Layout for Each Entry:
  first long and byte array: column family length and bytes
  next long and byte array: column qualifier length and bytes
  next long and byte array: column visibility length and bytes
  next boolean: has timestamp flag
  next long: timestamp (only present if timestamp flag is set)
  next boolean: deleted flag
  next long: value length (if negative, value bytes are same as already-read value (-length - 1))
  next _n_ bytes: value bytes (if value length is non-negative)
```

--------------------------------

### Mutation Version 1 Serialization Format

Source: https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/data/doc-files/mutation-serialization.html

Describes the byte-level layout for the Version 1 serialization format of Apache Accumulo's `Mutation` data. This format includes a main data layout and a specific 'data' block layout for each entry, detailing fields like row ID, data length, column families, qualifiers, visibilities, timestamps, and values.

```APIDOC
Mutation Version 1 Data Serialization Layout:
  bytes 0-3: 4-byte integer (length of row ID)
  next _n_ bytes: row ID
  next integer: data length
  next _n_ bytes: data (see "Version 1 Data Block Layout")
  next integer: number of entries
  next boolean: values present flag
  next integer: number of values (if values present flag is set)
  next _n_ sets of integers and byte arrays: _n_ value lengths and value data bytes (if values present flag is set)

Version 1 "data" Block Layout for Each Entry:
  first integer and byte array: column family length and bytes
  next integer and byte array: column qualifier length and bytes
  next integer and byte array: column visibility length and bytes
  next boolean: has timestamp flag
  next long: timestamp
  next boolean: deleted flag
  next integer: value length (if negative, value bytes are same as already-read value (-length - 1))
  next _n_ bytes: value bytes (if value length is non-negative)
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.