### Start ServerKillingMonkeyFactory Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/building-and-developing/tests.mdx Example command to start ChaosMonkey with ServerKillingMonkeyFactory, which includes actions like rolling restarts of RegionServers and Masters. ```bash $ bin/hbase org.apache.hadoop.hbase.chaos.util.ChaosMonkeyRunner -m org.apache.hadoop.hbase.chaos.factories.ServerKillingMonkeyFactory ``` -------------------------------- ### HBase Setup and ZooKeeper Startup Commands Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/zookeeper.mdx Commands to clone the HBase repository, run tests, and start ZooKeeper, master, and regionserver processes for initial configuration. ```bash git clone https://gitbox.apache.org/repos/asf/hbase.git cd hbase mvn clean test -Dtest=TestZooKeeperACL ``` ```bash bin/hbase zookeeper & bin/hbase master & bin/hbase regionserver & ``` -------------------------------- ### Starting HBase Cluster Components Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/zookeeper.mdx Commands to start the ZooKeeper, Master, and RegionServer components of an HBase cluster. These commands should be executed on the appropriate hosts as per your cluster setup. ```bash bin/hbase zookeeper start ``` ```bash bin/hbase master start ``` ```bash bin/hbase regionserver start ``` -------------------------------- ### Setting up Symlinks for Native Libraries Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/compression.mdx This example demonstrates how to create symbolic links to Hadoop's native libraries within the HBase directory structure. This is a common method to make them discoverable by HBase, especially when Hadoop and HBase installations are in adjacent locations. ```bash $ mkdir -p ~/hbaseLinux-amd64-64 -> /home/stack/hadoop/lib/native/lib/native/ $ cd ~/hbase/lib/native/ $ ln -s ~/hadoop/lib/native Linux-amd64-64 $ ls -la ``` -------------------------------- ### Starting HBase Cluster Components Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Commands to start HBase cluster services. Ensure Kerberos is properly configured before starting. ```bash bin/hbase zookeeper start bin/hbase master start bin/hbase regionserver start ``` -------------------------------- ### Minor Compaction File Selection - Example 2 Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/architecture/regions.mdx Illustrates a scenario where no compaction is initiated due to insufficient files meeting the minimum threshold. This example uses the same parameters as Example 1 but with different file sizes. ```text - hbase.hstore.compaction.ratio = 1.0f - hbase.hstore.compaction.min = 3 (files) - hbase.hstore.compaction.max = 5 (files) - hbase.hstore.compaction.min.size = 10 (bytes) - hbase.hstore.compaction.max.size = 1000 (bytes) The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest). With the above parameters, no compaction will be started. Why? - 100 → No, because sum(25, 12, 12) * 1.0 = 47 - 25 → No, because sum(12, 12) * 1.0 = 24 - 12 → No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3 - 12 → No. Candidate because the previous StoreFile was, but there are not enough files to compact ``` -------------------------------- ### Install Dependencies Source: https://github.com/apache/hbase/blob/master/dev-support/git-jira-release-audit/README.md Install project dependencies from the requirements.txt file into the active virtual environment. ```shell $ ./venv/bin/pip install -r ./requirements.txt ... Successfully installed... ``` -------------------------------- ### Start HBase Services Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/configuration/confirm.mdx Run this command from the HBASE_HOME directory to start the HBase services. Ensure HDFS and ZooKeeper are running beforehand, or HBase will start them for you. ```bash bin/start-hbase.sh ``` -------------------------------- ### Start a New Regionserver Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/operational-management/node.mdx Starts a new regionserver process. The new regionserver will register itself with the master. It is recommended to also start a DataNode on the same machine for local file access. ```bash $ ./bin/hbase-daemon.sh start regionserver ``` -------------------------------- ### Install HBase JARs Locally Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/building-and-developing/building.mdx The 'install' target compiles HBase and installs the resulting JARs into your local Maven repository (~/.m2/). ```bash mvn install ``` -------------------------------- ### HBase Website Download Link Example Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Example of a link to download HBase releases from the official website. ```text https://hbase.apache.org/downloads ``` -------------------------------- ### Starting HBase Cluster Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/zookeeper.mdx Commands to start HBase master and regionserver processes. ```bash bin/hbase master start bin/hbase regionserver start ``` -------------------------------- ### Install Docker CE Source: https://github.com/apache/hbase/blob/master/dev-support/create-release/README.txt Install Docker Community Edition (CE), the command-line interface (CLI), and containerd.io on your system. ```bash sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io ``` -------------------------------- ### HBase Cluster Startup Output Example Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Example output demonstrating the sequence of service startups when 'start-hbase.sh' is executed. It shows ZooKeeper, master, and regionserver initialization on different nodes. ```text node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out ``` -------------------------------- ### BulkPut Example with DStreams in Scala Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Shows a basic setup for using bulkPut with Spark Streaming DStreams in Scala. This example requires an existing SparkContext and HBaseConfiguration. ```scala val sc = new SparkContext("local", "test") val config = new HBaseConfiguration() ``` -------------------------------- ### Test Docker Installation Source: https://github.com/apache/hbase/blob/master/dev-support/create-release/README.txt Run the 'hello-world' Docker image to verify that Docker is installed and functioning correctly. This is a standard post-installation test. ```bash docker run hello-world ``` -------------------------------- ### HBase Table Region Boundaries Example Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/bulk-data-generator-tool.mdx Illustrates the start and end keys for regions in a pre-split HBase table. This example shows a table with 10 split-count, resulting in 11 regions. ```text (-000001, 000001-000002, 000002-000003, ...., 000009-000010, 0000010-) ``` -------------------------------- ### Example: Create Full Backup with Specific Tables Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/backup-restore/commands.mdx This example demonstrates creating a full backup for specific tables (SALES2, SALES3) to an HDFS path, utilizing three parallel workers for the operation. ```bash $ hbase backup create full hdfs://host5:9000/data/backup -t SALES2,SALES3 -w 3 ``` -------------------------------- ### HBase MapReduce Mapper Accessing Other Tables Example Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Example of a Mapper that accesses another HBase table for lookups. An HBase Table instance is created in the setup method using a Connection. ```Java public class MyMapper extends TableMapper { private Table myOtherTable; public void setup(Context context) { // In here create a Connection to the cluster and save it or use the Connection // from the existing table myOtherTable = connection.getTable("myOtherTable"); } public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException { // process Result... // use 'myOtherTable' for lookups } } ``` -------------------------------- ### Initiate Release Build Source: https://github.com/apache/hbase/blob/master/dev-support/release-vm/README.md Start the release build process within the Guest VM using the provided script. Ensure you have created the necessary build directory beforehand. ```shell guest:~$ mkdir ~/build-2.3.1-rc0 guest:~$ cd repos/hbase guest:~/repos/hbase$ ./dev-support/create-release/do-release-docker.sh -d ~/build-2.3.1-rc0/ ... ``` -------------------------------- ### Check Python Version Source: https://github.com/apache/hbase/blob/master/dev-support/git-jira-release-audit/README.md Verify that Python 3.7.6 or a compatible version is installed before proceeding with setup. ```shell $ python3 --version Python 3.7.6 ``` -------------------------------- ### Import and Verify GPG in Guest Source: https://github.com/apache/hbase/blob/master/dev-support/release-vm/README.md Import your public GPG key into the Guest VM and verify GPG agent passthrough by signing and verifying a file. Always use '--no-autostart' for GPG commands in the Guest. ```shell guest:~$ gpg --no-autostart --import /vagrant/gpg..apache.pub ... gpg: Total number processed: 1 gpg: imported: 1 guest:~$ gpg --no-autostart --detach --armor --sign repos/hbase/pom.xml guest:~$ gpg --no-autostart --verify repos/hbase/pom.xml.asc gpg: assuming signed data in 'repos/hbase/pom.xml' ... guest:~$ rm repos/hbase/pom.xml.asc ``` -------------------------------- ### Example HBase:slowlog Scan Result Rows Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/operational-management/metrics-and-monitoring.mdx This example shows the structure of data stored in the hbase:slowlog system table, including details like call details, client address, method name, and start time. ```text \x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) \x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347 \x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan \x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_ handles_heartbeats: true track_scan_metrics: false \x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18 \x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0 \x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1 \x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575 \x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer \x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732 \x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL \x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2 \x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) ``` -------------------------------- ### Display Help for Release Build Script Source: https://github.com/apache/hbase/blob/master/dev-support/create-release/README.txt Execute this command to view the available options and usage instructions for the release-build.sh script. ```bash $ release-build.sh --help ``` -------------------------------- ### Run ChaosMonkey Runner Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/building-and-developing/tests.mdx Start the ChaosMonkey runner from a single host to initiate chaos testing. This example uses the default PeriodicRandomActionPolicy. ```bash $ bin/hbase org.apache.hadoop.hbase.chaos.util.ChaosMonkeyRunner ``` -------------------------------- ### Example WALPlayer Usage Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/operational-management/tools.mdx An example demonstrating how to use WALPlayer to replay WAL files from a backup directory, mapping old tables to new ones. ```bash $ bin/hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 ``` -------------------------------- ### CopyTable Example with Specific Options Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Example of copying a table with specified start and end times, peer cluster address, and column families. Note the use of --peer.adr for older versions and the recommended scanner caching and speculative execution settings. ```bash bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable ``` -------------------------------- ### Filter by Regex String Comparator Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/architecture/client-request-filters.mdx Use RegexStringComparator with SingleColumnValueFilter to perform regular expression matching on column values. This example filters for values starting with 'my'. ```java RegexStringComparator comp = new RegexStringComparator("my."); // any value that starts with 'my' SingleColumnValueFilter filter = new SingleColumnValueFilter( cf, column, CompareOperaor.EQUAL, comp ); scan.setFilter(filter); ``` -------------------------------- ### Minor Compaction File Selection - Example 1 Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/architecture/regions.mdx Demonstrates a basic scenario for minor compaction file selection with specific parameters. It shows which files are selected based on size and the compaction ratio. ```text - hbase.hstore.compaction.ratio = 1.0f - hbase.hstore.compaction.min = 3 (files) - hbase.hstore.compaction.max = 5 (files) - hbase.hstore.compaction.min.size = 10 (bytes) - hbase.hstore.compaction.max.size = 1000 (bytes) The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12. Why? - 100 → No, because sum(50, 23, 12, 12) * 1.0 = 97. - 50 → No, because sum(23, 12, 12) * 1.0 = 47. - 23 → Yes, because sum(12, 12) * 1.0 = 24. - 12 → Yes, because the previous file has been included, and because this does not exceed the max-file limit of 5 - 12 → Yes, because the previous file had been included, and because this does not exceed the max-file limit of 5. ``` -------------------------------- ### Build Binary Tarball with Maven (Two Steps) Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Build the binary tarball using Maven. This process is recommended to be done in two steps: first installing to the local repository, then generating site documentation and assembling the tarball. ```bash mvn clean install -DskipTests -Prelease ``` ```bash mvn install -DskipTests site assembly:single -Prelease ``` -------------------------------- ### Invoke Endpoint Coprocessor from Client Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Create a connection to HBase, get a table instance, and use `coprocessorService` to invoke the endpoint. This example sends a request to the `getSum` method. ```java Configuration conf = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(conf); TableName tableName = TableName.valueOf("users"); Table table = connection.getTable(tableName); final Sum.SumRequest request = Sum.SumRequest.newBuilder().setFamily("salaryDet").setColumn("gross").build(); try { Map results = table.coprocessorService( Sum.SumService.class, null, /* start key */ null, /* end key */ new Batch.Call() { @Override public Long call(Sum.SumService aggregate) throws IOException { BlockingRpcCallback rpcCallback = new BlockingRpcCallback<>(); aggregate.getSum(null, request, rpcCallback); Sum.SumResponse response = rpcCallback.get(); return response.hasSum() ? response.getSum() : 0L; } }); } catch (Throwable e) { e.printStackTrace(); } ``` -------------------------------- ### HBase REST GET: Multi-Get with URL-Encoded Filter Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Performs a Multi-Get operation with a filter specified in Thrift Filter Language and URL-encoded. This example uses PrefixFilter('row1'). ```bash curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/multiget?row=row1&row=row2/cf:a&filter=PrefixFilter%28%27row1%27%29" ``` -------------------------------- ### LoadTestTool Example: Write and Read Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Demonstrates how to use LoadTestTool for a write and read workload. Specifies write parameters, number of keys, read verification percentage, number of tables, data block encoding, and table name. ```bash $ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 1000000 -read 100:30 -num_tables 1 -data_block_encoding NONE -tn load_test_tool_NONE ``` -------------------------------- ### Set Canary Test Timeout Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/operational-management/tools.mdx Enforce a timeout for Canary tests to prevent them from getting stuck on unresponsive RegionServers. This example sets the timeout to 60 seconds (60000 milliseconds). ```bash $ ${HBASE_HOME}/bin/hbase canary -t 60000 ``` -------------------------------- ### Start Backup HMaster Servers Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Use the local-master-backup.sh script to start backup HMaster instances for testing. Provide port offsets to avoid conflicts with the primary HMaster. ```bash $ ./bin/local-master-backup.sh start 2 3 5 ``` -------------------------------- ### Get Slow Log Responses from Specific RegionServers Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Retrieve slow log responses from a specified list of RegionServers. Server names should include host, port, and start code. ```shell hbase> get_slowlog_responses ['SERVER_NAME1', 'SERVER_NAME2'] ``` -------------------------------- ### Start Local HBase Cluster Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/building-and-developing/developer-guidelines.mdx Run HBase directly from source in local-mode for testing changes against a more realistic cluster. Ensure HBase is installed in your local Maven repository first. ```bash ${HBASE_HOME}/bin/start-hbase.sh ``` -------------------------------- ### HBase REST GET: Multi-Get with Base64 Encoded Filter Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Performs a Multi-Get operation with a filter specified in Thrift Filter Language and URL-safe base64 encoded. This example uses PrefixFilter('row1'). ```bash curl -vi -X GET \ -H "Accept: text/xml" \ "http://example.com:8000/users/multiget?row=row1&row=row2/cf:a&filter_b64=UHJlZml4RmlsdGVyKCdyb3cxJyk" ``` -------------------------------- ### Override HBase Configuration in Shell Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/shell.mdx Override HBase configuration properties when starting the shell by using the `-D` flag followed by the configuration key and value. This example sets the ZooKeeper quorum and a custom property. ```bash $ ./bin/hbase shell -Dhbase.zookeeper.quorum=ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org -Draining=false ... ``` ```jruby hbase(main):001:0> @shell.hbase.configuration.get("hbase.zookeeper.quorum") => "ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org" hbase(main):002:0> @shell.hbase.configuration.get("raining") => "false" ``` -------------------------------- ### HBase CompactionTool Examples Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html These examples demonstrate how to use the CompactionTool for different compaction scenarios, including using MapReduce and targeting specific column families. ```bash $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred hdfs://hbase/data/default/TestTable ``` ```bash $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool hdfs://hbase/data/default/TestTable/abc/x ``` -------------------------------- ### Import GPG Key and Test Signing Source: https://github.com/apache/hbase/blob/master/dev-support/create-release/README.txt After SSHing into the VM, import your public GPG key and test the signing process. Ensure your local GPG agent is accessible via the forwarded socket. ```bash gpg --no-autostart --import gpg.example.apache.pub ``` ```bash echo "foo" > foo.txt ``` ```bash gpg --no-autostart --detach --armor --sign foo.txt ``` ```bash gpg --no-autostart --verify foo.txt.asc ``` -------------------------------- ### Integration Test with TestingHBaseCluster (HBase 2.5.0+) Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/unit-testing.mdx Recommended for HBase 2.5.0 and later, this example uses TestingHBaseCluster for integration tests. It includes setup and teardown methods for the cluster, connection, and admin, then performs an insert and verification. ```java public class MyHBaseIntegrationTest { private TestingHBaseCluster cluster; private Connection conn; private Admin admin; private TableName tableName = TableName.valueOf("MyTest"); byte[] CF = "CF".getBytes(); byte[] CQ1 = "CQ-1".getBytes(); byte[] CQ2 = "CQ-2".getBytes(); @Before public void setUp() throws Exception { cluster = TestingHBaseCluster.create(TestingHBaseClusterOption.builder().build()); cluster.start(); conn = ConnectionFactory.createConnection(cluster.getConf()); admin = conn.getAdmin(); admin.createTable(TableDescriptorBuilder.newBuilder(tableName) .setColumnFamily(ColumnFamilyDescriptorBuilder.of(CF)).build()); } @After public void tearDown() throws Exception { admin.close(); conn.close(); cluster.stop(); } @Test public void testInsert() throws Exception { try (Table table = conn.getTable(tableName)) { HBaseTestObj obj = new HBaseTestObj(); obj.setRowKey("ROWKEY-1"); obj.setData1("DATA-1"); obj.setData2("DATA-2"); MyHBaseDAO.insertRecord(table, obj); Get get1 = new Get(Bytes.toBytes(obj.getRowKey())); get1.addColumn(CF, CQ1); Result result1 = table.get(get1); assertEquals(Bytes.toString(result1.getRow()), obj.getRowKey()); assertEquals(Bytes.toString(result1.value()), obj.getData1()); Get get2 = new Get(Bytes.toBytes(obj.getRowKey())); get2.addColumn(CF, CQ2); Result result2 = table.get(get2); assertEquals(Bytes.toString(result2.getRow()), obj.getRowKey()); assertEquals(Bytes.toString(result2.value()), obj.getData2()); } } } ``` -------------------------------- ### SyncTable Example Command Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Example of running SyncTable for a dry run from a remote source cluster to a local target cluster using ZooKeeper cluster keys. ```bash $ bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --dryrun=true --sourcezkcluster=zk1.example.com,zk2.example.com,zk3.example.com:2181:/hbase hdfs://nn:9000/hashes/tableA tableA tableA ``` -------------------------------- ### HBase Table Creation, Population, Get, and Delete with Jython Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html This Jython script demonstrates how to create, populate, fetch, and delete an HBase table. It includes setup for connection, table descriptor, column families, and data insertion/retrieval. ```Jython import java.lang from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, HColumnDescriptor, TableName from org.apache.hadoop.hbase.client import Admin, Connection, ConnectionFactory, Get, Put, Result, Table from org.apache.hadoop.conf import Configuration # First get a conf object. This will read in the configuration # that is out in your hbase-*.xml files such as location of the # hbase master node. conf = HBaseConfiguration.create() connection = ConnectionFactory.createConnection(conf) admin = connection.getAdmin() # Create a table named 'test' that has a column family # named 'content'. tableName = TableName.valueOf("test") table = connection.getTable(tableName) desc = HTableDescriptor(tableName) desc.addFamily(HColumnDescriptor("content")) # Drop and recreate if it exists if admin.tableExists(tableName): admin.disableTable(tableName) admin.deleteTable(tableName) admin.createTable(desc) # Add content to 'column:' on a row named 'row_x' row = 'row_x' put = Put(row) put.addColumn("content", "qual", "some content") table.put(put) # Now fetch the content just added, returns a byte[] get = Get(row) result = table.get(get) data = java.lang.String(result.getValue("content", "qual"), "UTF8") print "The fetched row contains the value '%s'" % data ``` -------------------------------- ### Example GC Log Entry: CMS-initial-mark Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html This log entry indicates the start of a CMS (Concurrent Mark Sweep) garbage collection cycle, specifically the initial marking phase. The duration shown is the time the entire VM was paused. ```text 64898.952: [GC [1 CMS-initial-mark: 2811538K(3055704K)] 2812179K(3061272K), 0.0007360 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] ``` -------------------------------- ### Monitor System Resources with top Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Use 'top' to view real-time system process information and resource consumption. Typing 'c' expands process details, and '1' shows per-CPU usage. ```bash top - 14:46:59 up 39 days, 11:55, 1 user, load average: 3.75, 3.57, 3.84 Tasks: 309 total, 1 running, 308 sleeping, 0 stopped, 0 zombie Cpu(s): 4.5%us, 1.6%sy, 0.0%ni, 91.7%id, 1.4%wa, 0.1%hi, 0.6%si, 0.0%st Mem: 24414432k total, 24296956k used, 117476k free, 7196k buffers Swap: 16008732k total, 14348k used, 15994384k free, 11106908k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15558 hadoop 18 -2 3292m 2.4g 3556 S 79 10.4 6523:52 java 13268 hadoop 18 -2 8967m 8.2g 4104 S 21 35.1 5170:30 java 8895 hadoop 18 -2 1581m 497m 3420 S 11 2.1 4002:32 java … ``` -------------------------------- ### Build Protobuf for Apple Silicon Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/building-and-developing/building.mdx Commands to download, patch, configure, and build the protobuf 2.5.0 library for Apple Silicon (aarch64) and install it locally for Maven to use. ```bash curl -sSL https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz | tar zx - cd protobuf-2.5.0 curl -L -O https://gist.githubusercontent.com/liusheng/64aee1b27de037f8b9ccf1873b82c413/raw/118c2fce733a9a62a03281753572a45b6efb8639/protobuf-2.5.0-arm64.patch patch -p1 < protobuf-2.5.0-arm64.patch ./configure --disable-shared make mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=2.5.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=src/protoc ``` -------------------------------- ### HBase MapReduce Job Setup for File Output Source: https://github.com/apache/hbase/blob/master/hbase-website/public/book.html Configures a MapReduce job to read from an HBase table and write summarized results to HDFS. The mapper remains the same as the table output example, but a different reducer and output path are specified. ```java Configuration config = HBaseConfiguration.create(); Job job = new Job(config,"ExampleSummaryToFile"); job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs // set other scan attrs TableMapReduceUtil.initTableMapperJob( sourceTable, // input table scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper class Text.class, // mapper output key IntWritable.class, // mapper output value job); job.setReducerClass(MyReducer.class); // reducer class job.setNumReduceTasks(1); // at least one, adjust as required FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required boolean b = job.waitForCompletion(true); if (!b) { throw new IOException("error with job!"); } ``` -------------------------------- ### Example LoadTestTool Command for Performance Testing Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/compression.mdx An example command demonstrating how to use the LoadTestTool to test write and read performance with specific configurations for data block encoding and table naming. ```bash $ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:10:100 -num_keys 1000000 \ -read 100:30 -num_tables 1 -data_block_encoding NONE -tn load_test_tool_NONE ``` -------------------------------- ### HBase Get with Byte Array Conversion Source: https://github.com/apache/hbase/blob/master/hbase-website/app/pages/_docs/docs/_mdx/(multi-page)/performance.mdx This example demonstrates a common but inefficient way to retrieve data from HBase by repeatedly converting column family and attribute names to byte arrays. This can be costly, especially within loops or MapReduce jobs. ```java Get get = new Get(rowkey); Result r = table.get(get); byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr")); // returns current version of value ```