### Basic GET Request Example Source: https://github.com/apache/amoro/blob/master/http/README.md This example demonstrates a basic GET request with common headers like Content-Type and Authorization. Ensure you replace `` with your actual authentication token. ```http GET http://localhost:8080/api/example Content-Type: application/json Authorization: Bearer ``` -------------------------------- ### Start Development Server Source: https://github.com/apache/amoro/blob/master/amoro-web/README.md Run this command in the `amoro-web` directory to start the development server if you are not a frontend developer. ```bash pnpm dev ``` -------------------------------- ### Start Mock Development Server Source: https://github.com/apache/amoro/blob/master/amoro-web/README.md Run this command in the `amoro-web` directory to start a development server with mock data for the dashboard app. ```bash pnpm dev:mock ``` -------------------------------- ### Docker Compose for Amoro Environment Setup Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/quick-start.md Use this docker-compose.yml file to set up MinIO object storage and the Amoro service. Ensure Docker and Docker Compose CLIs are installed. ```yaml version: "3" services: minio: image: minio/minio container_name: minio environment: - MINIO_ROOT_USER=admin - MINIO_ROOT_PASSWORD=password - MINIO_DOMAIN=minio networks: amoro_network: aliases: - warehouse.minio ports: - 9001:9001 - 9000:9000 command: [ "server", "/data", "--console-address", ":9001" ] mc: depends_on: - minio image: minio/mc container_name: mc networks: amoro_network: environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 entrypoint: > /bin/sh -c " until (/usr/bin/mc alias set minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done; /usr/bin/mc rm -r --force minio/warehouse; /usr/bin/mc mb minio/warehouse; /usr/bin/mc anonymous set public minio/warehouse; tail -f /dev/null " amoro: image: apache/amoro container_name: amoro ports: - 8081:8081 - 1630:1630 - 1260:1260 environment: - JVM_XMS=1024 networks: amoro_network: volumes: - ./amoro:/tmp/warehouse command: ["/entrypoint.sh", "ams"] tty: true stdin_open: true networks: amoro_network: driver: bridge ``` -------------------------------- ### Get Helm Charts and Build Dependencies Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment-on-kubernetes.md Clone the Amoro repository and navigate to the charts directory. Build Helm chart dependencies before installation. ```shell git clone https://github.com/apache/amoro.git cd amoro/charts helm dependency build ./amoro ``` -------------------------------- ### Install Thrift Compiler Source: https://github.com/apache/amoro/blob/master/amoro-common/src/main/thrift/README.md Installs the Thrift compiler from source. Ensure you have build tools and dependencies installed. ```shell wget -nv https://archive.apache.org/dist/thrift/0.20.0/thrift-0.20.0.tar.gz tar xzf thrift-0.20.0.tar.gz cd thrift-0.20.0 chmod +x ./configure ./configure --disable-libs sudo make install -j ``` -------------------------------- ### Frontend Development Server Source: https://github.com/apache/amoro/blob/master/AGENTS.md Starts the frontend development server with mock data enabled. Used for developing the Amoro dashboard UI. ```bash pnpm install && pnpm dev:mock ``` -------------------------------- ### Example Login Response with Roles and Privileges Source: https://github.com/apache/amoro/blob/master/docs/configuration/ams-config.md An example JSON response from the `/login/current` endpoint, showing the authenticated user's name, assigned roles, and effective privileges. ```json { "userName": "alice", "roles": ["CATALOG_ADMIN"], "privileges": [ "VIEW_CATALOG", "MANAGE_CATALOG", "VIEW_TABLE", "MANAGE_TABLE" ] } ``` -------------------------------- ### Run Amoro Docs Site Locally Source: https://github.com/apache/amoro/blob/master/site/README.md Command to start the Amoro documentation site locally using Hugo. ```shell (cd site/amoro-docs && hugo serve) ``` -------------------------------- ### Run Amoro Site Locally Source: https://github.com/apache/amoro/blob/master/site/README.md Command to start the main Amoro site locally using Hugo. ```shell (cd site/amoro-site && hugo serve) ``` -------------------------------- ### Toolchains XML Configuration for JDK 17 Source: https://github.com/apache/amoro/blob/master/README.md Example configuration for Maven toolchains.xml to specify JDK 17. Replace '${YourJDK17Home}' with the actual path to your JDK 17 installation. ```xml jdk 17 sun ${YourJDK17Home} ``` -------------------------------- ### Install helm-unittest Plugin Source: https://github.com/apache/amoro/blob/master/charts/amoro/README.md Install the helm-unittest plugin, which is required for running unit tests on the Amoro Helm chart. This is a prerequisite before executing tests. ```shell helm plugin install https://github.com/helm-unittest/helm-unittest.git ``` -------------------------------- ### Start Amoro AMS Service Source: https://github.com/apache/amoro/blob/master/CONTRIBUTING.md Instructions to start the Amoro AMS service by running the main container class in IntelliJ IDEA. Assumes the project is already imported and configured. ```java {base_dir}/amoro-ams/src/main/java/org/apache/amoro/server/AmoroServiceContainer.java ``` -------------------------------- ### Install Amoro with Helm Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment-on-kubernetes.md Install Amoro on your Kubernetes cluster using the Helm chart. Replace `` with your desired name. ```shell helm install ./amoro ``` -------------------------------- ### Start, Restart, and Stop AMS Service Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment.md Navigate to the Amoro installation directory and use the `ams.sh` script to manage the AMS service. Ensure you are in the `amoro-x.y.z` directory before executing commands. ```shell $ cd amoro-x.y.z $ bin/ams.sh start ``` ```shell $ bin/ams.sh restart ``` ```shell $ bin/ams.sh stop ``` -------------------------------- ### Start Docker Compose Containers Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/quick-start.md Execute this command in your terminal to launch the Amoro environment defined in your docker-compose.yml file. ```shell docker-compose up ``` -------------------------------- ### Deploy Amoro with Custom Values Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment-on-kubernetes.md Install Amoro using a custom configuration file. This allows for specific overrides of default Helm chart settings. ```shell helm install ./amoro -f my-values.yaml ``` -------------------------------- ### Start Standalone Optimizer Source: https://github.com/apache/amoro/blob/master/CONTRIBUTING.md Run the standalone optimizer in IntelliJ IDEA with specific program arguments. This allows for local testing and debugging of optimizer functionalities. ```java {base_dir}/amoro-optimizer/amoro-optimizer-standalone/src/main/java/org/apache/amoro/optimizer/standalone/StandaloneOptimizer.java ``` -------------------------------- ### Build Project for Deployment Source: https://github.com/apache/amoro/blob/master/amoro-web/README.md Execute this command in the `amoro-web` directory to prepare the dashboard for deployment. ```bash pnpm build ``` -------------------------------- ### Run Optimizer with Parameters Source: https://github.com/apache/amoro/blob/master/CONTRIBUTING.md Execute the standalone optimizer with command-line arguments. The parameters specify the thrift server address, parallelism, and group name for the optimizer. ```shell -a thrift://127.0.0.1:1261 -p 1 -g local ``` -------------------------------- ### Build Amoro with Aliyun OSS SDK Source: https://github.com/apache/amoro/blob/master/README.md Builds modules with the Aliyun OSS SDK enabled. Tests are skipped. ```bash ./mvnw clean package -DskipTests -Paliyun-oss-sdk ``` -------------------------------- ### Amoro Documentation and Build Info Links Source: https://github.com/apache/amoro/blob/master/charts/amoro/templates/NOTES.txt Links to documentation and version build information are provided. The links differ based on whether the chart is deployed from a master snapshot or a specific version. ```go-template {{ if eq .Chart.AppVersion "master-snapshot" }} * Documentation: https://amoro.apache.org/docs/latest/ * Version build Info : https://github.com/apache/amoro {{ else }} * Documentation: https://amoro.apache.org/docs/{{ .Chart.AppVersion }}/ * Version build Info : https://github.com/apache/amoro/releases/tag/v{{ .Chart.AppVersion }} {{ end }} ``` -------------------------------- ### Example Watermark Query Result Source: https://github.com/apache/amoro/blob/master/docs/user-guides/using-tables.md This is an example of the output when querying a table's watermark. It displays the key and its corresponding timestamp value. ```text +-----------------+" + "| key | value |" + "+-----------------+" + "| watermark.table | 1668579055000 | " + "+-----------------+" ``` -------------------------------- ### Create Sample Table for Overwrite Demonstration Source: https://github.com/apache/amoro/blob/master/docs/engines/spark/spark-writes.md Define a sample table using the `mixed_iceberg` format, partitioned by a timestamp column. This table is used to demonstrate `INSERT OVERWRITE` behavior. ```sql CREATE TABLE mixed_catalog.db.sample ( id int, data string, ts timestamp, primary key (id)) USING mixed_iceberg PARTITIONED BY (days(ts)) ``` -------------------------------- ### Install Thrift on macOS with Homebrew Source: https://github.com/apache/amoro/blob/master/amoro-common/src/main/thrift/README.md Installs Thrift 0.20.0 using Homebrew and updates the PATH environment variable. This is an alternative for macOS users. ```shell brew install thrift export PATH="/usr/local/opt/thrift@0.20.0/bin:$PATH" ``` -------------------------------- ### Clone Repository and Configure Source: https://github.com/apache/amoro/blob/master/CONTRIBUTING.md Clone the Amoro repository and set up the initial configuration file for local development. This includes creating directories and modifying the config.yaml file. ```shell git clone https://github.com/apache/amoro.git cd amoro base_dir=$(pwd) mkdir -p conf cp dist/src/main/amoro-bin/conf/config.yaml conf/config.yaml sed -i '' "s|/tmp/amoro/derby|${base_dir}/conf/derby|g" conf/config.yaml ``` -------------------------------- ### Build All Modules with Toolchain Configuration Source: https://github.com/apache/amoro/blob/master/README.md Builds all modules with toolchain and mixed format support enabled. Requires 'toolchains.xml' configuration in '~/.m2/'. Tests are skipped. ```bash ./mvnw clean package -DskipTests -Ptoolchain,build-mixed-format-trino ``` -------------------------------- ### Build Amoro Spark Optimizer Docker Image Source: https://github.com/apache/amoro/blob/master/docker/README.md Use this command to build the Amoro Spark optimizer Docker image. Package the project first by running `./mvnw clean package -DskipTests`. ```shell ./build.sh amoro-spark-optimizer ``` -------------------------------- ### Custom Role Policy Example Source: https://github.com/apache/amoro/blob/master/docs/configuration/ams-config.md An example of a custom role policy defined using Casbin's policy format. This CSV-formatted string specifies permissions for the 'CATALOG_ADMIN' role, granting it rights to view and manage catalogs and tables. ```csv p, CATALOG_ADMIN, CATALOG, GLOBAL, VIEW_CATALOG, allow p, CATALOG_ADMIN, CATALOG, GLOBAL, MANAGE_CATALOG, allow p, CATALOG_ADMIN, TABLE, GLOBAL, VIEW_TABLE, allow p, CATALOG_ADMIN, TABLE, GLOBAL, MANAGE_TABLE, allow ``` -------------------------------- ### Build Amoro Docker Image Source: https://github.com/apache/amoro/blob/master/docker/README.md Use this command to build the base Amoro Docker image. Ensure the project is packaged first by running `./mvnw clean package -DskipTests`. ```shell ./build.sh amoro ``` -------------------------------- ### Build Distribution Package with All Formats Source: https://github.com/apache/amoro/blob/master/README.md Builds a distribution package that integrates all supported formats. Tests are skipped. ```bash ./mvnw clean package -Psupport-all-formats ``` -------------------------------- ### Create Table Like Source: https://github.com/apache/amoro/blob/master/docs/engines/spark/spark-ddl.md Copy the structure of an existing table, including primary keys and partitions, to a new table using `CREATE TABLE ... LIKE`. Data is not copied. This only supports binary `db.table` format within the same catalog. ```sql USE mixed_catalog; CREATE TABLE db.sample LIKE db.sample2 USING mixed_iceberg TBLPROPERTIES ('owner'='xxxx'); ``` -------------------------------- ### Show Tables Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-ddl.md List all table names present in the current database. ```sql SHOW TABLES; ``` -------------------------------- ### Custom Base64 Decryption Implementation Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/using-customized-encryption-method-for-configurations.md An example Java implementation of the `ConfigShade` interface using standard Base64 decoding. The `getIdentifier` method returns 'base64-custom'. ```java package com.example.shade; import org.apache.amoro.config.Configurations; import org.apache.amoro.config.shade.ConfigShade; import java.util.Base64; /** * Custom Base64 decryption implementation for AMS. */ public class Base64CustomConfigShade implements ConfigShade { @Override public String getIdentifier() { return "base64-custom"; // Use this identifier in shade.identifier } @Override public String decrypt(String content) { return new String(Base64.getDecoder().decode(content)); } } ``` -------------------------------- ### Show Create Table Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-ddl.md Display the full DDL statement used to create the current table, including all properties and configurations. ```sql SHOW CREATE TABLE; ``` -------------------------------- ### Configure Table Watermark Source: https://github.com/apache/amoro/blob/master/docs/concepts/table-watermark.md Configure the event time field and allowed lateness for watermark calculation. This setup helps in handling out-of-order data writes. ```sql 'table.event-time-field' = 'op_time', 'table.watermark-allowed-lateness-second' = '60' ``` -------------------------------- ### Prometheus Scrape Configuration Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment.md Example configuration for Prometheus to scrape metrics from the Amoro Prometheus Exporter. Ensure the target address matches the configured exporter port. ```yaml # Your prometheus configs file. scrape_configs: - job_name: 'amoro-exporter' scrape_interval: 15s static_configs: - targets: ['localhost:9090'] # The host and port that you configured in Amoro plugins configs file. ``` -------------------------------- ### Build Binary Release Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/release-guide.md Build the Amoro binary release using the provided script. Ensure AMORO_SOURCE_HOME is set. ```shell cd ${AMORO_SOURCE_HOME}/tools RELEASE_VERSION=0.8.0-incubating bash ./releasing/create_binary_release.sh ``` -------------------------------- ### Access Amoro Pod Logs Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment-on-kubernetes.md Retrieve logs from Amoro pods to troubleshoot issues. First, list all pods, then specify the pod name to get its logs. ```shell kubectl get pod kubectl logs {amoro-pod-name} ``` -------------------------------- ### Spark Optimizer Configuration for YARN Client Mode Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/managing-optimizers.md Configure the Spark optimizer container for YARN client mode. Ensure Spark is installed and Hadoop configurations are accessible. ```yaml containers: - name: sparkContainer container-impl: org.apache.amoro.server.manager.SparkOptimizerContainer properties: spark-home: /opt/spark/ # Spark install home master: yarn # The k8s cluster manager to connect to deploy-mode: client # Spark run as client mode export.HADOOP_CONF_DIR: /etc/hadoop/conf/ # Hadoop config dir export.HADOOP_USER_NAME: hadoop # Hadoop user submits on yarn export.JVM_ARGS: -Djava.security.krb5.conf=/opt/krb5.conf # Spark launch jvm args, like kerberos config when ues kerberos export.SPARK_CONF_DIR: /etc/hadoop/conf/ # Spark config dir ``` -------------------------------- ### Build Amoro (Skip Tests/Frontend) Source: https://github.com/apache/amoro/blob/master/AGENTS.md Builds the Amoro project, skipping all tests and the frontend dashboard build. Useful for quick builds when only backend changes are made. ```bash ./mvnw clean package -DskipTests -Pskip-dashboard-build ``` -------------------------------- ### Execute Helm Unit Tests Source: https://github.com/apache/amoro/blob/master/charts/amoro/README.md Run unit tests for the Amoro Helm chart using the installed helm-unittest plugin. Ensure all tests pass before submitting pull requests. ```shell helm unittest ../amoro ``` -------------------------------- ### Deploy Staging JARs Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/release-guide.md Deploy the required JAR files to the Apache Nexus repository using the provided script. Ensure AMORO_SOURCE_HOME is set. ```shell cd ${AMORO_SOURCE_HOME}/tools RELEASE_VERSION=0.8.0-incubating bash ./releasing/deploy_staging_jars.sh ``` -------------------------------- ### Render Helm Chart Templates Server-Side Source: https://github.com/apache/amoro/blob/master/charts/amoro/README.md This command performs a dry run of a Helm installation to render templates on the server-side. It's useful for debugging and previewing the deployment configuration. ```shell helm install --dry-run --debug --generate-name ../amoro ``` -------------------------------- ### Build Docs Site for Local Testing Source: https://github.com/apache/amoro/blob/master/site/README.md Builds the Amoro documentation site with a custom base URL and output directory for local testing, intended to be served alongside the main site. ```shell cd ../amoro-docs hugo -b http://localhost:5500/docs/latest/ -d ../../public/docs/latest ``` -------------------------------- ### Render Helm Chart Templates Locally Source: https://github.com/apache/amoro/blob/master/charts/amoro/README.md Use this command to view the rendered Helm chart templates without installing the chart. It helps in debugging template rendering issues. ```shell helm template --debug ../amoro ``` -------------------------------- ### Check GPG Signature Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/validate-release.md Download the KEYS file, import it, and then trust the key used for the release. Finally, verify the signature of the release artifacts. ```shell $ curl https://downloads.apache.org/incubator/amoro/KEYS > KEYS # Download KEYS $ gpg --import KEYS # Import KEYS to local ``` ```shell $ gpg --edit-key xxxxxxxxxx #KEY user used in this version gpg (GnuPG) 2.2.21; Copyright (C) 2020 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Secret key is available. gpg> trust #trust Please decide how far you trust this user to correctly verify other users' keys (by looking at passports, checking fingerprints from different sources, etc.) 1 = I don't know or won't say 2 = I do NOT trust 3 = I trust marginally 4 = I trust fully 5 = I trust ultimately m = back to the main menu Your decision? 5 #choose 5 Do you really want to set this key to ultimate trust? (y/N) y #choose y ``` ```shell $ for i in *.tar.gz; do echo $i; gpg --verify $i.asc $i; done ``` -------------------------------- ### Flink Optimizer for Flink Session Mode Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/managing-optimizers.md Configure the Flink optimizer container for Flink session mode. This example includes settings for high availability and connecting to a session cluster. ```yaml containers: - name: flinkContainer container-impl: org.apache.amoro.server.manager.FlinkOptimizerContainer properties: target: session # Flink run in session cluster job-uri: "local:///opt/flink/usrlib/optimizer-job.jar" # Optimizer job main jar ams-optimizing-uri: thrift://ams.amoro.service.local:1261 # AMS optimizing uri export.FLINK_CONF_DIR: /opt/flink/conf/ # Flink config dir, flink-conf.yaml should e in this dir, contains the rest connection parameters of the session cluster flink-conf.high-availability: zookeeper # Flink high availability mode, reference: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#high-availability flink-conf.high-availability.zookeeper.quorum: xxx:2181 flink-conf.high-availability.zookeeper.path.root: /flink flink-conf.high-availability.cluster-id: amoro-optimizer-cluster flink-conf.high-availability.storageDir: hdfs://xxx/xxx/xxx flink-conf.rest.address: localhost:8081 # If the session cluster is not high availability mode, please configure the restaddress of jobmanager ``` -------------------------------- ### Write Data to Amoro Table using DataFrame API Source: https://github.com/apache/amoro/blob/master/docs/engines/spark/spark-get-started.md Use the DataFrame API in a JAR job to write data to an Amoro table. This example demonstrates overwriting partitions. ```scala val df = spark.read().load("/path-to-table") df.writeTo('test_db.table1').overwritePartitions() ``` -------------------------------- ### Build Source Release Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/release-guide.md Build the Amoro source release using the provided script. Ensure AMORO_SOURCE_HOME is set. ```shell cd ${AMORO_SOURCE_HOME}/tools RELEASE_VERSION=0.8.0-incubating bash ./releasing/create_source_release.sh ``` -------------------------------- ### TPC-H Query Examples Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/benchmark/benchmark.md These SQL queries are part of the TPC-H benchmark suite, used to test data warehousing performance. They cover various operations like aggregation, joins, and filtering. ```sql -- query1 SELECT ol_number, sum(ol_quantity) AS sum_qty, sum(ol_amount) AS sum_amount, avg(ol_quantity) AS avg_qty, avg(ol_amount) AS avg_amount, count(*) AS count_order FROM order_line WHERE ol_delivery_d > '2007-01-02 00:00:00.000000' GROUP BY ol_number ORDER BY ol_number; ``` ```sql -- query2 SELECT su_suppkey, su_name, n_name, i_id, i_name, su_address, su_phone, su_comment FROM item, supplier, stock, nation, region, (SELECT s_i_id AS m_i_id, MIN(s_quantity) AS m_s_quantity FROM stock, supplier, nation, region WHERE MOD((s_w_id*s_i_id), 10000) = su_suppkey AND su_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name LIKE 'Europ%' GROUP BY s_i_id) m WHERE i_id = s_i_id AND MOD((s_w_id * s_i_id), 10000) = su_suppkey AND su_nationkey = n_nationkey AND n_regionkey = r_regionkey AND i_data LIKE '%b' AND r_name LIKE 'Europ%' AND i_id=m_i_id AND s_quantity = m_s_quantity ORDER BY n_name, su_name, i_id; ``` ```sql -- query3 SELECT ol_o_id, ol_w_id, ol_d_id, sum(ol_amount) AS revenue, o_entry_d FROM customer, new_order, oorder, order_line WHERE c_state LIKE 'A%' AND c_id = o_c_id AND c_w_id = o_w_id AND c_d_id = o_d_id AND no_w_id = o_w_id AND no_d_id = o_d_id AND no_o_id = o_id AND ol_w_id = o_w_id AND ol_d_id = o_d_id AND ol_o_id = o_id AND o_entry_d > '2007-01-02 00:00:00.000000' GROUP BY ol_o_id, ol_w_id, ol_d_id, o_entry_d ORDER BY revenue DESC, o_entry_d; ``` -------------------------------- ### Synchronize Data to Data Lake Tables Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/benchmark/benchmark-guide.md Start the data ingestion job to synchronize data from MySQL to data lake tables. Specify the configuration directory, sink type (arctic/iceberg/hudi), and sink database. ```shell java -cp lakehouse-benchmark-ingestion-1.0-SNAPSHOT.jar com.netease.arctic.benchmark.ingestion.MainRunner -confDir [confDir] -sinkType [arctic/iceberg/hudi] -sinkDatabase [dbName] ``` -------------------------------- ### Show Databases Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-ddl.md List all available database names within the currently active catalog. ```sql SHOW DATABASES; ``` -------------------------------- ### Get Debezium Deserialize Schemas Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-cdc-ingestion.md Generates a map of Debezium deserialize schemas for tables. It configures the physical row type, user-defined converter factory, metadata converters, and result type information for each table. ```java private static Map getDebeziumDeserializeSchemas( final List> pathAndTable) { return pathAndTable.stream() .collect(toMap(e -> e.f0.toString(), e -> RowDataDebeziumDeserializeSchema.newBuilder() .setPhysicalRowType( (RowType) e.f1.getResolvedSchema().toPhysicalRowDataType().getLogicalType()) .setUserDefinedConverterFactory(MySqlDeserializationConverterFactory.instance()) .setMetadataConverters( new MetadataConverter[] {TABLE_NAME.getConverter(), DATABASE_NAME.getConverter()}) .setResultTypeInfo(TypeInformation.of(RowData.class)).build())); } ``` -------------------------------- ### Create Table Like Existing Table Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-ddl.md Create a new table with the same structure, partitions, and properties as an existing table using the LIKE clause. ```sql CREATE TABLE `mixed_catalog`.`mixed_db`.`test_table` ( id BIGINT, name STRING, op_time TIMESTAMP ); CREATE TABLE `mixed_catalog`.`mixed_db`.`test_table_like` LIKE `mixed_catalog`.`mixed_db`.`test_table`; ``` -------------------------------- ### Create Table As Select Source: https://github.com/apache/amoro/blob/master/docs/engines/spark/spark-ddl.md Create a table and populate it with query results using `CREATE TABLE ... AS SELECT`. Primary keys, partitions, and properties must be configured separately. Uniqueness checks for primary keys can be enabled with `spark.sql.mixed-format.check-source-data-uniqueness.enabled = true`. ```sql CREATE TABLE mixed_catalog.db.sample USING mixed_iceberg AS SELECT ... ``` ```sql CREATE TABLE mixed_catalog.db.sample PRIMARY KEY(id) USING mixed_iceberg PARTITIONED BY (pt) TBLPROPERTIES (''prop1''=''val1'', ''prop2''=''val2'') AS SELECT ... ``` -------------------------------- ### Configure Terminal Local Backend Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment.md Configure the terminal backend to use the local implementation, which starts an embedded Spark environment. This includes settings for handling timestamps and using the Spark session catalog for Hive tables. ```yaml ams: terminal: backend: local local.spark.sql.iceberg.handle-timestamp-without-timezone: false # When the catalog type is Hive, it automatically uses the Spark session catalog to access Hive tables. local.using-session-catalog-for-hive: true ``` -------------------------------- ### Flink SQL Streaming Mode Configuration Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-dml.md Configure Flink tasks to run in streaming mode and enable dynamic table parameter configuration for SQL hints. This setup is necessary for real-time data processing and dynamic option adjustments. ```sql SET execution.runtime-mode = streaming; SET table.dynamic-table-options.enabled = true; ``` -------------------------------- ### Build Main Site for Local Testing Source: https://github.com/apache/amoro/blob/master/site/README.md Builds the main Amoro site with a custom base URL and output directory for local testing. ```shell cd site/amoro-site hugo -b http://localhost:5500/ -d ../../public ``` -------------------------------- ### Ingest MySQL CDC Data to Mixed-Iceberg Table Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-cdc-ingestion.md This SQL example demonstrates writing MySQL CDC data to a Mixed-Iceberg table. It requires Flink SQL Connector MySQL CDC and Amoro Jars in the Flink engine's lib directory. ```sql CREATE TABLE products ( id INT, name STRING, description STRING, PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'localhost', 'port' = '3306', 'username' = 'root', 'password' = '123456', 'database-name' = 'mydb', 'table-name' = 'products' ); CREATE CATALOG amoro_catalog WITH ( 'type'='amoro', 'ams.uri'='thrift://:/' ); CREATE TABLE IF NOT EXISTS `amoro_catalog`.`db`.`test_tb`( id INT, name STRING, description STRING, PRIMARY KEY (id) NOT ENFORCED ); INSERT INTO `amoro_catalog`.`db`.`test_tb` SELECT * FROM products; ``` -------------------------------- ### Configure LDAP Role Mapping for RBAC Source: https://github.com/apache/amoro/blob/master/docs/configuration/ams-config.md Configure Amoro to use LDAP for role mapping to enable RBAC. This setup includes specifying the LDAP server details, user patterns, and mapping LDAP groups to Amoro roles. Ensure the bind DN and password are set for authentication. ```yaml ams: http-server: login-auth-provider: org.apache.amoro.server.authentication.LdapPasswdAuthenticationProvider login-auth-ldap-url: "ldap://ldap.example.com:389" login-auth-ldap-user-pattern: "uid={0},ou=people,dc=example,dc=com" authorization: enabled: true ldap-role-mapping: enabled: true group-member-attribute: "member" user-dn-pattern: "uid={0},ou=people,dc=example,dc=com" bind-dn: "cn=service-account,dc=example,dc=com" bind-password: "service-password" groups: - group-dn: "cn=amoro-service-admins,ou=groups,dc=example,dc=com" role: SERVICE_ADMIN - group-dn: "cn=amoro-viewers,ou=groups,dc=example,dc=com" role: VIEWER - group-dn: "cn=amoro-catalog-admins,ou=groups,dc=example,dc=com" role: CATALOG_ADMIN ``` -------------------------------- ### Create and Push Release Tag Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/release-guide.md Create an annotated Git tag for the release candidate and push it to the Apache repository. ```shell git tag -a v0.8.0-rc1 -m "Release Apache Amoro 0.8.0 rc1" git push apache v0.8.0-rc1 ``` -------------------------------- ### Build Amoro Flink Optimizer Docker Image Source: https://github.com/apache/amoro/blob/master/docker/README.md Use this command to build the Amoro Flink optimizer Docker image. The project must be packaged beforehand using `./mvnw clean package -DskipTests`. ```shell ./build.sh amoro-flink-optimizer ``` -------------------------------- ### Initialize Source Tables Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-cdc-ingestion.md Initializes a list of Tuple2 containing ObjectPath and ResolvedCatalogTable for source tables. This includes defining schemas and primary keys for 'user' and 'product' tables. ```java private static List> initSourceTables() { List> pathAndTable = new ArrayList<>(); // build table "user" Schema userSchema = Schema.newBuilder() .column("id", DataTypes.INT().notNull()) .column("name", DataTypes.STRING()) .column("op_time", DataTypes.TIMESTAMP()) .primaryKey("id") .build(); List userTableCols = Stream.of( Column.physical("id", DataTypes.INT().notNull()), Column.physical("name", DataTypes.STRING()), Column.physical("op_time", DataTypes.TIMESTAMP())).collect(Collectors.toList()); Schema.UnresolvedPrimaryKey userPrimaryKey = userSchema.getPrimaryKey().orElseThrow(() -> new RuntimeException("table user required pk ")); ResolvedSchema userResolvedSchema = new ResolvedSchema(userTableCols, Collections.emptyList(), UniqueConstraint.primaryKey( userPrimaryKey.getConstraintName(), userPrimaryKey.getColumnNames())); ResolvedCatalogTable userTable = new ResolvedCatalogTable( CatalogTable.of(userSchema, "", Collections.emptyList(), new HashMap<>()), userResolvedSchema); pathAndTable.add(Tuple2.of(new ObjectPath("test_db", "user"), userTable)); // build table "product" Schema productSchema = Schema.newBuilder() .column("productId", DataTypes.INT().notNull()) .column("price", DataTypes.DECIMAL(12, 6)) .column("saleCount", DataTypes.INT()) .primaryKey("productId") .build(); List productTableCols = Stream.of( Column.physical("productId", DataTypes.INT().notNull()), Column.physical("price", DataTypes.DECIMAL(12, 6)), Column.physical("saleCount", DataTypes.INT())).collect(Collectors.toList()); ``` -------------------------------- ### Download Release Candidate Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/validate-release.md Clone the release candidate repository using SVN or download the material files directly using wget. ```shell # If there is svn locally, you can clone to the local $ svn co https://dist.apache.org/repos/dist/dev/incubator/amoro/${release_version}-${rc_version}/ # or download the material file directly $ wget https://dist.apache.org/repos/dist/dev/incubator/amoro/${release_version}-${rc_version}/ ``` -------------------------------- ### Create Database Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/flink-ddl.md Create a new database within a specified catalog. The USE statement sets the current database context. ```sql CREATE DATABASE [catalog_name.]mixed_db; USE mixed_db; ``` -------------------------------- ### Compile Apache Amoro from Source Source: https://github.com/apache/amoro/blob/master/docs/admin-guides/deployment.md Build the Apache Amoro project from its source code using Maven. This process generates the binary distribution and engine-specific runtime JARs. ```shell $ git clone https://github.com/apache/amoro.git $ cd amoro $ base_dir=$(pwd) $ ./mvnw clean package -DskipTests $ cd dist/target/ $ ls amoro-x.y.z-bin.zip # AMS release package $ cd ${base_dir}/amoro-format-mixed/amoro-mixed-flink/v1.18/amoro-mixed-flink-runtime-1.18/target $ ls amoro-format-mixed-flink-runtime-1.18-x.y.z.jar # Flink 1.18 runtime package $ cd ${base_dir}/amoro-format-mixed/amoro-mixed-spark/v3.3/amoro-mixed-spark-runtime-3.3/target $ ls amoro-format-mixed-spark-runtime-3.3-x.y.z.jar # Spark v3.3 runtime package ``` -------------------------------- ### Create Database in Amoro Catalog Source: https://github.com/apache/amoro/blob/master/docs/engines/spark/spark-get-started.md Switch to the configured Amoro catalog and create a new database using Spark SQL. Ensure the database exists before creating tables. ```sql -- switch to mixed catalog defined in spark conf use local_catalog; -- create databsae first create database if not exists test_db; ``` -------------------------------- ### Maven settings.xml Configuration Source: https://github.com/apache/amoro/blob/master/site/amoro-site/content/community/release-guide.md Configure your ~/.m2/settings.xml file to include server credentials for Apache snapshots and releases, and profile settings for GPG. This ensures secure and automated builds. ```xml apache.snapshots.https amoro {/ZLaH78TWboH5IRqNv9pgU4uamuqm9fCIbw0gRWT01c=} apache.releases.https amoro {/ZLaH78TWboH5IRqNv9pgU4uamuqm9fCIbw0gRWT01c=} apache-release 05016886 true passphrase for your gpg key ``` -------------------------------- ### Create Amoro Table with LogStore Enabled (SQL) Source: https://github.com/apache/amoro/blob/master/docs/engines/flink/using-logstore.md Use this SQL statement to create an Amoro table with LogStore enabled. Configure the LogStore topic and address as required. ```sql CREATE TABLE db.log_table ( id int, name string, ts timestamp, primary key (id) ) using mixed_iceberg tblproperties ( "log-store.enabled" = "true", "log-store.topic"="topic_log_test", "log-store.address"="localhost:9092" ); ```