### Streamloader Example with Specific Parameters Source: https://github.com/apache/doris-streamloader/blob/master/README.md An example demonstrating how to load data from 'data.csv' into 'testdb.testtbl' with custom column separators and specified columns. ```shell doris-streamloader --source_file="data.csv" --url="http://localhost:8330" --header="column_separator:|?columns:col1,col2" --db="testdb" --table="testtbl" ``` -------------------------------- ### Build Streamloader on Windows Source: https://github.com/apache/doris-streamloader/blob/master/README.md Execute this command in the doris-streamloader directory to build the project on Windows systems. Ensure Go is installed. ```shell cd doris-streamloader && build.bat ``` -------------------------------- ### Build Streamloader on Linux/macOS Source: https://github.com/apache/doris-streamloader/blob/master/README.md Execute this command in the doris-streamloader directory to build the project on Linux or macOS systems. Ensure Go is installed. ```shell cd doris-streamloader && sh build.sh ``` -------------------------------- ### Build Doris Streamloader from Source Source: https://context7.com/apache/doris-streamloader/llms.txt Build the Doris Streamloader binary from source using the provided build scripts. Requires Go version 1.20 or higher. The build process includes formatting, generating version information, and compiling. ```bash # Linux / macOS git clone https://github.com/apache/doris-streamloader.git cd doris-streamloader sh build.sh # Output: doris-streamloader (binary in repo root) # Windows cd doris-streamloader build.bat # Manual equivalent go generate # writes version.go from current git tag + commit go build # produces ./doris-streamloader ``` -------------------------------- ### Basic Streamloader Usage Source: https://github.com/apache/doris-streamloader/blob/master/README.md Use this command to initiate data loading. Specify the source file(s), Doris server URL, stream load headers, target database, and table. ```shell doris-streamloader --source_file={FILE_LIST} --url={FE_OR_BE_SERVER_URL}:{PORT} --header={STREAMLOAD_HEADER} --db={TARGET_DATABASE} --table={TARGET_TABLE} ``` -------------------------------- ### Configure Stream Load CLI Options Source: https://context7.com/apache/doris-streamloader/llms.txt Use --timeout to set HTTP client timeouts and --check_utf8 to control UTF-8 validation. Disabling --check_utf8 can improve performance if encoding is known to be clean. ```bash doris-streamloader \ --source_file="data.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --timeout=7200 \ --check_utf8=false # Default timeout: 36000 seconds (10 hours) # Disabling check_utf8 improves performance when encoding is guaranteed clean ``` -------------------------------- ### Configure Logging Source: https://context7.com/apache/doris-streamloader/llms.txt Configure log file output, enable logging to stderr, and set the log level. Debug mode enables verbose logging and prints all resolved flag values at startup. ```bash # Write logs to a rotating file (100 MB max, 7-day retention, 30 backups) doris-streamloader \ --source_file="data.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --log_filename="/var/log/streamloader/load.log" \ --log_also_to_stderr=true # Enable debug mode (prints all flags and sets log level to debug) doris-streamloader \ --source_file="data.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --debug=true ``` -------------------------------- ### Configure Batch Size and Task Splitting Source: https://context7.com/apache/doris-streamloader/llms.txt Control the number of rows per chunk and the maximum byte size for individual HTTP PUT requests to Doris. Defaults are provided for batch size, byte size per batch, and maximum byte size per task. ```bash doris-streamloader \ --source_file="data.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --batch=8192 \ --batch_byte=209715200 \ --max_byte_per_task=5368709120 # Defaults: # --batch 4096 rows # --batch_byte 943718400 bytes (~900 MB) # --max_byte_per_task 9663676416 bytes (~9 GB) ``` -------------------------------- ### Check Stream Loader Version Source: https://context7.com/apache/doris-streamloader/llms.txt Displays the current version of the Doris Stream Loader CLI. ```bash doris-streamloader --version # Output: version v1.0.0, git commit a3f9c1b2d... ``` -------------------------------- ### Basic Single-File Load with Custom Delimiter and Columns Source: https://context7.com/apache/doris-streamloader/llms.txt Loads a single CSV file into Doris, specifying a pipe delimiter and explicit column mapping. Requires authentication credentials. ```bash doris-streamloader \ --source_file="data.csv" \ --url="http://doris-fe.example.com:8030" \ --header="column_separator:|?columns:id,name,score" \ --db="analytics" \ --table="user_scores" \ -u admin \ -p "secret" ``` -------------------------------- ### Load Multiple Files and Wildcard Patterns Source: https://context7.com/apache/doris-streamloader/llms.txt Loads data from multiple specified files and all CSV files within a directory using glob patterns. Supports recursive directory traversal. ```bash doris-streamloader \ --source_file="/mnt/data/2024/*.csv,/tmp/extra1.csv,/tmp/extra2.csv" \ --url="http://doris-be.example.com:8040" \ --header="format:csv?column_separator:,?columns:ts,event,user_id" \ --db="events" \ --table="raw_events" \ -u loader \ -p "loader_pass" ``` ```bash doris-streamloader \ --source_file="/data/partitioned/year=2024" \ --url="http://doris-fe.example.com:8030" \ --db="warehouse" \ --table="fact_orders" \ -u etl_user \ -p "" ``` -------------------------------- ### Configure Stream Load HTTP Headers Source: https://context7.com/apache/doris-streamloader/llms.txt Pass custom HTTP headers to the Doris Stream Load API using the `--header` flag. Supports various options like column separators, delimiters, column mapping, and merge types. ```bash # column_separator — field delimiter (default: \t) # line_delimiter — row delimiter (default: \n) # columns — column mapping/transformation # format — csv (default) or json # label — custom load label (suffixed with __) # max_filter_ratio — fraction of rows allowed to fail (0.0–1.0) # where — filter expression applied on Doris side # merge_type — APPEND (default), DELETE, MERGE doris-streamloader \ --source_file="upsert.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="orders" \ -u root -p "" \ --header="column_separator:,?columns:order_id,amount,status?max_filter_ratio:0.05?label:daily_load?merge_type:MERGE?delete:status='deleted'" ``` -------------------------------- ### LZ4 Compression for Large Data Loads Source: https://context7.com/apache/doris-streamloader/llms.txt Enables LZ4 compression for data transfer, reducing network bandwidth at the expense of increased CPU usage. Sets appropriate HTTP headers for compression. ```bash doris-streamloader \ --source_file="/data/large_dump.csv" \ --url="http://doris-fe.example.com:8030" \ --header="column_separator:," \ --db="mydb" \ --table="big_table" \ --compress=true \ -u root \ -p "" ``` -------------------------------- ### Manual Partial Retry with --auto_retry Source: https://context7.com/apache/doris-streamloader/llms.txt Resume a failed load by specifying specific worker and task indices to retry. This avoids re-processing already loaded data. The format is `,` pairs. ```bash # Resume worker 0 from task 2, and worker 3 from task 1 doris-streamloader \ --source_file="/data/large/*.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --workers=4 \ --auto_retry="0,2;3,1" \ --auto_retry_times=3 \ --auto_retry_interval=60 # Format: --auto_retry=",[;,...]" ``` -------------------------------- ### Parallel Workers and Throughput Tuning Source: https://context7.com/apache/doris-streamloader/llms.txt Explicitly sets the number of parallel workers and provides hints for disk and stream load throughput to influence auto-scaling. Defaults to auto-inferred worker count. ```bash doris-streamloader \ --source_file="/data/huge/*.csv" \ --url="http://doris-fe.example.com:8030" \ --db="dw" \ --table="fact_sales" \ -u etl \ -p "pass" \ --workers=8 \ --disk_throughput=400 \ --streamload_throughput=80 ``` -------------------------------- ### JSON Format Load (JSON Array with Stripped Outer Array) Source: https://context7.com/apache/doris-streamloader/llms.txt Loads a JSON array, stripping the outer array structure for processing. This is useful for loading data that is enclosed in a top-level JSON array. ```bash doris-streamloader \ --source_file="batch.json" \ --url="http://doris-fe.example.com:8030" \ --header="format:json?strip_outer_array:true" \ --db="app_db" \ --table="events" \ -u root \ -p "" ``` -------------------------------- ### JSON Format Load (Newline-Delimited JSON) Source: https://context7.com/apache/doris-streamloader/llms.txt Loads newline-delimited JSON (NDJSON) data. Streamloader automatically disables parallel splitting for JSON format. ```bash doris-streamloader \ --source_file="events.jsonl" \ --url="http://doris-fe.example.com:8030" \ --header="format:json?strip_outer_array:false?fuzzy_parse:true" \ --db="app_db" \ --table="events" \ -u root \ -p "" ``` -------------------------------- ### Configure Automatic Retries Source: https://context7.com/apache/doris-streamloader/llms.txt Set the number of automatic retry attempts and the interval between them for failed workers. A retry command is printed to stdout if all retries are exhausted. ```bash doris-streamloader \ --source_file="/data/large/*.csv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="mytable" \ -u root -p "" \ --auto_retry_times=5 \ --auto_retry_interval=30 # On exhausted retries, output example: # load has some error, and auto retry failed, you can retry by : # ./doris_streamloader --source_file /data/large/*.csv --url="http://..." \ # --header="" --db="mydb" --table="mytable" -u root -p "" \ # --auto_retry="0,2;3,1" --auto_retry_times=5 --auto_retry_interval=30 ``` -------------------------------- ### Aggregated Load Result Schema Source: https://context7.com/apache/doris-streamloader/llms.txt This JSON schema represents the aggregated load result printed to stdout after all workers complete. It includes total rows, failed rows, loaded rows, and load statistics. ```json { "Status": "Success", "TotalRows": 200000, "FailLoadRows": 0, "LoadedRows": 199950, "FilteredRows": 50, "UnselectedRows": 0, "LoadBytes": 26214400, "LoadTimeMs": 8430, "LoadFiles": [ "/data/2024/part-00000.csv", "/data/2024/part-00001.csv", "/data/2024/part-00002.csv" ] } ``` -------------------------------- ### StreamLoad HTTP Response Schema Source: https://context7.com/apache/doris-streamloader/llms.txt This JSON schema represents the response body from an HTTP PUT request to the Stream Load API. It aggregates load statistics atomically across all workers. ```json { "TxnId": 1001, "Label": "daily_load_0_1", "Status": "Success", "Message": "OK", "NumberTotalRows": 50000, "NumberLoadedRows": 49975, "NumberFilteredRows": 25, "NumberUnselectedRows": 0, "LoadBytes": 6291456, "LoadTimeMs": 1842, "BeginTxnTimeMs": 10, "StreamLoadPutTimeMs": 5, "ReadDataTimeMs": 980, "WriteDataTimeMs": 820, "CommitAndPublishTimeMs": 27, "ErrorURL": "http://doris-be.example.com:8040/api/_load_error_log?file=error_log_0" } ``` -------------------------------- ### Specify Custom Line Delimiter Source: https://context7.com/apache/doris-streamloader/llms.txt Override the default newline character (`\n`) for line delimitation. Supports hex escape sequences for custom delimiters like tab (`\t`) or carriage return (`\r`). ```bash # Tab-separated values where records end with \r\n doris-streamloader \ --source_file="windows_export.tsv" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="tsv_table" \ -u root -p "" \ --header="column_separator:\t?line_delimiter:\x0d" # Custom single-character delimiter via \xNN hex escape doris-streamloader \ --source_file="pipe_delim.dat" \ --url="http://doris-fe.example.com:8030" \ --db="mydb" \ --table="pipe_table" \ -u root -p "" \ --header="line_delimiter:\x7c" ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.