### Example Pipeline with Multiple Steps Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PIPELINE.md A comprehensive example demonstrating a pipeline with environment variables, logging, replication, conditional command execution, and an HTTP notification step. ```yaml env: TARGET_SCHEMA: "analytics" steps: - type: log message: "Pipeline started at {timestamp.datetime}" - type: replication path: replications/main_replication.yaml env: SLING_THREADS: 8 id: main_replication - type: command command: "grep -q 'ERROR' /var/log/app.log" if: "false" # This step will be skipped - type: http url: "https://hooks.slack.com/services/..." method: "POST" payload: | { "text": "Pipeline completed successfully for schema: {env.TARGET_SCHEMA}" } if: state.main_replication.status == "success" ``` -------------------------------- ### Install Sling CLI using Scoop on Windows Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Add the Sling repository to Scoop and install the Sling CLI on Windows. Verify the installation with `sling -h`. ```powershell scoop bucket add sling https://github.com/slingdata-io/scoop-sling.git scoop install sling # You're good to go! sling -h ``` -------------------------------- ### Getting Connection Documentation with MCP Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION.md Demonstrates how to use the 'connection' tool with the 'docs' action to retrieve comprehensive documentation for connections. ```json { "tool": "connection", "action": "docs", "input": {} } ``` -------------------------------- ### Basic File Listing Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Demonstrates listing only files within a specific directory in an S3 connection. ```json { "action": "list", "input": { "connection": "MY_S3", "path": "data/csv_files/", "recursive": false, "only": "files" } } ``` -------------------------------- ### Extend Default Setup and Teardown Sequences Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Use `+setup`, `setup+`, `+teardown`, and `teardown+` modifiers to prepend or append custom setup and teardown calls relative to default sequences. ```yaml defaults: setup: - request: url: "{base_url}/auth/refresh" endpoints: my_endpoint: +setup: # Runs BEFORE default setup - request: url: "{base_url}/pre-check" teardown+: # Runs AFTER default teardown - request: url: "{base_url}/cleanup" request: url: "{base_url}/data" ``` -------------------------------- ### Install Sling CLI Binary on Linux Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Download the latest Sling CLI binary for Linux (amd64), extract it, and clean up the archive. Check the installation with `sling -h`. ```bash curl -LO 'https://github.com/slingdata-io/sling-cli/releases/latest/download/sling_linux_amd64.tar.gz' \ && tar xzf sling_linux_amd64.tar.gz \ && rm -f sling_linux_amd64.tar.gz # You're good to go! sling -h ``` -------------------------------- ### Install Sling CLI via Python Wrapper Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Install the Sling CLI using pip. This makes the `sling` command available on your system. ```python pip install sling ``` -------------------------------- ### HTTP and Command Hook Examples Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Illustrates using 'http' hooks for notifications and 'command' hooks for cleanup tasks. Ensure necessary environment variables like SLACK_TOKEN are set. ```yaml hooks: start: # Notification - type: http url: 'https://slack.com/api/chat.postMessage' method: POST headers: Authorization: 'Bearer ${SLACK_TOKEN}' body: | { "channel": "#data-pipeline", "text": "Starting data replication" } end: # Cleanup - type: command command: 'rm -f /tmp/temp_files_*' # Update status - type: query connection: METADATA_DB sql: | INSERT INTO replication_log (job_id, status, completed_at) VALUES ('repl_001', 'completed', NOW()) ``` -------------------------------- ### Full Refresh Replication Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Configures a full-refresh replication from a source to a target, with options for object naming, column casing, and chunking for large tables. ```yaml source: POSTGRES_PROD target: SNOWFLAKE_DW defaults: mode: full-refresh object: 'analytics.{stream_schema}_{stream_table}' target_options: column_casing: snake streams: # Single table public.customers: # All sales tables sales.* object: 'sales_data.{stream_table}' # Large table with chunking public.transactions: source_options: chunk_count: 10 env: SLING_THREADS: 5 SLING_LOADED_AT_COLUMN: true ``` -------------------------------- ### Install Sling CLI using Homebrew on Mac Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Use Homebrew to install the Sling CLI on macOS. After installation, verify by running `sling -h`. ```shell brew install slingdata-io/sling/sling # You're good to go! sling -h ``` -------------------------------- ### Local to Cloud Backup Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Back up local files to cloud storage recursively. Note the double slash in the local absolute path for clarity. ```json { "action": "copy", "input": { "source_location": "LOCAL//home/user/documents/", "target_location": "BACKUP_S3/daily-backup/documents/", "recursive": true } } ``` -------------------------------- ### Single File Copy Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Demonstrates copying a single file from a local path to an S3 bucket. Ensure the source and target locations are correctly formatted. ```json { "action": "copy", "input": { "source_location": "local/path/to/source.csv", "target_location": "s3/bucket/folder/destination.csv", "recursive": false } } ``` -------------------------------- ### Pattern Matching Examples for Streams Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Combine schema and file wildcards with custom configurations for specific tables or patterns. Includes setting mode, object names, and disabling streams. ```yaml source: POSTGRES target: SNOWFLAKE defaults: mode: incremental object: 'warehouse.{stream_schema}_{stream_table}' streams: # All user-related tables public.user_*: primary_key: [user_id] # All tables except sensitive ones public.*: public.passwords: disabled: true public.credit_cards: disabled: true # Specific tables with custom config public.large_table: source_options: chunk_size: 12h ``` -------------------------------- ### Incremental Replication Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Sets up incremental replication, defining primary and update keys, and handling source/target options like empty values and new columns. Supports custom SQL for incremental logic. ```yaml source: MYSQL_APP target: POSTGRES_DW defaults: mode: incremental object: 'warehouse.{stream_table}' primary_key: [id] update_key: updated_at source_options: empty_as_null: false target_options: column_casing: snake add_new_columns: true streams: # Standard incremental app.users: app.orders: primary_key: [order_id] # Append-only (no primary key) app.events: primary_key: [] update_key: created_at # Custom SQL with incremental app.user_summary: sql: | SELECT user_id, COUNT(*) as order_count, MAX(created_at) as last_order_date FROM orders WHERE updated_at > coalesce({incremental_value}, '1900-01-01') GROUP BY user_id update_key: last_order_date env: SLING_THREADS: 3 SLING_RETRIES: 2 ``` -------------------------------- ### Full Sling API Spec Structure with Authentication Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Illustrates a comprehensive Sling API specification, including queues, detailed authentication methods (basic and OAuth2), and endpoint configurations. This example covers more advanced setup options. ```yaml # Name of the API name: "Example API" # Description of what the API does description: "This API provides access to example data" # Queues pass data between endpoints (write-then-read, temporary storage) queues: - user_ids # Authentication configuration for accessing the API authentication: # Type of authentication: "basic", "oauth2", "aws-sigv4", "sequence", or empty for none type: "basic" # Basic authentication credentials username: "{secrets.username}" password: "{secrets.password}" # OAuth2 Client Credentials Flow (most common for API integrations) type: "oauth2" flow: "client_credentials" client_id: "{secrets.oauth_client_id}" client_secret: "{secrets.oauth_client_secret}" authentication_url: "https://api.example.com/oauth/token" scopes: ["read:data", "write:data"] expires: 3600 # Re-auth interval in seconds; automatic before each request if expired refresh_on_expire: true # Auto-refresh OAuth2 tokens (requires refresh_token) ``` -------------------------------- ### Install Sling Python Package Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PYTHON.md Install the Sling Python package using pip. Include the `[arrow]` extra for Apache Arrow support for high-performance streaming. ```bash pip install sling ``` ```bash pip install sling[arrow] ``` -------------------------------- ### File Discovery Workflow - Browse Root Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Start a file discovery process by listing the contents of the root directory of a specified connection. This is the first step in exploring storage. ```json { "action": "list", "input": { "connection": "MY_STORAGE", "path": "" } } ``` -------------------------------- ### Directory Copy Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Use this to copy an entire directory recursively from S3 to a local path. The `recursive` parameter must be set to `true`. ```json { "action": "copy", "input": { "source_location": "s3/bucket/source_folder/", "target_location": "local/backup/target_folder/", "recursive": true } } ``` -------------------------------- ### Replication-Level Hooks Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Define commands or SQL queries to execute at the start and end of a replication process. Supports command-line execution and database queries. ```yaml hooks: start: - type: command command: 'echo "Starting replication at $(date)"' - type: query connection: MY_DB sql: 'UPDATE job_status SET status = "running" WHERE job_id = "repl_001"' end: - type: command command: 'echo "Replication completed at $(date)"' ``` -------------------------------- ### Simple Database-to-Database Replication Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Example of a basic database-to-database replication. Configures source and target connections, default object naming, and specifies tables to replicate, including exclusions. ```yaml source: MY_POSTGRES target: MY_SNOWFLAKE defaults: mode: full-refresh object: 'warehouse.{stream_schema}_{stream_table}' streams: # Single table public.customers: # All tables in schema public.*: # Exclude specific table public.sensitive_data: disabled: true env: SLING_THREADS: 5 ``` -------------------------------- ### Simple File-to-Database Replication Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Example of replicating data from files to a database. Specifies source and target, default object naming for files, and source options for CSV format. ```yaml source: LOCAL target: POSTGRES defaults: mode: full-refresh object: 'staging.{stream_file_name}' source_options: format: csv header: true streams: 'file://data/customers.csv': 'file://data/products.csv': 'data/*.csv': # All CSV files in directory env: SLING_THREADS: 3 ``` -------------------------------- ### Get Database Documentation Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Use the `docs` action to retrieve comprehensive documentation for a specific database. This action requires a Pro token and is subject to rate limits. ```json { "action": "docs", "input": {} } ``` -------------------------------- ### State Variable Rendering Order Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Demonstrates the automatic topological sort for state variable dependencies. Ensure variables are resolved before requests are made. Circular dependencies will cause errors. ```yaml state: base_url: "https://api.com" # No dependencies users_url: "{state.base_url}/users" # Depends on base_url full_url: "{state.users_url}?limit=100" # Depends on users_url # Renders: base_url → users_url → full_url # ❌ Circular dependency error: state: var_a: "{state.var_b}" # A → B var_b: "{state.var_a}" # B → A (circular!) ``` -------------------------------- ### Date Parsing Examples Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Demonstrates how to parse date strings into time objects using 'auto' detection or a specified strftime format. Ensure the input string matches the provided format for successful parsing. ```yaml # Auto-detect format - expression: date_parse("05/15/2022", "auto") output: state.parsed_auto ``` ```yaml # Specify strftime format - expression: date_parse("15-May-2022 10:30", "%d-%b-%Y %H:%M") output: state.parsed_specific ``` -------------------------------- ### Date Formatting Examples Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Shows how to format a time object into a string using a specified strftime format. This is useful for creating human-readable dates or preparing dates for API requests. ```yaml # Format a date object - expression: date_format(state.parsed_auto, "%Y%m%d") output: state.formatted_compact ``` ```yaml # Format for API parameter (ISO 8601 with timezone) request: parameters: updated_since: "{date_format(date_add(now(), -1, 'hour'), '%Y-%m-%dT%H:%M:%SZ')}" ``` -------------------------------- ### Multi-Schema Replication Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Demonstrates replicating data across multiple schemas, including schema mapping and table exclusions. Configures target options like column casing and sort keys. ```yaml source: ORACLE_ERP target: REDSHIFT_DW defaults: mode: incremental primary_key: [id] update_key: last_modified object: '{stream_schema}_data.{stream_table}' target_options: column_casing: lower table_keys: sort: [id, last_modified] streams: # Finance schema finance.* target_options: table_keys: sort: [account_id, transaction_date] # HR schema hr.* object: 'human_resources.{stream_table}' # Sensitive table exclusions hr.salaries: disabled: true finance.audit_log: disabled: true env: SLING_THREADS: 8 ``` -------------------------------- ### Simple Data Load and Query Pipeline Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PIPELINE.md An example pipeline that first loads data from a CSV file into a PostgreSQL table and then queries the first 5 rows, conditional on the load success. ```yaml steps: - type: run source: "file://data/users.csv" target: "MY_POSTGRES.public.users" mode: "full-refresh" id: load_data - type: query connection: "MY_POSTGRES" query: "SELECT * FROM public.users LIMIT 5;" if: state.load_data.status == "success" ``` -------------------------------- ### Loop Over a List of Files Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PIPELINE.md Iterate over a list of files using a 'group' step. This example processes each CSV file, loading its content into a staging table. ```yaml - type: group loop: - "customers.csv" - "products.csv" - "orders.csv" steps: - type: log message: "Processing file {loop.index + 1}: {loop.value}" - type: run source: "file://data/{loop.value}" target: "STAGING.{loop.value.split('.')[0]}" mode: "truncate" ``` -------------------------------- ### Multi-Cloud Copy Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Copy files between different cloud providers, such as AWS S3 and Google Cloud Storage. The action handles the underlying transfer. ```json { "action": "copy", "input": { "source_location": "AWS_S3/data/export.csv", "target_location": "GCS_BUCKET/imports/data.csv" } } ``` -------------------------------- ### Authorization Code Flow (Interactive Mode) Source: https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/api/OAUTH2_EXAMPLES.md For CLI applications. Leave redirect_uri empty to automatically start a local server and open your browser for authentication. ```yaml authentication: type: oauth2 flow: authorization_code client_id: "${secrets.OAUTH_CLIENT_ID}" client_secret: "${secrets.OAUTH_CLIENT_SECRET}" authentication_url: "https://api.example.com/oauth/token" # redirect_uri: "" # Leave empty for automatic localhost server scopes: - "read:data" ``` -------------------------------- ### Dynamic Endpoints Generation Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md This example shows how to dynamically generate API endpoints at runtime based on discovery or catalog APIs. It first fetches a list of items (e.g., tables) and then iterates over this list to create individual endpoints, each templated with data from the iterated item. ```yaml dynamic_endpoints: - setup: # Optional: fetch list of items - request: url: "{state.base_url}/catalog" response: processors: - expression: 'jmespath(response.json, "tables")' output: "state.table_list" iterate: "state.table_list" # Or JSON literal: '["a","b","c"]' into: "state.current_table" # Must be state.* endpoint: # Template with {state.current_table} access name: "{state.current_table.name}" description: "Table: {state.current_table.description}" state: table_id: "{state.current_table.id}" request: url: "{state.base_url}/data/{state.table_id}" response: records: jmespath: "rows[]" primary_key: ["id"] ``` -------------------------------- ### Backfill Historical Data Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Configures backfill replication for historical data with specified date ranges and chunking strategies. Adjust thread count for conservative historical loads. ```yaml source: POSTGRES_OLD target: BIGQUERY_DW defaults: mode: backfill object: 'historical.{stream_table}' update_key: created_date source_options: range: '2020-01-01,2023-12-31' chunk_size: 30d # 30-day chunks streams: transactions: primary_key: [transaction_id] user_activity: source_options: chunk_size: 7d # Smaller chunks for large table env: SLING_THREADS: 2 # Conservative for historical loads ``` -------------------------------- ### CSV Import Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Configures importing CSV files into a database, specifying source options like format, header, encoding, and handling of empty values. Defines column types for specific files. ```yaml source: LOCAL target: POSTGRES defaults: mode: full-refresh object: 'staging.{stream_file_name}' source_options: format: csv header: true encoding: utf8 empty_as_null: true skip_blank_lines: true streams: 'data/customers.csv': columns: customer_id: bigint email: string(255) created_at: datetime 'data/products.csv': target_options: column_casing: snake # All CSV files in directory 'imports/*.csv': env: SLING_THREADS: 3 ``` -------------------------------- ### Start MCP Server with Sling CLI Source: https://context7.com/slingdata-io/sling-cli/llms.txt Launch the Model Context Protocol (MCP) server using the 'sling serve mcp' command. This server supports stdio transport and is compatible with MCP clients. ```bash # Start the MCP server (stdio transport) sling serve mcp ``` -------------------------------- ### Replication Hooks Configuration Source: https://context7.com/slingdata-io/sling-cli/llms.txt Define shell commands, SQL queries, or HTTP calls to execute before or after replication or individual streams. Supports 'start' and 'end' hooks at the global level, and 'pre' and 'post' hooks at the default level. ```yaml source: MY_PG target: MY_SNOWFLAKE hooks: start: - type: query connection: MY_SNOWFLAKE sql: "UPDATE job_log SET status='running', started_at=NOW() WHERE job='daily_sync'" - type: http url: "https://hooks.slack.com/services/T00/B00/XXXX" method: POST body: '{"text": "Sling replication started"}' end: - type: query connection: MY_SNOWFLAKE sql: "UPDATE job_log SET status='done', ended_at=NOW() WHERE job='daily_sync'" - type: command command: 'echo "Replication finished at $(date)"' defaults: mode: incremental update_key: updated_at primary_key: [id] hooks: pre: - type: query connection: MY_SNOWFLAKE sql: "CREATE SCHEMA IF NOT EXISTS analytics" post: - type: query connection: MY_SNOWFLAKE sql: "ANALYZE {object_name}" streams: public.users: public.orders: hooks: post: - type: command command: 'python validate_orders.py' ``` -------------------------------- ### Backup and Sync Pattern - Inspect Source Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Inspect the source directory recursively to understand its size and contents before creating a backup. This helps in planning and verification. ```json { "action": "inspect", "input": { "connection": "LOCAL", "path": "/important/data/", "recursive": true } } ``` -------------------------------- ### List All Connections Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION.md Retrieve a list of all configured connections across different sources like environment files, dbt profiles, and environment variables. ```json { "action": "list", "input": {} } ``` -------------------------------- ### Compile Sling CLI from Source on Linux/Mac Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Clone the Sling CLI repository, navigate into the directory, and build the project using the provided script. Run `./sling --help` to verify. ```bash git clone https://github.com/slingdata-io/sling-cli.git cd sling-cli bash scripts/build.sh ./sling --help ``` -------------------------------- ### Test Sling Connectivity Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/DEBUG.md Tests the connectivity of Sling. Use this to ensure proper connection setup before executing queries. ```bash ./sling conns test ``` -------------------------------- ### Data Migration Workflow - List Source Files Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Begin a data migration by listing all files recursively from the source storage. This provides an overview of the data to be moved. ```json { "action": "list", "input": { "connection": "OLD_STORAGE", "path": "legacy_data/", "recursive": true } } ``` -------------------------------- ### Get Value by Path Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Retrieve a value from a nested object using dot notation with `get_path()`. The path can include array indices. ```sling get_path(response.json, "user.profile.email") ``` -------------------------------- ### Load Replication Config from File in Go Source: https://context7.com/slingdata-io/sling-cli/llms.txt Loads a replication configuration from a YAML file, compiles it to resolve wildcards and apply defaults, and iterates through the generated tasks. Ensure 'replication.yaml' exists and is correctly formatted. ```go package main import ( "github.com/slingdata-io/sling-cli/core/sling" "github.com/flarco/g" ) func main() { // Load replication config from YAML file replication, err := sling.LoadReplicationConfigFromFile("replication.yaml") if err != nil { g.LogFatal(err) } // Compile: resolve wildcards, apply defaults, build task list err = replication.Compile(nil) // pass nil or a *sling.Config overwrite if err != nil { g.LogFatal(err) } g.Info("Will run %d streams", len(replication.Tasks)) // Iterate compiled tasks for _, task := range replication.Tasks { g.Info("Stream: %s -> %s", task.Source.Stream, task.Target.Object) } } ``` -------------------------------- ### Inspect Files or Directories Recursively Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Get metadata and statistics for files or directories. Set `recursive` to `true` to include all contents of a directory. ```json { "action": "inspect", "input": { "connection": "MY_S3", "path": "data/large_dataset/", "recursive": true } } ``` -------------------------------- ### Load Replication Configuration from YAML Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PYTHON.md Initialize a Replication object by specifying a file path to a YAML configuration file. ```python replication = Replication(file_path="replication.yaml") replication.run() ``` -------------------------------- ### Conditional Execution with 'if' Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PIPELINE.md Use the 'if' key to conditionally execute a step based on an expression. This example checks if the day of the week is a weekend. ```yaml steps: - type: command command: "date +%u" # Returns 1-7 (Mon-Sun) id: "day_of_week" - type: replication path: "weekend_job.yaml" if: "cast(state.day_of_week.output.stdout, 'int') > 5" # Only run on Sat or Sun ``` -------------------------------- ### Explore Tables within a Schema Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Get a list of tables within a specific schema. This action helps in navigating and understanding the structure of the database. ```json { "action": "get_schemata", "input": { "connection": "MY_DB", "level": "table", "schema_name": "production" } } ``` -------------------------------- ### Basic MCP Tool Usage Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION.md Illustrates the general JSON structure for interacting with MCP tools, specifying the tool, action, and input parameters. ```json { "tool": "connection", "action": "action_name", "input": { "parameter1": "value1", "parameter2": "value2" } } ``` -------------------------------- ### Get Schemata with Schema Level Detail Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md List all available schemas within a database connection. This provides a high-level overview of the database organization. ```json { "action": "get_schemata", "input": { "connection": "MY_POSTGRES", "level": "schema" } } ``` -------------------------------- ### Iterate Over ID Chunks Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Example of iterating over batches of IDs from a queue using the `chunk` function. The batch is joined into a comma-separated string for the API parameter. ```yaml iterate: # Use the chunk() function to process IDs in batches of 50 over: "chunk(queue.variant_ids, 50)" into: "state.variant_id_batch" # state.variant_id_batch will be an array/list concurrency: 5 request: parameters: # Join the batch of IDs into a comma-separated string for the API parameter ids: '{join(state.variant_id_batch,",")}' ``` -------------------------------- ### Iterate Over Date Range Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Example of iterating over a date range using the `range` function. The current date is stored in `state.current_day` and formatted for the request parameters. ```yaml iterate: over: > range( date_trunc(date_add(now(), -7, "day"), "day"), # Start date: 7 days ago date_trunc(now(), "day"), # End date: today "1d" # Step: 1 day ) into: state.current_day concurrency: 10 request: parameters: date: '{date_format(state.current_day, "%Y-%m-%d")}' # ... other params ... ``` -------------------------------- ### Endpoint Definition Example Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Defines an API endpoint within the Sling configuration. The key used for the endpoint (e.g., 'users') serves as its internal name. ```yaml endpoints: # The key 'users' is the effective name users: description: "Retrieve users from the API" # ... other endpoint config ... # The key 'get_details' is the effective name get_details: description: "Get item details" # ... other endpoint config ... ``` -------------------------------- ### Run Replication with Connection Test Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md This snippet demonstrates running a replication job after testing a connection. It uses the 'run' action with a specified file path. ```json { "action": "run", "input": { "file_path": "/path/to/replication.yaml" } } ``` -------------------------------- ### Execute Replication Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Use the 'run' action to execute a replication job based on a configuration file. Options include selecting specific streams, setting a working directory, specifying a backfill range, overriding the mode, and passing environment variables. ```json { "action": "run", "input": { "file_path": "/path/to/replication.yaml", "select_streams": ["specific_table"], "working_dir": "/project/directory", "range": "2024-01-01,2024-01-31", "mode": "incremental", "env": { "CUSTOM_VAR": "value", "SLING_THREADS": "5" } } } ``` -------------------------------- ### Sling MCP Database Tool Actions Source: https://context7.com/slingdata-io/sling-cli/llms.txt Query databases in a read-only manner. Supports querying data, retrieving schemata, and getting table columns. ```json { "action": "query", "input": { "connection": "MY_PG", "query": "SELECT status, COUNT(*) FROM orders GROUP BY status", "description": "Get order counts by status", "limit": 100 } } ``` ```json { "action": "get_schemata", "input": { "connection": "MY_PG", "level": "column", "schema_name": "public", "table_names": ["users", "orders"] } } ``` ```json { "action": "get_columns", "input": { "connection": "MY_PG", "table_name": "public.orders" } } ``` -------------------------------- ### Compile Sling CLI from Source on Windows (PowerShell) Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Clone the Sling CLI repository, change directory, and build the project using the PowerShell script. Execute `.\sling --help` to confirm. ```powershell git clone https://github.com/slingdata-io/sling-cli.git cd sling-cli .\scripts\build.ps1 .\sling --help ``` -------------------------------- ### Get Schemata with Specific Tables and Levels Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Retrieve schema information for specific tables and levels of detail. This allows for targeted exploration of your database structure. ```json { "action": "get_schemata", "input": { "connection": "MY_POSTGRES", "level": "table", "schema_name": "public", "table_names": ["users", "orders"] } } ``` -------------------------------- ### Get Schemata with Table Level Detail Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Retrieve a list of tables within a specific schema. This is useful for understanding the structure of your database at the table level. ```json { "action": "get_schemata", "input": { "connection": "MY_POSTGRES", "level": "table", "schema_name": "public" } } ``` -------------------------------- ### Sync State Validation: Valid Configuration Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Ensures that each key in the `sync` array has a corresponding processor that writes to `state.`. This example shows a valid configuration. ```yaml endpoints: valid: sync: ["last_id"] response: processors: - expression: "record.id" output: "state.last_id" # ✅ Matches sync key aggregation: "last" ``` -------------------------------- ### Backfill with Incremental Fallback using Context Variables Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md This configuration demonstrates how to set up a replication process that supports backfilling historical data and falling back to incremental updates. It utilizes context variables like `context.range_start`, `context.range_end`, and `sync.last_date` to define the iteration range, with `coalesce` ensuring a fallback mechanism. ```yaml endpoints: events: sync: [last_date] iterate: over: > range( coalesce(context.range_start, sync.last_date, "2024-01-01"), # Priority order coalesce(context.range_end, date_format(now(), "%Y-%m-%d")), "1d" ) into: "state.current_date" response: processors: - expression: "state.current_date" output: "state.last_date" aggregation: "maximum" # Replication config for backfill: # source_options: # range: '2024-01-01,2024-01-31' # Sets context.range_start/end ``` -------------------------------- ### Default Table Keys Configuration Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Define default table keys for primary, index, cluster, and partition columns, with specific examples for different database systems. ```yaml defaults: target_options: table_keys: primary: [id] index: [customer_id, created_at] cluster: [region] # BigQuery/Snowflake partition: [date_column] # PostgreSQL/ClickHouse ``` -------------------------------- ### File Discovery Workflow - Inspect Files Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Inspect specific files identified during the discovery process to get their metadata. This is useful for understanding file properties before further action. ```json { "action": "inspect", "input": { "connection": "MY_STORAGE", "path": "data/2024/large_dataset.parquet" } } ``` -------------------------------- ### Backup and Sync Pattern - Create Backup Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Create a timestamped backup of local data to cloud storage recursively. This ensures that backups are organized by date. ```json { "action": "copy", "input": { "source_location": "LOCAL//important/data/", "target_location": "BACKUP_S3/backups/2024-08-24/data/", "recursive": true } } ``` -------------------------------- ### Directory Statistics Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Get recursive statistics for a local directory, including total size, file count, and directory count. This is useful for understanding directory contents. ```json { "action": "inspect", "input": { "connection": "LOCAL_FS", "path": "/var/log/", "recursive": true } } ``` -------------------------------- ### Backup and Sync Pattern - Verify Backup Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_FILE.md Verify the integrity and size of the created backup by inspecting the backup location recursively. This confirms the backup process completed successfully. ```json { "action": "inspect", "input": { "connection": "BACKUP_S3", "path": "backups/2024-08-24/", "recursive": true } } ``` -------------------------------- ### Aggregate Data with IF Condition Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Example of using an IF condition before an expression for aggregation. The processor is skipped if the IF condition is false, and the aggregation is applied to the 'state.total_amount' if the condition is true and the expression evaluates. ```yaml processors: # Example using aggregation with IF condition - expression: "record.amount" # Optional: Control whether processor is evaluated with IF condition # - Evaluated BEFORE the main expression # - Must return a boolean value (true/false) # - If false, this processor is COMPLETELY SKIPPED for the current record # - Has access to full state map: record, state, response, env, secrets # - Common use: Filter records, skip nulls, conditional logic if: '!is_null(record.amount) && record.amount > 0' # Target output for aggregation must be 'state.' output: "state.total_amount" # Aggregation type: maximum, minimum, collect, first, last (default: none) aggregation: "maximum" ``` -------------------------------- ### Using 'store' for Cross-Step Communication Source: https://context7.com/slingdata-io/sling-cli/llms.txt Utilize the 'store' step to save values and access them in subsequent steps using the 'store.' syntax, enabling inter-step data sharing. ```yaml steps: # Use store values for cross-step communication - type: store key: target_env value: "production" - type: log message: "Deploying to {store.target_env}" if: "store.target_env == 'production'" ``` -------------------------------- ### Structuring Configuration with Defaults Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Define common replication patterns in a 'defaults' section to reduce repetition. Override defaults only when necessary for specific streams. ```yaml # Define common patterns in defaults defaults: mode: incremental object: 'warehouse.{stream_schema}_{stream_table}' primary_key: [id] update_key: updated_at target_options: column_casing: snake add_new_columns: true # Override only when necessary streams: public.users: # Uses all defaults public.logs: # Override for specific needs mode: full-refresh primary_key: [] ``` -------------------------------- ### Daily Data Warehouse Load Pipeline Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PIPELINE.md An example pipeline that performs a data warehouse load, runs dbt models, and sends Slack notifications for success or failure. ```yaml env: SLACK_WEBHOOK_URL: "..." steps: - type: log message: "Starting daily data warehouse load" - type: replication path: "replications/pg_to_sflake.yaml" mode: "incremental" id: dw_load - type: command command: "dbt run --select my_models" working_dir: "/path/to/dbt/project" if: state.dw_load.status == "success" - type: http url: "{env.SLACK_WEBHOOK_URL}" method: "POST" payload: '{"text": "Daily DW load completed successfully!"}' if: state.dw_load.status == "success" - type: http url: "{env.SLACK_WEBHOOK_URL}" method: "POST" payload: '{"text": "ERROR: Daily DW load failed!"}' if: state.dw_load.status == "error" ``` -------------------------------- ### Configure Source and Target Options Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_PYTHON.md Customize source and target behavior using `SourceOptions` and `TargetOptions`, specifying formats, delimiters, compression, and file settings. ```python src_opts = SourceOptions( format=Format.CSV, delimiter="|", header=True, null_if="NULL", ) tgt_opts = TargetOptions( format=Format.PARQUET, compression=Compression.ZSTD, file_max_rows=100000, column_casing="snake" ) sling = Sling( src_stream="file://input.csv", src_options=src_opts, tgt_object="file://output.parquet", tgt_options=tgt_opts ) sling.run() ``` -------------------------------- ### MySQL: Get Database Size by Schema Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Calculate and display the size of each database schema in megabytes for a MySQL instance, providing a storage overview. Requires a MySQL connection. ```json // Get database size { "action": "query", "input": { "connection": "MY_MYSQL", "query": "SELECT table_schema, ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'DB Size in MB' FROM information_schema.tables GROUP BY table_schema", "description": "Get size of each database schema in MB for storage overview" } } ``` -------------------------------- ### Programmatically Configure and Execute Sling Task in Go Source: https://context7.com/slingdata-io/sling-cli/llms.txt Builds a Sling configuration object programmatically, prepares it for execution by resolving connections and validating settings, determines the task type, and then executes the task. This is useful for dynamic task creation. ```go package main import ( "github.com/slingdata-io/sling-cli/core/sling" "github.com/flarco/g" ) func main() { // Build config programmatically cfg := &sling.Config{ Source: sling.Source{ Conn: "MY_POSTGRES", Stream: "public.transactions", UpdateKey: "updated_at", PrimaryKeyI: []string{"transaction_id"}, Options: &sling.SourceOptions{ Limit: g.Int(10000), }, }, Target: sling.Target{ Conn: "MY_SNOWFLAKE", Object: "analytics.transactions", Options: &sling.TargetOptions{ AddNewColumns: g.Bool(true), UseBulk: g.Bool(true), }, }, Mode: sling.IncrementalMode, Env: map[string]string{ "SLING_THREADS": "5", }, } // Prepare resolves connections, validates config, sets defaults err := cfg.Prepare() if err != nil { g.LogFatal(err, "could not prepare config") } // Determine task type: DbToDb, FileToDB, DbToFile, ApiToDB, etc. taskType, err := cfg.DetermineType() if err != nil { g.LogFatal(err) } g.Info("Task type: %s", taskType) // Task type: db-db // Execute task := sling.NewTask("my-exec-id", cfg) if err = task.Execute(); err != nil { g.LogFatal(err) } g.Info("Done. Rows: %d", task.GetCount()) } ``` -------------------------------- ### Sample Data from a Table Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Preview a small number of rows from a table to inspect data values and format. This is a common step in basic data exploration. ```json { "action": "query", "input": { "connection": "MY_DB", "query": "SELECT * FROM production.users LIMIT 3", "description": "Sample 3 rows from production.users to inspect data values and format" } } ``` -------------------------------- ### Discover Available Streams Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Use the discover action to list available streams from a source connection, optionally filtering by a pattern. ```json { "action": "discover", "input": { "connection": "MY_SOURCE_DB", "pattern": "schema.*" } } ``` -------------------------------- ### Advanced Transformations with Staged Logic Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Define complex business logic using staged transformations. This example shows data cleansing, computed metrics, segmentation, and risk scoring. ```yaml streams: customer_analytics: transforms: # Stage 1: Data cleansing - email: 'lower(trim_space(value))' phone: 'replace(value, "[^0-9+થી", "")' name: 'trim_space(value)' # Stage 2: Computed metrics - days_since_signup: 'date_diff(now(), record.created_at, "day")' lifetime_value: 'coalesce(record.total_spent, 0) * 1.2' # Stage 3: Segmentation - customer_segment: | record.lifetime_value >= 10000 ? "enterprise" : ( record.lifetime_value >= 1000 ? "professional" : ( record.days_since_signup <= 30 ? "new" : "standard" ) ) # Stage 4: Risk scoring - risk_score: | (record.failed_payments * 0.3) + (record.days_since_last_login * 0.1) + (record.support_tickets * 0.2) ``` -------------------------------- ### Build Sling Binary Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/DEBUG.md Builds the sling binary in the cmd/sling directory. This is a prerequisite for debugging. ```bash cd cmd/sling go build . ``` -------------------------------- ### Enable Debug Logging for Run Action Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_REPLICATION.md Run a replication with the DEBUG environment variable set to true to enable detailed debug logging. ```json { "action": "run", "input": { "file_path": "/path/to/replication.yaml", "env": { "DEBUG": "true" } } } ``` -------------------------------- ### Query Data Distribution by Status Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Analyze the distribution of data across different categories, such as order statuses, to understand data breakdown. This example groups by status and limits results. ```json { "action": "query", "input": { "connection": "MY_DB", "query": "SELECT status, COUNT(*) as count FROM sales.orders GROUP BY status ORDER BY count DESC LIMIT 10", "description": "Analyze order status distribution to understand the breakdown of order states" } } ``` -------------------------------- ### get_user_details Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_API_SPEC.md Retrieve details for each user ID from the queue. This endpoint iterates over user IDs provided in the 'user_ids' queue, making concurrent GET requests to fetch user details. ```APIDOC ## GET /users/{state.current_user_id} ### Description Retrieve details for each user ID from the queue. This endpoint iterates over user IDs provided in the 'user_ids' queue, making concurrent GET requests to fetch user details. ### Method GET ### Endpoint {state.base_url}/users/{state.current_user_id} ### Parameters #### Path Parameters - **state.current_user_id** (string) - Required - The ID of the user to retrieve details for, iterated from the 'user_ids' queue. ### Request Example ```json { "example": "No request body for GET request" } ``` ### Response #### Success Response (200) - **user** (object) - Contains the user details. - **id** (string) - The unique identifier for the user. #### Response Example ```json { "user": { "id": "123e4567-e89b-12d3-a456-426614174000", "name": "John Doe", "email": "john.doe@example.com" } } ``` ### Iteration Configuration - **over**: "queue.user_ids" - **into**: "state.current_user_id" - **concurrency**: 10 ``` -------------------------------- ### Query Total Records in a Table Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Use this action to get the total row count of a table to understand data volume. Requires a valid connection name and SQL query. ```json { "action": "query", "input": { "connection": "MY_DB", "query": "SELECT COUNT(*) as total_records FROM sales.orders", "description": "Get total row count of sales.orders to understand data volume" } } ``` -------------------------------- ### Run Specific Test Numbers/Ranges with Go Test Source: https://github.com/slingdata-io/sling-cli/blob/main/README.md Demonstrates various ways to specify individual test numbers, ranges, or subsequent tests using the `--` flag with `go test`. ```go -go test -v -run TestCLI -- "1,2,3" -go test -v -run TestSuiteFileS3 -- "1-5" -go test -v -run TestCLI -- "3+" ``` -------------------------------- ### Get Schema Names Source: https://github.com/slingdata-io/sling-cli/blob/main/cmd/sling/resource/llm_CONNECTION_DATABASE.md Retrieve a simple list of all schema names available in the specified database connection. This is a quick way to see available schemas without detailed metadata. ```json { "action": "get_schemas", "input": { "connection": "MY_POSTGRES" } } ```