### Install csv-diff using pip Source: https://github.com/simonw/csv-diff/blob/main/README.md Installs the csv-diff package from PyPI. This is the primary method for getting the command-line tool. ```bash pip install csv-diff ``` -------------------------------- ### Docker: Build and Run csv-diff Source: https://context7.com/simonw/csv-diff/llms.txt Demonstrates how to build a Docker image for csv-diff and run it with volume mounting to compare files. Includes examples for basic usage, full options with JSON output, and TSV file processing. ```Bash # Build the Docker image docker build -t csvdiff . # Run with volume mount docker run --rm -v $(pwd):/files csvdiff one.csv two.csv --key=id # Example with full options docker run --rm -v $(pwd):/files csvdiff \ data/old.csv data/new.csv \ --key=id \ --json \ --format=csv # Using with TSV files docker run --rm -v $(pwd):/files csvdiff \ reports/2024-01.tsv reports/2024-02.tsv \ --key=report_id \ --format=tsv \ --singular=report \ --plural=reports ``` -------------------------------- ### Build csv-diff Docker image Source: https://github.com/simonw/csv-diff/blob/main/README.md Builds a Docker image for the csv-diff tool. This allows the tool to be run in an isolated environment without local installation. ```bash $ docker build -t csvdiff . ``` -------------------------------- ### Get JSON output for CSV differences Source: https://github.com/simonw/csv-diff/blob/main/README.md Compares two CSV files and outputs the differences in a machine-readable JSON format. This is useful for programmatic processing of diff results. ```bash $ csv-diff one.csv two.csv --key=id --json ``` -------------------------------- ### CLI: Custom Terminology for Output Source: https://context7.com/simonw/csv-diff/llms.txt Allows customization of singular and plural terms used in the human-readable output for domain-specific language. ```bash cat > trees1.csv << EOF id,name,age 1,Cleo,4 EOF cat > trees2.csv << EOF id,name,age 1,Cleo,5 3,Bailey,1 4,Carl,7 EOF csv-diff trees1.csv trees2.csv --key=id --singular=tree --plural=trees # Output: # 1 tree changed, 2 trees added # # 1 tree changed # # id: 1 # age: "4" => "5" # # 2 trees added # # id: 3 # name: Bailey # age: 1 # # id: 4 # name: Carl # age: 7 ``` -------------------------------- ### CLI Command - Custom Terminology Source: https://context7.com/simonw/csv-diff/llms.txt Uses custom singular and plural terms for domain-specific output formatting. ```APIDOC ## CLI Command - Custom Terminology ### Description Allows the use of custom singular and plural terms for domain-specific output formatting in the human-readable diff. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first CSV file. - `file2` (string) - Required - Path to the second CSV file. #### Query Parameters - `--key` (string) - Required - The column name to use as the unique key for matching rows. - `--singular` (string) - Optional - Custom singular term (e.g., `tree`). - `--plural` (string) - Optional - Custom plural term (e.g., `trees`). ### Request Example ```bash cat > trees1.csv << EOF id,name,age 1,Cleo,4 EOF cat > trees2.csv << EOF id,name,age 1,Cleo,5 3,Bailey,1 4,Carl,7 EOF csv-diff trees1.csv trees2.csv --key=id --singular=tree --plural=trees ``` ### Response #### Success Response (Human-readable text) Outputs differences using the specified custom singular and plural terms. #### Response Example ```text 1 tree changed, 2 trees added 1 tree changed id: 1 age: "4" => "5" 2 trees added id: 3 name: Bailey age: 1 id: 4 name: Carl age: 7 ``` ``` -------------------------------- ### CLI Command - Show Unchanged Fields Source: https://context7.com/simonw/csv-diff/llms.txt Displays unchanged field values alongside changes for complete context. ```APIDOC ## CLI Command - Show Unchanged Fields ### Description Includes unchanged field values in the output for changed rows, providing complete context when reviewing modifications. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first CSV file. - `file2` (string) - Required - Path to the second CSV file. #### Query Parameters - `--key` (string) - Required - The column name to use as the unique key for matching rows. - `--show-unchanged` - Optional - Flag to include unchanged fields in the output for changed rows. ### Request Example ```bash csv-diff one.csv two.csv --key=id --show-unchanged ``` ### Response #### Success Response (Human-readable text) Outputs changed rows, also listing the fields that remained unchanged for that row. #### Response Example ```text 1 row changed id: 1 age: "4" => "5" Unchanged: name: "Cleo" ``` ``` -------------------------------- ### CLI: Show Unchanged Fields Source: https://context7.com/simonw/csv-diff/llms.txt Includes unchanged field values in the output alongside the modifications, providing complete context for reviewed changes. ```bash csv-diff one.csv two.csv --key=id --show-unchanged # Output: # 1 row changed # # id: 1 # age: "4" => "5" # # Unchanged: # name: "Cleo" ``` -------------------------------- ### CLI Command - JSON Output Source: https://context7.com/simonw/csv-diff/llms.txt Outputs differences in machine-readable JSON format for programmatic processing. ```APIDOC ## CLI Command - JSON Output ### Description Outputs differences in machine-readable JSON format for programmatic processing and integration with other tools. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first CSV file. - `file2` (string) - Required - Path to the second CSV file. #### Query Parameters - `--key` (string) - Required - The column name to use as the unique key for matching rows. - `--json` - Required - Flag to output results in JSON format. ### Request Example ```bash csv-diff one.csv two.csv --key=id --json ``` ### Response #### Success Response (JSON) Returns a JSON object detailing added, removed, changed rows, and column differences. #### Response Example ```json { "added": [ { "id": "3", "name": "Bailey", "age": "1" } ], "removed": [ { "id": "2", "name": "Pancakes", "age": "2" } ], "changed": [ { "key": "1", "changes": { "age": [ "4", "5" ] } } ], "columns_added": [], "columns_removed": [] } ``` ``` -------------------------------- ### Run csv-diff Docker container Source: https://github.com/simonw/csv-diff/blob/main/README.md Runs the csv-diff Docker container, mounting the current directory to '/files' to access CSV files. It then executes the diff command on 'one.csv' and 'two.csv'. ```bash $ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv ``` -------------------------------- ### CLI Command - Basic Diff Source: https://context7.com/simonw/csv-diff/llms.txt Compares two CSV files and displays human-readable differences using a specified unique key column. ```APIDOC ## CLI Command - Basic Diff ### Description Compares two CSV files and displays human-readable differences with a specified unique key column. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first CSV file. - `file2` (string) - Required - Path to the second CSV file. #### Query Parameters - `--key` (string) - Required - The column name to use as the unique key for matching rows. - `--output` (string) - Optional - Output format (e.g., `text`, `json`). Defaults to human-readable text. ### Request Example ```bash cat > one.csv << EOF id,name,age 1,Cleo,4 2,Pancakes,2 EOF cat > two.csv << EOF id,name,age 1,Cleo,5 3,Bailey,1 EOF csv-diff one.csv two.csv --key=id ``` ### Response #### Success Response (Human-readable text) Output details added, removed, and changed rows, including specific field modifications. #### Response Example ```text 1 row changed, 1 row added, 1 row removed 1 row changed id: 1 age: "4" => "5" 1 row added id: 3 name: Bailey age: 1 1 row removed id: 2 name: Pancakes age: 2 ``` ``` -------------------------------- ### CLI: Add Custom Formatted Fields with CSV-Diff Source: https://context7.com/simonw/csv-diff/llms.txt Demonstrates how to use the --extra option with templated Python format strings in the CLI to add custom fields to the output, such as URLs for searching. This allows for dynamic metadata generation based on row content. ```bash cat > animals1.json << EOF [ {"id": 1, "name": "Cleo", "type": "dog"}, {"id": 2, "name": "Suna", "type": "chicken"} ] EOF cat > animals2.json << EOF [ {"id": 2, "name": "Suna", "type": "pretty chicken"}, {"id": 3, "name": "Artie", "type": "bunny"} ] EOF csv-diff animals1.json animals2.json --key=id --format=json \ --extra search "https://www.google.com/search?q={name}" # Output: # 1 row changed, 1 row added, 1 row removed # # 1 row changed # # id: 2 # type: "chicken" => "pretty chicken" # extras: # search: https://www.google.com/search?q=Suna # # 1 row added # # id: 3 # name: Artie # type: bunny # extras: # search: https://www.google.com/search?q=Artie # # 1 row removed # # id: 1 # name: Cleo # type: dog # extras: # search: https://www.google.com/search?q=Cleo ``` -------------------------------- ### CLI: Basic CSV Diff Source: https://context7.com/simonw/csv-diff/llms.txt Compares two CSV files and displays human-readable differences using a specified unique key column. Outputs added, removed, and changed rows with details of modifications. ```bash # Create sample CSV files cat > one.csv << EOF id,name,age 1,Cleo,4 2,Pancakes,2 EOF cat > two.csv << EOF id,name,age 1,Cleo,5 3,Bailey,1 EOF # Run comparisoncsv-diff one.csv two.csv --key=id # Output: # 1 row changed, 1 row added, 1 row removed # # 1 row changed # # id: 1 # age: "4" => "5" # # 1 row added # # id: 3 # name: Bailey # age: 1 # # 1 row removed # # id: 2 # name: Pancakes # age: 2 ``` -------------------------------- ### CLI Command - Extra Templated Fields Source: https://context7.com/simonw/csv-diff/llms.txt Demonstrates how to add custom formatted fields to the CSV diff output using Python format strings for links or additional metadata. This is useful for enriching the output with external references or calculated values. ```APIDOC ## CLI Command - Extra Templated Fields ### Description Add custom formatted fields to output using Python format strings for links or additional metadata. ### Method CLI Command ### Endpoint N/A ### Parameters #### Query Parameters - **--extra** (string) - Required - Specifies a field to add to the output, followed by a Python format string. - **--key** (string) - Required - The key column to use for diffing. - **--format** (string) - Optional - The output format (e.g., json). ### Request Example ```bash cat > animals1.json << EOF [ {"id": 1, "name": "Cleo", "type": "dog"} ] EOF cat > animals2.json << EOF [ {"id": 2, "name": "Suna", "type": "pretty chicken"} ] EOF csv-diff animals1.json animals2.json --key=id --format=json \ --extra search "https://www.google.com/search?q={name}" ``` ### Response #### Success Response (200) - **diff** (object) - A structured object representing the differences between the two datasets, including added, removed, and changed rows, along with any extra fields. #### Response Example ``` 1 row changed, 1 row added, 1 row removed 1 row changed id: 2 type: "chicken" => "pretty chicken" extras: search: https://www.google.com/search?q=Suna 1 row added id: 3 name: Artie type: bunny extras: search: https://www.google.com/search?q=Artie 1 row removed id: 1 name: Cleo type: dog extras: search: https://www.google.com/search?q=Cleo ``` ``` -------------------------------- ### CLI Command - JSON File Comparison Source: https://context7.com/simonw/csv-diff/llms.txt Compares JSON array files, handling nested objects and arrays. ```APIDOC ## CLI Command - JSON File Comparison ### Description Compares JSON array files where nested objects and arrays are automatically serialized for comparison. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first JSON file. - `file2` (string) - Required - Path to the second JSON file. #### Query Parameters - `--key` (string) - Required - The field name within the JSON objects to use as the unique key for matching records. - `--format` (string) - Required - Set to `json` to indicate JSON file input. - `--json` - Optional - Flag to output results in JSON format (highly recommended for JSON input). ### Request Example ```bash # Create JSON files cat > one.json << EOF [ {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1}, {"id": 2, "name": "Pancakes", "nested": {"foo": 3}} ] EOF cat > two.json << EOF [ {"id": 1, "name": "Cleo", "nested": {"foo": 3, "bar": 5}, "extra": 1}, {"id": 2, "name": "Pancakes!", "nested": {"foo": 3}, "extra": 1} ] EOF # Compare JSON files csv-diff one.json two.json --key=id --format=json --json ``` ### Response #### Success Response (JSON) Returns a JSON object detailing differences between the JSON array elements, including changes in nested structures. #### Response Example ```json { "added": [], "removed": [], "changed": [ { "key": 1, "changes": { "nested": ["{\"foo\": 3}", "{\"foo\": 3, \"bar\": 5}"] } }, { "key": 2, "changes": { "name": ["Pancakes", "Pancakes!"], "extra": [null, 1] } } ], "columns_added": [], "columns_removed": [] } ``` ``` -------------------------------- ### Compare two CSV files from the command line Source: https://github.com/simonw/csv-diff/blob/main/README.md Compares two CSV files, 'one.csv' and 'two.csv', using 'id' as the key column. Outputs a human-readable summary of added, removed, and changed rows. ```bash $ csv-diff one.csv two.csv --key=id ``` -------------------------------- ### CLI Command - TSV and Format Detection Source: https://context7.com/simonw/csv-diff/llms.txt Automatically detects TSV format or allows explicit specification. ```APIDOC ## CLI Command - TSV and Format Detection ### Description Automatically detects tab-separated values (TSV) or allows explicit specification of the file format for proper parsing. ### Method `csv-diff` (CLI command) ### Parameters #### Path Parameters - `file1` (string) - Required - Path to the first TSV file. - `file2` (string) - Required - Path to the second TSV file. #### Query Parameters - `--key` (string) - Required - The column name to use as the unique key for matching rows. - `--format` (string) - Optional - Explicitly set the file format (e.g., `tsv`, `csv`, `json`). Defaults to auto-detection. - `--json` - Optional - Flag to output results in JSON format. ### Request Example ```bash # Create TSV files cat > data1.tsv << EOF id\tname\tage 1\tCleo\t4 2\tPancakes\t2 EOF cat > data2.tsv << EOF id\tname\tage 1\tCleo\t5 2\tPancakes\t2 EOF # Auto-detect format csv-diff data1.tsv data2.tsv --key=id --json # Explicitly specify TSV format csv-diff data1.tsv data2.tsv --key=id --format=tsv --json ``` ### Response #### Success Response (JSON) Returns a JSON object detailing differences, similar to the basic JSON output but parsed from TSV. #### Response Example ```json { "added": [], "removed": [], "changed": [ { "key": "1", "changes": { "age": ["4", "5"] } } ], "columns_added": [], "columns_removed": [] } ``` ``` -------------------------------- ### Python API - Schema Change Detection Source: https://context7.com/simonw/csv-diff/llms.txt Illustrates how the `compare` function automatically detects and reports schema changes, such as added or removed columns, when comparing datasets with different structures. ```APIDOC ## Python API - Schema Change Detection ### Description Automatically detect and report added or removed columns when comparing datasets with different schemas. ### Method `compare(previous_data, current_data)` ### Endpoint N/A (Python function) ### Parameters N/A ### Request Example ```python from csv_diff import load_csv, compare import io previous = """id,name,age 1,Cleo,5 3,Bailey,1""" current = """id,name,weight 1,Cleo,48 3,Bailey,20""" prev_data = load_csv(io.StringIO(previous), key="id") curr_data = load_csv(io.StringIO(current), key="id") diff = compare(prev_data, curr_data) print(diff) ``` ### Response #### Success Response (200) - **diff** (dict): The comparison result dictionary will include `columns_added` and `columns_removed` lists indicating schema differences. #### Response Example ```json { 'added': [], 'removed': [], 'changed': [], # Schema changes are ignored in row comparisons 'columns_added': ['weight'], 'columns_removed': ['age'] } ``` ``` -------------------------------- ### Use csv-diff as a Python library Source: https://github.com/simonw/csv-diff/blob/main/README.md Demonstrates how to use the csv-diff library within a Python script to load and compare CSV files. The `compare` function returns the same JSON structure as the command-line `--json` option. ```python from csv_diff import load_csv, compare diff = compare( load_csv(open("one.csv"), key="id"), load_csv(open("two.csv"), key="id") ) ``` -------------------------------- ### CLI: TSV and Format Detection Source: https://context7.com/simonw/csv-diff/llms.txt Compares TSV files by automatically detecting the tab-separated format or allowing explicit specification. Outputs differences in JSON format. ```bash # Create TSV files cat > data1.tsv << EOF id name age 1 Cleo 4 2 Pancakes 2 EOF cat > data2.tsv << EOF id name age 1 Cleo 5 2 Pancakes 2 EOF # Auto-detect formatcsv-diff data1.tsv data2.tsv --key=id --json # Explicitly specify TSV formatcsv-diff data1.tsv data2.tsv --key=id --format=tsv --json # Output: # { # "added": [], # "removed": [], # "changed": [ # { # "key": "1", # "changes": { # "age": ["4", "5"] # } # } # ], # "columns_added": [], # "columns_removed": [] # } ``` -------------------------------- ### CLI: JSON File Comparison Source: https://context7.com/simonw/csv-diff/llms.txt Compares JSON array files, handling nested objects and arrays by serializing them for comparison. Outputs differences in JSON format. ```bash # Create JSON files cat > one.json << EOF [ {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1}, {"id": 2, "name": "Pancakes", "nested": {"foo": 3}} ] EOF cat > two.json << EOF [ {"id": 1, "name": "Cleo", "nested": {"foo": 3, "bar": 5}, "extra": 1}, {"id": 2, "name": "Pancakes!", "nested": {"foo": 3}, "extra": 1} ] EOF # Compare JSON filescsv-diff one.json two.json --key=id --format=json --json # Output: # { # "added": [], # "removed": [], # "changed": [ # { # "key": 1, # "changes": { # "nested": ["{\"foo\": 3}", "{\"foo\": 3, \"bar\": 5}"] # } # }, # { # "key": 2, # "changes": { # "name": ["Pancakes", "Pancakes!"], # "extra": [null, 1] # } # } # ], # "columns_added": [], # "columns_removed": [] # } ``` -------------------------------- ### Python API: Compare Datasets with `compare` Function Source: https://context7.com/simonw/csv-diff/llms.txt Compares two loaded datasets (from `load_csv` or `load_json`) and returns a structured dictionary detailing added, removed, and changed rows. Changes are represented as a dictionary of original and new values. An option `show_unchanged` can include unchanged fields in the output. ```python from csv_diff import load_csv, compare import io one = """id,name,age 1,Cleo,4 2,Pancakes,2""" two = """id,name,age 1,Cleo,5 3,Bailey,1""" previous = load_csv(io.StringIO(one), key="id") current = load_csv(io.StringIO(two), key="id") diff = compare(previous, current) print(diff) # Output: # { # 'added': [ # {'id': '3', 'name': 'Bailey', 'age': '1'} # ], # 'removed': [ # {'id': '2', 'name': 'Pancakes', 'age': '2'} # ], # 'changed': [ # { # 'key': '1', # 'changes': {'age': ['4', '5']} # } # ], # 'columns_added': [], # 'columns_removed': [] # } # With show_unchanged=True diff_verbose = compare(previous, current, show_unchanged=True) print(diff_verbose['changed'][0]) # Output: # { # 'key': '1', # 'changes': {'age': ['4', '5']}, # 'unchanged': {'name': 'Cleo'} # } ``` -------------------------------- ### Python API - compare Function Source: https://context7.com/simonw/csv-diff/llms.txt Compares two loaded datasets (from `load_csv` or `load_json`) and returns a detailed dictionary of differences, including added, removed, and changed rows. It can also highlight unchanged fields if `show_unchanged` is set to True. ```APIDOC ## Python API - compare Function ### Description Compare two loaded datasets and return a structured dictionary of differences including changes, additions, and removals. ### Method `compare(previous_data, current_data, key=None, show_unchanged=False)` ### Endpoint N/A (Python function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A #### Function Arguments - **previous_data** (dict): The dataset from a previous state (e.g., loaded from `load_csv` or `load_json`). - **current_data** (dict): The dataset from the current state. - **key** (string, optional): The key column used when loading the data. Used for reporting changes. - **show_unchanged** (bool, optional): If True, the `changed` list will include an `unchanged` field showing fields that did not differ. ### Request Example ```python from csv_diff import load_csv, compare import io one = """id,name,age 1,Cleo,4 2,Pancakes,2""" two = """id,name,age 1,Cleo,5 3,Bailey,1""" previous = load_csv(io.StringIO(one), key="id") current = load_csv(io.StringIO(two), key="id") diff = compare(previous, current) print(diff) # With show_unchanged=True diff_verbose = compare(previous, current, show_unchanged=True) print(diff_verbose['changed'][0]) ``` ### Response #### Success Response (200) - **diff** (dict): A dictionary containing the comparison results with the following keys: - `added` (list): Rows present in `current_data` but not in `previous_data`. - `removed` (list): Rows present in `previous_data` but not in `current_data`. - `changed` (list): Rows present in both datasets but with differing values. Each item includes the `key` and a `changes` dictionary detailing the differing fields. If `show_unchanged` is True, it also includes an `unchanged` dictionary. - `columns_added` (list): Columns present in `current_data` but not in `previous_data`. - `columns_removed` (list): Columns present in `previous_data` but not in `current_data`. #### Response Example ```json { 'added': [ {'id': '3', 'name': 'Bailey', 'age': '1'} ], 'removed': [ {'id': '2', 'name': 'Pancakes', 'age': '2'} ], 'changed': [ { 'key': '1', 'changes': {'age': ['4', '5']} } ], 'columns_added': [], 'columns_removed': [] } ``` ``` -------------------------------- ### Show unchanged rows in CSV comparison Source: https://github.com/simonw/csv-diff/blob/main/README.md Compares two CSV files, including full details of unchanged values for rows that have at least one change, using 'id' as the key. This option is useful for detailed analysis. ```bash % csv-diff one.csv two.csv --key=id --show-unchanged ``` -------------------------------- ### Python API - load_csv Function Source: https://context7.com/simonw/csv-diff/llms.txt Loads CSV data from a file handle into a dictionary. It can use a specified key column for dictionary keys or automatically generate SHA1 hashes of the content if no key is provided. ```APIDOC ## Python API - load_csv Function ### Description Load CSV data from a file handle into a dictionary keyed by a unique identifier or content hash. ### Method `load_csv(file_handle, key=None)` ### Endpoint N/A (Python function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A #### Function Arguments - **file_handle**: A file-like object containing the CSV data. - **key** (string, optional): The name of the column to use as the key for the output dictionary. If None, a SHA1 hash of the row content is used. ### Request Example ```python from csv_diff import load_csv import io # Load CSV with explicit key column csv_data = """id,name,age 1,Cleo,4 2,Pancakes,2""" data = load_csv(io.StringIO(csv_data), key="id") print(data) # Load CSV without key (uses SHA1 hash of content) csv_no_key = """name,age Cleo,4 Pancakes,2""" data_hash = load_csv(io.StringIO(csv_no_key)) print(data_hash) # Load from actual file # with open('data.csv', newline='') as f: # data = load_csv(f, key="id") ``` ### Response #### Success Response (200) - **data** (dict): A dictionary where keys are either the values from the specified `key` column or SHA1 hashes of the row content, and values are dictionaries representing each row. #### Response Example ```json { '1': {'id': '1', 'name': 'Cleo', 'age': '4'}, '2': {'id': '2', 'name': 'Pancakes', 'age': '2'} } ``` (If key is None, keys will be SHA1 hashes) ``` -------------------------------- ### Add templated extras to CSV diff output Source: https://github.com/simonw/csv-diff/blob/main/README.md Compares two CSV files and includes additional, user-defined information for each row based on a Python format string. This allows for custom annotations in the diff output. ```bash csv-diff one.csv two.csv --key=id \ --extra latest "https://news.ycombinator.com/latest?id={id}" ``` -------------------------------- ### Python API: Compare JSON Datasets Source: https://context7.com/simonw/csv-diff/llms.txt Compares two JSON datasets provided as Python objects. It converts the datasets to JSON strings, loads them using a specified key field, and compares them. It also provides a summary of the comparison results including additions, removals, and changes. ```Python from csv_diff import load_csv, load_json, compare, human_text import json import io # Example 2: JSON comparison workflow def compare_json_datasets(dataset1, dataset2, key_field): # Convert datasets to JSON strings json1 = json.dumps(dataset1) json2 = json.dumps(dataset2) # Load and compare prev = load_json(io.StringIO(json1), key=key_field) curr = load_json(io.StringIO(json2), key=key_field) diff = compare(prev, curr) # Check for specific types of changes has_additions = len(diff['added']) > 0 has_removals = len(diff['removed']) > 0 has_changes = len(diff['changed']) > 0 has_schema_changes = (len(diff['columns_added']) > 0 or len(diff['columns_removed']) > 0) return { 'diff': diff, 'summary': { 'has_additions': has_additions, 'has_removals': has_removals, 'has_changes': has_changes, 'has_schema_changes': has_schema_changes, 'total_changes': (len(diff['added']) + len(diff['removed']) + len(diff['changed'])) } } ``` -------------------------------- ### Python API: Load CSV Data with `load_csv` Source: https://context7.com/simonw/csv-diff/llms.txt Loads CSV data from a file handle into a dictionary. It can use a specified key column for the dictionary keys or automatically generate SHA1 hashes of the row content if no key is provided. Handles newline characters correctly for proper file reading. ```python from csv_diff import load_csv import io # Load CSV with explicit key column csv_data = """id,name,age 1,Cleo,4 2,Pancakes,2""" data = load_csv(io.StringIO(csv_data), key="id") print(data) # Output: # { # '1': {'id': '1', 'name': 'Cleo', 'age': '4'}, # '2': {'id': '2', 'name': 'Pancakes', 'age': '2'} # } # Load CSV without key (uses SHA1 hash of content) csv_no_key = """name,age Cleo,4 Pancakes,2""" data_hash = load_csv(io.StringIO(csv_no_key)) print(data_hash) # Output: Dictionary with SHA1 hashes as keys # { # 'hash1...': {'name': 'Cleo', 'age': '4'}, # 'hash2...': {'name': 'Pancakes', 'age': '2'} # } # Load from actual file with open('data.csv', newline='') as f: data = load_csv(f, key="id") ``` -------------------------------- ### Python API: Load JSON Data with `load_json` Source: https://context7.com/simonw/csv-diff/llms.txt Loads JSON array data, automatically serializing nested objects and arrays into strings for consistent comparison. Missing keys in subsequent rows are filled with `None`. Supports loading from file-like objects and actual files. ```python from csv_diff import load_json import io import json # Load JSON with key json_data = [ {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1}, {"id": 2, "name": "Pancakes", "nested": {"foo": 3}} ] json_str = json.dumps(json_data) data = load_json(io.StringIO(json_str), key="id") print(data) # Output: # { # 1: { # 'id': 1, # 'name': 'Cleo', # 'nested': '{"foo": 3}', # Nested objects serialized # 'extra': 1 # }, # 2: { # 'id': 2, # 'name': 'Pancakes', # 'nested': '{"foo": 3}', # 'extra': None # Missing keys filled with None # } # } # Load from file with open('data.json') as f: data = load_json(f, key="id") ``` -------------------------------- ### Python API - human_text Function Source: https://context7.com/simonw/csv-diff/llms.txt Converts the structured difference dictionary produced by the `compare` function into a human-readable text format, suitable for direct display or reporting. ```APIDOC ## Python API - human_text Function ### Description Convert comparison results into human-readable formatted text for display or reporting. ### Method `human_text(diff_dict, key=None)` ### Endpoint N/A (Python function) ### Parameters #### Function Arguments - **diff_dict** (dict): The dictionary of differences obtained from the `compare` function. - **key** (string, optional): The key column name used during comparison, for display purposes. ### Request Example ```python from csv_diff import load_csv, compare, human_text import io one = """id,name,age 1,Cleo,4 2,Pancakes,2""" two = """id,name,age 1,Cleo,5 3,Bailey,1""" previous = load_csv(io.StringIO(one), key="id") current = load_csv(io.StringIO(two), key="id") diff = compare(previous, current) output = human_text(diff, key="id") print(output) ``` ### Response #### Success Response (200) - **output** (string): A formatted string summarizing the differences between the two datasets in a human-readable format. #### Response Example ``` 1 row changed, 1 row added, 1 row removed 1 row changed id: 1 ``` ``` -------------------------------- ### Python API: Detect Schema Changes with `compare` Source: https://context7.com/simonw/csv-diff/llms.txt The `compare` function automatically detects and reports schema differences between datasets, specifically identifying added and removed columns. These schema-level changes are reported in `columns_added` and `columns_removed` lists within the diff output, separate from row-level comparisons. ```python from csv_diff import load_csv, compare import io previous = """id,name,age 1,Cleo,5 3,Bailey,1""" current = """id,name,weight 1,Cleo,48 3,Bailey,20""" prev_data = load_csv(io.StringIO(previous), key="id") curr_data = load_csv(io.StringIO(current), key="id") diff = compare(prev_data, curr_data) print(diff) # Output: # { # 'added': [], # 'removed': [], # 'changed': [], # Schema changes are ignored in row comparisons # 'columns_added': ['weight'], # 'columns_removed': ['age'] # } ``` -------------------------------- ### Python API - load_json Function Source: https://context7.com/simonw/csv-diff/llms.txt Loads JSON data from a file handle into a dictionary. It automatically serializes nested objects and arrays within the JSON data, making them comparable as strings. ```APIDOC ## Python API - load_json Function ### Description Load JSON array data with automatic serialization of nested objects and arrays for comparison. ### Method `load_json(file_handle, key=None)` ### Endpoint N/A (Python function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A #### Function Arguments - **file_handle**: A file-like object containing the JSON data (expected to be a JSON array). - **key** (string, optional): The name of the field in each JSON object to use as the key for the output dictionary. If None, a SHA1 hash of the row content is used. ### Request Example ```python from csv_diff import load_json import io import json # Load JSON with key json_data = [ {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1}, {"id": 2, "name": "Pancakes", "nested": {"foo": 3}} ] json_str = json.dumps(json_data) data = load_json(io.StringIO(json_str), key="id") print(data) # Load from file # with open('data.json') as f: # data = load_json(f, key="id") ``` ### Response #### Success Response (200) - **data** (dict): A dictionary where keys are either the values from the specified `key` field or SHA1 hashes of the JSON object content, and values are dictionaries representing each JSON object. Nested objects/arrays are serialized into strings. #### Response Example ```json { 1: { 'id': 1, 'name': 'Cleo', 'nested': '{"foo": 3}', # Nested objects serialized 'extra': 1 }, 2: { 'id': 2, 'name': 'Pancakes', 'nested': '{"foo": 3}', 'extra': None # Missing keys filled with None } } ``` ``` -------------------------------- ### Python API: Compare CSV Files Source: https://context7.com/simonw/csv-diff/llms.txt Compares two CSV files based on a specified key column. It loads the data, performs the comparison, and generates both human-readable and JSON outputs. Handles file not found and key column errors. ```Python from csv_diff import load_csv, load_json, compare, human_text import json # Example 1: CSV comparison workflow def compare_csv_files(file1_path, file2_path, key_column): try: with open(file1_path, newline='') as f1, \ open(file2_path, newline='') as f2: previous = load_csv(f1, key=key_column) current = load_csv(f2, key=key_column) diff = compare(previous, current, show_unchanged=False) # Generate both outputs human_output = human_text( diff, key=key_column, singular="row", plural="rows", current=current ) json_output = json.dumps(diff, indent=2) return { 'human': human_output, 'json': json_output, 'diff': diff } except FileNotFoundError as e: return {'error': f'File not found: {e}'} except KeyError as e: return {'error': f'Key column not found: {e}'} # Usage result = compare_csv_files('old_data.csv', 'new_data.csv', 'id') if 'error' not in result: print(result['human']) print("\nJSON output available:", len(result['json']), "characters") # Access specific changes for change in result['diff']['changed']: print(f"Row {change['key']} changed: {change['changes']}") ``` -------------------------------- ### Python API: Format Differences with `human_text` Source: https://context7.com/simonw/csv-diff/llms.txt Converts the structured difference dictionary produced by the `compare` function into a human-readable text format. This is useful for generating user-friendly reports or displaying comparison results directly. ```python from csv_diff import load_csv, compare, human_text import io one = """id,name,age 1,Cleo,4 2,Pancakes,2""" two = """id,name,age 1,Cleo,5 3,Bailey,1""" previous = load_csv(io.StringIO(one), key="id") current = load_csv(io.StringIO(two), key="id") diff = compare(previous, current) output = human_text(diff, key="id") print(output) # Output: # 1 row changed, 1 row added, 1 row removed # # 1 row changed # # id: 1 ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.