### Install csv-diff using pip

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Installs the csv-diff package from PyPI. This is the primary method for getting the command-line tool.

```bash
pip install csv-diff
```

--------------------------------

### Docker: Build and Run csv-diff

Source: https://context7.com/simonw/csv-diff/llms.txt

Demonstrates how to build a Docker image for csv-diff and run it with volume mounting to compare files. Includes examples for basic usage, full options with JSON output, and TSV file processing.

```Bash
# Build the Docker image
docker build -t csvdiff .

# Run with volume mount
docker run --rm -v $(pwd):/files csvdiff one.csv two.csv --key=id

# Example with full options
docker run --rm -v $(pwd):/files csvdiff \
    data/old.csv data/new.csv \
    --key=id \
    --json \
    --format=csv

# Using with TSV files
docker run --rm -v $(pwd):/files csvdiff \
    reports/2024-01.tsv reports/2024-02.tsv \
    --key=report_id \
    --format=tsv \
    --singular=report \
    --plural=reports
```

--------------------------------

### Build csv-diff Docker image

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Builds a Docker image for the csv-diff tool. This allows the tool to be run in an isolated environment without local installation.

```bash
$ docker build -t csvdiff .
```

--------------------------------

### Get JSON output for CSV differences

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Compares two CSV files and outputs the differences in a machine-readable JSON format. This is useful for programmatic processing of diff results.

```bash
$ csv-diff one.csv two.csv --key=id --json
```

--------------------------------

### CLI: Custom Terminology for Output

Source: https://context7.com/simonw/csv-diff/llms.txt

Allows customization of singular and plural terms used in the human-readable output for domain-specific language.

```bash
cat > trees1.csv << EOF
id,name,age
1,Cleo,4
EOF

cat > trees2.csv << EOF
id,name,age
1,Cleo,5
3,Bailey,1
4,Carl,7
EOF
csv-diff trees1.csv trees2.csv --key=id --singular=tree --plural=trees

# Output:
# 1 tree changed, 2 trees added
#
# 1 tree changed
#
#   id: 1
#     age: "4" => "5"
#
# 2 trees added
#
#   id: 3
#   name: Bailey
#   age: 1
#
#   id: 4
#   name: Carl
#   age: 7
```

--------------------------------

### CLI Command - Custom Terminology

Source: https://context7.com/simonw/csv-diff/llms.txt

Uses custom singular and plural terms for domain-specific output formatting.

```APIDOC
## CLI Command - Custom Terminology

### Description
Allows the use of custom singular and plural terms for domain-specific output formatting in the human-readable diff.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first CSV file.
- `file2` (string) - Required - Path to the second CSV file.

#### Query Parameters
- `--key` (string) - Required - The column name to use as the unique key for matching rows.
- `--singular` (string) - Optional - Custom singular term (e.g., `tree`).
- `--plural` (string) - Optional - Custom plural term (e.g., `trees`).

### Request Example
```bash
cat > trees1.csv << EOF
id,name,age
1,Cleo,4
EOF

cat > trees2.csv << EOF
id,name,age
1,Cleo,5
3,Bailey,1
4,Carl,7
EOF

csv-diff trees1.csv trees2.csv --key=id --singular=tree --plural=trees
```

### Response
#### Success Response (Human-readable text)
Outputs differences using the specified custom singular and plural terms.

#### Response Example
```text
1 tree changed, 2 trees added

1 tree changed

  id: 1
    age: "4" => "5"

2 trees added

  id: 3
  name: Bailey
  age: 1

  id: 4
  name: Carl
  age: 7
```
```

--------------------------------

### CLI Command - Show Unchanged Fields

Source: https://context7.com/simonw/csv-diff/llms.txt

Displays unchanged field values alongside changes for complete context.

```APIDOC
## CLI Command - Show Unchanged Fields

### Description
Includes unchanged field values in the output for changed rows, providing complete context when reviewing modifications.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first CSV file.
- `file2` (string) - Required - Path to the second CSV file.

#### Query Parameters
- `--key` (string) - Required - The column name to use as the unique key for matching rows.
- `--show-unchanged` - Optional - Flag to include unchanged fields in the output for changed rows.

### Request Example
```bash
csv-diff one.csv two.csv --key=id --show-unchanged
```

### Response
#### Success Response (Human-readable text)
Outputs changed rows, also listing the fields that remained unchanged for that row.

#### Response Example
```text
1 row changed

  id: 1
    age: "4" => "5"

    Unchanged:
      name: "Cleo"
```
```

--------------------------------

### CLI: Show Unchanged Fields

Source: https://context7.com/simonw/csv-diff/llms.txt

Includes unchanged field values in the output alongside the modifications, providing complete context for reviewed changes.

```bash
csv-diff one.csv two.csv --key=id --show-unchanged

# Output:
# 1 row changed
#
#   id: 1
#     age: "4" => "5"
#
#     Unchanged:
#       name: "Cleo"
```

--------------------------------

### CLI Command - JSON Output

Source: https://context7.com/simonw/csv-diff/llms.txt

Outputs differences in machine-readable JSON format for programmatic processing.

```APIDOC
## CLI Command - JSON Output

### Description
Outputs differences in machine-readable JSON format for programmatic processing and integration with other tools.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first CSV file.
- `file2` (string) - Required - Path to the second CSV file.

#### Query Parameters
- `--key` (string) - Required - The column name to use as the unique key for matching rows.
- `--json` - Required - Flag to output results in JSON format.

### Request Example
```bash
csv-diff one.csv two.csv --key=id --json
```

### Response
#### Success Response (JSON)
Returns a JSON object detailing added, removed, changed rows, and column differences.

#### Response Example
```json
{
    "added": [
        {
            "id": "3",
            "name": "Bailey",
            "age": "1"
        }
    ],
    "removed": [
        {
            "id": "2",
            "name": "Pancakes",
            "age": "2"
        }
    ],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": [
                    "4",
                    "5"
                ]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}
```
```

--------------------------------

### Run csv-diff Docker container

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Runs the csv-diff Docker container, mounting the current directory to '/files' to access CSV files. It then executes the diff command on 'one.csv' and 'two.csv'.

```bash
$ docker run --rm -v $(pwd):/files csvdiff one.csv two.csv
```

--------------------------------

### CLI Command - Basic Diff

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two CSV files and displays human-readable differences using a specified unique key column.

```APIDOC
## CLI Command - Basic Diff

### Description
Compares two CSV files and displays human-readable differences with a specified unique key column.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first CSV file.
- `file2` (string) - Required - Path to the second CSV file.

#### Query Parameters
- `--key` (string) - Required - The column name to use as the unique key for matching rows.
- `--output` (string) - Optional - Output format (e.g., `text`, `json`). Defaults to human-readable text.

### Request Example
```bash
cat > one.csv << EOF
id,name,age
1,Cleo,4
2,Pancakes,2
EOF

cat > two.csv << EOF
id,name,age
1,Cleo,5
3,Bailey,1
EOF

csv-diff one.csv two.csv --key=id
```

### Response
#### Success Response (Human-readable text)
Output details added, removed, and changed rows, including specific field modifications.

#### Response Example
```text
1 row changed, 1 row added, 1 row removed

1 row changed

  id: 1
    age: "4" => "5"

1 row added

  id: 3
  name: Bailey
  age: 1

1 row removed

  id: 2
  name: Pancakes
  age: 2
```
```

--------------------------------

### CLI: Add Custom Formatted Fields with CSV-Diff

Source: https://context7.com/simonw/csv-diff/llms.txt

Demonstrates how to use the --extra option with templated Python format strings in the CLI to add custom fields to the output, such as URLs for searching. This allows for dynamic metadata generation based on row content.

```bash
cat > animals1.json << EOF
[
    {"id": 1, "name": "Cleo", "type": "dog"},
    {"id": 2, "name": "Suna", "type": "chicken"}
]
EOF

cat > animals2.json << EOF
[
    {"id": 2, "name": "Suna", "type": "pretty chicken"},
    {"id": 3, "name": "Artie", "type": "bunny"}
]
EOF

csv-diff animals1.json animals2.json --key=id --format=json \
    --extra search "https://www.google.com/search?q={name}"

# Output:
# 1 row changed, 1 row added, 1 row removed
#
# 1 row changed
#
#   id: 2
#     type: "chicken" => "pretty chicken"
#   extras:
#     search: https://www.google.com/search?q=Suna
#
# 1 row added
#
#   id: 3
#   name: Artie
#   type: bunny
#   extras:
#     search: https://www.google.com/search?q=Artie
#
# 1 row removed
#
#   id: 1
#   name: Cleo
#   type: dog
#   extras:
#     search: https://www.google.com/search?q=Cleo
```

--------------------------------

### CLI: Basic CSV Diff

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two CSV files and displays human-readable differences using a specified unique key column. Outputs added, removed, and changed rows with details of modifications.

```bash
# Create sample CSV files
cat > one.csv << EOF
id,name,age
1,Cleo,4
2,Pancakes,2
EOF

cat > two.csv << EOF
id,name,age
1,Cleo,5
3,Bailey,1
EOF

# Run comparisoncsv-diff one.csv two.csv --key=id

# Output:
# 1 row changed, 1 row added, 1 row removed
#
# 1 row changed
#
#   id: 1
#     age: "4" => "5"
#
# 1 row added
#
#   id: 3
#   name: Bailey
#   age: 1
#
# 1 row removed
#
#   id: 2
#   name: Pancakes
#   age: 2
```

--------------------------------

### CLI Command - Extra Templated Fields

Source: https://context7.com/simonw/csv-diff/llms.txt

Demonstrates how to add custom formatted fields to the CSV diff output using Python format strings for links or additional metadata. This is useful for enriching the output with external references or calculated values.

```APIDOC
## CLI Command - Extra Templated Fields

### Description
Add custom formatted fields to output using Python format strings for links or additional metadata.

### Method
CLI Command

### Endpoint
N/A

### Parameters
#### Query Parameters
- **--extra** (string) - Required - Specifies a field to add to the output, followed by a Python format string.
- **--key** (string) - Required - The key column to use for diffing.
- **--format** (string) - Optional - The output format (e.g., json).

### Request Example
```bash
cat > animals1.json << EOF
[
    {"id": 1, "name": "Cleo", "type": "dog"}
]
EOF

cat > animals2.json << EOF
[
    {"id": 2, "name": "Suna", "type": "pretty chicken"}
]
EOF

csv-diff animals1.json animals2.json --key=id --format=json \
    --extra search "https://www.google.com/search?q={name}"
```

### Response
#### Success Response (200)
- **diff** (object) - A structured object representing the differences between the two datasets, including added, removed, and changed rows, along with any extra fields.

#### Response Example
```
1 row changed, 1 row added, 1 row removed

1 row changed

  id: 2
    type: "chicken" => "pretty chicken"
  extras:
    search: https://www.google.com/search?q=Suna

1 row added

  id: 3
  name: Artie
  type: bunny
  extras:
    search: https://www.google.com/search?q=Artie

1 row removed

  id: 1
  name: Cleo
  type: dog
  extras:
    search: https://www.google.com/search?q=Cleo
```
```

--------------------------------

### CLI Command - JSON File Comparison

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares JSON array files, handling nested objects and arrays.

```APIDOC
## CLI Command - JSON File Comparison

### Description
Compares JSON array files where nested objects and arrays are automatically serialized for comparison.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first JSON file.
- `file2` (string) - Required - Path to the second JSON file.

#### Query Parameters
- `--key` (string) - Required - The field name within the JSON objects to use as the unique key for matching records.
- `--format` (string) - Required - Set to `json` to indicate JSON file input.
- `--json` - Optional - Flag to output results in JSON format (highly recommended for JSON input).

### Request Example
```bash
# Create JSON files
cat > one.json << EOF
[
    {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1},
    {"id": 2, "name": "Pancakes", "nested": {"foo": 3}}
]
EOF

cat > two.json << EOF
[
    {"id": 1, "name": "Cleo", "nested": {"foo": 3, "bar": 5}, "extra": 1},
    {"id": 2, "name": "Pancakes!", "nested": {"foo": 3}, "extra": 1}
]
EOF

# Compare JSON files
csv-diff one.json two.json --key=id --format=json --json
```

### Response
#### Success Response (JSON)
Returns a JSON object detailing differences between the JSON array elements, including changes in nested structures.

#### Response Example
```json
{
    "added": [],
    "removed": [],
    "changed": [
        {
            "key": 1,
            "changes": {
                "nested": ["{\"foo\": 3}", "{\"foo\": 3, \"bar\": 5}"]
            }
        },
        {
            "key": 2,
            "changes": {
                "name": ["Pancakes", "Pancakes!"],
                "extra": [null, 1]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}
```
```

--------------------------------

### Compare two CSV files from the command line

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Compares two CSV files, 'one.csv' and 'two.csv', using 'id' as the key column. Outputs a human-readable summary of added, removed, and changed rows.

```bash
$ csv-diff one.csv two.csv --key=id
```

--------------------------------

### CLI Command - TSV and Format Detection

Source: https://context7.com/simonw/csv-diff/llms.txt

Automatically detects TSV format or allows explicit specification.

```APIDOC
## CLI Command - TSV and Format Detection

### Description
Automatically detects tab-separated values (TSV) or allows explicit specification of the file format for proper parsing.

### Method
`csv-diff` (CLI command)

### Parameters
#### Path Parameters
- `file1` (string) - Required - Path to the first TSV file.
- `file2` (string) - Required - Path to the second TSV file.

#### Query Parameters
- `--key` (string) - Required - The column name to use as the unique key for matching rows.
- `--format` (string) - Optional - Explicitly set the file format (e.g., `tsv`, `csv`, `json`). Defaults to auto-detection.
- `--json` - Optional - Flag to output results in JSON format.

### Request Example
```bash
# Create TSV files
cat > data1.tsv << EOF
id\tname\tage
1\tCleo\t4
2\tPancakes\t2
EOF

cat > data2.tsv << EOF
id\tname\tage
1\tCleo\t5
2\tPancakes\t2
EOF

# Auto-detect format
csv-diff data1.tsv data2.tsv --key=id --json

# Explicitly specify TSV format
csv-diff data1.tsv data2.tsv --key=id --format=tsv --json
```

### Response
#### Success Response (JSON)
Returns a JSON object detailing differences, similar to the basic JSON output but parsed from TSV.

#### Response Example
```json
{
    "added": [],
    "removed": [],
    "changed": [
        {
            "key": "1",
            "changes": {
                "age": ["4", "5"]
            }
        }
    ],
    "columns_added": [],
    "columns_removed": []
}
```
```

--------------------------------

### Python API - Schema Change Detection

Source: https://context7.com/simonw/csv-diff/llms.txt

Illustrates how the `compare` function automatically detects and reports schema changes, such as added or removed columns, when comparing datasets with different structures.

```APIDOC
## Python API - Schema Change Detection

### Description
Automatically detect and report added or removed columns when comparing datasets with different schemas.

### Method
`compare(previous_data, current_data)`

### Endpoint
N/A (Python function)

### Parameters
N/A

### Request Example
```python
from csv_diff import load_csv, compare
import io

previous = """id,name,age
1,Cleo,5
3,Bailey,1"""

current = """id,name,weight
1,Cleo,48
3,Bailey,20"""

prev_data = load_csv(io.StringIO(previous), key="id")
curr_data = load_csv(io.StringIO(current), key="id")

diff = compare(prev_data, curr_data)
print(diff)
```

### Response
#### Success Response (200)
- **diff** (dict): The comparison result dictionary will include `columns_added` and `columns_removed` lists indicating schema differences.

#### Response Example
```json
{
    'added': [],
    'removed': [],
    'changed': [],  # Schema changes are ignored in row comparisons
    'columns_added': ['weight'],
    'columns_removed': ['age']
}
```
```

--------------------------------

### Use csv-diff as a Python library

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Demonstrates how to use the csv-diff library within a Python script to load and compare CSV files. The `compare` function returns the same JSON structure as the command-line `--json` option.

```python
from csv_diff import load_csv, compare
diff = compare(
    load_csv(open("one.csv"), key="id"),
    load_csv(open("two.csv"), key="id")
)
```

--------------------------------

### CLI: TSV and Format Detection

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares TSV files by automatically detecting the tab-separated format or allowing explicit specification. Outputs differences in JSON format.

```bash
# Create TSV files
cat > data1.tsv << EOF
id	name	age
1	Cleo	4
2	Pancakes	2
EOF

cat > data2.tsv << EOF
id	name	age
1	Cleo	5
2	Pancakes	2
EOF

# Auto-detect formatcsv-diff data1.tsv data2.tsv --key=id --json

# Explicitly specify TSV formatcsv-diff data1.tsv data2.tsv --key=id --format=tsv --json

# Output:
# {
#     "added": [],
#     "removed": [],
#     "changed": [
#         {
#             "key": "1",
#             "changes": {
#                 "age": ["4", "5"]
#             }
#         }
#     ],
#     "columns_added": [],
#     "columns_removed": []
# }
```

--------------------------------

### CLI: JSON File Comparison

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares JSON array files, handling nested objects and arrays by serializing them for comparison. Outputs differences in JSON format.

```bash
# Create JSON files
cat > one.json << EOF
[
    {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1},
    {"id": 2, "name": "Pancakes", "nested": {"foo": 3}}
]
EOF

cat > two.json << EOF
[
    {"id": 1, "name": "Cleo", "nested": {"foo": 3, "bar": 5}, "extra": 1},
    {"id": 2, "name": "Pancakes!", "nested": {"foo": 3}, "extra": 1}
]
EOF

# Compare JSON filescsv-diff one.json two.json --key=id --format=json --json

# Output:
# {
#     "added": [],
#     "removed": [],
#     "changed": [
#         {
#             "key": 1,
#             "changes": {
#                 "nested": ["{\"foo\": 3}", "{\"foo\": 3, \"bar\": 5}"]
#             }
#         },
#         {
#             "key": 2,
#             "changes": {
#                 "name": ["Pancakes", "Pancakes!"],
#                 "extra": [null, 1]
#             }
#         }
#     ],
#     "columns_added": [],
#     "columns_removed": []
# }
```

--------------------------------

### Python API: Compare Datasets with `compare` Function

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two loaded datasets (from `load_csv` or `load_json`) and returns a structured dictionary detailing added, removed, and changed rows. Changes are represented as a dictionary of original and new values. An option `show_unchanged` can include unchanged fields in the output.

```python
from csv_diff import load_csv, compare
import io

one = """id,name,age
1,Cleo,4
2,Pancakes,2"""

two = """id,name,age
1,Cleo,5
3,Bailey,1"""

previous = load_csv(io.StringIO(one), key="id")
current = load_csv(io.StringIO(two), key="id")

diff = compare(previous, current)
print(diff)
# Output:
# {
#     'added': [
#         {'id': '3', 'name': 'Bailey', 'age': '1'}
#     ],
#     'removed': [
#         {'id': '2', 'name': 'Pancakes', 'age': '2'}
#     ],
#     'changed': [
#         {
#             'key': '1',
#             'changes': {'age': ['4', '5']}
#         }
#     ],
#     'columns_added': [],
#     'columns_removed': []
# }

# With show_unchanged=True
diff_verbose = compare(previous, current, show_unchanged=True)
print(diff_verbose['changed'][0])
# Output:
# {
#     'key': '1',
#     'changes': {'age': ['4', '5']},
#     'unchanged': {'name': 'Cleo'}
# }
```

--------------------------------

### Python API - compare Function

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two loaded datasets (from `load_csv` or `load_json`) and returns a detailed dictionary of differences, including added, removed, and changed rows. It can also highlight unchanged fields if `show_unchanged` is set to True.

```APIDOC
## Python API - compare Function

### Description
Compare two loaded datasets and return a structured dictionary of differences including changes, additions, and removals.

### Method
`compare(previous_data, current_data, key=None, show_unchanged=False)`

### Endpoint
N/A (Python function)

### Parameters
#### Path Parameters
N/A

#### Query Parameters
N/A

#### Request Body
N/A

#### Function Arguments
- **previous_data** (dict): The dataset from a previous state (e.g., loaded from `load_csv` or `load_json`).
- **current_data** (dict): The dataset from the current state.
- **key** (string, optional): The key column used when loading the data. Used for reporting changes.
- **show_unchanged** (bool, optional): If True, the `changed` list will include an `unchanged` field showing fields that did not differ.

### Request Example
```python
from csv_diff import load_csv, compare
import io

one = """id,name,age
1,Cleo,4
2,Pancakes,2"""

two = """id,name,age
1,Cleo,5
3,Bailey,1"""

previous = load_csv(io.StringIO(one), key="id")
current = load_csv(io.StringIO(two), key="id")

diff = compare(previous, current)
print(diff)

# With show_unchanged=True
diff_verbose = compare(previous, current, show_unchanged=True)
print(diff_verbose['changed'][0])
```

### Response
#### Success Response (200)
- **diff** (dict): A dictionary containing the comparison results with the following keys:
    - `added` (list): Rows present in `current_data` but not in `previous_data`.
    - `removed` (list): Rows present in `previous_data` but not in `current_data`.
    - `changed` (list): Rows present in both datasets but with differing values. Each item includes the `key` and a `changes` dictionary detailing the differing fields. If `show_unchanged` is True, it also includes an `unchanged` dictionary.
    - `columns_added` (list): Columns present in `current_data` but not in `previous_data`.
    - `columns_removed` (list): Columns present in `previous_data` but not in `current_data`.

#### Response Example
```json
{
    'added': [
        {'id': '3', 'name': 'Bailey', 'age': '1'}
    ],
    'removed': [
        {'id': '2', 'name': 'Pancakes', 'age': '2'}
    ],
    'changed': [
        {
            'key': '1',
            'changes': {'age': ['4', '5']}
        }
    ],
    'columns_added': [],
    'columns_removed': []
}
```
```

--------------------------------

### Show unchanged rows in CSV comparison

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Compares two CSV files, including full details of unchanged values for rows that have at least one change, using 'id' as the key. This option is useful for detailed analysis.

```bash
% csv-diff one.csv two.csv --key=id --show-unchanged
```

--------------------------------

### Python API - load_csv Function

Source: https://context7.com/simonw/csv-diff/llms.txt

Loads CSV data from a file handle into a dictionary. It can use a specified key column for dictionary keys or automatically generate SHA1 hashes of the content if no key is provided.

```APIDOC
## Python API - load_csv Function

### Description
Load CSV data from a file handle into a dictionary keyed by a unique identifier or content hash.

### Method
`load_csv(file_handle, key=None)`

### Endpoint
N/A (Python function)

### Parameters
#### Path Parameters
N/A

#### Query Parameters
N/A

#### Request Body
N/A

#### Function Arguments
- **file_handle**: A file-like object containing the CSV data.
- **key** (string, optional): The name of the column to use as the key for the output dictionary. If None, a SHA1 hash of the row content is used.

### Request Example
```python
from csv_diff import load_csv
import io

# Load CSV with explicit key column
csv_data = """id,name,age
1,Cleo,4
2,Pancakes,2"""

data = load_csv(io.StringIO(csv_data), key="id")
print(data)

# Load CSV without key (uses SHA1 hash of content)
csv_no_key = """name,age
Cleo,4
Pancakes,2"""

data_hash = load_csv(io.StringIO(csv_no_key))
print(data_hash)

# Load from actual file
# with open('data.csv', newline='') as f:
#     data = load_csv(f, key="id")
```

### Response
#### Success Response (200)
- **data** (dict): A dictionary where keys are either the values from the specified `key` column or SHA1 hashes of the row content, and values are dictionaries representing each row.

#### Response Example
```json
{
    '1': {'id': '1', 'name': 'Cleo', 'age': '4'},
    '2': {'id': '2', 'name': 'Pancakes', 'age': '2'}
}
```
(If key is None, keys will be SHA1 hashes)
```

--------------------------------

### Add templated extras to CSV diff output

Source: https://github.com/simonw/csv-diff/blob/main/README.md

Compares two CSV files and includes additional, user-defined information for each row based on a Python format string. This allows for custom annotations in the diff output.

```bash
csv-diff one.csv two.csv --key=id \
  --extra latest "https://news.ycombinator.com/latest?id={id}"
```

--------------------------------

### Python API: Compare JSON Datasets

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two JSON datasets provided as Python objects. It converts the datasets to JSON strings, loads them using a specified key field, and compares them. It also provides a summary of the comparison results including additions, removals, and changes.

```Python
from csv_diff import load_csv, load_json, compare, human_text
import json
import io

# Example 2: JSON comparison workflow
def compare_json_datasets(dataset1, dataset2, key_field):

    # Convert datasets to JSON strings
    json1 = json.dumps(dataset1)
    json2 = json.dumps(dataset2)

    # Load and compare
    prev = load_json(io.StringIO(json1), key=key_field)
    curr = load_json(io.StringIO(json2), key=key_field)

    diff = compare(prev, curr)

    # Check for specific types of changes
    has_additions = len(diff['added']) > 0
    has_removals = len(diff['removed']) > 0
    has_changes = len(diff['changed']) > 0
    has_schema_changes = (len(diff['columns_added']) > 0 or
                          len(diff['columns_removed']) > 0)

    return {
        'diff': diff,
        'summary': {
            'has_additions': has_additions,
            'has_removals': has_removals,
            'has_changes': has_changes,
            'has_schema_changes': has_schema_changes,
            'total_changes': (len(diff['added']) +
                            len(diff['removed']) +
                            len(diff['changed']))
        }
    }
```

--------------------------------

### Python API: Load CSV Data with `load_csv`

Source: https://context7.com/simonw/csv-diff/llms.txt

Loads CSV data from a file handle into a dictionary. It can use a specified key column for the dictionary keys or automatically generate SHA1 hashes of the row content if no key is provided. Handles newline characters correctly for proper file reading.

```python
from csv_diff import load_csv
import io

# Load CSV with explicit key column
csv_data = """id,name,age
1,Cleo,4
2,Pancakes,2"""

data = load_csv(io.StringIO(csv_data), key="id")
print(data)
# Output:
# {
#     '1': {'id': '1', 'name': 'Cleo', 'age': '4'},
#     '2': {'id': '2', 'name': 'Pancakes', 'age': '2'}
# }

# Load CSV without key (uses SHA1 hash of content)
csv_no_key = """name,age
Cleo,4
Pancakes,2"""

data_hash = load_csv(io.StringIO(csv_no_key))
print(data_hash)
# Output: Dictionary with SHA1 hashes as keys
# {
#     'hash1...': {'name': 'Cleo', 'age': '4'},
#     'hash2...': {'name': 'Pancakes', 'age': '2'}
# }

# Load from actual file
with open('data.csv', newline='') as f:
    data = load_csv(f, key="id")
```

--------------------------------

### Python API: Load JSON Data with `load_json`

Source: https://context7.com/simonw/csv-diff/llms.txt

Loads JSON array data, automatically serializing nested objects and arrays into strings for consistent comparison. Missing keys in subsequent rows are filled with `None`. Supports loading from file-like objects and actual files.

```python
from csv_diff import load_json
import io
import json

# Load JSON with key
json_data = [
    {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1},
    {"id": 2, "name": "Pancakes", "nested": {"foo": 3}}
]

json_str = json.dumps(json_data)
data = load_json(io.StringIO(json_str), key="id")
print(data)
# Output:
# {
#     1: {
#         'id': 1,
#         'name': 'Cleo',
#         'nested': '{"foo": 3}',  # Nested objects serialized
#         'extra': 1
#     },
#     2: {
#         'id': 2,
#         'name': 'Pancakes',
#         'nested': '{"foo": 3}',
#         'extra': None  # Missing keys filled with None
#     }
# }

# Load from file
with open('data.json') as f:
    data = load_json(f, key="id")
```

--------------------------------

### Python API - human_text Function

Source: https://context7.com/simonw/csv-diff/llms.txt

Converts the structured difference dictionary produced by the `compare` function into a human-readable text format, suitable for direct display or reporting.

```APIDOC
## Python API - human_text Function

### Description
Convert comparison results into human-readable formatted text for display or reporting.

### Method
`human_text(diff_dict, key=None)`

### Endpoint
N/A (Python function)

### Parameters
#### Function Arguments
- **diff_dict** (dict): The dictionary of differences obtained from the `compare` function.
- **key** (string, optional): The key column name used during comparison, for display purposes.

### Request Example
```python
from csv_diff import load_csv, compare, human_text
import io

one = """id,name,age
1,Cleo,4
2,Pancakes,2"""

two = """id,name,age
1,Cleo,5
3,Bailey,1"""

previous = load_csv(io.StringIO(one), key="id")
current = load_csv(io.StringIO(two), key="id")

diff = compare(previous, current)
output = human_text(diff, key="id")
print(output)
```

### Response
#### Success Response (200)
- **output** (string): A formatted string summarizing the differences between the two datasets in a human-readable format.

#### Response Example
```
1 row changed, 1 row added, 1 row removed

1 row changed

  id: 1
```
```

--------------------------------

### Python API: Detect Schema Changes with `compare`

Source: https://context7.com/simonw/csv-diff/llms.txt

The `compare` function automatically detects and reports schema differences between datasets, specifically identifying added and removed columns. These schema-level changes are reported in `columns_added` and `columns_removed` lists within the diff output, separate from row-level comparisons.

```python
from csv_diff import load_csv, compare
import io

previous = """id,name,age
1,Cleo,5
3,Bailey,1"""

current = """id,name,weight
1,Cleo,48
3,Bailey,20"""

prev_data = load_csv(io.StringIO(previous), key="id")
curr_data = load_csv(io.StringIO(current), key="id")

diff = compare(prev_data, curr_data)
print(diff)
# Output:
# {
#     'added': [],
#     'removed': [],
#     'changed': [],  # Schema changes are ignored in row comparisons
#     'columns_added': ['weight'],
#     'columns_removed': ['age']
# }
```

--------------------------------

### Python API - load_json Function

Source: https://context7.com/simonw/csv-diff/llms.txt

Loads JSON data from a file handle into a dictionary. It automatically serializes nested objects and arrays within the JSON data, making them comparable as strings.

```APIDOC
## Python API - load_json Function

### Description
Load JSON array data with automatic serialization of nested objects and arrays for comparison.

### Method
`load_json(file_handle, key=None)`

### Endpoint
N/A (Python function)

### Parameters
#### Path Parameters
N/A

#### Query Parameters
N/A

#### Request Body
N/A

#### Function Arguments
- **file_handle**: A file-like object containing the JSON data (expected to be a JSON array).
- **key** (string, optional): The name of the field in each JSON object to use as the key for the output dictionary. If None, a SHA1 hash of the row content is used.

### Request Example
```python
from csv_diff import load_json
import io
import json

# Load JSON with key
json_data = [
    {"id": 1, "name": "Cleo", "nested": {"foo": 3}, "extra": 1},
    {"id": 2, "name": "Pancakes", "nested": {"foo": 3}}
]

json_str = json.dumps(json_data)
data = load_json(io.StringIO(json_str), key="id")
print(data)

# Load from file
# with open('data.json') as f:
#     data = load_json(f, key="id")
```

### Response
#### Success Response (200)
- **data** (dict): A dictionary where keys are either the values from the specified `key` field or SHA1 hashes of the JSON object content, and values are dictionaries representing each JSON object. Nested objects/arrays are serialized into strings.

#### Response Example
```json
{
    1: {
        'id': 1,
        'name': 'Cleo',
        'nested': '{"foo": 3}',  # Nested objects serialized
        'extra': 1
    },
    2: {
        'id': 2,
        'name': 'Pancakes',
        'nested': '{"foo": 3}',
        'extra': None  # Missing keys filled with None
    }
}
```
```

--------------------------------

### Python API: Compare CSV Files

Source: https://context7.com/simonw/csv-diff/llms.txt

Compares two CSV files based on a specified key column. It loads the data, performs the comparison, and generates both human-readable and JSON outputs. Handles file not found and key column errors.

```Python
from csv_diff import load_csv, load_json, compare, human_text
import json

# Example 1: CSV comparison workflow
def compare_csv_files(file1_path, file2_path, key_column):
    try:
        with open(file1_path, newline='') as f1, \
             open(file2_path, newline='') as f2:
            previous = load_csv(f1, key=key_column)
            current = load_csv(f2, key=key_column)

            diff = compare(previous, current, show_unchanged=False)

            # Generate both outputs
            human_output = human_text(
                diff,
                key=key_column,
                singular="row",
                plural="rows",
                current=current
            )
            json_output = json.dumps(diff, indent=2)

            return {
                'human': human_output,
                'json': json_output,
                'diff': diff
            }
    except FileNotFoundError as e:
        return {'error': f'File not found: {e}'}
    except KeyError as e:
        return {'error': f'Key column not found: {e}'}

# Usage
result = compare_csv_files('old_data.csv', 'new_data.csv', 'id')
if 'error' not in result:
    print(result['human'])
    print("\nJSON output available:", len(result['json']), "characters")

    # Access specific changes
    for change in result['diff']['changed']:
        print(f"Row {change['key']} changed: {change['changes']}")
```

--------------------------------

### Python API: Format Differences with `human_text`

Source: https://context7.com/simonw/csv-diff/llms.txt

Converts the structured difference dictionary produced by the `compare` function into a human-readable text format. This is useful for generating user-friendly reports or displaying comparison results directly.

```python
from csv_diff import load_csv, compare, human_text
import io

one = """id,name,age
1,Cleo,4
2,Pancakes,2"""

two = """id,name,age
1,Cleo,5
3,Bailey,1"""

previous = load_csv(io.StringIO(one), key="id")
current = load_csv(io.StringIO(two), key="id")

diff = compare(previous, current)
output = human_text(diff, key="id")
print(output)
# Output:
# 1 row changed, 1 row added, 1 row removed
#
# 1 row changed
#
#   id: 1

```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.