### Install pdf-diff from PyPI

Source: https://github.com/joshdata/pdf-diff/blob/primary/README.md

Use pip to install the pdf-diff package. This is the recommended method for most users.

```bash
pip install pdf-diff
```

--------------------------------

### Install pdf-diff and System Dependencies

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Installs the pdf-diff Python package and necessary system dependencies for Ubuntu and macOS.

```bash
sudo apt-get install python3-lxml poppler-utils
```

```bash
brew install libxml2 libxslt poppler
```

```bash
pip install pdf-diff
```

--------------------------------

### Install pdf-diff from Source

Source: https://github.com/joshdata/pdf-diff/blob/primary/README.md

Install the pdf-diff package directly from its source code. This is useful for development or when a specific version is needed.

```bash
sudo python3 setup.py install
```

--------------------------------

### Deploy pdf-diff

Source: https://github.com/joshdata/pdf-diff/blob/primary/README.md

Commands to prepare and upload a new release of the pdf-diff package using setuptools, wheel, and twine.

```bash
python3 -m pip install --user --upgrade setuptools wheel twine
python3 setup.py sdist bdist_wheel
python3 -m twine upload dist/*
```

--------------------------------

### Compare PDFs and Output PNG to Stdout

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Basic command-line usage to compare two PDF files and direct the resulting comparison image to standard output.

```bash
pdf-diff before.pdf after.pdf > comparison.png
```

--------------------------------

### Run pdf-diff to Compare PDFs

Source: https://github.com/joshdata/pdf-diff/blob/primary/README.md

Execute the pdf-diff script to compare two PDF files and output the differences as a PNG image. The output is redirected to a file.

```bash
pdf-diff before.pdf after.pdf > comparison_output.png
```

--------------------------------

### Command Line Interface - pdf-diff

Source: https://context7.com/joshdata/pdf-diff/llms.txt

The main entry point for comparing two PDF documents and generating a visual diff output as a PNG image. Various options are available for customization.

```APIDOC
## Command Line Interface - pdf-diff

### Description
The main entry point for comparing two PDF documents and generating a visual diff output as a PNG image.

### Method
CLI

### Endpoint
`pdf-diff <before.pdf> <after.pdf>`

### Parameters
#### Command Line Arguments
- `--format` (string) - Optional - Save as different format (e.g., gif, jpeg, ppm, tiff). Default is PNG.
- `--style` (string) - Optional - Customize diff marking styles (e.g., box, strike, underline). Format: `deletion_style,addition_style`. Default is `strike,underline`.
- `--top-margin` (integer) - Optional - Ignore headers with margin settings (percentage of page height). Default is 0.
- `--bottom-margin` (integer) - Optional - Ignore footers with margin settings (percentage of page height). Default is 100.
- `--result-width` (integer) - Optional - Adjust output image width in pixels. Default is 900.
- `--changes` - Optional - Render from pre-computed changes JSON (read from stdin).

### Request Example
```bash
# Basic usage - compare two PDFs and output PNG to stdout
pdf-diff before.pdf after.pdf > comparison.png

# Save as different format (gif)
pdf-diff before.pdf after.pdf --format gif > comparison.gif

# Customize diff marking styles (box for deletions, underline for additions)
pdf-diff before.pdf after.pdf --style box,underline > comparison.png

# Ignore headers/footers with margin settings (percentage of page height)
pdf-diff before.pdf after.pdf --top-margin 5 --bottom-margin 95 > comparison.png

# Adjust output image width (default: 900px)
pdf-diff before.pdf after.pdf --result-width 1200 > comparison.png

# Render from pre-computed changes JSON (read from stdin)
cat changes.json | pdf-diff --changes > comparison.png
```

### Response
Outputs a PNG image (or other specified format) to stdout representing the comparison.
```

--------------------------------

### render_changes(changes, styles, width)

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Takes a list of change objects and renders them into a side-by-side comparison image with visual annotations. Returns a PIL Image object that can be saved in various formats.

```APIDOC
## render_changes(changes, styles, width)

### Description
Takes a list of change objects and renders them into a side-by-side comparison image with visual annotations. Returns a PIL Image object that can be saved in various formats.

### Method
Python Function

### Parameters
- **changes** (list) - Required - A list of change objects, typically generated by `compute_changes`.
- **styles** (list of strings) - Required - A list of two strings specifying the style for deletions and additions, respectively. Available styles: "box", "strike", "underline". Example: `["strike", "underline"]`.
- **width** (integer) - Required - The desired width of the output image in pixels.

### Request Example
```python
from pdf_diff.command_line import compute_changes, render_changes

# Compute differences
changes = compute_changes("old.pdf", "new.pdf")

# Render with default styles (strike for deletions, underline for additions)
styles = ["strike", "underline"]
img = render_changes(changes, styles, width=900)

# Save as PNG
img.save("comparison.png", "PNG")

# Render with box style for both
styles = ["box", "box"]
img = render_changes(changes, styles, width=1200)
img.save("comparison_boxes.png", "PNG")
```

### Response
- **img** (PIL.Image.Image) - A PIL Image object representing the side-by-side comparison. This object can be saved to a file using its `save()` method.
```

--------------------------------

### Render from Pre-computed Changes JSON

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Generates a comparison image from a JSON file containing pre-computed text changes, read from standard input.

```bash
cat changes.json | pdf-diff --changes > comparison.png
```

--------------------------------

### Save Comparison as Different Image Format

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDFs and saves the output image in a specified format like GIF.

```bash
pdf-diff before.pdf after.pdf --format gif > comparison.gif
```

--------------------------------

### Adjust Output Image Width

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDFs and sets a custom width for the generated comparison image. The default width is 900px.

```bash
pdf-diff before.pdf after.pdf --result-width 1200 > comparison.png
```

--------------------------------

### Render Differences into a Comparison Image

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Renders a list of change objects into a side-by-side comparison image using specified styles. Returns a PIL Image object.

```python
from pdf_diff.command_line import compute_changes, render_changes

# Compute differences
changes = compute_changes("old.pdf", "new.pdf")

# Render with default styles (strike for deletions, underline for additions)
styles = ["strike", "underline"]
img = render_changes(changes, styles, width=900)

# Save as PNG
img.save("comparison.png", "PNG")
```

```python
from pdf_diff.command_line import compute_changes, render_changes

# Render with box style for both
styles = ["box", "box"]
img = render_changes(changes, styles, width=1200)
img.save("comparison_boxes.png", "PNG")
```

--------------------------------

### Customize Diff Marking Styles

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDFs and applies custom styles for marking deletions (first style) and additions (second style).

```bash
pdf-diff before.pdf after.pdf --style box,underline > comparison.png
```

--------------------------------

### compute_changes(pdf_fn_1, pdf_fn_2, top_margin=0, bottom_margin=100)

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDF files and returns a list of change objects representing text differences between them. Each change object contains bounding box coordinates, page information, and the changed text content.

```APIDOC
## compute_changes(pdf_fn_1, pdf_fn_2, top_margin=0, bottom_margin=100)

### Description
Compares two PDF files and returns a list of change objects representing text differences between them. Each change object contains bounding box coordinates, page information, and the changed text content.

### Method
Python Function

### Parameters
- **pdf_fn_1** (string) - Required - Path to the first PDF file.
- **pdf_fn_2** (string) - Required - Path to the second PDF file.
- **top_margin** (integer) - Optional - Percentage of page height to ignore from the top (for headers). Default is 0.
- **bottom_margin** (integer) - Optional - Percentage of page height to ignore from the bottom (for footers). Default is 100.

### Request Example
```python
from pdf_diff.command_line import compute_changes
import json

# Compare two PDF documents
changes = compute_changes("document_v1.pdf", "document_v2.pdf")

# Ignore top 5% and bottom 5% of each page (headers/footers)
changes = compute_changes(
    "document_v1.pdf",
    "document_v2.pdf",
    top_margin=5,
    bottom_margin=95
)

# Output changes as JSON
print(json.dumps(changes, indent=2, default=str))
```

### Response
- **changes** (list) - A list of change objects. Each object represents a text difference and includes details like page number, coordinates, and the text content. Returns an empty list if no changes are found.

#### Response Example
```json
[
  {
    "index": 0,
    "pdf": {"index": 0, "file": "document_v1.pdf"},
    "page": {"number": 1, "width": 612.0, "height": 792.0},
    "x": 72.0,
    "y": 100.5,
    "width": 150.0,
    "height": 12.0,
    "text": "deleted text ",
    "startIndex": 0,
    "textLength": 13
  },
  "*",  # Alignment marker between change groups
  ...
]
```
```

--------------------------------

### Rasterize PDF Page to Image

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Uses the pdftopng function to convert a specific PDF page into a PIL Image object. Specify the PDF file, page number, and desired width for rasterization. The returned image is in RGBA mode.

```python
from pdf_diff.command_line import pdftopng

# Rasterize page 1 at 900px width
img = pdftopng("document.pdf", 1, 900)
img.save("page1.png", "PNG")

# Rasterize page 3 at higher resolution
img = pdftopng("document.pdf", 3, 1800)
img.save("page3_highres.png", "PNG")

# The returned image is in RGBA mode
print(f"Image size: {img.size}, Mode: {img.mode}")
# Output: Image size: (900, 1165), Mode: RGBA
```

--------------------------------

### Ignore Headers/Footers with Margin Settings

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDFs while ignoring specified top and bottom margins, expressed as a percentage of page height.

```bash
pdf-diff before.pdf after.pdf --top-margin 5 --bottom-margin 95 > comparison.png
```

--------------------------------

### Compute Changes Between Two PDF Documents

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Compares two PDF files programmatically and returns a list of change objects. Requires importing `compute_changes`.

```python
from pdf_diff.command_line import compute_changes
import json

# Compare two PDF documents
changes = compute_changes("document_v1.pdf", "document_v2.pdf")
```

```python
from pdf_diff.command_line import compute_changes
import json

# Ignore top 5% and bottom 5% of each page (headers/footers)
changes = compute_changes(
    "document_v1.pdf",
    "document_v2.pdf",
    top_margin=5,
    bottom_margin=95
)
```

```python
from pdf_diff.command_line import compute_changes
import json

# Output changes as JSON
print(json.dumps(changes, indent=2, default=str))
```

--------------------------------

### pdf_to_bboxes(pdf_index, fn, top_margin=0, bottom_margin=100)

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Generator function that extracts text bounding boxes from a PDF file using pdftotext. Yields dictionaries containing position, dimensions, and text content for each word.

```APIDOC
## pdf_to_bboxes(pdf_index, fn, top_margin=0, bottom_margin=100)

### Description
Generator function that extracts text bounding boxes from a PDF file using pdftotext. Yields dictionaries containing position, dimensions, and text content for each word.

### Method
Python Function

### Parameters
- **pdf_index** (integer) - Required - An index for the PDF file (used internally).
- **fn** (string) - Required - Path to the PDF file.
- **top_margin** (integer) - Optional - Percentage of page height to ignore from the top. Default is 0.
- **bottom_margin** (integer) - Optional - Percentage of page height to ignore from the bottom. Default is 100.

### Request Example
```python
from pdf_diff.command_line import pdf_to_bboxes

# Extract all text bounding boxes from a PDF
for bbox in pdf_to_bboxes(0, "document.pdf"):
    print(f"Page {bbox['page']['number']}: '{bbox['text']}' at ({bbox['x']}, {bbox['y']})")

# With margin filtering (ignore top 10% and bottom 10%)
for bbox in pdf_to_bboxes(0, "document.pdf", top_margin=10, bottom_margin=90):
    print(f"Text: {bbox['text']}, Width: {bbox['width']}, Height: {bbox['height']}")
```

### Response
- **bbox** (dict) - A dictionary representing a text bounding box. Each dictionary contains:
    - `index` (integer): Sequential box index.
    - `pdf` (dict): Information about the PDF file (`index`, `file`).
    - `page` (dict): Information about the page (`number`, `width`, `height`).
    - `x` (float): Left edge coordinate (PDF coordinates).
    - `y` (float): Top edge coordinate (PDF coordinates).
    - `width` (float): Width of the bounding box.
    - `height` (float): Height of the bounding box.
    - `text` (string): The extracted text content within the bounding box.

#### Response Example
```json
{
    "index": 0,
    "pdf": {"index": 0, "file": "document.pdf"},
    "page": {"number": 1, "width": 612.0, "height": 792.0},
    "x": 72.0,
    "y": 720.5,
    "width": 45.2,
    "height": 11.0,
    "text": "Hello"
}
```
```

--------------------------------

### Function Flow Diagram

Source: https://github.com/joshdata/pdf-diff/blob/primary/README.md

A diagram illustrating the flow of operations within the pdf-diff script, from computing changes to rendering and stacking pages.

```text
compute_changes
│
├── serialize_pdf (called twice)
│    ├── pdf_to_bboxes
│    ├── mark_eol_hyphens
│    │    └── mark_eol_hyphen
│    └── Processes bounding boxes and text
│
├── perform_diff
│    └── Calls external `fast_diff_match_patch`
│
└── process_hunks
     ├── Iterates through diff hunks
     └── mark_difference (called multiple times)

render_changes
│
├── simplify_changes
├── make_pages_images
│    └── pdftopng (converts PDF pages to images)
├── realign_pages
│    ├── Splits pages into sub-pages
│    └── Adjusts box coordinates
├── draw_red_boxes
│    └── Annotates images with rectangles or lines
└── zealous_crop
     └── Crops the image to reduce unnecessary margins

stack_pages
│
└── Combines processed images into a final output
```

--------------------------------

### Extract Text Bounding Boxes from PDF

Source: https://context7.com/joshdata/pdf-diff/llms.txt

Generator function to extract text bounding boxes from a PDF file using pdftotext. Yields dictionaries with position, dimensions, and text.

```python
from pdf_diff.command_line import pdf_to_bboxes

# Extract all text bounding boxes from a PDF
for bbox in pdf_to_bboxes(0, "document.pdf"):
    print(f"Page {bbox['page']['number']}: '{bbox['text']}' at ({bbox['x']}, {bbox['y']})")
```

```python
from pdf_diff.command_line import pdf_to_bboxes

# With margin filtering (ignore top 10% and bottom 10%)
for bbox in pdf_to_bboxes(0, "document.pdf", top_margin=10, bottom_margin=90):
    print(f"Text: {bbox['text']}, Width: {bbox['width']}, Height: {bbox['height']}")
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.