### Start DocTR API Server

Source: https://github.com/mindee/doctr/blob/main/api/README.md

Clone the repository, navigate to the API directory, and run the make command to start the FastAPI server. Ensure Docker is installed and running.

```shell
git clone https://github.com/mindee/doctr.git
cd doctr/api
make run
```

--------------------------------

### Install Documentation Environment

Source: https://github.com/mindee/doctr/blob/main/docs/README.md

Installs the development environment and documentation dependencies. Ensure you are at the repository root before running.

```bash
# Make sure you are at the root of the repository before executing these commands
python -m pip install --upgrade pip
pip install -e .[viz,html]
pip install -e .[docs]
```

--------------------------------

### Install Dependencies and Run Streamlit Demo

Source: https://github.com/mindee/doctr/blob/main/demo/README.md

Navigate to the demo directory, install Python dependencies, and run the Streamlit application.

```bash
cd demo
pip install -r pt-requirements.txt
streamlit run app.py
```

--------------------------------

### Install docTR

Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md

Install docTR and its requirements using pip.

```shell
pip install -e . --upgrade
pip install -r references/requirements.txt
```

--------------------------------

### Install Demo Dependencies

Source: https://github.com/mindee/doctr/blob/main/README.md

Install the required dependencies for the docTR demo application. This includes libraries like Streamlit.

```shell
pip install -r demo/pt-requirements.txt
```

--------------------------------

### Install Contrib Module

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_contrib_modules.rst

To use the contrib module, install the `onnxruntime` package. You can install it using pip.

```bash
pip install python-doctr[contrib]
# Or
pip install onnxruntime  # pip install onnxruntime-gpu
```

--------------------------------

### Install API Dependencies

Source: https://github.com/mindee/doctr/blob/main/README.md

Install the necessary dependencies to run the docTR API template locally. This includes Poetry for dependency management.

```shell
cd api/
pip install poetry
make lock
pip install -r requirements.txt
```

--------------------------------

### Install docTR in Developer Mode

Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md

Installs docTR with all development dependencies and sets up pre-commit hooks. Ensure pip is up-to-date before running.

```shell
python -m pip install --upgrade pip
pip install -e '.[dev]'
pre-commit install
```

--------------------------------

### Install Doctr in Developer Mode with Dependencies

Source: https://github.com/mindee/doctr/blob/main/README.md

Installs the Doctr package in developer mode from source, including all optional dependencies.

```shell
pip install -e doctr/.
```

--------------------------------

### Install Python Doctr with Optional Dependencies

Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst

Install python-doctr with additional packages for visualization, HTML support, and the contrib module. This is useful for extended functionality.

```bash
pip install "python-doctr[viz,html,contrib]"
```

--------------------------------

### Run Document Analysis Script

Source: https://github.com/mindee/doctr/blob/main/README.md

Execute the example script for analyzing a PDF or image file. Use --help to see all available arguments.

```shell
python scripts/analyze.py path/to/your/doc.pdf
```

--------------------------------

### Clone and Install Python Doctr in Developer Mode

Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst

Clone the Doctr repository from GitHub and install it in editable mode using pip. This is recommended for development.

```bash
git clone https://github.com/mindee/doctr.git
```

```bash
pip install -e doctr/.
```

--------------------------------

### Basic Training Command

Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md

Start training a text detection model using the `db_resnet50` architecture. Specify paths to your training and validation datasets and the number of epochs.

```shell
python references/detection/train.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```

--------------------------------

### Multi-GPU Training with torchrun

Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md

Configure and launch training across multiple GPUs using `torchrun`. This example specifies two processes per node and uses the `nccl` backend for distributed data parallelism.

```shell
CUDA_VISIBLE_DEVICES=0,2 \
 torchrun --nproc_per_node=2 references/detection/train.py \
  db_resnet50 \
  --train_path path/to/train \
  --val_path   path/to/val \
  --epochs 5 \
  --backend nccl
```

--------------------------------

### Install Python Doctr Stable Release

Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst

Use this command to install the latest stable version of the python-doctr package via pip.

```bash
pip install python-doctr
```

--------------------------------

### Run Local API Server

Source: https://github.com/mindee/doctr/blob/main/README.md

Run the docTR API locally using uvicorn. This command starts a development server with hot-reloading.

```shell
uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
```

--------------------------------

### Install Custom Font on Linux

Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md

Install a custom TrueType font (.ttf) on a Linux system by copying it to the system fonts directory and rebuilding the font cache.

```shell
sudo cp custom-font.ttf /usr/local/share/fonts/
fc-cache -f -v
```

--------------------------------

### Train Text Recognition Model (Single GPU)

Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md

Start training a text recognition model using the provided script. Specify the model architecture, training/validation paths, and number of epochs.

```shell
python references/recognition/train.py crnn_vgg16_bn --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```

--------------------------------

### Install OnnxTR for ONNX Model Inference

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst

Install the OnnxTR package, which provides a lightweight way to perform inference with ONNX exported models using ONNXRuntime, without requiring PyTorch or TensorFlow.

```shell
pip install onnxtr[cpu]
```

--------------------------------

### Load Documents from Various Sources

Source: https://github.com/mindee/doctr/blob/main/README.md

Read documents from PDF files, single or multiple images, or URLs. For URLs, ensure `weasyprint` is installed.

```python
from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
```

--------------------------------

### Using DocTR Recognition Predictor

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Instantiate a recognition model using `recognition_predictor` and pass a dummy image to get OCR output. Ensure numpy is imported for image generation.

```python3
import numpy as np
from doctr.models import recognition_predictor
model = recognition_predictor('crnn_vgg16_bn')
dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8)
out = model([dummy_img])
```

--------------------------------

### Run API Server with Docker Compose

Source: https://github.com/mindee/doctr/blob/main/README.md

Run the docTR API server using Docker Compose. This command builds the image and starts the container in detached mode.

```shell
PORT=8002 docker-compose up -d --build
```

--------------------------------

### Train Character Classification Model

Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md

Start training a character classification model in PyTorch. Specify the model architecture, number of epochs, and device.

```shell
python references/classification/train_character.py mobilenet_v3_large --epochs 5 --device 0
```

--------------------------------

### Example OCR JSON Response

Source: https://github.com/mindee/doctr/blob/main/api/README.md

This is an example of the JSON output you can expect after a successful OCR request. It details the detected image name, orientation, language, dimensions, and extracted text items with their respective geometries and confidence scores.

```json
[
  {
    "name": "117319856-fc35bf00-ae8b-11eb-9b51-ca5aba673466.jpg",
    "orientation": {
      "value": 0,
      "confidence": null
    },
    "language": {
      "value": null,
      "confidence": null
    },
    "dimensions": [2339, 1654],
    "items": [
      {
        "blocks": [
          {
            "geometry": [
              0.7471996155154171,
              0.1787109375,
              0.9101580212741838,
              0.2080078125
            ],
            "objectness_score": 0.5,
            "lines": [
              {
                "geometry": [
                  0.7471996155154171,
                  0.1787109375,
                  0.9101580212741838,
                  0.2080078125
                ],
                "objectness_score": 0.5,
                "words": [
                  {
                    "value": "Hello",
                    "geometry": [
                      0.7471996155154171,
                      0.1796875,
                      0.8272978149561669,
                      0.20703125
                    ],
                    "objectness_score": 0.5,
                    "confidence": 1.0,
                    "crop_orientation": {"value": 0, "confidence": null}
                  },
                  {
                    "value": "world!",
                    "geometry": [
                      0.8176307908857315,
                      0.1787109375,
                      0.9101580212741838,
                      0.2080078125
                    ],
                    "objectness_score": 0.5,
                    "confidence": 1.0,
                    "crop_orientation": {"value": 0, "confidence": null}
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]
```

--------------------------------

### Run GPU-enabled Docker Container

Source: https://github.com/mindee/doctr/blob/main/README.md

Run a docTR Docker container with GPU support enabled. Ensure your host system has NVIDIA Container Toolkit installed and configured.

```shell
docker run -it --gpus all ghcr.io/mindee/doctr:torch-py3.9.18-2024-10 bash
```

--------------------------------

### Push Trained Model to Huggingface Hub

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst

Push a trained recognition model to the Huggingface Hub. Requires Huggingface account and Git LFS installation. Existing repositories will not be overwritten.

```python3
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')
```

--------------------------------

### Perform OCR via REST API using curl

Source: https://context7.com/mindee/doctr/llms.txt

Provides a `curl` command example for performing OCR on a document via the docTR REST API. Specifies detection and recognition architectures.

```bash
# Using curl:
# curl -X POST "http://localhost:8080/ocr/" \
#      -F "files=@document.pdf" \
#      -F "det_arch=db_resnet50" \
#      -F "reco_arch=crnn_vgg16_bn"
```

--------------------------------

### Display Training Help

Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md

View all available command-line arguments and options for the training script by using the --help flag.

```shell
python references/recognition/train.py --help
```

--------------------------------

### Display Training Help

Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md

View all available command-line options for the training script to customize your training process.

```python
python references/detection/train.py --help
```

--------------------------------

### Build Documentation Locally

Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md

Builds the project documentation locally using Sphinx. Modified files are rebuilt by default. To force a complete rebuild, delete the `_build` directory.

```shell
make docs-single-version
```

--------------------------------

### Display OCR Result Visualization

Source: https://github.com/mindee/doctr/blob/main/README.md

Displays the visualization of the OCR result. Requires matplotlib and mplcursors to be installed.

```python
result.show()
```

--------------------------------

### Build Custom Docker Image

Source: https://github.com/mindee/doctr/blob/main/README.md

Build a custom docTR Docker image with specified framework, Python version, and docTR version using build arguments.

```shell
docker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .
```

--------------------------------

### Initialize and Use KIE Predictor

Source: https://github.com/mindee/doctr/blob/main/README.md

Initializes a KIE predictor with specified detection and recognition architectures, analyzes a PDF document, and prints predictions for each detected class.

```python
from doctr.io import DocumentFile
from doctr.models import kie_predictor

# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

predictions = result.pages[0].predictions
for class_name in predictions.keys():
    list_predictions = predictions[class_name]
    for prediction in list_predictions:
        print(f"Prediction for {class_name}: {prediction}")
```

--------------------------------

### Sample hOCR XML Output

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

This is an example of the XML byte string output generated by the `export_as_xml` method, representing OCR results in hOCR format.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>docTR - hOCR</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="ocr-system" content="doctr 0.11.0" />
    <meta name="ocr-capabilities" content="ocr_page ocr_carea ocr_par ocr_line ocrx_word" />
  </head>
  <body>
    <div class="ocr_page" id="page_1" title="image; bbox 0 0 3456 3456; ppageno 0" />
      <div class="ocr_carea" id="block_1_1" title="bbox 857 529 2504 2710">
        <p class="ocr_par" id="par_1_1" title="bbox 857 529 2504 2710">
          <span class="ocr_line" id="line_1_1" title="bbox 857 529 2504 2710; baseline 0 0; x_size 0; x_descenders 0; x_ascenders 0">
            <span class="ocrx_word" id="word_1_1" title="bbox 1552 540 1778 580; x_wconf 99">Hello</span>
            <span class="ocrx_word" id="word_1_2" title="bbox 1782 529 1900 583; x_wconf 99">XML</span>
            <span class="ocrx_word" id="word_1_3" title="bbox 1420 597 1684 641; x_wconf 81">World</span>
          </span>
        </p>
      </div>
  </body>
</html>
```

--------------------------------

### Run Local Demo App

Source: https://github.com/mindee/doctr/blob/main/README.md

Run the docTR demo application locally using Streamlit. The app will open in your default browser.

```shell
streamlit run demo/app.py
```

--------------------------------

### Recognition Predictor Usage

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Demonstrates how to initialize and use a recognition predictor with a specified model.

```APIDOC
## Recognition Predictor Usage

### Description
This section shows how to instantiate a recognition predictor and use it for OCR tasks.

### Method
Instantiate `recognition_predictor` with a model name.

### Endpoint
N/A (Local library usage)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
import numpy as np
from doctr.models import recognition_predictor
model = recognition_predictor('crnn_vgg16_bn')
dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8)
out = model([dummy_img])
```

### Response
#### Success Response (200)
Output of the recognition model.

#### Response Example
```json
{
  "example": "[OCR Output]"
}
```
```

--------------------------------

### Accessing Model Vocabulary in DocTR

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Use `recognition_predictor` to get a model and then access its configuration to print the vocabulary. This is useful for understanding the character set a model was trained on.

```python3
from doctr.models import recognition_predictor
predictor = recognition_predictor('crnn_vgg16_bn')
print(predictor.model.cfg['vocab'])
```

--------------------------------

### Orientation Classification Help

Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md

View advanced options for training the orientation classification model.

```shell
python references/classification/train_orientation.py --help
```

--------------------------------

### Build Docker Image Locally

Source: https://github.com/mindee/doctr/blob/main/README.md

Build a docTR Docker image locally. This command creates an image tagged as 'doctr'.

```shell
docker build -t doctr .
```

--------------------------------

### Push Model via Command Line

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst

Push a trained model to the Huggingface Hub using the command line interface. This command assumes you are in the doctr repository and have a trained model script.

```bash
python3 ~/doctr/references/recognition/train.py crnn_mobilenet_v3_large --name doctr-crnn-mobilenet-v3-large --push-to-hub
```

--------------------------------

### Initialize OCR Predictor Model

Source: https://github.com/mindee/doctr/blob/main/README.md

Instantiate an OCR predictor with specified text detection and recognition architectures. Ensure `pretrained=True` to load weights.

```python
from doctr.models import ocr_predictor

model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
```

--------------------------------

### Train Orientation Classification Model

Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md

Start training an orientation classification model in PyTorch. Specify the model architecture, type (page or crop), training/validation paths, and epochs.

```shell
python references/classification/train_orientation.py resnet18 --type page --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```

--------------------------------

### OCR Predictor Document Structure Example

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Illustrates the nested structure of the Document object returned by the OCR predictor, including pages, blocks, lines, and words with their properties.

```python
Document(
    (pages): [Page(
      dimensions=(340, 600)
      (blocks): [Block(
        (lines): [Line(
          (words): [
            Word(value='No.', confidence=0.91),
            Word(value='RECEIPT', confidence=0.99),
            Word(value='DATE', confidence=0.96),
          ]
        )]
        (artefacts): []
      )]
    )]
  )
```

--------------------------------

### Load CORD Dataset (Straight and Rotated Boxes)

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst

Load the CORD dataset for training. Use `use_polygons=True` to load rotated boxes.

```python
from doctr.datasets import CORD
# Load straight boxes
train_set = CORD(train=True, download=True)
# Load rotated boxes
train_set = CORD(train=True, download=True, use_polygons=True)
img, target = train_set[0]
```

--------------------------------

### Hugging Face Hub Integration for DocTR Models

Source: https://context7.com/mindee/doctr/llms.txt

Shows how to load pre-trained models from Hugging Face Hub and push custom-trained models to the Hub for sharing. Requires authentication with a Hugging Face token.

```python
from doctr.models import from_hub, push_to_hf_hub, login_to_hub
from doctr.models import ocr_predictor
from doctr.models.recognition import crnn_mobilenet_v3_small

# Load a pre-trained model from Hugging Face Hub
custom_reco_model = from_hub("mindee/crnn_vgg16_bn")

# Use custom model in OCR pipeline
predictor = ocr_predictor(
    det_arch='db_resnet50',
    reco_arch=custom_reco_model,  # Use loaded model directly
    pretrained=True
)

# Push your own model to Hugging Face Hub
login_to_hub()  # Authenticate with HF token

# Train or load your model
my_model = crnn_mobilenet_v3_small(pretrained=True)

# Push to hub
push_to_hf_hub(
    model=my_model,
    model_name='my-username/my-custom-crnn',
    task='recognition',
    arch='crnn_mobilenet_v3_small'
)
```

--------------------------------

### Load CORD Dataset for Text Detection (Python)

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst

Load the CORD dataset for text detection. Use `use_polygons=True` to load rotated boxes instead of straight boxes.

```python
from doctr.datasets import CORD
# Load straight boxes
train_set = CORD(train=True, download=True, detection_task=True)
# Load rotated boxes
train_set = CORD(train=True, download=True, use_polygons=True, detection_task=True)
img, target = train_set[0]
```

--------------------------------

### Initialize OCR Predictor with Custom Detection Thresholds

Source: https://context7.com/mindee/doctr/llms.txt

Initializes an OCR predictor and then fine-tunes the detection model's post-processing parameters, specifically adjusting the binarization and box thresholds.

```python
from doctr.models import ocr_predictor

# Create predictor
predictor = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)

# Adjust binarization and box thresholds
predictor.det_predictor.model.postprocessor.bin_thresh = 0.5  # Higher = less noise
predictor.det_predictor.model.postprocessor.box_thresh = 0.2  # Lower = more detections
```

--------------------------------

### Send Request to OCR Route

Source: https://github.com/mindee/doctr/blob/main/README.md

Example Python script using the requests library to send a document file to the docTR API's OCR route. Supports PDF, JPEG, and PNG.

```python
import requests

params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}

with open('/path/to/your/doc.jpg', 'rb') as f:
    files = [  # application/pdf, image/jpeg, image/png supported
        ("files", ("doc.jpg", f.read(), "image/jpeg")),
    ]
print(requests.post("http://localhost:8080/ocr", params=params, files=files).json())
```

--------------------------------

### Initialize Recognition Predictor with Default Vocab

Source: https://context7.com/mindee/doctr/llms.txt

Initializes a recognition predictor using a specified architecture and pre-trained weights. Note that pre-trained models use language-specific vocabularies by default.

```python
# Use custom vocabulary with recognition model
from doctr.models import recognition_predictor

# Create predictor for specific language
french_predictor = recognition_predictor(
    arch='crnn_vgg16_bn',
    pretrained=True
)
# Note: Pre-trained models use 'french' vocab by default
```

--------------------------------

### Export OCR Result as XML

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Use the `export_as_xml` method to get the OCR output in hOCR format. This method returns a list of tuples, where each tuple contains the XML byte string and the corresponding XML element.

```python
xml_output = result.export_as_xml()
for output in xml_output:
    xml_bytes_string = output[0]
    xml_element = output[1]
```

--------------------------------

### Load Custom Detection, Recognition, and OCR Datasets

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst

Load custom datasets for detection, recognition, or OCR tasks by specifying image and label paths. The format of labels depends on the dataset type.

```python
from doctr.datasets import DetectionDataset, RecognitionDataset, OCRDataset
# Load a detection dataset
train_set = DetectionDataset(img_folder="/path/to/images", label_path="/path/to/labels.json")
# Load a recognition Dataset
train_set = RecognitionDataset(img_folder="/path/to/images", labels_path="/path/to/labels.json")
# Load an OCR dataset which contains annotations for the boxes and labels
train_set = OCRDataset(img_folder="/path/to/images", label_file="/path/to/labels.json")
img, target = train_set[0]
```

--------------------------------

### Configure OCR Predictor Batch Sizes

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Adjust the batch sizes for the detection and recognition models to optimize performance based on your hardware. This example sets detection batch size to 4 and recognition batch size to 1024.

```python3
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True, det_bs=4, reco_bs=1024)
```

--------------------------------

### Instantiate Detection Predictor

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Instantiates a detection predictor with specified parameters. Use `pretrained=True` to load pre-trained weights. `assume_straight_pages` and `preserve_aspect_ratio` can be set for specific document types.

```python
import numpy as np
from doctr.models import detection_predictor
model = detection_predictor('db_resnet50')
dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8)
out = model([dummy_img])
```

```python
from doctr.models import detection_predictor
predictor = detection_predictor('db_resnet50', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
```

--------------------------------

### Run OCR Predictor on Apple Silicon (MPS)

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Instantiate an OCR predictor and move it to an Apple Silicon (MPS) GPU for accelerated inference. Falls back to CPU if MPS is not available.

```python
import torch
from doctr.models import ocr_predictor

# For Apple Silicon (MPS)
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
predictor = ocr_predictor(pretrained=True).to(device)
```

--------------------------------

### Apply Probabilistic Transforms with RandomApply

Source: https://context7.com/mindee/doctr/llms.txt

Applies a transform with a specified probability using `RandomApply`. This allows for conditional data augmentation.

```python
# Use RandomApply for probabilistic transforms
maybe_invert = RandomApply(ColorInversion(), p=0.5)
result = maybe_invert(image)
```

--------------------------------

### Run All Style Checks

Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md

Executes all code style checks to verify adherence to project formatting and style guidelines.

```shell
make style
```

--------------------------------

### Load Pretrained Models from Huggingface Hub

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst

Load custom detection and recognition models from the Huggingface Hub and integrate them into an OCR predictor. Ensure the necessary doctr modules are imported.

```python3
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
```

--------------------------------

### Train Text Recognition Model (Multi-GPU with torchrun)

Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md

Utilize torchrun for distributed data parallel training across multiple GPUs. Configure the number of processes per node and the communication backend.

```shell
CUDA_VISIBLE_DEVICES=0,2 \
torchrun --nproc_per_node=2 references/recognition/train.py \
  crnn_vgg16_bn \
  --train_path path/to/train \
  --val_path   path/to/val \
  --epochs 5 \
  --backend nccl
```

--------------------------------

### Document Structure Navigation and Export

Source: https://context7.com/mindee/doctr/llms.txt

Demonstrates how to navigate the hierarchical structure of DocTR's `Document` objects (Page, Block, Line, Word) and export the results as plain text or a structured JSON dictionary.

```python
from doctr.models import ocr_predictor
from doctr.io import DocumentFile

predictor = ocr_predictor(pretrained=True)
pages = DocumentFile.from_pdf("document.pdf")
result = predictor(pages)

# Navigate document structure
for page in result.pages:
    print(f"Page dimensions: {page.dimensions}")
    print(f"Page orientation: {page.orientation}")
    print(f"Page language: {page.language}")

    for block in page.blocks:
        print(f"  Block geometry: {block.geometry}")
        for line in block.lines:
            print(f"    Line: {line.render()}")
            for word in line.words:
                print(f"      Word: '{word.value}' "
                      f"(confidence: {word.confidence:.2f}, "
                      f"geometry: {word.geometry})")

# Export as plain text
plain_text = result.render()

# Export as structured dictionary (JSON-compatible)
json_dict = result.export()
# Structure: {'pages': [{'page_idx': 0, 'dimensions': (h, w),
#             'blocks': [{'geometry': ..., 'lines': [...]}]}]}
```

--------------------------------

### Character Classification Help

Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md

View advanced options for training the character classification model.

```shell
python references/classification/train_character.py --help
```

--------------------------------

### Run All Code Quality Checks

Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md

Executes all code quality checks, including style verification and other tests. This command ensures the codebase adheres to project standards.

```shell
make quality
```

--------------------------------

### Load Model with Customized Preprocessor in docTR

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst

Instantiate custom detection and recognition predictors with customized `PreProcessor` objects. This allows fine-tuning parameters like input size, batch size, mean, and standard deviation for each stage. Finally, combine these predictors into an `OCRPredictor`.

```python3
import torch
from doctr.models.predictor import OCRPredictor
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor
from doctr.models import db_resnet50, crnn_vgg16_bn

det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_model.from_pretrained('<path_to_pt>')
reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.from_pretrained('<path_to_pt>')

det_predictor = DetectionPredictor(
    PreProcessor(
        (1024, 1024),
        batch_size=1,
        mean=(0.798, 0.785, 0.772),
        std=(0.264, 0.2749, 0.287)
    ),
    det_model
)

reco_predictor = RecognitionPredictor(
    PreProcessor(
        (32, 128),
        preserve_aspect_ratio=True,
        batch_size=32,
        mean=(0.694, 0.695, 0.693),
        std=(0.299, 0.296, 0.301)
    ),
    reco_model
)

predictor = OCRPredictor(det_predictor, reco_predictor)
```

--------------------------------

### Load Custom Detection and Recognition Models in docTR

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst

Load both custom detection and recognition models by specifying their respective .pt file paths. Both `pretrained` and `pretrained_backbone` should be False for each model. The `pretrained` argument for `ocr_predictor` should be set to False if custom models are loaded.

```python3
import torch
from doctr.models import ocr_predictor, db_resnet50, crnn_vgg16_bn

# Load custom detection and recognition model
det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_model.from_pretrained('<path_to_pt>', map_location="cpu")
reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.from_pretrained('<path_to_pt>', map_location="cpu")
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model, pretrained=False)
```

--------------------------------

### Build and Upload Conda Package

Source: https://github.com/mindee/doctr/wiki/Home

Commands to build a conda package and upload it to anaconda. Ensure BUILD_VERSION is set correctly before building.

```bash
conda build purge
conda build purge-all
rm -rf conda-dist/
BUILD_VERSION='X.Y.Z' python setup.py sdist
mkdir conda-dist
conda-build ./conda/ -c pytorch --output-folder conda-dist
ls -l conda-dist/noarch/*conda

anaconda upload conda-dist/noarch/*conda -u mindee
```

--------------------------------

### Create Recognition Predictor

Source: https://context7.com/mindee/doctr/llms.txt

Initializes a recognition predictor with a specified architecture and batch size. Accesses the model's vocabulary and processes cropped text images.

```python
import numpy as np
from doctr.models import recognition_predictor

reco_model = recognition_predictor(
    arch='crnn_vgg16_bn',    # Architecture: crnn_vgg16_bn, sar_resnet31, parseq, vitstr_small, etc.
    pretrained=True,
    batch_size=128
)

# Access the model's vocabulary
vocab = reco_model.model.cfg['vocab']
print(f"Model vocabulary: {vocab[:50]}...")

# Process cropped text images (expected shape: height~32, variable width)
crop1 = (255 * np.random.rand(32, 128, 3)).astype(np.uint8)
crop2 = (255 * np.random.rand(32, 200, 3)).astype(np.uint8)
results = reco_model([crop1, crop2])

# Results contain (text, confidence) tuples
for text, confidence in results:
    print(f"Recognized: '{text}' (confidence: {confidence:.2%})")
```

--------------------------------

### Load CORD Dataset with DataLoader

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst

Load the CORD dataset and wrap it with a DataLoader for batch processing. Create an iterator to fetch batches of images and targets.

```python
from doctr.datasets import CORD, DataLoader
train_set = CORD(train=True, download=True)
train_loader = DataLoader(train_set, batch_size=32)
train_iter = iter(train_loader)
images, targets = next(train_iter)
```

--------------------------------

### Run Unit Tests Locally

Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md

Executes the project's unit tests to ensure code correctness. This command mirrors the tests run in CI workflows.

```shell
make test
```

--------------------------------

### Reconstitution Utilities

Source: https://github.com/mindee/doctr/blob/main/docs/source/modules/utils.rst

Functions for reconstituting pages from model outputs.

```APIDOC
## synthesize_page

### Description
Function to synthesize a page from model outputs.

### Method
(Not specified, likely a Python function call)

### Endpoint
(Not applicable, this is a utility function)

### Parameters
(Not specified in the provided text)

### Request Example
(Not applicable)

### Response
(Not specified in the provided text)
```

--------------------------------

### Data Format for Multiple Classes

Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md

For multi-class training, the `polygons` field in `labels.json` should be a dictionary mapping class names to their respective polygons.

```json
{
    "sample_img_01.png": {
        'img_dimensions': (900, 600),
        'img_hash': "theimagedumpmyhash",
        'polygons': {
            "class_name_1": [[[x10, y10], [x20, y20], [x30, y30], [x40, y40]], ...],
            "class_name_2": [[[x11, y11], [x21, y21], [x31, y31], [x41, y41]], ...]
        }
    },
    "sample_img_02.png": {
        'img_dimensions': (900, 600),
        'img_hash': "thisisahash",
        'polygons': {
            "class_name_1": [[[x12, y12], [x22, y22], [x32, y32], [x42, y42]], ...],
            "class_name_2": [[[x13, y13], [x23, y23], [x33, y33], [x43, y43]], ...]
        }
    },
    ...
}
```

--------------------------------

### Data Format for Single Class

Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md

The `labels.json` file should map image filenames to their dimensions, SHA256 hash, and polygons. Polygons are lists of [x, y] coordinates.

```json
{
    "sample_img_01.png" = {
        'img_dimensions': (900, 600),
        'img_hash': "theimagedumpmyhash",
        'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
     },
     "sample_img_02.png" = {
        'img_dimensions': (900, 600),
        'img_hash': "thisisahash",
        'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
     }
     ...
}
```

--------------------------------

### Compile PyTorch Models with torch.compile

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst

Compile PyTorch models using `torch.compile` to optimize them for faster inference and reduced memory overhead. Note that the `master` recognition architecture is not supported for compilation.

```python
import torch
from doctr.models import (
    ocr_predictor,
    vitstr_small,
    fast_base,
    mobilenet_v3_small_crop_orientation,
    mobilenet_v3_small_page_orientation,
    crop_orientation_predictor,
    page_orientation_predictor
)

# Compile the models
detection_model = torch.compile(
    fast_base(pretrained=True).eval()
)
recognition_model = torch.compile(
    vitstr_small(pretrained=True).eval()
)
crop_orientation_model = torch.compile(
    mobilenet_v3_small_crop_orientation(pretrained=True).eval()
)
page_orientation_model = torch.compile(
    mobilenet_v3_small_page_orientation(pretrained=True).eval()
)

predictor = models.ocr_predictor(
    detection_model, recognition_model, assume_straight_pages=False
)
# NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification
# Set the orientation predictors
predictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model)

compiled_out = predictor(doc)
```

--------------------------------

### Load Custom KIE Detection Model in docTR

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst

Load a custom trained Key Information Extraction (KIE) detection model. Specify the path to the .pt file and define custom class names for the model. This model is then used to initialize a KIE predictor.

```python3
import torch
from doctr.models import kie_predictor, db_resnet50

det_model = db_resnet50(pretrained=False, pretrained_backbone=False, class_names=['total', 'date'])
det_model.from_pretrained('<path_to_pt>')
kie_predictor(det_arch=det_model, reco_arch='crnn_vgg16_bn', pretrained=True)
```

--------------------------------

### Accessing Model Vocabulary

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst

Illustrates how to retrieve the vocabulary used by a specific recognition model.

```APIDOC
## Accessing Model Vocabulary

### Description
This snippet demonstrates how to access and print the vocabulary associated with a recognition model.

### Method
Instantiate `recognition_predictor` and access the `model.cfg['vocab']` attribute.

### Endpoint
N/A (Local library usage)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```python
from doctr.models import recognition_predictor
predictor = recognition_predictor('crnn_vgg16_bn')
print(predictor.model.cfg['vocab'])
```

### Response
#### Success Response (200)
Prints the vocabulary of the specified model.

#### Response Example
```
['a', 'b', 'c', ...]
```
```

--------------------------------

### Perform OCR via REST API using Python Requests

Source: https://context7.com/mindee/doctr/llms.txt

Demonstrates how to perform OCR on a document by sending a POST request to a deployed docTR REST API. Includes file upload and parameter configuration.

```python
# Python client example using requests
import requests

# Perform OCR on a document
url = "http://localhost:8080/ocr/"
files = [
    ("files", ("document.pdf", open("document.pdf", "rb"), "application/pdf"))
]
params = {
    "det_arch": "db_resnet50",
    "reco_arch": "crnn_vgg16_bn"
}

response = requests.post(url, files=files, params=params)
results = response.json()

# Process results
for doc_result in results:
    print(f"Document: {doc_result['name']}")
    print(f"Orientation: {doc_result['orientation']}")
    print(f"Language: {doc_result['language']}")
    print(f"Dimensions: {doc_result['dimensions']}")

    for page in doc_result['items']:
        for block in page['blocks']:
            for line in block['lines']:
                for word in line['words']:
                    print(f"  {word['value']} (confidence: {word['confidence']})")
```

--------------------------------

### Train Classification and Orientation Model

Source: https://github.com/mindee/doctr/wiki/Home

Training configuration for classification and orientation models. Adjust batch_size according to machine capacity.

```bash
--epochs 40
- default config (`batch_size` adjusted to machine capacity)
```

--------------------------------

### GPU Acceleration for DocTR Predictors

Source: https://context7.com/mindee/doctr/llms.txt

Demonstrates how to run DocTR predictors on GPU (NVIDIA CUDA or Apple Silicon MPS) for faster inference. Includes checks for device availability and optional half-precision.

```python
import torch
from doctr.models import ocr_predictor

# Check available devices
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")

# NVIDIA GPU
if torch.cuda.is_available():
    device = torch.device('cuda')
    predictor = ocr_predictor(pretrained=True).to(device)
    # Or shorthand: predictor = ocr_predictor(pretrained=True).cuda()

# Apple Silicon (MPS)
elif torch.backends.mps.is_available():
    device = torch.device('mps')
    predictor = ocr_predictor(pretrained=True).to(device)

# CPU fallback
else:
    device = torch.device('cpu')
    predictor = ocr_predictor(pretrained=True)

# Enable half-precision for faster inference (GPU only)
if device.type in ('cuda', 'mps'):
    predictor = predictor.half()

# Process documents
from doctr.io import DocumentFile
pages = DocumentFile.from_pdf("document.pdf")
result = predictor(pages)
```

--------------------------------

### Create Custom Data Augmentation Pipeline

Source: https://context7.com/mindee/doctr/llms.txt

Builds a custom data augmentation pipeline using docTR's transform modules for training. Includes image-only transforms and transforms that adjust bounding boxes.

```python
import numpy as np
import torch
from doctr.transforms import (
    SampleCompose, ImageTransform, ColorInversion,
    RandomRotate, RandomCrop, OneOf, RandomApply
)

# Create a transform pipeline for training
train_transforms = SampleCompose([
    # Image-only transforms wrapped for (image, target) compatibility
    ImageTransform(ColorInversion(min_val=0.6)),

    # Random rotation with bounding box adjustment
    RandomRotate(max_angle=10, expand=False),

    # Random crop with box clipping
    RandomCrop(scale=(0.5, 1.0), ratio=(0.75, 1.33)),
])

# Apply transforms to image and bounding boxes
image = torch.rand(3, 256, 256)
boxes = np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.9]])  # Relative coordinates

augmented_image, augmented_boxes = train_transforms(image, boxes)
```

--------------------------------

### Load CORD Dataset for Recognition Task

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst

Loads the CORD dataset for a recognition task. Use `use_polygons=True` to crop rotated boxes (always regular).

```python
from doctr.datasets import CORD
    # Crop boxes as is (can contain irregular)
    train_set = CORD(train=True, download=True, recognition_task=True)
    # Crop rotated boxes (always regular)
    train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
    img, target = train_set[0]
```

--------------------------------

### Enable Half-Precision Inference (PyTorch GPU)

Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst

Use this snippet to enable half-precision (FP16) inference for PyTorch models on GPU devices. This reduces memory usage and speeds up inference.

```python3
import torch
predictor = ocr_predictor(
    reco_arch="crnn_mobilenet_v3_small",
    det_arch="linknet_resnet34",
    pretrained=True
).cuda().half()
```

--------------------------------

### Access and Print Available Character Vocabularies

Source: https://context7.com/mindee/doctr/llms.txt

Retrieves and prints a list of available character vocabularies provided by docTR. Shows the first 20 available keys.

```python
from doctr.datasets.vocabs import VOCABS

# Available vocabularies
print(f"Available vocabularies: {list(VOCABS.keys())[:20]}...")

# Get vocabulary for specific languages
english_vocab = VOCABS['english']
french_vocab = VOCABS['french']
multilingual_vocab = VOCABS['multilingual']

# Language-specific vocabularies
print(f"English vocab length: {len(english_vocab)}")
print(f"French vocab length: {len(french_vocab)}")
print(f"Multilingual vocab length: {len(multilingual_vocab)}")
```