### Start DocTR API Server
Source: https://github.com/mindee/doctr/blob/main/api/README.md
Clone the repository, navigate to the API directory, and run the make command to start the FastAPI server. Ensure Docker is installed and running.
```shell
git clone https://github.com/mindee/doctr.git
cd doctr/api
make run
```
--------------------------------
### Install Documentation Environment
Source: https://github.com/mindee/doctr/blob/main/docs/README.md
Installs the development environment and documentation dependencies. Ensure you are at the repository root before running.
```bash
# Make sure you are at the root of the repository before executing these commands
python -m pip install --upgrade pip
pip install -e .[viz,html]
pip install -e .[docs]
```
--------------------------------
### Install Dependencies and Run Streamlit Demo
Source: https://github.com/mindee/doctr/blob/main/demo/README.md
Navigate to the demo directory, install Python dependencies, and run the Streamlit application.
```bash
cd demo
pip install -r pt-requirements.txt
streamlit run app.py
```
--------------------------------
### Install docTR
Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md
Install docTR and its requirements using pip.
```shell
pip install -e . --upgrade
pip install -r references/requirements.txt
```
--------------------------------
### Install Demo Dependencies
Source: https://github.com/mindee/doctr/blob/main/README.md
Install the required dependencies for the docTR demo application. This includes libraries like Streamlit.
```shell
pip install -r demo/pt-requirements.txt
```
--------------------------------
### Install Contrib Module
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_contrib_modules.rst
To use the contrib module, install the `onnxruntime` package. You can install it using pip.
```bash
pip install python-doctr[contrib]
# Or
pip install onnxruntime # pip install onnxruntime-gpu
```
--------------------------------
### Install API Dependencies
Source: https://github.com/mindee/doctr/blob/main/README.md
Install the necessary dependencies to run the docTR API template locally. This includes Poetry for dependency management.
```shell
cd api/
pip install poetry
make lock
pip install -r requirements.txt
```
--------------------------------
### Install docTR in Developer Mode
Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md
Installs docTR with all development dependencies and sets up pre-commit hooks. Ensure pip is up-to-date before running.
```shell
python -m pip install --upgrade pip
pip install -e '.[dev]'
pre-commit install
```
--------------------------------
### Install Doctr in Developer Mode with Dependencies
Source: https://github.com/mindee/doctr/blob/main/README.md
Installs the Doctr package in developer mode from source, including all optional dependencies.
```shell
pip install -e doctr/.
```
--------------------------------
### Install Python Doctr with Optional Dependencies
Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst
Install python-doctr with additional packages for visualization, HTML support, and the contrib module. This is useful for extended functionality.
```bash
pip install "python-doctr[viz,html,contrib]"
```
--------------------------------
### Run Document Analysis Script
Source: https://github.com/mindee/doctr/blob/main/README.md
Execute the example script for analyzing a PDF or image file. Use --help to see all available arguments.
```shell
python scripts/analyze.py path/to/your/doc.pdf
```
--------------------------------
### Clone and Install Python Doctr in Developer Mode
Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst
Clone the Doctr repository from GitHub and install it in editable mode using pip. This is recommended for development.
```bash
git clone https://github.com/mindee/doctr.git
```
```bash
pip install -e doctr/.
```
--------------------------------
### Basic Training Command
Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md
Start training a text detection model using the `db_resnet50` architecture. Specify paths to your training and validation datasets and the number of epochs.
```shell
python references/detection/train.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```
--------------------------------
### Multi-GPU Training with torchrun
Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md
Configure and launch training across multiple GPUs using `torchrun`. This example specifies two processes per node and uses the `nccl` backend for distributed data parallelism.
```shell
CUDA_VISIBLE_DEVICES=0,2 \
torchrun --nproc_per_node=2 references/detection/train.py \
db_resnet50 \
--train_path path/to/train \
--val_path path/to/val \
--epochs 5 \
--backend nccl
```
--------------------------------
### Install Python Doctr Stable Release
Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst
Use this command to install the latest stable version of the python-doctr package via pip.
```bash
pip install python-doctr
```
--------------------------------
### Run Local API Server
Source: https://github.com/mindee/doctr/blob/main/README.md
Run the docTR API locally using uvicorn. This command starts a development server with hot-reloading.
```shell
uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app
```
--------------------------------
### Install Custom Font on Linux
Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md
Install a custom TrueType font (.ttf) on a Linux system by copying it to the system fonts directory and rebuilding the font cache.
```shell
sudo cp custom-font.ttf /usr/local/share/fonts/
fc-cache -f -v
```
--------------------------------
### Train Text Recognition Model (Single GPU)
Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md
Start training a text recognition model using the provided script. Specify the model architecture, training/validation paths, and number of epochs.
```shell
python references/recognition/train.py crnn_vgg16_bn --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```
--------------------------------
### Install OnnxTR for ONNX Model Inference
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst
Install the OnnxTR package, which provides a lightweight way to perform inference with ONNX exported models using ONNXRuntime, without requiring PyTorch or TensorFlow.
```shell
pip install onnxtr[cpu]
```
--------------------------------
### Load Documents from Various Sources
Source: https://github.com/mindee/doctr/blob/main/README.md
Read documents from PDF files, single or multiple images, or URLs. For URLs, ensure `weasyprint` is installed.
```python
from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage (requires `weasyprint` to be installed)
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
```
--------------------------------
### Using DocTR Recognition Predictor
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Instantiate a recognition model using `recognition_predictor` and pass a dummy image to get OCR output. Ensure numpy is imported for image generation.
```python3
import numpy as np
from doctr.models import recognition_predictor
model = recognition_predictor('crnn_vgg16_bn')
dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8)
out = model([dummy_img])
```
--------------------------------
### Run API Server with Docker Compose
Source: https://github.com/mindee/doctr/blob/main/README.md
Run the docTR API server using Docker Compose. This command builds the image and starts the container in detached mode.
```shell
PORT=8002 docker-compose up -d --build
```
--------------------------------
### Train Character Classification Model
Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md
Start training a character classification model in PyTorch. Specify the model architecture, number of epochs, and device.
```shell
python references/classification/train_character.py mobilenet_v3_large --epochs 5 --device 0
```
--------------------------------
### Example OCR JSON Response
Source: https://github.com/mindee/doctr/blob/main/api/README.md
This is an example of the JSON output you can expect after a successful OCR request. It details the detected image name, orientation, language, dimensions, and extracted text items with their respective geometries and confidence scores.
```json
[
{
"name": "117319856-fc35bf00-ae8b-11eb-9b51-ca5aba673466.jpg",
"orientation": {
"value": 0,
"confidence": null
},
"language": {
"value": null,
"confidence": null
},
"dimensions": [2339, 1654],
"items": [
{
"blocks": [
{
"geometry": [
0.7471996155154171,
0.1787109375,
0.9101580212741838,
0.2080078125
],
"objectness_score": 0.5,
"lines": [
{
"geometry": [
0.7471996155154171,
0.1787109375,
0.9101580212741838,
0.2080078125
],
"objectness_score": 0.5,
"words": [
{
"value": "Hello",
"geometry": [
0.7471996155154171,
0.1796875,
0.8272978149561669,
0.20703125
],
"objectness_score": 0.5,
"confidence": 1.0,
"crop_orientation": {"value": 0, "confidence": null}
},
{
"value": "world!",
"geometry": [
0.8176307908857315,
0.1787109375,
0.9101580212741838,
0.2080078125
],
"objectness_score": 0.5,
"confidence": 1.0,
"crop_orientation": {"value": 0, "confidence": null}
}
]
}
]
}
]
}
]
}
]
```
--------------------------------
### Run GPU-enabled Docker Container
Source: https://github.com/mindee/doctr/blob/main/README.md
Run a docTR Docker container with GPU support enabled. Ensure your host system has NVIDIA Container Toolkit installed and configured.
```shell
docker run -it --gpus all ghcr.io/mindee/doctr:torch-py3.9.18-2024-10 bash
```
--------------------------------
### Push Trained Model to Huggingface Hub
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst
Push a trained recognition model to the Huggingface Hub. Requires Huggingface account and Git LFS installation. Existing repositories will not be overwritten.
```python3
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')
```
--------------------------------
### Perform OCR via REST API using curl
Source: https://context7.com/mindee/doctr/llms.txt
Provides a `curl` command example for performing OCR on a document via the docTR REST API. Specifies detection and recognition architectures.
```bash
# Using curl:
# curl -X POST "http://localhost:8080/ocr/" \
# -F "files=@document.pdf" \
# -F "det_arch=db_resnet50" \
# -F "reco_arch=crnn_vgg16_bn"
```
--------------------------------
### Display Training Help
Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md
View all available command-line arguments and options for the training script by using the --help flag.
```shell
python references/recognition/train.py --help
```
--------------------------------
### Display Training Help
Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md
View all available command-line options for the training script to customize your training process.
```python
python references/detection/train.py --help
```
--------------------------------
### Build Documentation Locally
Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md
Builds the project documentation locally using Sphinx. Modified files are rebuilt by default. To force a complete rebuild, delete the `_build` directory.
```shell
make docs-single-version
```
--------------------------------
### Display OCR Result Visualization
Source: https://github.com/mindee/doctr/blob/main/README.md
Displays the visualization of the OCR result. Requires matplotlib and mplcursors to be installed.
```python
result.show()
```
--------------------------------
### Build Custom Docker Image
Source: https://github.com/mindee/doctr/blob/main/README.md
Build a custom docTR Docker image with specified framework, Python version, and docTR version using build arguments.
```shell
docker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .
```
--------------------------------
### Initialize and Use KIE Predictor
Source: https://github.com/mindee/doctr/blob/main/README.md
Initializes a KIE predictor with specified detection and recognition architectures, analyzes a PDF document, and prints predictions for each detected class.
```python
from doctr.io import DocumentFile
from doctr.models import kie_predictor
# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)
predictions = result.pages[0].predictions
for class_name in predictions.keys():
list_predictions = predictions[class_name]
for prediction in list_predictions:
print(f"Prediction for {class_name}: {prediction}")
```
--------------------------------
### Sample hOCR XML Output
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
This is an example of the XML byte string output generated by the `export_as_xml` method, representing OCR results in hOCR format.
```xml
docTR - hOCR
```
--------------------------------
### Run Local Demo App
Source: https://github.com/mindee/doctr/blob/main/README.md
Run the docTR demo application locally using Streamlit. The app will open in your default browser.
```shell
streamlit run demo/app.py
```
--------------------------------
### Recognition Predictor Usage
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Demonstrates how to initialize and use a recognition predictor with a specified model.
```APIDOC
## Recognition Predictor Usage
### Description
This section shows how to instantiate a recognition predictor and use it for OCR tasks.
### Method
Instantiate `recognition_predictor` with a model name.
### Endpoint
N/A (Local library usage)
### Parameters
#### Path Parameters
None
#### Query Parameters
None
#### Request Body
None
### Request Example
```python
import numpy as np
from doctr.models import recognition_predictor
model = recognition_predictor('crnn_vgg16_bn')
dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8)
out = model([dummy_img])
```
### Response
#### Success Response (200)
Output of the recognition model.
#### Response Example
```json
{
"example": "[OCR Output]"
}
```
```
--------------------------------
### Accessing Model Vocabulary in DocTR
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Use `recognition_predictor` to get a model and then access its configuration to print the vocabulary. This is useful for understanding the character set a model was trained on.
```python3
from doctr.models import recognition_predictor
predictor = recognition_predictor('crnn_vgg16_bn')
print(predictor.model.cfg['vocab'])
```
--------------------------------
### Orientation Classification Help
Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md
View advanced options for training the orientation classification model.
```shell
python references/classification/train_orientation.py --help
```
--------------------------------
### Build Docker Image Locally
Source: https://github.com/mindee/doctr/blob/main/README.md
Build a docTR Docker image locally. This command creates an image tagged as 'doctr'.
```shell
docker build -t doctr .
```
--------------------------------
### Push Model via Command Line
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst
Push a trained model to the Huggingface Hub using the command line interface. This command assumes you are in the doctr repository and have a trained model script.
```bash
python3 ~/doctr/references/recognition/train.py crnn_mobilenet_v3_large --name doctr-crnn-mobilenet-v3-large --push-to-hub
```
--------------------------------
### Initialize OCR Predictor Model
Source: https://github.com/mindee/doctr/blob/main/README.md
Instantiate an OCR predictor with specified text detection and recognition architectures. Ensure `pretrained=True` to load weights.
```python
from doctr.models import ocr_predictor
model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
```
--------------------------------
### Train Orientation Classification Model
Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md
Start training an orientation classification model in PyTorch. Specify the model architecture, type (page or crop), training/validation paths, and epochs.
```shell
python references/classification/train_orientation.py resnet18 --type page --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5
```
--------------------------------
### OCR Predictor Document Structure Example
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Illustrates the nested structure of the Document object returned by the OCR predictor, including pages, blocks, lines, and words with their properties.
```python
Document(
(pages): [Page(
dimensions=(340, 600)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='No.', confidence=0.91),
Word(value='RECEIPT', confidence=0.99),
Word(value='DATE', confidence=0.96),
]
)]
(artefacts): []
)]
)]
)
```
--------------------------------
### Load CORD Dataset (Straight and Rotated Boxes)
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst
Load the CORD dataset for training. Use `use_polygons=True` to load rotated boxes.
```python
from doctr.datasets import CORD
# Load straight boxes
train_set = CORD(train=True, download=True)
# Load rotated boxes
train_set = CORD(train=True, download=True, use_polygons=True)
img, target = train_set[0]
```
--------------------------------
### Hugging Face Hub Integration for DocTR Models
Source: https://context7.com/mindee/doctr/llms.txt
Shows how to load pre-trained models from Hugging Face Hub and push custom-trained models to the Hub for sharing. Requires authentication with a Hugging Face token.
```python
from doctr.models import from_hub, push_to_hf_hub, login_to_hub
from doctr.models import ocr_predictor
from doctr.models.recognition import crnn_mobilenet_v3_small
# Load a pre-trained model from Hugging Face Hub
custom_reco_model = from_hub("mindee/crnn_vgg16_bn")
# Use custom model in OCR pipeline
predictor = ocr_predictor(
det_arch='db_resnet50',
reco_arch=custom_reco_model, # Use loaded model directly
pretrained=True
)
# Push your own model to Hugging Face Hub
login_to_hub() # Authenticate with HF token
# Train or load your model
my_model = crnn_mobilenet_v3_small(pretrained=True)
# Push to hub
push_to_hf_hub(
model=my_model,
model_name='my-username/my-custom-crnn',
task='recognition',
arch='crnn_mobilenet_v3_small'
)
```
--------------------------------
### Load CORD Dataset for Text Detection (Python)
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst
Load the CORD dataset for text detection. Use `use_polygons=True` to load rotated boxes instead of straight boxes.
```python
from doctr.datasets import CORD
# Load straight boxes
train_set = CORD(train=True, download=True, detection_task=True)
# Load rotated boxes
train_set = CORD(train=True, download=True, use_polygons=True, detection_task=True)
img, target = train_set[0]
```
--------------------------------
### Initialize OCR Predictor with Custom Detection Thresholds
Source: https://context7.com/mindee/doctr/llms.txt
Initializes an OCR predictor and then fine-tunes the detection model's post-processing parameters, specifically adjusting the binarization and box thresholds.
```python
from doctr.models import ocr_predictor
# Create predictor
predictor = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
# Adjust binarization and box thresholds
predictor.det_predictor.model.postprocessor.bin_thresh = 0.5 # Higher = less noise
predictor.det_predictor.model.postprocessor.box_thresh = 0.2 # Lower = more detections
```
--------------------------------
### Send Request to OCR Route
Source: https://github.com/mindee/doctr/blob/main/README.md
Example Python script using the requests library to send a document file to the docTR API's OCR route. Supports PDF, JPEG, and PNG.
```python
import requests
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
with open('/path/to/your/doc.jpg', 'rb') as f:
files = [ # application/pdf, image/jpeg, image/png supported
("files", ("doc.jpg", f.read(), "image/jpeg")),
]
print(requests.post("http://localhost:8080/ocr", params=params, files=files).json())
```
--------------------------------
### Initialize Recognition Predictor with Default Vocab
Source: https://context7.com/mindee/doctr/llms.txt
Initializes a recognition predictor using a specified architecture and pre-trained weights. Note that pre-trained models use language-specific vocabularies by default.
```python
# Use custom vocabulary with recognition model
from doctr.models import recognition_predictor
# Create predictor for specific language
french_predictor = recognition_predictor(
arch='crnn_vgg16_bn',
pretrained=True
)
# Note: Pre-trained models use 'french' vocab by default
```
--------------------------------
### Export OCR Result as XML
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Use the `export_as_xml` method to get the OCR output in hOCR format. This method returns a list of tuples, where each tuple contains the XML byte string and the corresponding XML element.
```python
xml_output = result.export_as_xml()
for output in xml_output:
xml_bytes_string = output[0]
xml_element = output[1]
```
--------------------------------
### Load Custom Detection, Recognition, and OCR Datasets
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst
Load custom datasets for detection, recognition, or OCR tasks by specifying image and label paths. The format of labels depends on the dataset type.
```python
from doctr.datasets import DetectionDataset, RecognitionDataset, OCRDataset
# Load a detection dataset
train_set = DetectionDataset(img_folder="/path/to/images", label_path="/path/to/labels.json")
# Load a recognition Dataset
train_set = RecognitionDataset(img_folder="/path/to/images", labels_path="/path/to/labels.json")
# Load an OCR dataset which contains annotations for the boxes and labels
train_set = OCRDataset(img_folder="/path/to/images", label_file="/path/to/labels.json")
img, target = train_set[0]
```
--------------------------------
### Configure OCR Predictor Batch Sizes
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Adjust the batch sizes for the detection and recognition models to optimize performance based on your hardware. This example sets detection batch size to 4 and recognition batch size to 1024.
```python3
from doctr.models import ocr_predictor
model = ocr_predictor(pretrained=True, det_bs=4, reco_bs=1024)
```
--------------------------------
### Instantiate Detection Predictor
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Instantiates a detection predictor with specified parameters. Use `pretrained=True` to load pre-trained weights. `assume_straight_pages` and `preserve_aspect_ratio` can be set for specific document types.
```python
import numpy as np
from doctr.models import detection_predictor
model = detection_predictor('db_resnet50')
dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8)
out = model([dummy_img])
```
```python
from doctr.models import detection_predictor
predictor = detection_predictor('db_resnet50', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
```
--------------------------------
### Run OCR Predictor on Apple Silicon (MPS)
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Instantiate an OCR predictor and move it to an Apple Silicon (MPS) GPU for accelerated inference. Falls back to CPU if MPS is not available.
```python
import torch
from doctr.models import ocr_predictor
# For Apple Silicon (MPS)
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
predictor = ocr_predictor(pretrained=True).to(device)
```
--------------------------------
### Apply Probabilistic Transforms with RandomApply
Source: https://context7.com/mindee/doctr/llms.txt
Applies a transform with a specified probability using `RandomApply`. This allows for conditional data augmentation.
```python
# Use RandomApply for probabilistic transforms
maybe_invert = RandomApply(ColorInversion(), p=0.5)
result = maybe_invert(image)
```
--------------------------------
### Run All Style Checks
Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md
Executes all code style checks to verify adherence to project formatting and style guidelines.
```shell
make style
```
--------------------------------
### Load Pretrained Models from Huggingface Hub
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst
Load custom detection and recognition models from the Huggingface Hub and integrate them into an OCR predictor. Ensure the necessary doctr modules are imported.
```python3
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
```
--------------------------------
### Train Text Recognition Model (Multi-GPU with torchrun)
Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md
Utilize torchrun for distributed data parallel training across multiple GPUs. Configure the number of processes per node and the communication backend.
```shell
CUDA_VISIBLE_DEVICES=0,2 \
torchrun --nproc_per_node=2 references/recognition/train.py \
crnn_vgg16_bn \
--train_path path/to/train \
--val_path path/to/val \
--epochs 5 \
--backend nccl
```
--------------------------------
### Document Structure Navigation and Export
Source: https://context7.com/mindee/doctr/llms.txt
Demonstrates how to navigate the hierarchical structure of DocTR's `Document` objects (Page, Block, Line, Word) and export the results as plain text or a structured JSON dictionary.
```python
from doctr.models import ocr_predictor
from doctr.io import DocumentFile
predictor = ocr_predictor(pretrained=True)
pages = DocumentFile.from_pdf("document.pdf")
result = predictor(pages)
# Navigate document structure
for page in result.pages:
print(f"Page dimensions: {page.dimensions}")
print(f"Page orientation: {page.orientation}")
print(f"Page language: {page.language}")
for block in page.blocks:
print(f" Block geometry: {block.geometry}")
for line in block.lines:
print(f" Line: {line.render()}")
for word in line.words:
print(f" Word: '{word.value}' "
f"(confidence: {word.confidence:.2f}, "
f"geometry: {word.geometry})")
# Export as plain text
plain_text = result.render()
# Export as structured dictionary (JSON-compatible)
json_dict = result.export()
# Structure: {'pages': [{'page_idx': 0, 'dimensions': (h, w),
# 'blocks': [{'geometry': ..., 'lines': [...]}]}]}
```
--------------------------------
### Character Classification Help
Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md
View advanced options for training the character classification model.
```shell
python references/classification/train_character.py --help
```
--------------------------------
### Run All Code Quality Checks
Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md
Executes all code quality checks, including style verification and other tests. This command ensures the codebase adheres to project standards.
```shell
make quality
```
--------------------------------
### Load Model with Customized Preprocessor in docTR
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst
Instantiate custom detection and recognition predictors with customized `PreProcessor` objects. This allows fine-tuning parameters like input size, batch size, mean, and standard deviation for each stage. Finally, combine these predictors into an `OCRPredictor`.
```python3
import torch
from doctr.models.predictor import OCRPredictor
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models.preprocessor import PreProcessor
from doctr.models import db_resnet50, crnn_vgg16_bn
det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_model.from_pretrained('')
reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.from_pretrained('')
det_predictor = DetectionPredictor(
PreProcessor(
(1024, 1024),
batch_size=1,
mean=(0.798, 0.785, 0.772),
std=(0.264, 0.2749, 0.287)
),
det_model
)
reco_predictor = RecognitionPredictor(
PreProcessor(
(32, 128),
preserve_aspect_ratio=True,
batch_size=32,
mean=(0.694, 0.695, 0.693),
std=(0.299, 0.296, 0.301)
),
reco_model
)
predictor = OCRPredictor(det_predictor, reco_predictor)
```
--------------------------------
### Load Custom Detection and Recognition Models in docTR
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst
Load both custom detection and recognition models by specifying their respective .pt file paths. Both `pretrained` and `pretrained_backbone` should be False for each model. The `pretrained` argument for `ocr_predictor` should be set to False if custom models are loaded.
```python3
import torch
from doctr.models import ocr_predictor, db_resnet50, crnn_vgg16_bn
# Load custom detection and recognition model
det_model = db_resnet50(pretrained=False, pretrained_backbone=False)
det_model.from_pretrained('', map_location="cpu")
reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.from_pretrained('', map_location="cpu")
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model, pretrained=False)
```
--------------------------------
### Build and Upload Conda Package
Source: https://github.com/mindee/doctr/wiki/Home
Commands to build a conda package and upload it to anaconda. Ensure BUILD_VERSION is set correctly before building.
```bash
conda build purge
conda build purge-all
rm -rf conda-dist/
BUILD_VERSION='X.Y.Z' python setup.py sdist
mkdir conda-dist
conda-build ./conda/ -c pytorch --output-folder conda-dist
ls -l conda-dist/noarch/*conda
anaconda upload conda-dist/noarch/*conda -u mindee
```
--------------------------------
### Create Recognition Predictor
Source: https://context7.com/mindee/doctr/llms.txt
Initializes a recognition predictor with a specified architecture and batch size. Accesses the model's vocabulary and processes cropped text images.
```python
import numpy as np
from doctr.models import recognition_predictor
reco_model = recognition_predictor(
arch='crnn_vgg16_bn', # Architecture: crnn_vgg16_bn, sar_resnet31, parseq, vitstr_small, etc.
pretrained=True,
batch_size=128
)
# Access the model's vocabulary
vocab = reco_model.model.cfg['vocab']
print(f"Model vocabulary: {vocab[:50]}...")
# Process cropped text images (expected shape: height~32, variable width)
crop1 = (255 * np.random.rand(32, 128, 3)).astype(np.uint8)
crop2 = (255 * np.random.rand(32, 200, 3)).astype(np.uint8)
results = reco_model([crop1, crop2])
# Results contain (text, confidence) tuples
for text, confidence in results:
print(f"Recognized: '{text}' (confidence: {confidence:.2%})")
```
--------------------------------
### Load CORD Dataset with DataLoader
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst
Load the CORD dataset and wrap it with a DataLoader for batch processing. Create an iterator to fetch batches of images and targets.
```python
from doctr.datasets import CORD, DataLoader
train_set = CORD(train=True, download=True)
train_loader = DataLoader(train_set, batch_size=32)
train_iter = iter(train_loader)
images, targets = next(train_iter)
```
--------------------------------
### Run Unit Tests Locally
Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md
Executes the project's unit tests to ensure code correctness. This command mirrors the tests run in CI workflows.
```shell
make test
```
--------------------------------
### Reconstitution Utilities
Source: https://github.com/mindee/doctr/blob/main/docs/source/modules/utils.rst
Functions for reconstituting pages from model outputs.
```APIDOC
## synthesize_page
### Description
Function to synthesize a page from model outputs.
### Method
(Not specified, likely a Python function call)
### Endpoint
(Not applicable, this is a utility function)
### Parameters
(Not specified in the provided text)
### Request Example
(Not applicable)
### Response
(Not specified in the provided text)
```
--------------------------------
### Data Format for Multiple Classes
Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md
For multi-class training, the `polygons` field in `labels.json` should be a dictionary mapping class names to their respective polygons.
```json
{
"sample_img_01.png": {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': {
"class_name_1": [[[x10, y10], [x20, y20], [x30, y30], [x40, y40]], ...],
"class_name_2": [[[x11, y11], [x21, y21], [x31, y31], [x41, y41]], ...]
}
},
"sample_img_02.png": {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': {
"class_name_1": [[[x12, y12], [x22, y22], [x32, y32], [x42, y42]], ...],
"class_name_2": [[[x13, y13], [x23, y23], [x33, y33], [x43, y43]], ...]
}
},
...
}
```
--------------------------------
### Data Format for Single Class
Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md
The `labels.json` file should map image filenames to their dimensions, SHA256 hash, and polygons. Polygons are lists of [x, y] coordinates.
```json
{
"sample_img_01.png" = {
'img_dimensions': (900, 600),
'img_hash': "theimagedumpmyhash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
},
"sample_img_02.png" = {
'img_dimensions': (900, 600),
'img_hash': "thisisahash",
'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...]
}
...
}
```
--------------------------------
### Compile PyTorch Models with torch.compile
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst
Compile PyTorch models using `torch.compile` to optimize them for faster inference and reduced memory overhead. Note that the `master` recognition architecture is not supported for compilation.
```python
import torch
from doctr.models import (
ocr_predictor,
vitstr_small,
fast_base,
mobilenet_v3_small_crop_orientation,
mobilenet_v3_small_page_orientation,
crop_orientation_predictor,
page_orientation_predictor
)
# Compile the models
detection_model = torch.compile(
fast_base(pretrained=True).eval()
)
recognition_model = torch.compile(
vitstr_small(pretrained=True).eval()
)
crop_orientation_model = torch.compile(
mobilenet_v3_small_crop_orientation(pretrained=True).eval()
)
page_orientation_model = torch.compile(
mobilenet_v3_small_page_orientation(pretrained=True).eval()
)
predictor = models.ocr_predictor(
detection_model, recognition_model, assume_straight_pages=False
)
# NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification
# Set the orientation predictors
predictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model)
compiled_out = predictor(doc)
```
--------------------------------
### Load Custom KIE Detection Model in docTR
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst
Load a custom trained Key Information Extraction (KIE) detection model. Specify the path to the .pt file and define custom class names for the model. This model is then used to initialize a KIE predictor.
```python3
import torch
from doctr.models import kie_predictor, db_resnet50
det_model = db_resnet50(pretrained=False, pretrained_backbone=False, class_names=['total', 'date'])
det_model.from_pretrained('')
kie_predictor(det_arch=det_model, reco_arch='crnn_vgg16_bn', pretrained=True)
```
--------------------------------
### Accessing Model Vocabulary
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst
Illustrates how to retrieve the vocabulary used by a specific recognition model.
```APIDOC
## Accessing Model Vocabulary
### Description
This snippet demonstrates how to access and print the vocabulary associated with a recognition model.
### Method
Instantiate `recognition_predictor` and access the `model.cfg['vocab']` attribute.
### Endpoint
N/A (Local library usage)
### Parameters
#### Path Parameters
None
#### Query Parameters
None
#### Request Body
None
### Request Example
```python
from doctr.models import recognition_predictor
predictor = recognition_predictor('crnn_vgg16_bn')
print(predictor.model.cfg['vocab'])
```
### Response
#### Success Response (200)
Prints the vocabulary of the specified model.
#### Response Example
```
['a', 'b', 'c', ...]
```
```
--------------------------------
### Perform OCR via REST API using Python Requests
Source: https://context7.com/mindee/doctr/llms.txt
Demonstrates how to perform OCR on a document by sending a POST request to a deployed docTR REST API. Includes file upload and parameter configuration.
```python
# Python client example using requests
import requests
# Perform OCR on a document
url = "http://localhost:8080/ocr/"
files = [
("files", ("document.pdf", open("document.pdf", "rb"), "application/pdf"))
]
params = {
"det_arch": "db_resnet50",
"reco_arch": "crnn_vgg16_bn"
}
response = requests.post(url, files=files, params=params)
results = response.json()
# Process results
for doc_result in results:
print(f"Document: {doc_result['name']}")
print(f"Orientation: {doc_result['orientation']}")
print(f"Language: {doc_result['language']}")
print(f"Dimensions: {doc_result['dimensions']}")
for page in doc_result['items']:
for block in page['blocks']:
for line in block['lines']:
for word in line['words']:
print(f" {word['value']} (confidence: {word['confidence']})")
```
--------------------------------
### Train Classification and Orientation Model
Source: https://github.com/mindee/doctr/wiki/Home
Training configuration for classification and orientation models. Adjust batch_size according to machine capacity.
```bash
--epochs 40
- default config (`batch_size` adjusted to machine capacity)
```
--------------------------------
### GPU Acceleration for DocTR Predictors
Source: https://context7.com/mindee/doctr/llms.txt
Demonstrates how to run DocTR predictors on GPU (NVIDIA CUDA or Apple Silicon MPS) for faster inference. Includes checks for device availability and optional half-precision.
```python
import torch
from doctr.models import ocr_predictor
# Check available devices
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")
# NVIDIA GPU
if torch.cuda.is_available():
device = torch.device('cuda')
predictor = ocr_predictor(pretrained=True).to(device)
# Or shorthand: predictor = ocr_predictor(pretrained=True).cuda()
# Apple Silicon (MPS)
elif torch.backends.mps.is_available():
device = torch.device('mps')
predictor = ocr_predictor(pretrained=True).to(device)
# CPU fallback
else:
device = torch.device('cpu')
predictor = ocr_predictor(pretrained=True)
# Enable half-precision for faster inference (GPU only)
if device.type in ('cuda', 'mps'):
predictor = predictor.half()
# Process documents
from doctr.io import DocumentFile
pages = DocumentFile.from_pdf("document.pdf")
result = predictor(pages)
```
--------------------------------
### Create Custom Data Augmentation Pipeline
Source: https://context7.com/mindee/doctr/llms.txt
Builds a custom data augmentation pipeline using docTR's transform modules for training. Includes image-only transforms and transforms that adjust bounding boxes.
```python
import numpy as np
import torch
from doctr.transforms import (
SampleCompose, ImageTransform, ColorInversion,
RandomRotate, RandomCrop, OneOf, RandomApply
)
# Create a transform pipeline for training
train_transforms = SampleCompose([
# Image-only transforms wrapped for (image, target) compatibility
ImageTransform(ColorInversion(min_val=0.6)),
# Random rotation with bounding box adjustment
RandomRotate(max_angle=10, expand=False),
# Random crop with box clipping
RandomCrop(scale=(0.5, 1.0), ratio=(0.75, 1.33)),
])
# Apply transforms to image and bounding boxes
image = torch.rand(3, 256, 256)
boxes = np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.9]]) # Relative coordinates
augmented_image, augmented_boxes = train_transforms(image, boxes)
```
--------------------------------
### Load CORD Dataset for Recognition Task
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst
Loads the CORD dataset for a recognition task. Use `use_polygons=True` to crop rotated boxes (always regular).
```python
from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]
```
--------------------------------
### Enable Half-Precision Inference (PyTorch GPU)
Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst
Use this snippet to enable half-precision (FP16) inference for PyTorch models on GPU devices. This reduces memory usage and speeds up inference.
```python3
import torch
predictor = ocr_predictor(
reco_arch="crnn_mobilenet_v3_small",
det_arch="linknet_resnet34",
pretrained=True
).cuda().half()
```
--------------------------------
### Access and Print Available Character Vocabularies
Source: https://context7.com/mindee/doctr/llms.txt
Retrieves and prints a list of available character vocabularies provided by docTR. Shows the first 20 available keys.
```python
from doctr.datasets.vocabs import VOCABS
# Available vocabularies
print(f"Available vocabularies: {list(VOCABS.keys())[:20]}...")
# Get vocabulary for specific languages
english_vocab = VOCABS['english']
french_vocab = VOCABS['french']
multilingual_vocab = VOCABS['multilingual']
# Language-specific vocabularies
print(f"English vocab length: {len(english_vocab)}")
print(f"French vocab length: {len(french_vocab)}")
print(f"Multilingual vocab length: {len(multilingual_vocab)}")
```