### Start DocTR API Server Source: https://github.com/mindee/doctr/blob/main/api/README.md Clone the repository, navigate to the API directory, and run the make command to start the FastAPI server. Ensure Docker is installed and running. ```shell git clone https://github.com/mindee/doctr.git cd doctr/api make run ``` -------------------------------- ### Install Documentation Environment Source: https://github.com/mindee/doctr/blob/main/docs/README.md Installs the development environment and documentation dependencies. Ensure you are at the repository root before running. ```bash # Make sure you are at the root of the repository before executing these commands python -m pip install --upgrade pip pip install -e .[viz,html] pip install -e .[docs] ``` -------------------------------- ### Install Dependencies and Run Streamlit Demo Source: https://github.com/mindee/doctr/blob/main/demo/README.md Navigate to the demo directory, install Python dependencies, and run the Streamlit application. ```bash cd demo pip install -r pt-requirements.txt streamlit run app.py ``` -------------------------------- ### Install docTR Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md Install docTR and its requirements using pip. ```shell pip install -e . --upgrade pip install -r references/requirements.txt ``` -------------------------------- ### Install Demo Dependencies Source: https://github.com/mindee/doctr/blob/main/README.md Install the required dependencies for the docTR demo application. This includes libraries like Streamlit. ```shell pip install -r demo/pt-requirements.txt ``` -------------------------------- ### Install Contrib Module Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_contrib_modules.rst To use the contrib module, install the `onnxruntime` package. You can install it using pip. ```bash pip install python-doctr[contrib] # Or pip install onnxruntime # pip install onnxruntime-gpu ``` -------------------------------- ### Install API Dependencies Source: https://github.com/mindee/doctr/blob/main/README.md Install the necessary dependencies to run the docTR API template locally. This includes Poetry for dependency management. ```shell cd api/ pip install poetry make lock pip install -r requirements.txt ``` -------------------------------- ### Install docTR in Developer Mode Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md Installs docTR with all development dependencies and sets up pre-commit hooks. Ensure pip is up-to-date before running. ```shell python -m pip install --upgrade pip pip install -e '.[dev]' pre-commit install ``` -------------------------------- ### Install Doctr in Developer Mode with Dependencies Source: https://github.com/mindee/doctr/blob/main/README.md Installs the Doctr package in developer mode from source, including all optional dependencies. ```shell pip install -e doctr/. ``` -------------------------------- ### Install Python Doctr with Optional Dependencies Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst Install python-doctr with additional packages for visualization, HTML support, and the contrib module. This is useful for extended functionality. ```bash pip install "python-doctr[viz,html,contrib]" ``` -------------------------------- ### Run Document Analysis Script Source: https://github.com/mindee/doctr/blob/main/README.md Execute the example script for analyzing a PDF or image file. Use --help to see all available arguments. ```shell python scripts/analyze.py path/to/your/doc.pdf ``` -------------------------------- ### Clone and Install Python Doctr in Developer Mode Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst Clone the Doctr repository from GitHub and install it in editable mode using pip. This is recommended for development. ```bash git clone https://github.com/mindee/doctr.git ``` ```bash pip install -e doctr/. ``` -------------------------------- ### Basic Training Command Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md Start training a text detection model using the `db_resnet50` architecture. Specify paths to your training and validation datasets and the number of epochs. ```shell python references/detection/train.py db_resnet50 --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 ``` -------------------------------- ### Multi-GPU Training with torchrun Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md Configure and launch training across multiple GPUs using `torchrun`. This example specifies two processes per node and uses the `nccl` backend for distributed data parallelism. ```shell CUDA_VISIBLE_DEVICES=0,2 \ torchrun --nproc_per_node=2 references/detection/train.py \ db_resnet50 \ --train_path path/to/train \ --val_path path/to/val \ --epochs 5 \ --backend nccl ``` -------------------------------- ### Install Python Doctr Stable Release Source: https://github.com/mindee/doctr/blob/main/docs/source/getting_started/installing.rst Use this command to install the latest stable version of the python-doctr package via pip. ```bash pip install python-doctr ``` -------------------------------- ### Run Local API Server Source: https://github.com/mindee/doctr/blob/main/README.md Run the docTR API locally using uvicorn. This command starts a development server with hot-reloading. ```shell uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app ``` -------------------------------- ### Install Custom Font on Linux Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md Install a custom TrueType font (.ttf) on a Linux system by copying it to the system fonts directory and rebuilding the font cache. ```shell sudo cp custom-font.ttf /usr/local/share/fonts/ fc-cache -f -v ``` -------------------------------- ### Train Text Recognition Model (Single GPU) Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md Start training a text recognition model using the provided script. Specify the model architecture, training/validation paths, and number of epochs. ```shell python references/recognition/train.py crnn_vgg16_bn --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 ``` -------------------------------- ### Install OnnxTR for ONNX Model Inference Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst Install the OnnxTR package, which provides a lightweight way to perform inference with ONNX exported models using ONNXRuntime, without requiring PyTorch or TensorFlow. ```shell pip install onnxtr[cpu] ``` -------------------------------- ### Load Documents from Various Sources Source: https://github.com/mindee/doctr/blob/main/README.md Read documents from PDF files, single or multiple images, or URLs. For URLs, ensure `weasyprint` is installed. ```python from doctr.io import DocumentFile # PDF pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Image single_img_doc = DocumentFile.from_images("path/to/your/img.jpg") # Webpage (requires `weasyprint` to be installed) webpage_doc = DocumentFile.from_url("https://www.yoursite.com") # Multiple page images multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"]) ``` -------------------------------- ### Using DocTR Recognition Predictor Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Instantiate a recognition model using `recognition_predictor` and pass a dummy image to get OCR output. Ensure numpy is imported for image generation. ```python3 import numpy as np from doctr.models import recognition_predictor model = recognition_predictor('crnn_vgg16_bn') dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8) out = model([dummy_img]) ``` -------------------------------- ### Run API Server with Docker Compose Source: https://github.com/mindee/doctr/blob/main/README.md Run the docTR API server using Docker Compose. This command builds the image and starts the container in detached mode. ```shell PORT=8002 docker-compose up -d --build ``` -------------------------------- ### Train Character Classification Model Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md Start training a character classification model in PyTorch. Specify the model architecture, number of epochs, and device. ```shell python references/classification/train_character.py mobilenet_v3_large --epochs 5 --device 0 ``` -------------------------------- ### Example OCR JSON Response Source: https://github.com/mindee/doctr/blob/main/api/README.md This is an example of the JSON output you can expect after a successful OCR request. It details the detected image name, orientation, language, dimensions, and extracted text items with their respective geometries and confidence scores. ```json [ { "name": "117319856-fc35bf00-ae8b-11eb-9b51-ca5aba673466.jpg", "orientation": { "value": 0, "confidence": null }, "language": { "value": null, "confidence": null }, "dimensions": [2339, 1654], "items": [ { "blocks": [ { "geometry": [ 0.7471996155154171, 0.1787109375, 0.9101580212741838, 0.2080078125 ], "objectness_score": 0.5, "lines": [ { "geometry": [ 0.7471996155154171, 0.1787109375, 0.9101580212741838, 0.2080078125 ], "objectness_score": 0.5, "words": [ { "value": "Hello", "geometry": [ 0.7471996155154171, 0.1796875, 0.8272978149561669, 0.20703125 ], "objectness_score": 0.5, "confidence": 1.0, "crop_orientation": {"value": 0, "confidence": null} }, { "value": "world!", "geometry": [ 0.8176307908857315, 0.1787109375, 0.9101580212741838, 0.2080078125 ], "objectness_score": 0.5, "confidence": 1.0, "crop_orientation": {"value": 0, "confidence": null} } ] } ] } ] } ] } ] ``` -------------------------------- ### Run GPU-enabled Docker Container Source: https://github.com/mindee/doctr/blob/main/README.md Run a docTR Docker container with GPU support enabled. Ensure your host system has NVIDIA Container Toolkit installed and configured. ```shell docker run -it --gpus all ghcr.io/mindee/doctr:torch-py3.9.18-2024-10 bash ``` -------------------------------- ### Push Trained Model to Huggingface Hub Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst Push a trained recognition model to the Huggingface Hub. Requires Huggingface account and Git LFS installation. Existing repositories will not be overwritten. ```python3 from doctr.models import recognition, login_to_hub, push_to_hf_hub login_to_hub() my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True) push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large') ``` -------------------------------- ### Perform OCR via REST API using curl Source: https://context7.com/mindee/doctr/llms.txt Provides a `curl` command example for performing OCR on a document via the docTR REST API. Specifies detection and recognition architectures. ```bash # Using curl: # curl -X POST "http://localhost:8080/ocr/" \ # -F "files=@document.pdf" \ # -F "det_arch=db_resnet50" \ # -F "reco_arch=crnn_vgg16_bn" ``` -------------------------------- ### Display Training Help Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md View all available command-line arguments and options for the training script by using the --help flag. ```shell python references/recognition/train.py --help ``` -------------------------------- ### Display Training Help Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md View all available command-line options for the training script to customize your training process. ```python python references/detection/train.py --help ``` -------------------------------- ### Build Documentation Locally Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md Builds the project documentation locally using Sphinx. Modified files are rebuilt by default. To force a complete rebuild, delete the `_build` directory. ```shell make docs-single-version ``` -------------------------------- ### Display OCR Result Visualization Source: https://github.com/mindee/doctr/blob/main/README.md Displays the visualization of the OCR result. Requires matplotlib and mplcursors to be installed. ```python result.show() ``` -------------------------------- ### Build Custom Docker Image Source: https://github.com/mindee/doctr/blob/main/README.md Build a custom docTR Docker image with specified framework, Python version, and docTR version using build arguments. ```shell docker build -t doctr --build-arg FRAMEWORK=torch --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 . ``` -------------------------------- ### Initialize and Use KIE Predictor Source: https://github.com/mindee/doctr/blob/main/README.md Initializes a KIE predictor with specified detection and recognition architectures, analyzes a PDF document, and prints predictions for each detected class. ```python from doctr.io import DocumentFile from doctr.models import kie_predictor # Model model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) # PDF doc = DocumentFile.from_pdf("path/to/your/doc.pdf") # Analyze result = model(doc) predictions = result.pages[0].predictions for class_name in predictions.keys(): list_predictions = predictions[class_name] for prediction in list_predictions: print(f"Prediction for {class_name}: {prediction}") ``` -------------------------------- ### Sample hOCR XML Output Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst This is an example of the XML byte string output generated by the `export_as_xml` method, representing OCR results in hOCR format. ```xml docTR - hOCR

Hello XML World

``` -------------------------------- ### Run Local Demo App Source: https://github.com/mindee/doctr/blob/main/README.md Run the docTR demo application locally using Streamlit. The app will open in your default browser. ```shell streamlit run demo/app.py ``` -------------------------------- ### Recognition Predictor Usage Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Demonstrates how to initialize and use a recognition predictor with a specified model. ```APIDOC ## Recognition Predictor Usage ### Description This section shows how to instantiate a recognition predictor and use it for OCR tasks. ### Method Instantiate `recognition_predictor` with a model name. ### Endpoint N/A (Local library usage) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python import numpy as np from doctr.models import recognition_predictor model = recognition_predictor('crnn_vgg16_bn') dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8) out = model([dummy_img]) ``` ### Response #### Success Response (200) Output of the recognition model. #### Response Example ```json { "example": "[OCR Output]" } ``` ``` -------------------------------- ### Accessing Model Vocabulary in DocTR Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Use `recognition_predictor` to get a model and then access its configuration to print the vocabulary. This is useful for understanding the character set a model was trained on. ```python3 from doctr.models import recognition_predictor predictor = recognition_predictor('crnn_vgg16_bn') print(predictor.model.cfg['vocab']) ``` -------------------------------- ### Orientation Classification Help Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md View advanced options for training the orientation classification model. ```shell python references/classification/train_orientation.py --help ``` -------------------------------- ### Build Docker Image Locally Source: https://github.com/mindee/doctr/blob/main/README.md Build a docTR Docker image locally. This command creates an image tagged as 'doctr'. ```shell docker build -t doctr . ``` -------------------------------- ### Push Model via Command Line Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst Push a trained model to the Huggingface Hub using the command line interface. This command assumes you are in the doctr repository and have a trained model script. ```bash python3 ~/doctr/references/recognition/train.py crnn_mobilenet_v3_large --name doctr-crnn-mobilenet-v3-large --push-to-hub ``` -------------------------------- ### Initialize OCR Predictor Model Source: https://github.com/mindee/doctr/blob/main/README.md Instantiate an OCR predictor with specified text detection and recognition architectures. Ensure `pretrained=True` to load weights. ```python from doctr.models import ocr_predictor model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True) ``` -------------------------------- ### Train Orientation Classification Model Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md Start training an orientation classification model in PyTorch. Specify the model architecture, type (page or crop), training/validation paths, and epochs. ```shell python references/classification/train_orientation.py resnet18 --type page --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 ``` -------------------------------- ### OCR Predictor Document Structure Example Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Illustrates the nested structure of the Document object returned by the OCR predictor, including pages, blocks, lines, and words with their properties. ```python Document( (pages): [Page( dimensions=(340, 600) (blocks): [Block( (lines): [Line( (words): [ Word(value='No.', confidence=0.91), Word(value='RECEIPT', confidence=0.99), Word(value='DATE', confidence=0.96), ] )] (artefacts): [] )] )] ) ``` -------------------------------- ### Load CORD Dataset (Straight and Rotated Boxes) Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst Load the CORD dataset for training. Use `use_polygons=True` to load rotated boxes. ```python from doctr.datasets import CORD # Load straight boxes train_set = CORD(train=True, download=True) # Load rotated boxes train_set = CORD(train=True, download=True, use_polygons=True) img, target = train_set[0] ``` -------------------------------- ### Hugging Face Hub Integration for DocTR Models Source: https://context7.com/mindee/doctr/llms.txt Shows how to load pre-trained models from Hugging Face Hub and push custom-trained models to the Hub for sharing. Requires authentication with a Hugging Face token. ```python from doctr.models import from_hub, push_to_hf_hub, login_to_hub from doctr.models import ocr_predictor from doctr.models.recognition import crnn_mobilenet_v3_small # Load a pre-trained model from Hugging Face Hub custom_reco_model = from_hub("mindee/crnn_vgg16_bn") # Use custom model in OCR pipeline predictor = ocr_predictor( det_arch='db_resnet50', reco_arch=custom_reco_model, # Use loaded model directly pretrained=True ) # Push your own model to Hugging Face Hub login_to_hub() # Authenticate with HF token # Train or load your model my_model = crnn_mobilenet_v3_small(pretrained=True) # Push to hub push_to_hf_hub( model=my_model, model_name='my-username/my-custom-crnn', task='recognition', arch='crnn_mobilenet_v3_small' ) ``` -------------------------------- ### Load CORD Dataset for Text Detection (Python) Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst Load the CORD dataset for text detection. Use `use_polygons=True` to load rotated boxes instead of straight boxes. ```python from doctr.datasets import CORD # Load straight boxes train_set = CORD(train=True, download=True, detection_task=True) # Load rotated boxes train_set = CORD(train=True, download=True, use_polygons=True, detection_task=True) img, target = train_set[0] ``` -------------------------------- ### Initialize OCR Predictor with Custom Detection Thresholds Source: https://context7.com/mindee/doctr/llms.txt Initializes an OCR predictor and then fine-tunes the detection model's post-processing parameters, specifically adjusting the binarization and box thresholds. ```python from doctr.models import ocr_predictor # Create predictor predictor = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) # Adjust binarization and box thresholds predictor.det_predictor.model.postprocessor.bin_thresh = 0.5 # Higher = less noise predictor.det_predictor.model.postprocessor.box_thresh = 0.2 # Lower = more detections ``` -------------------------------- ### Send Request to OCR Route Source: https://github.com/mindee/doctr/blob/main/README.md Example Python script using the requests library to send a document file to the docTR API's OCR route. Supports PDF, JPEG, and PNG. ```python import requests params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"} with open('/path/to/your/doc.jpg', 'rb') as f: files = [ # application/pdf, image/jpeg, image/png supported ("files", ("doc.jpg", f.read(), "image/jpeg")), ] print(requests.post("http://localhost:8080/ocr", params=params, files=files).json()) ``` -------------------------------- ### Initialize Recognition Predictor with Default Vocab Source: https://context7.com/mindee/doctr/llms.txt Initializes a recognition predictor using a specified architecture and pre-trained weights. Note that pre-trained models use language-specific vocabularies by default. ```python # Use custom vocabulary with recognition model from doctr.models import recognition_predictor # Create predictor for specific language french_predictor = recognition_predictor( arch='crnn_vgg16_bn', pretrained=True ) # Note: Pre-trained models use 'french' vocab by default ``` -------------------------------- ### Export OCR Result as XML Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Use the `export_as_xml` method to get the OCR output in hOCR format. This method returns a list of tuples, where each tuple contains the XML byte string and the corresponding XML element. ```python xml_output = result.export_as_xml() for output in xml_output: xml_bytes_string = output[0] xml_element = output[1] ``` -------------------------------- ### Load Custom Detection, Recognition, and OCR Datasets Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst Load custom datasets for detection, recognition, or OCR tasks by specifying image and label paths. The format of labels depends on the dataset type. ```python from doctr.datasets import DetectionDataset, RecognitionDataset, OCRDataset # Load a detection dataset train_set = DetectionDataset(img_folder="/path/to/images", label_path="/path/to/labels.json") # Load a recognition Dataset train_set = RecognitionDataset(img_folder="/path/to/images", labels_path="/path/to/labels.json") # Load an OCR dataset which contains annotations for the boxes and labels train_set = OCRDataset(img_folder="/path/to/images", label_file="/path/to/labels.json") img, target = train_set[0] ``` -------------------------------- ### Configure OCR Predictor Batch Sizes Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Adjust the batch sizes for the detection and recognition models to optimize performance based on your hardware. This example sets detection batch size to 4 and recognition batch size to 1024. ```python3 from doctr.models import ocr_predictor model = ocr_predictor(pretrained=True, det_bs=4, reco_bs=1024) ``` -------------------------------- ### Instantiate Detection Predictor Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Instantiates a detection predictor with specified parameters. Use `pretrained=True` to load pre-trained weights. `assume_straight_pages` and `preserve_aspect_ratio` can be set for specific document types. ```python import numpy as np from doctr.models import detection_predictor model = detection_predictor('db_resnet50') dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8) out = model([dummy_img]) ``` ```python from doctr.models import detection_predictor predictor = detection_predictor('db_resnet50', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True) ``` -------------------------------- ### Run OCR Predictor on Apple Silicon (MPS) Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Instantiate an OCR predictor and move it to an Apple Silicon (MPS) GPU for accelerated inference. Falls back to CPU if MPS is not available. ```python import torch from doctr.models import ocr_predictor # For Apple Silicon (MPS) device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu') predictor = ocr_predictor(pretrained=True).to(device) ``` -------------------------------- ### Apply Probabilistic Transforms with RandomApply Source: https://context7.com/mindee/doctr/llms.txt Applies a transform with a specified probability using `RandomApply`. This allows for conditional data augmentation. ```python # Use RandomApply for probabilistic transforms maybe_invert = RandomApply(ColorInversion(), p=0.5) result = maybe_invert(image) ``` -------------------------------- ### Run All Style Checks Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md Executes all code style checks to verify adherence to project formatting and style guidelines. ```shell make style ``` -------------------------------- ### Load Pretrained Models from Huggingface Hub Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/sharing_models.rst Load custom detection and recognition models from the Huggingface Hub and integrate them into an OCR predictor. Ensure the necessary doctr modules are imported. ```python3 from doctr.io import DocumentFile from doctr.models import ocr_predictor, from_hub image = DocumentFile.from_images(['data/example.jpg']) # Load a custom detection model from huggingface hub det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large') # Load a custom recognition model from huggingface hub reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french') # You can easily plug in this models to the OCR predictor predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model) result = predictor(image) ``` -------------------------------- ### Train Text Recognition Model (Multi-GPU with torchrun) Source: https://github.com/mindee/doctr/blob/main/references/recognition/README.md Utilize torchrun for distributed data parallel training across multiple GPUs. Configure the number of processes per node and the communication backend. ```shell CUDA_VISIBLE_DEVICES=0,2 \ torchrun --nproc_per_node=2 references/recognition/train.py \ crnn_vgg16_bn \ --train_path path/to/train \ --val_path path/to/val \ --epochs 5 \ --backend nccl ``` -------------------------------- ### Document Structure Navigation and Export Source: https://context7.com/mindee/doctr/llms.txt Demonstrates how to navigate the hierarchical structure of DocTR's `Document` objects (Page, Block, Line, Word) and export the results as plain text or a structured JSON dictionary. ```python from doctr.models import ocr_predictor from doctr.io import DocumentFile predictor = ocr_predictor(pretrained=True) pages = DocumentFile.from_pdf("document.pdf") result = predictor(pages) # Navigate document structure for page in result.pages: print(f"Page dimensions: {page.dimensions}") print(f"Page orientation: {page.orientation}") print(f"Page language: {page.language}") for block in page.blocks: print(f" Block geometry: {block.geometry}") for line in block.lines: print(f" Line: {line.render()}") for word in line.words: print(f" Word: '{word.value}' " f"(confidence: {word.confidence:.2f}, " f"geometry: {word.geometry})") # Export as plain text plain_text = result.render() # Export as structured dictionary (JSON-compatible) json_dict = result.export() # Structure: {'pages': [{'page_idx': 0, 'dimensions': (h, w), # 'blocks': [{'geometry': ..., 'lines': [...]}]}]} ``` -------------------------------- ### Character Classification Help Source: https://github.com/mindee/doctr/blob/main/references/classification/README.md View advanced options for training the character classification model. ```shell python references/classification/train_character.py --help ``` -------------------------------- ### Run All Code Quality Checks Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md Executes all code quality checks, including style verification and other tests. This command ensures the codebase adheres to project standards. ```shell make quality ``` -------------------------------- ### Load Model with Customized Preprocessor in docTR Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst Instantiate custom detection and recognition predictors with customized `PreProcessor` objects. This allows fine-tuning parameters like input size, batch size, mean, and standard deviation for each stage. Finally, combine these predictors into an `OCRPredictor`. ```python3 import torch from doctr.models.predictor import OCRPredictor from doctr.models.detection.predictor import DetectionPredictor from doctr.models.recognition.predictor import RecognitionPredictor from doctr.models.preprocessor import PreProcessor from doctr.models import db_resnet50, crnn_vgg16_bn det_model = db_resnet50(pretrained=False, pretrained_backbone=False) det_model.from_pretrained('') reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False) reco_model.from_pretrained('') det_predictor = DetectionPredictor( PreProcessor( (1024, 1024), batch_size=1, mean=(0.798, 0.785, 0.772), std=(0.264, 0.2749, 0.287) ), det_model ) reco_predictor = RecognitionPredictor( PreProcessor( (32, 128), preserve_aspect_ratio=True, batch_size=32, mean=(0.694, 0.695, 0.693), std=(0.299, 0.296, 0.301) ), reco_model ) predictor = OCRPredictor(det_predictor, reco_predictor) ``` -------------------------------- ### Load Custom Detection and Recognition Models in docTR Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst Load both custom detection and recognition models by specifying their respective .pt file paths. Both `pretrained` and `pretrained_backbone` should be False for each model. The `pretrained` argument for `ocr_predictor` should be set to False if custom models are loaded. ```python3 import torch from doctr.models import ocr_predictor, db_resnet50, crnn_vgg16_bn # Load custom detection and recognition model det_model = db_resnet50(pretrained=False, pretrained_backbone=False) det_model.from_pretrained('', map_location="cpu") reco_model = crnn_vgg16_bn(pretrained=False, pretrained_backbone=False) reco_model.from_pretrained('', map_location="cpu") predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model, pretrained=False) ``` -------------------------------- ### Build and Upload Conda Package Source: https://github.com/mindee/doctr/wiki/Home Commands to build a conda package and upload it to anaconda. Ensure BUILD_VERSION is set correctly before building. ```bash conda build purge conda build purge-all rm -rf conda-dist/ BUILD_VERSION='X.Y.Z' python setup.py sdist mkdir conda-dist conda-build ./conda/ -c pytorch --output-folder conda-dist ls -l conda-dist/noarch/*conda anaconda upload conda-dist/noarch/*conda -u mindee ``` -------------------------------- ### Create Recognition Predictor Source: https://context7.com/mindee/doctr/llms.txt Initializes a recognition predictor with a specified architecture and batch size. Accesses the model's vocabulary and processes cropped text images. ```python import numpy as np from doctr.models import recognition_predictor reco_model = recognition_predictor( arch='crnn_vgg16_bn', # Architecture: crnn_vgg16_bn, sar_resnet31, parseq, vitstr_small, etc. pretrained=True, batch_size=128 ) # Access the model's vocabulary vocab = reco_model.model.cfg['vocab'] print(f"Model vocabulary: {vocab[:50]}...") # Process cropped text images (expected shape: height~32, variable width) crop1 = (255 * np.random.rand(32, 128, 3)).astype(np.uint8) crop2 = (255 * np.random.rand(32, 200, 3)).astype(np.uint8) results = reco_model([crop1, crop2]) # Results contain (text, confidence) tuples for text, confidence in results: print(f"Recognized: '{text}' (confidence: {confidence:.2%})") ``` -------------------------------- ### Load CORD Dataset with DataLoader Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst Load the CORD dataset and wrap it with a DataLoader for batch processing. Create an iterator to fetch batches of images and targets. ```python from doctr.datasets import CORD, DataLoader train_set = CORD(train=True, download=True) train_loader = DataLoader(train_set, batch_size=32) train_iter = iter(train_loader) images, targets = next(train_iter) ``` -------------------------------- ### Run Unit Tests Locally Source: https://github.com/mindee/doctr/blob/main/CONTRIBUTING.md Executes the project's unit tests to ensure code correctness. This command mirrors the tests run in CI workflows. ```shell make test ``` -------------------------------- ### Reconstitution Utilities Source: https://github.com/mindee/doctr/blob/main/docs/source/modules/utils.rst Functions for reconstituting pages from model outputs. ```APIDOC ## synthesize_page ### Description Function to synthesize a page from model outputs. ### Method (Not specified, likely a Python function call) ### Endpoint (Not applicable, this is a utility function) ### Parameters (Not specified in the provided text) ### Request Example (Not applicable) ### Response (Not specified in the provided text) ``` -------------------------------- ### Data Format for Multiple Classes Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md For multi-class training, the `polygons` field in `labels.json` should be a dictionary mapping class names to their respective polygons. ```json { "sample_img_01.png": { 'img_dimensions': (900, 600), 'img_hash': "theimagedumpmyhash", 'polygons': { "class_name_1": [[[x10, y10], [x20, y20], [x30, y30], [x40, y40]], ...], "class_name_2": [[[x11, y11], [x21, y21], [x31, y31], [x41, y41]], ...] } }, "sample_img_02.png": { 'img_dimensions': (900, 600), 'img_hash': "thisisahash", 'polygons': { "class_name_1": [[[x12, y12], [x22, y22], [x32, y32], [x42, y42]], ...], "class_name_2": [[[x13, y13], [x23, y23], [x33, y33], [x43, y43]], ...] } }, ... } ``` -------------------------------- ### Data Format for Single Class Source: https://github.com/mindee/doctr/blob/main/references/detection/README.md The `labels.json` file should map image filenames to their dimensions, SHA256 hash, and polygons. Polygons are lists of [x, y] coordinates. ```json { "sample_img_01.png" = { 'img_dimensions': (900, 600), 'img_hash': "theimagedumpmyhash", 'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...] }, "sample_img_02.png" = { 'img_dimensions': (900, 600), 'img_hash': "thisisahash", 'polygons': [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], ...] } ... } ``` -------------------------------- ### Compile PyTorch Models with torch.compile Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst Compile PyTorch models using `torch.compile` to optimize them for faster inference and reduced memory overhead. Note that the `master` recognition architecture is not supported for compilation. ```python import torch from doctr.models import ( ocr_predictor, vitstr_small, fast_base, mobilenet_v3_small_crop_orientation, mobilenet_v3_small_page_orientation, crop_orientation_predictor, page_orientation_predictor ) # Compile the models detection_model = torch.compile( fast_base(pretrained=True).eval() ) recognition_model = torch.compile( vitstr_small(pretrained=True).eval() ) crop_orientation_model = torch.compile( mobilenet_v3_small_crop_orientation(pretrained=True).eval() ) page_orientation_model = torch.compile( mobilenet_v3_small_page_orientation(pretrained=True).eval() ) predictor = models.ocr_predictor( detection_model, recognition_model, assume_straight_pages=False ) # NOTE: Only required for non-straight pages (`assume_straight_pages=False`) and non-disabled orientation classification # Set the orientation predictors predictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model) predictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model) compiled_out = predictor(doc) ``` -------------------------------- ### Load Custom KIE Detection Model in docTR Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/custom_models_training.rst Load a custom trained Key Information Extraction (KIE) detection model. Specify the path to the .pt file and define custom class names for the model. This model is then used to initialize a KIE predictor. ```python3 import torch from doctr.models import kie_predictor, db_resnet50 det_model = db_resnet50(pretrained=False, pretrained_backbone=False, class_names=['total', 'date']) det_model.from_pretrained('') kie_predictor(det_arch=det_model, reco_arch='crnn_vgg16_bn', pretrained=True) ``` -------------------------------- ### Accessing Model Vocabulary Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst Illustrates how to retrieve the vocabulary used by a specific recognition model. ```APIDOC ## Accessing Model Vocabulary ### Description This snippet demonstrates how to access and print the vocabulary associated with a recognition model. ### Method Instantiate `recognition_predictor` and access the `model.cfg['vocab']` attribute. ### Endpoint N/A (Local library usage) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python from doctr.models import recognition_predictor predictor = recognition_predictor('crnn_vgg16_bn') print(predictor.model.cfg['vocab']) ``` ### Response #### Success Response (200) Prints the vocabulary of the specified model. #### Response Example ``` ['a', 'b', 'c', ...] ``` ``` -------------------------------- ### Perform OCR via REST API using Python Requests Source: https://context7.com/mindee/doctr/llms.txt Demonstrates how to perform OCR on a document by sending a POST request to a deployed docTR REST API. Includes file upload and parameter configuration. ```python # Python client example using requests import requests # Perform OCR on a document url = "http://localhost:8080/ocr/" files = [ ("files", ("document.pdf", open("document.pdf", "rb"), "application/pdf")) ] params = { "det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn" } response = requests.post(url, files=files, params=params) results = response.json() # Process results for doc_result in results: print(f"Document: {doc_result['name']}") print(f"Orientation: {doc_result['orientation']}") print(f"Language: {doc_result['language']}") print(f"Dimensions: {doc_result['dimensions']}") for page in doc_result['items']: for block in page['blocks']: for line in block['lines']: for word in line['words']: print(f" {word['value']} (confidence: {word['confidence']})") ``` -------------------------------- ### Train Classification and Orientation Model Source: https://github.com/mindee/doctr/wiki/Home Training configuration for classification and orientation models. Adjust batch_size according to machine capacity. ```bash --epochs 40 - default config (`batch_size` adjusted to machine capacity) ``` -------------------------------- ### GPU Acceleration for DocTR Predictors Source: https://context7.com/mindee/doctr/llms.txt Demonstrates how to run DocTR predictors on GPU (NVIDIA CUDA or Apple Silicon MPS) for faster inference. Includes checks for device availability and optional half-precision. ```python import torch from doctr.models import ocr_predictor # Check available devices print(f"CUDA available: {torch.cuda.is_available()}") print(f"MPS available: {torch.backends.mps.is_available()}") # NVIDIA GPU if torch.cuda.is_available(): device = torch.device('cuda') predictor = ocr_predictor(pretrained=True).to(device) # Or shorthand: predictor = ocr_predictor(pretrained=True).cuda() # Apple Silicon (MPS) elif torch.backends.mps.is_available(): device = torch.device('mps') predictor = ocr_predictor(pretrained=True).to(device) # CPU fallback else: device = torch.device('cpu') predictor = ocr_predictor(pretrained=True) # Enable half-precision for faster inference (GPU only) if device.type in ('cuda', 'mps'): predictor = predictor.half() # Process documents from doctr.io import DocumentFile pages = DocumentFile.from_pdf("document.pdf") result = predictor(pages) ``` -------------------------------- ### Create Custom Data Augmentation Pipeline Source: https://context7.com/mindee/doctr/llms.txt Builds a custom data augmentation pipeline using docTR's transform modules for training. Includes image-only transforms and transforms that adjust bounding boxes. ```python import numpy as np import torch from doctr.transforms import ( SampleCompose, ImageTransform, ColorInversion, RandomRotate, RandomCrop, OneOf, RandomApply ) # Create a transform pipeline for training train_transforms = SampleCompose([ # Image-only transforms wrapped for (image, target) compatibility ImageTransform(ColorInversion(min_val=0.6)), # Random rotation with bounding box adjustment RandomRotate(max_angle=10, expand=False), # Random crop with box clipping RandomCrop(scale=(0.5, 1.0), ratio=(0.75, 1.33)), ]) # Apply transforms to image and bounding boxes image = torch.rand(3, 256, 256) boxes = np.array([[0.1, 0.1, 0.3, 0.3], [0.5, 0.5, 0.8, 0.9]]) # Relative coordinates augmented_image, augmented_boxes = train_transforms(image, boxes) ``` -------------------------------- ### Load CORD Dataset for Recognition Task Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_datasets.rst Loads the CORD dataset for a recognition task. Use `use_polygons=True` to crop rotated boxes (always regular). ```python from doctr.datasets import CORD # Crop boxes as is (can contain irregular) train_set = CORD(train=True, download=True, recognition_task=True) # Crop rotated boxes (always regular) train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True) img, target = train_set[0] ``` -------------------------------- ### Enable Half-Precision Inference (PyTorch GPU) Source: https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_model_export.rst Use this snippet to enable half-precision (FP16) inference for PyTorch models on GPU devices. This reduces memory usage and speeds up inference. ```python3 import torch predictor = ocr_predictor( reco_arch="crnn_mobilenet_v3_small", det_arch="linknet_resnet34", pretrained=True ).cuda().half() ``` -------------------------------- ### Access and Print Available Character Vocabularies Source: https://context7.com/mindee/doctr/llms.txt Retrieves and prints a list of available character vocabularies provided by docTR. Shows the first 20 available keys. ```python from doctr.datasets.vocabs import VOCABS # Available vocabularies print(f"Available vocabularies: {list(VOCABS.keys())[:20]}...") # Get vocabulary for specific languages english_vocab = VOCABS['english'] french_vocab = VOCABS['french'] multilingual_vocab = VOCABS['multilingual'] # Language-specific vocabularies print(f"English vocab length: {len(english_vocab)}") print(f"French vocab length: {len(french_vocab)}") print(f"Multilingual vocab length: {len(multilingual_vocab)}") ```