### Install DocClassifier from whl File Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Install the generated wheel file for docclassifier-docsaid. ```bash pip install dist/docclassifier_docsaid-*-py3-none-any.whl ``` -------------------------------- ### Install Wheel Package Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Install the wheel package, which is a build frontend for the bdist_wheel command. ```bash pip install wheel ``` -------------------------------- ### Install docclassifier-docsaid via PyPI Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Use this command to install the docclassifier-docsaid package from the Python Package Index. ```bash pip install docclassifier-docsaid ``` -------------------------------- ### Initialize and Use DocClassifier Source: https://context7.com/docsaidlab/docclassifier/llms.txt Demonstrates basic and advanced initialization of the DocClassifier, including loading an image, classifying it, and accessing model information. Ensure necessary libraries like OpenCV are installed. ```python import cv2 import numpy as np from docclassifier import DocClassifier, ModelType # Basic initialization with default settings classifier = DocClassifier() # Advanced initialization with custom parameters classifier = DocClassifier( model_type=ModelType.margin_based, # Model architecture type model_cfg='20240326', # Model configuration/version backend='cpu', # 'cpu' or 'cuda' gpu_id=0, # GPU device ID if using CUDA threshold=0.6, # Similarity threshold (default: 0.627) register_root='/path/to/custom/templates' # Custom registration folder ) # Load an image for classification img = cv2.imread('document.jpg') # Classify the document - returns (label, score) tuple most_similar, max_score = classifier(img) print(f'Document type: {most_similar}, Confidence: {max_score:.4f}') # Output: Document type: Taiwan driver's license front, Confidence: 0.6116 # If no match above threshold, returns (None, 0.0) if most_similar is None: print("No matching document type found above threshold") # List available model configurations available_models = classifier.list_models() print(f'Available models: {available_models}') # Output: Available models: ['20240326'] # Access the registered document bank print(f'Registered documents: {list(classifier.bank.keys())}') # Access current threshold print(f'Current threshold: {classifier.threshold}') ``` -------------------------------- ### Run DocClassifier Demo Script Source: https://github.com/docsaidlab/docclassifier/blob/main/demo/README.md Execute the `demo_ipcam.py` script from your terminal to start the DocClassifier web demo. This will generate a URL to access the demo in your web browser. ```bash python demo_ipcam.py ``` -------------------------------- ### Verify docclassifier Installation Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Run this Python command to check if the docclassifier library is installed correctly and print its version. ```python import docclassifier; print(docclassifier.__version__) ``` -------------------------------- ### Perform Document Classification Inference Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md This example demonstrates how to load an image, initialize the DocClassifier, and perform inference to find the most similar document. The default threshold may result in 'None' if similarity is low. ```python import cv2 from skimage import io from docclassifier import DocClassifier img = io.imread('https://github.com/DocsaidLab/DocClassifier/blob/main/docs/test_driver.jpg?raw=true') img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) model = DocClassifier() most_similar, max_score = model(img) print(f'most_similar: {most_similar}, max_score: {max_score:.4f}') ``` -------------------------------- ### IP Camera Demo Integration with DocAligner and DocClassifier Source: https://context7.com/docsaidlab/docclassifier/llms.txt Integrate DocClassifier with IP camera feeds for real-time document processing. This example uses DocAligner for document detection and alignment, and Capybara for visualization. ```python import capybara as cb from docaligner import DocAligner # Separate package for document alignment from docclassifier import DocClassifier # IP camera address (requires IP Camera mobile app) IPADDR = '192.168.0.179' # Initialize models aligner = DocAligner() # Detects and aligns documents in frame classifier = DocClassifier(threshold=0.5) def process_frame(img): """Process a single frame from IP camera.""" # Detect and align document in the frame doc_info = aligner(img) if doc_info.has_doc_polygon: # Draw detected document boundary img = cb.draw_polygon(img, doc_info.doc_polygon, (0, 255, 0), 3) # Classify the aligned document image max_sim, max_score = classifier(doc_info.doc_flat_img) # Overlay classification result img = cb.draw_text( img, f'{max_sim} {max_score:.2f}', (30, 30), (0, 255, 0), 36 ) return img # Start web demo server # Access via browser at workstation IP:5001 demo = cb.WebDemo(IPADDR, pipelines=[process_frame]) demo.run() ``` -------------------------------- ### Model Training Configuration Source: https://context7.com/docsaidlab/docclassifier/llms.txt Example YAML configuration for training custom document classification models using PyTorch Lightning. Specifies model architecture, loss function, optimizer, and learning rate scheduler. ```yaml # Example configuration: model/config/lcnet050_cosface_f256_r128_squeeze_lbn_imagenet.yaml global_settings: image_size: [128, 128] common: use_imagenet: true # Use ImageNet-1K for training data diversity use_clip: true # Align features with CLIP for stability preview_batch: 100 model: backbone: name: Backbone options: name: lcnet_050 # PP-LCNet 0.5x as feature extractor pretrained: true head: name: FeatureLearningSqueezeLBNHead # Combined LayerNorm + BatchNorm options: in_dim: 256 embed_dim: 256 feature_map_size: 4 squeeze_ratio: 0.25 loss: name: CosFace # Margin-based loss function embed_dim: 256 options: s: 64.0 # Scale factor m: 0.4 # Margin parameter optimizer: name: AdamW options: lr: 0.001 weight_decay: 0.05 lr_scheduler: name: CosineAnnealingWarmRestarts options: T_0: 10 T_mult: 2 ``` -------------------------------- ### Adjust Threshold for Document Classification Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Modify the threshold parameter when initializing DocClassifier to potentially get a classification result even for images with lower similarity to registered documents. ```python model = DocClassifier( threshold=0.6 ) # 重新進行推論 most_similar, max_score = model(img) print(f'most_similar: {most_similar}, max_score: {max_score:.4f}') ``` -------------------------------- ### Build DocClassifier Wheel Package Source: https://github.com/docsaidlab/docclassifier/blob/main/README.md Build the wheel file for the DocClassifier package after cloning the repository. ```bash pip install wheel cd DocClassifier python setup.py bdist_wheel ``` -------------------------------- ### Build whl File for DocClassifier Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Navigate to the DocClassifier directory and build the wheel file for the package. ```bash cd DocClassifier python setup.py bdist_wheel ``` -------------------------------- ### Initialize DocClassifier with Custom Registration Source: https://context7.com/docsaidlab/docclassifier/llms.txt Initialize DocClassifier with a custom registration folder and threshold. Labels are automatically generated from the folder structure. ```python from docclassifier import DocClassifier # Initialize with custom registration folder classifier = DocClassifier( register_root='/path/to/register', threshold=0.5 # Lower threshold for more permissive matching ) # Labels are generated from folder structure: # - "id_card_front", "id_card_back", "passport_template", "drivers_license_template" print(f'Registered labels: {list(classifier.bank.keys())}') # Reload registration from a different folder new_bank = classifier.get_register('/path/to/another/register') print(f'New bank contains {len(new_bank)} templates') # Supported image formats: .jpg, .jpeg, .png # Images are automatically resized to 128x128 for feature extraction ``` -------------------------------- ### DocClassifier Initialization and Classification Source: https://context7.com/docsaidlab/docclassifier/llms.txt Demonstrates how to initialize the DocClassifier with default or custom parameters and perform document classification. ```APIDOC ## DocClassifier Class ### Description The main interface for document classification that wraps the underlying inference model. It provides methods for classifying document images, extracting features, and managing registered document templates. The classifier automatically downloads required models on first use. ### Method `__call__` (or `classify` implicitly) ### Endpoint N/A (Python Class Method) ### Parameters #### Initialization Parameters - **model_type** (ModelType) - Optional - Model architecture type (e.g., `ModelType.margin_based`). - **model_cfg** (string) - Optional - Model configuration/version (e.g., '20240326'). - **backend** (string) - Optional - Inference backend ('cpu' or 'cuda'). Defaults to 'cpu'. - **gpu_id** (int) - Optional - GPU device ID if using CUDA. - **threshold** (float) - Optional - Similarity threshold for classification. Defaults to 0.627. - **register_root** (string) - Optional - Path to a custom registration folder for document templates. #### Classification Input (when calling the classifier instance) - **img** (numpy.ndarray) - Required - The document image loaded using a library like OpenCV. ### Request Example ```python import cv2 from docclassifier import DocClassifier, ModelType # Basic initialization classifier = DocClassifier() # Advanced initialization classifier = DocClassifier( model_type=ModelType.margin_based, model_cfg='20240326', backend='cpu', threshold=0.6, register_root='/path/to/custom/templates' ) # Load an image img = cv2.imread('document.jpg') # Classify the document most_similar, max_score = classifier(img) print(f'Document type: {most_similar}, Confidence: {max_score:.4f}') if most_similar is None: print("No matching document type found above threshold") ``` ### Response #### Success Response - **most_similar** (string or None) - The label of the most similar document type found, or None if no match above the threshold. - **max_score** (float) - The confidence score of the best match, ranging from 0.0 to 1.0. #### Response Example ```json { "most_similar": "Taiwan driver's license front", "max_score": 0.6116 } ``` #### Error Response Example (if no match) ```json { "most_similar": null, "max_score": 0.0 } ``` ### Additional Methods - **list_models()**: Returns a list of available model configurations. - **bank**: Accesses the registered document bank (dictionary). - **threshold**: Gets or sets the current similarity threshold. ``` -------------------------------- ### Initialize MarginBasedInference Engine Source: https://context7.com/docsaidlab/docclassifier/llms.txt Initialize the MarginBasedInference engine with specified parameters. This class handles model loading, preprocessing, and feature comparison for direct control over the inference pipeline. ```python import numpy as np import cv2 from docclassifier import MarginBasedInference # Initialize inference engine inference = MarginBasedInference( gpu_id=0, backend='cpu', # or 'cuda' for GPU model_cfg='20240326', threshold=0.627, register_root=None # Uses default registration ) # Available model configurations print(f'Available configs: {list(inference.configs.keys())}') # Output: ['20240326'] # Model configuration details cfg = inference.configs['20240326'] print(f'Input size: {cfg["img_size_infer"]}') # (128, 128) print(f'Default threshold: {cfg["threshold"]}') # 0.627 (FPR=0.01) # Load and preprocess image img = cv2.imread('document.jpg') # Extract features features = inference.extract_feature(img) print(f'Feature vector shape: {features.shape}') # (256,) # Compare two feature vectors feat1 = inference.extract_feature(cv2.imread('doc1.jpg')) feat2 = inference.extract_feature(cv2.imread('doc2.jpg')) similarity = inference.compare(feat1, feat2) print(f'Similarity: {similarity:.4f}') # Run classification most_similar, max_score = inference(img) print(f'Result: {most_similar} ({max_score:.4f})') ``` -------------------------------- ### Set Indoor and ImageNet Dataset Paths Source: https://github.com/docsaidlab/docclassifier/blob/main/model/README.md Defines the root directories for the indoor scene recognition and ImageNet datasets. Modify these variables if your datasets are located elsewhere. ```python INDOOR_ROOT = '/data/Dataset/indoor_scene_recognition/Images' IMAGENET_ROOT = '/data/Dataset/ILSVRC2012/train' ``` -------------------------------- ### Custom Registration Source: https://context7.com/docsaidlab/docclassifier/llms.txt Details on how to set up and manage custom registration folders for defining new document types. ```APIDOC ## Custom Registration ### Description Create and manage custom document registration folders to define which document types the classifier can recognize. Images in the registration folder become the reference templates that input documents are compared against. ### Method N/A (Configuration via `register_root` parameter during initialization) ### Endpoint N/A (File System Operation) ### Parameters - **register_root** (string) - Path to the custom directory containing reference document images. ### Process 1. Create a directory (e.g., `/path/to/custom/templates`). 2. Place representative images of each new document type within this directory. 3. The filename (without extension) can optionally serve as the document label, or labels can be managed programmatically. 4. Initialize `DocClassifier` with the `register_root` parameter pointing to this directory. ```python from docclassifier import DocClassifier classifier = DocClassifier(register_root='/path/to/custom/templates') ``` 5. The classifier will automatically load these images as reference templates upon initialization. 6. New document images can then be classified against this custom bank of templates. ### Example Structure ``` /path/to/custom/templates/ ├── ID_Card_Front.jpg ├── Passport_Page1.png └── Drivers_License_Back.jpeg ``` In this example, the classifier would recognize 'ID_Card_Front', 'Passport_Page1', and 'Drivers_License_Back' as distinct document types if their similarity scores exceed the threshold. ``` -------------------------------- ### Clone DocClassifier Repository Source: https://github.com/docsaidlab/docclassifier/blob/main/README_tw.md Download the DocClassifier project files from GitHub using git clone. ```bash git clone https://github.com/DocsaidLab/DocClassifier.git ``` -------------------------------- ### Configure IP Camera Address Source: https://github.com/docsaidlab/docclassifier/blob/main/demo/README.md Modify the IPADDR variable in `demo_ipcam.py` to match your IP camera's network address. Ensure your workstation and mobile device are on the same network. ```python IPADDR = '192.168.0.179' # Change this to your IP camera address ``` -------------------------------- ### Custom Document Registration Source: https://context7.com/docsaidlab/docclassifier/llms.txt Explains the process of creating and managing custom document registration folders. Images placed in these folders serve as reference templates for the classifier. ```python import os from docclassifier import DocClassifier ``` -------------------------------- ### ONNX Model Inference with Capybara Source: https://context7.com/docsaidlab/docclassifier/llms.txt Loads an ONNX model using Capybara's `ONNXEngine` and performs inference on an input image. Demonstrates image preprocessing, including resizing, transposing, and normalization. ```python # Using the ONNX model for inference import capybara as cb import numpy as np # Load ONNX model model = cb.ONNXEngine( 'model.onnx', gpu_id=0, backend='cpu' # or 'cuda' ) # Prepare input (128x128 RGB image normalized to [0, 1]) img = cb.imread('document.jpg') img = cb.imresize(img, size=(128, 128)) img = np.transpose(img, (2, 0, 1)).astype('float32') img = img[None] / 255.0 # Add batch dimension and normalize # Run inference output = model(img=img) features = output['feats'][0] # 256-dimensional normalized features print(f'Feature shape: {features.shape}') ``` -------------------------------- ### Extract Features and Compare Documents Source: https://context7.com/docsaidlab/docclassifier/llms.txt Shows how to extract normalized 256-dimensional feature vectors from document images using `extract_feature`. It also demonstrates computing cosine similarity and building a custom document bank for classification. ```python import numpy as np from docclassifier import DocClassifier import cv2 classifier = DocClassifier() # Load document images doc1 = cv2.imread('document1.jpg') doc2 = cv2.imread('document2.jpg') # Extract feature vectors (256-dimensional, normalized) feat1 = classifier.extract_feature(doc1) feat2 = classifier.extract_feature(doc2) print(f'Feature shape: {feat1.shape}') # Output: Feature shape: (256,) print(f'Feature norm: {np.linalg.norm(feat1):.4f}') # Output: ~1.0 (normalized) # Compute similarity between two documents (same method used internally) def compare_features(feat1, feat2): # Cosine similarity scaled to [0, 1] range return (np.dot(feat1, feat2) + 1) / 2 similarity = compare_features(feat1, feat2) print(f'Similarity score: {similarity:.4f}') # Build custom document bank custom_bank = {} template_images = ['id_card.jpg', 'passport.jpg', 'drivers_license.jpg'] labels = ['ID Card', 'Passport', 'Drivers License'] for img_path, label in zip(template_images, labels): img = cv2.imread(img_path) if img is not None: custom_bank[label] = classifier.extract_feature(img) # Find best match from custom bank query_img = cv2.imread('unknown_document.jpg') query_feat = classifier.extract_feature(query_img) best_match = None best_score = 0.0 threshold = 0.6 for label, template_feat in custom_bank.items(): score = compare_features(query_feat, template_feat) if score > threshold and score > best_score: best_match = label best_score = score print(f'Best match: {best_match}, Score: {best_score:.4f}') ``` -------------------------------- ### Export PyTorch Model to ONNX Source: https://context7.com/docsaidlab/docclassifier/llms.txt Converts a trained PyTorch classifier model to ONNX format using the `main_classifier_torch2onnx` function. The exported model includes metadata like input shape, FLOPs, and configuration details. ```python from model.to_onnx import main_classifier_torch2onnx # Export model to ONNX format # Generates: lcnet050_cosface_f256_r128_squeeze_lbn_imagenet_finetune_20240326_fp32.onnx main_classifier_torch2onnx('lcnet050_cosface_f256_r128_squeeze_lbn_imagenet_finetune') ``` -------------------------------- ### Feature Extraction Source: https://context7.com/docsaidlab/docclassifier/llms.txt Explains how to extract 256-dimensional feature vectors from document images for custom similarity comparisons. ```APIDOC ## Feature Extraction ### Description Extract normalized feature vectors from document images for custom similarity comparisons or building your own document bank. The feature vectors are 256-dimensional and L2-normalized for cosine similarity computation. ### Method `extract_feature(img)` ### Endpoint N/A (Python Class Method) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Input Image - **img** (numpy.ndarray) - Required - The document image loaded using a library like OpenCV. ### Request Example ```python import numpy as np from docclassifier import DocClassifier import cv2 classifier = DocClassifier() # Load document images doc1 = cv2.imread('document1.jpg') doc2 = cv2.imread('document2.jpg') # Extract feature vectors feat1 = classifier.extract_feature(doc1) feat2 = classifier.extract_feature(doc2) print(f'Feature shape: {feat1.shape}') print(f'Feature norm: {np.linalg.norm(feat1):.4f}') # Compute similarity def compare_features(feat1, feat2): return (np.dot(feat1, feat2) + 1) / 2 similarity = compare_features(feat1, feat2) print(f'Similarity score: {similarity:.4f}') ``` ### Response #### Success Response - **feature_vector** (numpy.ndarray) - A 256-dimensional, L2-normalized feature vector. #### Response Example ```python # feat1 will be a numpy array like: # [ 0.0123 -0.4567 ... 0.7890] # Shape: (256,) # Norm: ~1.0 ``` ### Usage Example (Custom Document Bank) ```python # ... (feature extraction code above) ... custom_bank = {} template_images = ['id_card.jpg', 'passport.jpg', 'drivers_license.jpg'] labels = ['ID Card', 'Passport', 'Drivers License'] for img_path, label in zip(template_images, labels): img = cv2.imread(img_path) if img is not None: custom_bank[label] = classifier.extract_feature(img) query_img = cv2.imread('unknown_document.jpg') query_feat = classifier.extract_feature(query_img) best_match = None best_score = 0.0 threshold = 0.6 for label, template_feat in custom_bank.items(): score = compare_features(query_feat, template_feat) if score > threshold and score > best_score: best_match = label best_score = score print(f'Best match: {best_match}, Score: {best_score:.4f}') ``` ``` -------------------------------- ### Find Threshold at Specific FPR Values Source: https://context7.com/docsaidlab/docclassifier/llms.txt Iterates through target False Positive Rate (FPR) values to find corresponding True Positive Rate (TPR) and threshold values. Useful for calibrating model performance. ```python fpr_targets = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1] for target_fpr in fpr_targets: idx = np.argmin(np.abs(fpr - target_fpr)) print(f'FPR={target_fpr:.0e}: TPR={tpr[idx]:.3f}, Threshold={thresholds[idx]:.4f}') ``` -------------------------------- ### Sequential Feature Embedding with Normalization Source: https://github.com/docsaidlab/docclassifier/blob/main/README.md Defines a sequential module for embedding features, incorporating Linear layers, Layer Normalization, and Batch Normalization. Use this for processing extracted features before classification. ```python self.embed_feats = nn.Sequential( nn.Linear(in_dim_flatten, embed_dim, bias=False), nn.LayerNorm(embed_dim), nn.BatchNorm1d(embed_dim), nn.Linear(embed_dim, embed_dim, bias=False), nn.LayerNorm(embed_dim), nn.BatchNorm1d(embed_dim), ) ``` -------------------------------- ### Document Classification with Lowered Threshold Source: https://github.com/docsaidlab/docclassifier/blob/main/README.md Perform document classification inference with a lowered threshold to potentially identify similar documents when the default threshold is too high. This can help in cases where the input image is not a perfect match but shares similarities with registered document types. ```python model = DocClassifier( threshold=0.6 ) # Re-run the inference most_similar, max_score = model(img) print(f'most_similar: {most_similar}, max_score: {max_score:.4f}') ``` -------------------------------- ### Benchmarking DocClassifier Performance Source: https://context7.com/docsaidlab/docclassifier/llms.txt Evaluate model performance using ROC curves and TPR/FPR metrics. This involves calculating pairwise similarities and binary labels from extracted features. ```python import numpy as np from itertools import combinations from sklearn.metrics import roc_curve from docclassifier import DocClassifier import capybara as cb # Initialize model model = DocClassifier(model_cfg='20240326') def calculate_combinations(embeddings, labels): """Calculate pairwise similarities and labels.""" pairs = np.array(list(combinations(range(len(embeddings)), 2))) base_idx, target_idx = pairs[:, 0], pairs[:, 1] # Compute cosine similarity scaled to [0, 1] scores = np.sum(embeddings[base_idx] * embeddings[target_idx], axis=-1) scores = (scores + 1) / 2 # Binary labels: 1 if same class, 0 if different pair_labels = np.where(labels[base_idx] == labels[target_idx], 1, 0) return scores, pair_labels # Example: Evaluate on test dataset test_images = ['doc1.jpg', 'doc2.jpg', 'doc3.jpg', 'doc4.jpg'] test_labels = np.array([0, 0, 1, 1]) # Document type labels # Extract features features = [] for img_path in test_images: img = cb.imread(img_path) feat = model.extract_feature(img) features.append(feat) features = np.stack(features) scores, labels = calculate_combinations(features, test_labels) # Compute ROC curve ``` -------------------------------- ### BibTeX Citation for DocClassifier Source: https://github.com/docsaidlab/docclassifier/blob/main/README.md Use this BibTeX entry to cite the DocClassifier GitHub repository in academic work. Ensure the URL and note fields are included. ```bibtex @misc{lin2024docclassifier, author = {Kun-Hsiang Lin, Ze Yuan}, title = {DocClassifier}, year = {2024}, publisher = {GitHub}, url = {https://github.com/DocsaidLab/DocClassifier}, note = {GitHub repository} } ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.