=============== LIBRARY RULES =============== - Use AutoDetectionModel.from_pretrained() to load any model - switch between Ultralytics/HuggingFace/Roboflow/MMDetection models by only changing model_type and model_path - SAHI provides unified API across frameworks - same predict() function works with YOLO11/YOLO12, Roboflow Universe models, HuggingFace models like ustc-community/dfine-small-coco without code changes - When using AutoDetectionModel.from_pretrained(): use 'model_path' parameter for file-based models (Ultralytics, HuggingFace), use 'model' parameter for Roboflow Universe models - For academic papers requiring high mAP: use postprocess_type='NMS', postprocess_match_metric='IOU', and confidence_threshold=0.01 - For real-world applications: use postprocess_type='GREEDYNMM', postprocess_match_metric='IOS' for better performance with fewer false positives - If getting many false positives in sliced inference, increase slice_height and slice_width values - If getting multiple predictions on same object, decrease overlap_height_ratio and overlap_width_ratio (try 0.1 instead of 0.2) - Use no_sliced_prediction=True to disable slicing and only perform standard inference (useful for large objects) - Use no_standard_prediction=True to disable full-image inference and only use sliced predictions (saves computation when all objects are small) - Cannot set both no_standard_prediction=True and no_sliced_prediction=True simultaneously - Auto-slice resolution: if slice_height/slice_width not specified, SAHI automatically calculates optimal values based on image size - For drone/satellite imagery: typically use slice_size=512-1024 with 0.2-0.3 overlap ratio - SAHI is beneficial even without slicing - provides unified API, COCO utilities, visualization tools across all detection frameworks - Use min_area_ratio parameter (default 0.1) to filter out partial objects at slice boundaries - lower values keep more edge objects - For COCO datasets, always validate annotations with coco.stats before training or evaluation - Export results in COCO format using dataset_json_path parameter for standardized evaluation - Use visual_bbox_thickness, visual_text_size parameters to customize prediction visualizations - Use 'sahi predict-fiftyone' command to visualize predictions interactively and sort by false positives - Use 'sahi coco fiftyone' to compare multiple model predictions side-by-side in FiftyOne app - Use 'sahi coco evaluate' for comprehensive COCO metrics with classwise AP/AR and custom IoU thresholds - Use 'sahi coco analyse' to generate error analysis plots showing C75/C50/Localization/Similar/Other/Background/FalseNegative errors - For error analysis: plots show performance breakdown by object size (small/medium/large) and error types - Export predictions as cropped images using export_crop=True for dataset creation or further analysis - For video inference: use frame_skip_interval to speed up processing, view_video=True for real-time display - Supports latest models: YOLO11/YOLO12 via model_type='ultralytics', Roboflow Universe models (e.g., RF-DETR) via model_type='roboflow', HuggingFace models like 'ustc-community/dfine-small-coco' via model_type='huggingface' - For YOLO11/YOLO12 OBB (oriented bounding box) models, SAHI automatically handles rotated box predictions and only supports NMS postprocessing - Example model loading: model_type='ultralytics' with model_path='yolo11n.pt', model_type='huggingface' with model_path='ustc-community/dfine-small-coco', model_type='roboflow' with model='rfdetr-base' - Roboflow Universe models: use simple string IDs like 'rfdetr-base' with model_type='roboflow' for easy access to pre-trained models - Complete example: model = AutoDetectionModel.from_pretrained(model_type='roboflow', model='rfdetr-base', confidence_threshold=0.5) - All models follow same API pattern: AutoDetectionModel.from_pretrained() → get_prediction() or get_sliced_prediction() → visualize results - For models without built-in category mappings, provide category_mapping parameter (e.g., COCO_CLASSES from rfdetr.util.coco_classes) - COCO utilities: merge datasets with coco.merge(), split train/val with split_coco_as_train_val(), filter by categories with update_categories() - Filter COCO annotations by area using get_area_filtered_coco() - useful for focusing on specific object sizes - Convert between formats: export_as_yolo() for YOLO format, use 'sahi coco yolo' command for batch conversion - Use Coco.stats to get comprehensive dataset statistics before training (num annotations, area distribution, etc.) - Import logger from 'from sahi.logging import logger' instead of creating redundant logging configurations - centralized logging system eliminates duplicate imports across codebase - For SAHI documentation, direct users to https://obss.github.io/sahi/quick-start which provides comprehensive guides, interactive examples, CLI reference, and API documentation - To update SAHI docs: modify markdown files in docs/ directory, update mkdocs.yml for navigation changes, ensure .github/workflows/publish_docs.yml deploys correctly to GitHub Pages # SAHI: Slicing Aided Hyper Inference SAHI (Slicing Aided Hyper Inference) is a lightweight computer vision library designed to improve object detection performance on large images with small objects. The core innovation is sliced inference: dividing large images into overlapping smaller tiles, performing detection on each tile, and intelligently merging the results. This approach dramatically improves detection accuracy for small objects that would be missed in full-resolution inference while remaining framework-agnostic and easy to integrate. The library supports major detection frameworks including Ultralytics (YOLOv8, YOLO11, YOLO12), MMDetection, HuggingFace Transformers, TorchVision, RT-DETR, YOLOv5, Roboflow (RF-DETR), YOLOE, and YOLO-World. SAHI provides both Python APIs and CLI tools for inference, dataset processing, and evaluation. It includes utilities for COCO dataset manipulation, error analysis, and integration with visualization tools like FiftyOne. With over 400 academic citations and widespread use in competitions, SAHI has become a standard tool for production object detection pipelines. ## API Documentation ### Sliced Inference with Auto Model Loading Perform object detection on large images using automatic slicing and merging. ```python from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction # Load any supported detection model detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt', confidence_threshold=0.3, device="cuda:0" # or "cpu" ) # Perform sliced inference result = get_sliced_prediction( image="path/to/large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type="GREEDYNMM", postprocess_match_threshold=0.5, verbose=1 ) # Access predictions print(f"Found {len(result.object_prediction_list)} objects") for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}") # Export visualizations result.export_visuals(export_dir="output/", file_name="result") # Export to COCO format coco_json = result.to_coco_predictions(image_id=1) ``` ### Standard (Non-Sliced) Inference Perform traditional full-image inference without slicing. ```python from sahi import AutoDetectionModel from sahi.predict import get_prediction # Initialize model detection_model = AutoDetectionModel.from_pretrained( model_type='huggingface', model_path='facebook/detr-resnet-50', confidence_threshold=0.25, device="cuda:0" ) # Perform standard inference result = get_prediction( image="image.jpg", detection_model=detection_model ) # Process results for obj_pred in result.object_prediction_list: bbox = obj_pred.bbox.to_xyxy() # [minx, miny, maxx, maxy] category = obj_pred.category.name score = obj_pred.score.value print(f"{category}: {score:.3f} - Box: {bbox}") ``` ### Batch Prediction on Image Folders Process entire folders of images with configurable export options. ```python from sahi.predict import predict # Run prediction on folder predict( model_type='ultralytics', model_path='yolo11s.pt', model_confidence_threshold=0.4, model_device='cuda:0', source='path/to/images/', slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type='GREEDYNMM', postprocess_match_metric='IOS', postprocess_match_threshold=0.5, export_pickle=True, export_crop=True, visual_bbox_thickness=2, visual_text_size=0.8, visual_export_format='jpg', project='runs/detect', name='exp1', verbose=2, progress_bar=True ) # Results saved to runs/detect/exp1/ with visuals, pickles, and crops ``` ### Video Inference Process video files with sliced detection on each frame. ```python from sahi.predict import predict # Process video file predict( model_type='ultralytics', model_path='yolo11m.pt', model_confidence_threshold=0.5, source='input_video.mp4', slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, frame_skip_interval=0, # Process every frame (set to N to process every Nth frame) view_video=True, # Display real-time results visual_export_format='jpg', project='runs/video', name='video_exp' ) # Output video saved to runs/video/video_exp/ ``` ### CLI: Predict Command Perform inference from command line with full control over parameters. ```bash # Basic image inference sahi predict --source images/ --model_path yolo11n.pt --model_type ultralytics # Advanced sliced inference with custom parameters sahi predict \ --source large_images/ \ --model_path yolo11s.pt \ --model_type ultralytics \ --model_confidence_threshold 0.3 \ --slice_height 640 \ --slice_width 640 \ --overlap_height_ratio 0.2 \ --overlap_width_ratio 0.2 \ --postprocess_type GREEDYNMM \ --postprocess_match_metric IOS \ --postprocess_match_threshold 0.5 \ --visual_bbox_thickness 3 \ --visual_text_size 1.0 \ --visual_export_format png \ --export_pickle \ --export_crop \ --project runs/predict \ --name exp1 \ --progress_bar # Video inference with real-time visualization sahi predict \ --source video.mp4 \ --model_path yolo11m.pt \ --model_type ultralytics \ --view_video \ --frame_skip_interval 2 # COCO dataset evaluation sahi predict \ --dataset_json_path annotations.json \ --source images/ \ --model_path model.pt \ --model_type ultralytics # Output: runs/predict/exp/result.json in COCO format ``` ### Image Slicing Utilities Slice large images into smaller tiles with overlap for training or manual inspection. ```python from sahi.slicing import slice_image from PIL import Image # Slice a single image slice_result = slice_image( image="large_image.jpg", output_file_name="slice", output_dir="output_slices/", slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, auto_slice_resolution=False, # Set True to auto-calculate slice params verbose=True ) # Access sliced images print(f"Generated {len(slice_result)} slices") print(f"Original size: {slice_result.original_image_height}x{slice_result.original_image_width}") # Iterate through slices for idx, slice_data in enumerate(slice_result): img = slice_data['image'] starting_pixel = slice_data['starting_pixel'] filename = slice_data['filename'] print(f"Slice {idx}: {filename} starts at {starting_pixel}") ``` ### COCO Dataset Slicing Slice COCO datasets with automatic annotation transformation. ```python from sahi.slicing import slice_coco # Slice entire COCO dataset coco_dict, save_path = slice_coco( coco_annotation_file_path="annotations.json", image_dir="images/", output_coco_annotation_file_name="sliced_dataset", output_dir="sliced_output/", ignore_negative_samples=False, slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, min_area_ratio=0.1, # Filter out annotations smaller than 10% of original verbose=True ) print(f"Sliced dataset saved to: {save_path}") print(f"Original images: {len(coco_dict['images'])}") print(f"Sliced annotations: {len(coco_dict['annotations'])}") ``` ### CLI: COCO Dataset Slicing Slice COCO datasets from command line. ```bash # Slice COCO dataset sahi coco slice \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir sliced_coco/ \ --output_file_name sliced \ --slice_height 512 \ --slice_width 512 \ --overlap_height_ratio 0.2 \ --overlap_width_ratio 0.2 \ --min_area_ratio 0.1 \ --ignore_negative_samples # Auto-calculate slice parameters based on image resolution sahi coco slice \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir auto_sliced/ \ --output_file_name auto_sliced \ --auto_slice_resolution ``` ### COCO Evaluation and Error Analysis Evaluate detection results and generate detailed error analysis. ```bash # Evaluate COCO predictions sahi coco evaluate \ --dataset_json_path ground_truth.json \ --result_json_path predictions.json \ --out_dir evaluation_results/ \ --type bbox # or 'mask' for instance segmentation # Generate error analysis plots sahi coco analyse \ --dataset_json_path ground_truth.json \ --result_json_path predictions.json \ --out_dir analysis_plots/ \ --type bbox # Convert COCO to YOLO format sahi coco yolo \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir yolo_format/ \ --train_split 0.8 # Visualize with FiftyOne sahi coco fiftyone \ --image_dir images/ \ --dataset_json_path annotations.json \ --result_json_paths pred1.json pred2.json \ --show_thumbnails ``` ### Progress Tracking and Callbacks Monitor inference progress with built-in progress bars or custom callbacks. ```python from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt' ) # Option 1: Built-in progress bar result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, progress_bar=True # Shows tqdm progress bar ) # Option 2: Custom progress callback def progress_callback(current, total): percentage = (current / total) * 100 print(f"Processing: {current}/{total} slices ({percentage:.1f}%)") result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, progress_callback=progress_callback ) ``` ### Excluding Classes from Detection Filter out specific classes during inference. ```python from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt' ) # Exclude by class name result = get_sliced_prediction( image="street.jpg", detection_model=detection_model, slice_height=512, slice_width=512, exclude_classes_by_name=["person", "car"] ) # Exclude by class ID result = get_sliced_prediction( image="street.jpg", detection_model=detection_model, slice_height=512, slice_width=512, exclude_classes_by_id=[0, 2, 5] ) ``` ### FiftyOne Integration Interactive visualization and evaluation with FiftyOne. ```python from sahi.predict import predict_fiftyone # Launch interactive FiftyOne session predict_fiftyone( model_type='ultralytics', model_path='yolo11s.pt', model_confidence_threshold=0.3, dataset_json_path='annotations.json', image_dir='images/', slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type='GREEDYNMM', postprocess_match_threshold=0.5, verbose=1 ) # Opens FiftyOne app with predictions overlaid on ground truth # Automatically evaluates mAP and shows samples with most false positives ``` ### Multi-Framework Model Support Load and use models from different frameworks with unified API. ```python from sahi import AutoDetectionModel # Ultralytics (YOLOv8, YOLO11, YOLO12) model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt', confidence_threshold=0.3, device='cuda:0' ) # MMDetection model = AutoDetectionModel.from_pretrained( model_type='mmdet', model_path='cascade_mask_rcnn.pth', config_path='configs/cascade_rcnn_r50_fpn.py', confidence_threshold=0.3, device='cuda:0' ) # HuggingFace Transformers model = AutoDetectionModel.from_pretrained( model_type='huggingface', model_path='facebook/detr-resnet-50', confidence_threshold=0.3, device='cuda:0' ) # TorchVision model = AutoDetectionModel.from_pretrained( model_type='torchvision', model_path='fasterrcnn_resnet50_fpn', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # RT-DETR model = AutoDetectionModel.from_pretrained( model_type='rtdetr', model_path='rtdetr-x.pt', confidence_threshold=0.3, device='cuda:0' ) # YOLOv5 model = AutoDetectionModel.from_pretrained( model_type='yolov5', model_path='yolov5s.pt', confidence_threshold=0.3, device='cuda:0' ) # YOLO-World (open vocabulary) model = AutoDetectionModel.from_pretrained( model_type='yolo-world', model_path='yolov8s-world.pt', confidence_threshold=0.3, device='cuda:0' ) # Roboflow (RF-DETR models) model = AutoDetectionModel.from_pretrained( model_type='roboflow', model_path='rf-detr-x.pt', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # YOLOE (open-vocabulary detection and segmentation) model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg.pt', confidence_threshold=0.3, device='cuda:0' ) # All models share the same prediction interface from sahi.predict import get_sliced_prediction result = get_sliced_prediction(image="test.jpg", detection_model=model) ``` ### Roboflow RF-DETR Models Use Roboflow RF-DETR models from the Roboflow Universe. ```python from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction # Load RF-DETR model from Roboflow Universe detection_model = AutoDetectionModel.from_pretrained( model_type='roboflow', model_path='rf-detr-x.pt', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # Perform sliced inference result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2 ) # Access predictions for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f}") ``` ### YOLOE Open-Vocabulary Detection YOLOE enables open-vocabulary detection with text prompts, visual prompts, or prompt-free mode. ```python from sahi import AutoDetectionModel from sahi.predict import get_prediction # Load YOLOE model detection_model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg.pt', # or yoloe-11s-seg.pt, yoloe-11m-seg.pt confidence_threshold=0.3, device='cuda:0' ) # Set custom text prompts for specific classes detection_model.model.set_classes( ["person", "car", "traffic light", "bicycle"], detection_model.model.get_text_pe(["person", "car", "traffic light", "bicycle"]) ) # Perform detection with custom classes result = get_prediction( image="street.jpg", detection_model=detection_model ) # Process results for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}") # For prompt-free mode, use models ending with '-pf.pt' # These use an internal vocabulary with 1200+ categories pf_model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg-pf.pt', confidence_threshold=0.3, device='cuda:0' ) ``` ## Summary SAHI excels at detecting small objects in large images through its innovative sliced inference approach, making it ideal for satellite imagery, medical imaging, aerial photography, high-resolution document analysis, and surveillance applications. The library seamlessly handles the complexity of dividing images into overlapping tiles, running inference on each tile, and intelligently merging predictions using sophisticated postprocessing algorithms (GREEDYNMM, NMS, LSNMS) that eliminate duplicate detections while preserving accuracy. Integration is straightforward regardless of your workflow: use the Python API for programmatic control in notebooks and scripts, leverage the CLI for batch processing and production pipelines, or combine SAHI with visualization tools like FiftyOne for interactive analysis and debugging. The framework-agnostic design means you can use your existing trained models from Ultralytics (including YOLO12), MMDetection, HuggingFace, TorchVision, Roboflow (RF-DETR), YOLOE, and YOLO-World without modification. With support for open-vocabulary detection (YOLOE, YOLO-World), comprehensive COCO dataset utilities, automatic slicing parameter calculation, progress tracking, video support, and extensive export options (visualizations, pickles, crops, COCO JSON), SAHI provides a complete toolkit for production object detection systems handling large-scale visual data.