Try Live
Add Docs
Rankings
Pricing
Docs
Install
Theme
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
SAHI - Slicing Aided Hyper Inference
https://github.com/obss/sahi
Admin
A unified vision library providing a common API for multiple object detection and instance
...
Tokens:
16,958
Snippets:
112
Trust Score:
7.9
Update:
5 months ago
Context
Skills
Chat
Benchmark
82.5
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# SAHI: Slicing Aided Hyper Inference SAHI (Slicing Aided Hyper Inference) is a lightweight computer vision library designed to improve object detection performance on large images with small objects. The core innovation is sliced inference: dividing large images into overlapping smaller tiles, performing detection on each tile, and intelligently merging the results. This approach dramatically improves detection accuracy for small objects that would be missed in full-resolution inference while remaining framework-agnostic and easy to integrate. The library supports major detection frameworks including Ultralytics (YOLOv8, YOLO11, YOLO12), MMDetection, HuggingFace Transformers, TorchVision, RT-DETR, YOLOv5, Roboflow (RF-DETR), YOLOE, and YOLO-World. SAHI provides both Python APIs and CLI tools for inference, dataset processing, and evaluation. It includes utilities for COCO dataset manipulation, error analysis, and integration with visualization tools like FiftyOne. With over 400 academic citations and widespread use in competitions, SAHI has become a standard tool for production object detection pipelines. ## API Documentation ### Sliced Inference with Auto Model Loading Perform object detection on large images using automatic slicing and merging. ```python from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction # Load any supported detection model detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt', confidence_threshold=0.3, device="cuda:0" # or "cpu" ) # Perform sliced inference result = get_sliced_prediction( image="path/to/large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type="GREEDYNMM", postprocess_match_threshold=0.5, verbose=1 ) # Access predictions print(f"Found {len(result.object_prediction_list)} objects") for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}") # Export visualizations result.export_visuals(export_dir="output/", file_name="result") # Export to COCO format coco_json = result.to_coco_predictions(image_id=1) ``` ### Standard (Non-Sliced) Inference Perform traditional full-image inference without slicing. ```python from sahi import AutoDetectionModel from sahi.predict import get_prediction # Initialize model detection_model = AutoDetectionModel.from_pretrained( model_type='huggingface', model_path='facebook/detr-resnet-50', confidence_threshold=0.25, device="cuda:0" ) # Perform standard inference result = get_prediction( image="image.jpg", detection_model=detection_model ) # Process results for obj_pred in result.object_prediction_list: bbox = obj_pred.bbox.to_xyxy() # [minx, miny, maxx, maxy] category = obj_pred.category.name score = obj_pred.score.value print(f"{category}: {score:.3f} - Box: {bbox}") ``` ### Batch Prediction on Image Folders Process entire folders of images with configurable export options. ```python from sahi.predict import predict # Run prediction on folder predict( model_type='ultralytics', model_path='yolo11s.pt', model_confidence_threshold=0.4, model_device='cuda:0', source='path/to/images/', slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type='GREEDYNMM', postprocess_match_metric='IOS', postprocess_match_threshold=0.5, export_pickle=True, export_crop=True, visual_bbox_thickness=2, visual_text_size=0.8, visual_export_format='jpg', project='runs/detect', name='exp1', verbose=2, progress_bar=True ) # Results saved to runs/detect/exp1/ with visuals, pickles, and crops ``` ### Video Inference Process video files with sliced detection on each frame. ```python from sahi.predict import predict # Process video file predict( model_type='ultralytics', model_path='yolo11m.pt', model_confidence_threshold=0.5, source='input_video.mp4', slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, frame_skip_interval=0, # Process every frame (set to N to process every Nth frame) view_video=True, # Display real-time results visual_export_format='jpg', project='runs/video', name='video_exp' ) # Output video saved to runs/video/video_exp/ ``` ### CLI: Predict Command Perform inference from command line with full control over parameters. ```bash # Basic image inference sahi predict --source images/ --model_path yolo11n.pt --model_type ultralytics # Advanced sliced inference with custom parameters sahi predict \ --source large_images/ \ --model_path yolo11s.pt \ --model_type ultralytics \ --model_confidence_threshold 0.3 \ --slice_height 640 \ --slice_width 640 \ --overlap_height_ratio 0.2 \ --overlap_width_ratio 0.2 \ --postprocess_type GREEDYNMM \ --postprocess_match_metric IOS \ --postprocess_match_threshold 0.5 \ --visual_bbox_thickness 3 \ --visual_text_size 1.0 \ --visual_export_format png \ --export_pickle \ --export_crop \ --project runs/predict \ --name exp1 \ --progress_bar # Video inference with real-time visualization sahi predict \ --source video.mp4 \ --model_path yolo11m.pt \ --model_type ultralytics \ --view_video \ --frame_skip_interval 2 # COCO dataset evaluation sahi predict \ --dataset_json_path annotations.json \ --source images/ \ --model_path model.pt \ --model_type ultralytics # Output: runs/predict/exp/result.json in COCO format ``` ### Image Slicing Utilities Slice large images into smaller tiles with overlap for training or manual inspection. ```python from sahi.slicing import slice_image from PIL import Image # Slice a single image slice_result = slice_image( image="large_image.jpg", output_file_name="slice", output_dir="output_slices/", slice_height=512, slice_width=512, overlap_height_ratio=0.2, overlap_width_ratio=0.2, auto_slice_resolution=False, # Set True to auto-calculate slice params verbose=True ) # Access sliced images print(f"Generated {len(slice_result)} slices") print(f"Original size: {slice_result.original_image_height}x{slice_result.original_image_width}") # Iterate through slices for idx, slice_data in enumerate(slice_result): img = slice_data['image'] starting_pixel = slice_data['starting_pixel'] filename = slice_data['filename'] print(f"Slice {idx}: {filename} starts at {starting_pixel}") ``` ### COCO Dataset Slicing Slice COCO datasets with automatic annotation transformation. ```python from sahi.slicing import slice_coco # Slice entire COCO dataset coco_dict, save_path = slice_coco( coco_annotation_file_path="annotations.json", image_dir="images/", output_coco_annotation_file_name="sliced_dataset", output_dir="sliced_output/", ignore_negative_samples=False, slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, min_area_ratio=0.1, # Filter out annotations smaller than 10% of original verbose=True ) print(f"Sliced dataset saved to: {save_path}") print(f"Original images: {len(coco_dict['images'])}") print(f"Sliced annotations: {len(coco_dict['annotations'])}") ``` ### CLI: COCO Dataset Slicing Slice COCO datasets from command line. ```bash # Slice COCO dataset sahi coco slice \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir sliced_coco/ \ --output_file_name sliced \ --slice_height 512 \ --slice_width 512 \ --overlap_height_ratio 0.2 \ --overlap_width_ratio 0.2 \ --min_area_ratio 0.1 \ --ignore_negative_samples # Auto-calculate slice parameters based on image resolution sahi coco slice \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir auto_sliced/ \ --output_file_name auto_sliced \ --auto_slice_resolution ``` ### COCO Evaluation and Error Analysis Evaluate detection results and generate detailed error analysis. ```bash # Evaluate COCO predictions sahi coco evaluate \ --dataset_json_path ground_truth.json \ --result_json_path predictions.json \ --out_dir evaluation_results/ \ --type bbox # or 'mask' for instance segmentation # Generate error analysis plots sahi coco analyse \ --dataset_json_path ground_truth.json \ --result_json_path predictions.json \ --out_dir analysis_plots/ \ --type bbox # Convert COCO to YOLO format sahi coco yolo \ --image_dir images/ \ --dataset_json_path annotations.json \ --output_dir yolo_format/ \ --train_split 0.8 # Visualize with FiftyOne sahi coco fiftyone \ --image_dir images/ \ --dataset_json_path annotations.json \ --result_json_paths pred1.json pred2.json \ --show_thumbnails ``` ### Progress Tracking and Callbacks Monitor inference progress with built-in progress bars or custom callbacks. ```python from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt' ) # Option 1: Built-in progress bar result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, progress_bar=True # Shows tqdm progress bar ) # Option 2: Custom progress callback def progress_callback(current, total): percentage = (current / total) * 100 print(f"Processing: {current}/{total} slices ({percentage:.1f}%)") result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=512, slice_width=512, progress_callback=progress_callback ) ``` ### Excluding Classes from Detection Filter out specific classes during inference. ```python from sahi.predict import get_sliced_prediction from sahi import AutoDetectionModel detection_model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt' ) # Exclude by class name result = get_sliced_prediction( image="street.jpg", detection_model=detection_model, slice_height=512, slice_width=512, exclude_classes_by_name=["person", "car"] ) # Exclude by class ID result = get_sliced_prediction( image="street.jpg", detection_model=detection_model, slice_height=512, slice_width=512, exclude_classes_by_id=[0, 2, 5] ) ``` ### FiftyOne Integration Interactive visualization and evaluation with FiftyOne. ```python from sahi.predict import predict_fiftyone # Launch interactive FiftyOne session predict_fiftyone( model_type='ultralytics', model_path='yolo11s.pt', model_confidence_threshold=0.3, dataset_json_path='annotations.json', image_dir='images/', slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2, postprocess_type='GREEDYNMM', postprocess_match_threshold=0.5, verbose=1 ) # Opens FiftyOne app with predictions overlaid on ground truth # Automatically evaluates mAP and shows samples with most false positives ``` ### Multi-Framework Model Support Load and use models from different frameworks with unified API. ```python from sahi import AutoDetectionModel # Ultralytics (YOLOv8, YOLO11, YOLO12) model = AutoDetectionModel.from_pretrained( model_type='ultralytics', model_path='yolo11n.pt', confidence_threshold=0.3, device='cuda:0' ) # MMDetection model = AutoDetectionModel.from_pretrained( model_type='mmdet', model_path='cascade_mask_rcnn.pth', config_path='configs/cascade_rcnn_r50_fpn.py', confidence_threshold=0.3, device='cuda:0' ) # HuggingFace Transformers model = AutoDetectionModel.from_pretrained( model_type='huggingface', model_path='facebook/detr-resnet-50', confidence_threshold=0.3, device='cuda:0' ) # TorchVision model = AutoDetectionModel.from_pretrained( model_type='torchvision', model_path='fasterrcnn_resnet50_fpn', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # RT-DETR model = AutoDetectionModel.from_pretrained( model_type='rtdetr', model_path='rtdetr-x.pt', confidence_threshold=0.3, device='cuda:0' ) # YOLOv5 model = AutoDetectionModel.from_pretrained( model_type='yolov5', model_path='yolov5s.pt', confidence_threshold=0.3, device='cuda:0' ) # YOLO-World (open vocabulary) model = AutoDetectionModel.from_pretrained( model_type='yolo-world', model_path='yolov8s-world.pt', confidence_threshold=0.3, device='cuda:0' ) # Roboflow (RF-DETR models) model = AutoDetectionModel.from_pretrained( model_type='roboflow', model_path='rf-detr-x.pt', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # YOLOE (open-vocabulary detection and segmentation) model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg.pt', confidence_threshold=0.3, device='cuda:0' ) # All models share the same prediction interface from sahi.predict import get_sliced_prediction result = get_sliced_prediction(image="test.jpg", detection_model=model) ``` ### Roboflow RF-DETR Models Use Roboflow RF-DETR models from the Roboflow Universe. ```python from sahi import AutoDetectionModel from sahi.predict import get_sliced_prediction # Load RF-DETR model from Roboflow Universe detection_model = AutoDetectionModel.from_pretrained( model_type='roboflow', model_path='rf-detr-x.pt', confidence_threshold=0.3, device='cuda:0', load_at_init=True ) # Perform sliced inference result = get_sliced_prediction( image="large_image.jpg", detection_model=detection_model, slice_height=640, slice_width=640, overlap_height_ratio=0.2, overlap_width_ratio=0.2 ) # Access predictions for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f}") ``` ### YOLOE Open-Vocabulary Detection YOLOE enables open-vocabulary detection with text prompts, visual prompts, or prompt-free mode. ```python from sahi import AutoDetectionModel from sahi.predict import get_prediction # Load YOLOE model detection_model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg.pt', # or yoloe-11s-seg.pt, yoloe-11m-seg.pt confidence_threshold=0.3, device='cuda:0' ) # Set custom text prompts for specific classes detection_model.model.set_classes( ["person", "car", "traffic light", "bicycle"], detection_model.model.get_text_pe(["person", "car", "traffic light", "bicycle"]) ) # Perform detection with custom classes result = get_prediction( image="street.jpg", detection_model=detection_model ) # Process results for pred in result.object_prediction_list: print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}") # For prompt-free mode, use models ending with '-pf.pt' # These use an internal vocabulary with 1200+ categories pf_model = AutoDetectionModel.from_pretrained( model_type='yoloe', model_path='yoloe-11l-seg-pf.pt', confidence_threshold=0.3, device='cuda:0' ) ``` ## Summary SAHI excels at detecting small objects in large images through its innovative sliced inference approach, making it ideal for satellite imagery, medical imaging, aerial photography, high-resolution document analysis, and surveillance applications. The library seamlessly handles the complexity of dividing images into overlapping tiles, running inference on each tile, and intelligently merging predictions using sophisticated postprocessing algorithms (GREEDYNMM, NMS, LSNMS) that eliminate duplicate detections while preserving accuracy. Integration is straightforward regardless of your workflow: use the Python API for programmatic control in notebooks and scripts, leverage the CLI for batch processing and production pipelines, or combine SAHI with visualization tools like FiftyOne for interactive analysis and debugging. The framework-agnostic design means you can use your existing trained models from Ultralytics (including YOLO12), MMDetection, HuggingFace, TorchVision, Roboflow (RF-DETR), YOLOE, and YOLO-World without modification. With support for open-vocabulary detection (YOLOE, YOLO-World), comprehensive COCO dataset utilities, automatic slicing parameter calculation, progress tracking, video support, and extensive export options (visualizations, pickles, crops, COCO JSON), SAHI provides a complete toolkit for production object detection systems handling large-scale visual data.