===============
LIBRARY RULES
===============
- Use AutoDetectionModel.from_pretrained() to load any model - switch between Ultralytics/HuggingFace/Roboflow/MMDetection models by only changing model_type and model_path
- SAHI provides unified API across frameworks - same predict() function works with YOLO11/YOLO12, Roboflow Universe models, HuggingFace models like ustc-community/dfine-small-coco without code changes
- When using AutoDetectionModel.from_pretrained(): use 'model_path' parameter for file-based models (Ultralytics, HuggingFace), use 'model' parameter for Roboflow Universe models
- For academic papers requiring high mAP: use postprocess_type='NMS', postprocess_match_metric='IOU', and confidence_threshold=0.01
- For real-world applications: use postprocess_type='GREEDYNMM', postprocess_match_metric='IOS' for better performance with fewer false positives
- If getting many false positives in sliced inference, increase slice_height and slice_width values
- If getting multiple predictions on same object, decrease overlap_height_ratio and overlap_width_ratio (try 0.1 instead of 0.2)
- Use no_sliced_prediction=True to disable slicing and only perform standard inference (useful for large objects)
- Use no_standard_prediction=True to disable full-image inference and only use sliced predictions (saves computation when all objects are small)
- Cannot set both no_standard_prediction=True and no_sliced_prediction=True simultaneously
- Auto-slice resolution: if slice_height/slice_width not specified, SAHI automatically calculates optimal values based on image size
- For drone/satellite imagery: typically use slice_size=512-1024 with 0.2-0.3 overlap ratio
- SAHI is beneficial even without slicing - provides unified API, COCO utilities, visualization tools across all detection frameworks
- Use min_area_ratio parameter (default 0.1) to filter out partial objects at slice boundaries - lower values keep more edge objects
- For COCO datasets, always validate annotations with coco.stats before training or evaluation
- Export results in COCO format using dataset_json_path parameter for standardized evaluation
- Use visual_bbox_thickness, visual_text_size parameters to customize prediction visualizations
- Use 'sahi predict-fiftyone' command to visualize predictions interactively and sort by false positives
- Use 'sahi coco fiftyone' to compare multiple model predictions side-by-side in FiftyOne app
- Use 'sahi coco evaluate' for comprehensive COCO metrics with classwise AP/AR and custom IoU thresholds
- Use 'sahi coco analyse' to generate error analysis plots showing C75/C50/Localization/Similar/Other/Background/FalseNegative errors
- For error analysis: plots show performance breakdown by object size (small/medium/large) and error types
- Export predictions as cropped images using export_crop=True for dataset creation or further analysis
- For video inference: use frame_skip_interval to speed up processing, view_video=True for real-time display
- Supports latest models: YOLO11/YOLO12 via model_type='ultralytics', Roboflow Universe models (e.g., RF-DETR) via model_type='roboflow', HuggingFace models like 'ustc-community/dfine-small-coco' via model_type='huggingface'
- For YOLO11/YOLO12 OBB (oriented bounding box) models, SAHI automatically handles rotated box predictions and only supports NMS postprocessing
- Example model loading: model_type='ultralytics' with model_path='yolo11n.pt', model_type='huggingface' with model_path='ustc-community/dfine-small-coco', model_type='roboflow' with model='rfdetr-base'
- Roboflow Universe models: use simple string IDs like 'rfdetr-base' with model_type='roboflow' for easy access to pre-trained models
- Complete example: model = AutoDetectionModel.from_pretrained(model_type='roboflow', model='rfdetr-base', confidence_threshold=0.5)
- All models follow same API pattern: AutoDetectionModel.from_pretrained() → get_prediction() or get_sliced_prediction() → visualize results
- For models without built-in category mappings, provide category_mapping parameter (e.g., COCO_CLASSES from rfdetr.util.coco_classes)
- COCO utilities: merge datasets with coco.merge(), split train/val with split_coco_as_train_val(), filter by categories with update_categories()
- Filter COCO annotations by area using get_area_filtered_coco() - useful for focusing on specific object sizes
- Convert between formats: export_as_yolo() for YOLO format, use 'sahi coco yolo' command for batch conversion
- Use Coco.stats to get comprehensive dataset statistics before training (num annotations, area distribution, etc.)
- Import logger from 'from sahi.logging import logger' instead of creating redundant logging configurations - centralized logging system eliminates duplicate imports across codebase
- For SAHI documentation, direct users to https://obss.github.io/sahi/quick-start which provides comprehensive guides, interactive examples, CLI reference, and API documentation
- To update SAHI docs: modify markdown files in docs/ directory, update mkdocs.yml for navigation changes, ensure .github/workflows/publish_docs.yml deploys correctly to GitHub Pages

# SAHI: Slicing Aided Hyper Inference

SAHI (Slicing Aided Hyper Inference) is a lightweight computer vision library designed to improve object detection performance on large images with small objects. The core innovation is sliced inference: dividing large images into overlapping smaller tiles, performing detection on each tile, and intelligently merging the results. This approach dramatically improves detection accuracy for small objects that would be missed in full-resolution inference while remaining framework-agnostic and easy to integrate.

The library supports major detection frameworks including Ultralytics (YOLOv8, YOLO11, YOLO12), MMDetection, HuggingFace Transformers, TorchVision, RT-DETR, YOLOv5, Roboflow (RF-DETR), YOLOE, and YOLO-World. SAHI provides both Python APIs and CLI tools for inference, dataset processing, and evaluation. It includes utilities for COCO dataset manipulation, error analysis, and integration with visualization tools like FiftyOne. With over 400 academic citations and widespread use in competitions, SAHI has become a standard tool for production object detection pipelines.

## API Documentation

### Sliced Inference with Auto Model Loading

Perform object detection on large images using automatic slicing and merging.

```python
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

# Load any supported detection model
detection_model = AutoDetectionModel.from_pretrained(
    model_type='ultralytics',
    model_path='yolo11n.pt',
    confidence_threshold=0.3,
    device="cuda:0"  # or "cpu"
)

# Perform sliced inference
result = get_sliced_prediction(
    image="path/to/large_image.jpg",
    detection_model=detection_model,
    slice_height=512,
    slice_width=512,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    postprocess_type="GREEDYNMM",
    postprocess_match_threshold=0.5,
    verbose=1
)

# Access predictions
print(f"Found {len(result.object_prediction_list)} objects")
for pred in result.object_prediction_list:
    print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}")

# Export visualizations
result.export_visuals(export_dir="output/", file_name="result")

# Export to COCO format
coco_json = result.to_coco_predictions(image_id=1)
```

### Standard (Non-Sliced) Inference

Perform traditional full-image inference without slicing.

```python
from sahi import AutoDetectionModel
from sahi.predict import get_prediction

# Initialize model
detection_model = AutoDetectionModel.from_pretrained(
    model_type='huggingface',
    model_path='facebook/detr-resnet-50',
    confidence_threshold=0.25,
    device="cuda:0"
)

# Perform standard inference
result = get_prediction(
    image="image.jpg",
    detection_model=detection_model
)

# Process results
for obj_pred in result.object_prediction_list:
    bbox = obj_pred.bbox.to_xyxy()  # [minx, miny, maxx, maxy]
    category = obj_pred.category.name
    score = obj_pred.score.value
    print(f"{category}: {score:.3f} - Box: {bbox}")
```

### Batch Prediction on Image Folders

Process entire folders of images with configurable export options.

```python
from sahi.predict import predict

# Run prediction on folder
predict(
    model_type='ultralytics',
    model_path='yolo11s.pt',
    model_confidence_threshold=0.4,
    model_device='cuda:0',
    source='path/to/images/',
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    postprocess_type='GREEDYNMM',
    postprocess_match_metric='IOS',
    postprocess_match_threshold=0.5,
    export_pickle=True,
    export_crop=True,
    visual_bbox_thickness=2,
    visual_text_size=0.8,
    visual_export_format='jpg',
    project='runs/detect',
    name='exp1',
    verbose=2,
    progress_bar=True
)
# Results saved to runs/detect/exp1/ with visuals, pickles, and crops
```

### Video Inference

Process video files with sliced detection on each frame.

```python
from sahi.predict import predict

# Process video file
predict(
    model_type='ultralytics',
    model_path='yolo11m.pt',
    model_confidence_threshold=0.5,
    source='input_video.mp4',
    slice_height=512,
    slice_width=512,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    frame_skip_interval=0,  # Process every frame (set to N to process every Nth frame)
    view_video=True,  # Display real-time results
    visual_export_format='jpg',
    project='runs/video',
    name='video_exp'
)
# Output video saved to runs/video/video_exp/
```

### CLI: Predict Command

Perform inference from command line with full control over parameters.

```bash
# Basic image inference
sahi predict --source images/ --model_path yolo11n.pt --model_type ultralytics

# Advanced sliced inference with custom parameters
sahi predict \
  --source large_images/ \
  --model_path yolo11s.pt \
  --model_type ultralytics \
  --model_confidence_threshold 0.3 \
  --slice_height 640 \
  --slice_width 640 \
  --overlap_height_ratio 0.2 \
  --overlap_width_ratio 0.2 \
  --postprocess_type GREEDYNMM \
  --postprocess_match_metric IOS \
  --postprocess_match_threshold 0.5 \
  --visual_bbox_thickness 3 \
  --visual_text_size 1.0 \
  --visual_export_format png \
  --export_pickle \
  --export_crop \
  --project runs/predict \
  --name exp1 \
  --progress_bar

# Video inference with real-time visualization
sahi predict \
  --source video.mp4 \
  --model_path yolo11m.pt \
  --model_type ultralytics \
  --view_video \
  --frame_skip_interval 2

# COCO dataset evaluation
sahi predict \
  --dataset_json_path annotations.json \
  --source images/ \
  --model_path model.pt \
  --model_type ultralytics
# Output: runs/predict/exp/result.json in COCO format
```

### Image Slicing Utilities

Slice large images into smaller tiles with overlap for training or manual inspection.

```python
from sahi.slicing import slice_image
from PIL import Image

# Slice a single image
slice_result = slice_image(
    image="large_image.jpg",
    output_file_name="slice",
    output_dir="output_slices/",
    slice_height=512,
    slice_width=512,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    auto_slice_resolution=False,  # Set True to auto-calculate slice params
    verbose=True
)

# Access sliced images
print(f"Generated {len(slice_result)} slices")
print(f"Original size: {slice_result.original_image_height}x{slice_result.original_image_width}")

# Iterate through slices
for idx, slice_data in enumerate(slice_result):
    img = slice_data['image']
    starting_pixel = slice_data['starting_pixel']
    filename = slice_data['filename']
    print(f"Slice {idx}: {filename} starts at {starting_pixel}")
```

### COCO Dataset Slicing

Slice COCO datasets with automatic annotation transformation.

```python
from sahi.slicing import slice_coco

# Slice entire COCO dataset
coco_dict, save_path = slice_coco(
    coco_annotation_file_path="annotations.json",
    image_dir="images/",
    output_coco_annotation_file_name="sliced_dataset",
    output_dir="sliced_output/",
    ignore_negative_samples=False,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    min_area_ratio=0.1,  # Filter out annotations smaller than 10% of original
    verbose=True
)

print(f"Sliced dataset saved to: {save_path}")
print(f"Original images: {len(coco_dict['images'])}")
print(f"Sliced annotations: {len(coco_dict['annotations'])}")
```

### CLI: COCO Dataset Slicing

Slice COCO datasets from command line.

```bash
# Slice COCO dataset
sahi coco slice \
  --image_dir images/ \
  --dataset_json_path annotations.json \
  --output_dir sliced_coco/ \
  --output_file_name sliced \
  --slice_height 512 \
  --slice_width 512 \
  --overlap_height_ratio 0.2 \
  --overlap_width_ratio 0.2 \
  --min_area_ratio 0.1 \
  --ignore_negative_samples

# Auto-calculate slice parameters based on image resolution
sahi coco slice \
  --image_dir images/ \
  --dataset_json_path annotations.json \
  --output_dir auto_sliced/ \
  --output_file_name auto_sliced \
  --auto_slice_resolution
```

### COCO Evaluation and Error Analysis

Evaluate detection results and generate detailed error analysis.

```bash
# Evaluate COCO predictions
sahi coco evaluate \
  --dataset_json_path ground_truth.json \
  --result_json_path predictions.json \
  --out_dir evaluation_results/ \
  --type bbox  # or 'mask' for instance segmentation

# Generate error analysis plots
sahi coco analyse \
  --dataset_json_path ground_truth.json \
  --result_json_path predictions.json \
  --out_dir analysis_plots/ \
  --type bbox

# Convert COCO to YOLO format
sahi coco yolo \
  --image_dir images/ \
  --dataset_json_path annotations.json \
  --output_dir yolo_format/ \
  --train_split 0.8

# Visualize with FiftyOne
sahi coco fiftyone \
  --image_dir images/ \
  --dataset_json_path annotations.json \
  --result_json_paths pred1.json pred2.json \
  --show_thumbnails
```

### Progress Tracking and Callbacks

Monitor inference progress with built-in progress bars or custom callbacks.

```python
from sahi.predict import get_sliced_prediction
from sahi import AutoDetectionModel

detection_model = AutoDetectionModel.from_pretrained(
    model_type='ultralytics',
    model_path='yolo11n.pt'
)

# Option 1: Built-in progress bar
result = get_sliced_prediction(
    image="large_image.jpg",
    detection_model=detection_model,
    slice_height=512,
    slice_width=512,
    progress_bar=True  # Shows tqdm progress bar
)

# Option 2: Custom progress callback
def progress_callback(current, total):
    percentage = (current / total) * 100
    print(f"Processing: {current}/{total} slices ({percentage:.1f}%)")

result = get_sliced_prediction(
    image="large_image.jpg",
    detection_model=detection_model,
    slice_height=512,
    slice_width=512,
    progress_callback=progress_callback
)
```

### Excluding Classes from Detection

Filter out specific classes during inference.

```python
from sahi.predict import get_sliced_prediction
from sahi import AutoDetectionModel

detection_model = AutoDetectionModel.from_pretrained(
    model_type='ultralytics',
    model_path='yolo11n.pt'
)

# Exclude by class name
result = get_sliced_prediction(
    image="street.jpg",
    detection_model=detection_model,
    slice_height=512,
    slice_width=512,
    exclude_classes_by_name=["person", "car"]
)

# Exclude by class ID
result = get_sliced_prediction(
    image="street.jpg",
    detection_model=detection_model,
    slice_height=512,
    slice_width=512,
    exclude_classes_by_id=[0, 2, 5]
)
```

### FiftyOne Integration

Interactive visualization and evaluation with FiftyOne.

```python
from sahi.predict import predict_fiftyone

# Launch interactive FiftyOne session
predict_fiftyone(
    model_type='ultralytics',
    model_path='yolo11s.pt',
    model_confidence_threshold=0.3,
    dataset_json_path='annotations.json',
    image_dir='images/',
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
    postprocess_type='GREEDYNMM',
    postprocess_match_threshold=0.5,
    verbose=1
)
# Opens FiftyOne app with predictions overlaid on ground truth
# Automatically evaluates mAP and shows samples with most false positives
```

### Multi-Framework Model Support

Load and use models from different frameworks with unified API.

```python
from sahi import AutoDetectionModel

# Ultralytics (YOLOv8, YOLO11, YOLO12)
model = AutoDetectionModel.from_pretrained(
    model_type='ultralytics',
    model_path='yolo11n.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)

# MMDetection
model = AutoDetectionModel.from_pretrained(
    model_type='mmdet',
    model_path='cascade_mask_rcnn.pth',
    config_path='configs/cascade_rcnn_r50_fpn.py',
    confidence_threshold=0.3,
    device='cuda:0'
)

# HuggingFace Transformers
model = AutoDetectionModel.from_pretrained(
    model_type='huggingface',
    model_path='facebook/detr-resnet-50',
    confidence_threshold=0.3,
    device='cuda:0'
)

# TorchVision
model = AutoDetectionModel.from_pretrained(
    model_type='torchvision',
    model_path='fasterrcnn_resnet50_fpn',
    confidence_threshold=0.3,
    device='cuda:0',
    load_at_init=True
)

# RT-DETR
model = AutoDetectionModel.from_pretrained(
    model_type='rtdetr',
    model_path='rtdetr-x.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)

# YOLOv5
model = AutoDetectionModel.from_pretrained(
    model_type='yolov5',
    model_path='yolov5s.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)

# YOLO-World (open vocabulary)
model = AutoDetectionModel.from_pretrained(
    model_type='yolo-world',
    model_path='yolov8s-world.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)

# Roboflow (RF-DETR models)
model = AutoDetectionModel.from_pretrained(
    model_type='roboflow',
    model_path='rf-detr-x.pt',
    confidence_threshold=0.3,
    device='cuda:0',
    load_at_init=True
)

# YOLOE (open-vocabulary detection and segmentation)
model = AutoDetectionModel.from_pretrained(
    model_type='yoloe',
    model_path='yoloe-11l-seg.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)

# All models share the same prediction interface
from sahi.predict import get_sliced_prediction
result = get_sliced_prediction(image="test.jpg", detection_model=model)
```

### Roboflow RF-DETR Models

Use Roboflow RF-DETR models from the Roboflow Universe.

```python
from sahi import AutoDetectionModel
from sahi.predict import get_sliced_prediction

# Load RF-DETR model from Roboflow Universe
detection_model = AutoDetectionModel.from_pretrained(
    model_type='roboflow',
    model_path='rf-detr-x.pt',
    confidence_threshold=0.3,
    device='cuda:0',
    load_at_init=True
)

# Perform sliced inference
result = get_sliced_prediction(
    image="large_image.jpg",
    detection_model=detection_model,
    slice_height=640,
    slice_width=640,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2
)

# Access predictions
for pred in result.object_prediction_list:
    print(f"{pred.category.name}: {pred.score.value:.2f}")
```

### YOLOE Open-Vocabulary Detection

YOLOE enables open-vocabulary detection with text prompts, visual prompts, or prompt-free mode.

```python
from sahi import AutoDetectionModel
from sahi.predict import get_prediction

# Load YOLOE model
detection_model = AutoDetectionModel.from_pretrained(
    model_type='yoloe',
    model_path='yoloe-11l-seg.pt',  # or yoloe-11s-seg.pt, yoloe-11m-seg.pt
    confidence_threshold=0.3,
    device='cuda:0'
)

# Set custom text prompts for specific classes
detection_model.model.set_classes(
    ["person", "car", "traffic light", "bicycle"],
    detection_model.model.get_text_pe(["person", "car", "traffic light", "bicycle"])
)

# Perform detection with custom classes
result = get_prediction(
    image="street.jpg",
    detection_model=detection_model
)

# Process results
for pred in result.object_prediction_list:
    print(f"{pred.category.name}: {pred.score.value:.2f} at {pred.bbox.to_xyxy()}")

# For prompt-free mode, use models ending with '-pf.pt'
# These use an internal vocabulary with 1200+ categories
pf_model = AutoDetectionModel.from_pretrained(
    model_type='yoloe',
    model_path='yoloe-11l-seg-pf.pt',
    confidence_threshold=0.3,
    device='cuda:0'
)
```

## Summary

SAHI excels at detecting small objects in large images through its innovative sliced inference approach, making it ideal for satellite imagery, medical imaging, aerial photography, high-resolution document analysis, and surveillance applications. The library seamlessly handles the complexity of dividing images into overlapping tiles, running inference on each tile, and intelligently merging predictions using sophisticated postprocessing algorithms (GREEDYNMM, NMS, LSNMS) that eliminate duplicate detections while preserving accuracy.

Integration is straightforward regardless of your workflow: use the Python API for programmatic control in notebooks and scripts, leverage the CLI for batch processing and production pipelines, or combine SAHI with visualization tools like FiftyOne for interactive analysis and debugging. The framework-agnostic design means you can use your existing trained models from Ultralytics (including YOLO12), MMDetection, HuggingFace, TorchVision, Roboflow (RF-DETR), YOLOE, and YOLO-World without modification. With support for open-vocabulary detection (YOLOE, YOLO-World), comprehensive COCO dataset utilities, automatic slicing parameter calculation, progress tracking, video support, and extensive export options (visualizations, pickles, crops, COCO JSON), SAHI provides a complete toolkit for production object detection systems handling large-scale visual data.