Deep SORT Realtime (levan92/deep_sort_realtime)

Deep SORT Realtime

https://github.com/levan92/deep_sort_realtime
Admin
A realtime adaptation of Deep SORT for object tracking, offering enhanced performance and...

Tokens:12,168
Snippets:74
Trust Score:8.7
Update:2 weeks ago
Show doc for...
Context Summary (auto-generated)
Raw
# deep-sort-realtime

`deep-sort-realtime` (v1.3.2) is a real-time adaptation of the Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) algorithm for multi-object tracking in video streams. It extends the [original Deep SORT repo](https://github.com/nwojke/deep_sort) by removing offline/academic-style processing in favour of a frame-by-frame update API suited to live detection pipelines. The library takes raw bounding-box detections from any object detector, associates them across frames using a combination of Kalman-filter motion prediction and deep appearance features, and returns a stable list of tracked objects with persistent IDs.

At its core the library couples a Kalman-filter-based `Tracker` with pluggable appearance embedders (PyTorch MobileNetV2, TorchReID, CLIP, or TensorFlow MobileNetV2) to compute cosine-distance appearance features for each detection crop. Detections are matched to existing tracks via a cascaded matching strategy that first uses appearance features and then falls back to IoU. Tracks pass through a tentative → confirmed → deleted lifecycle, only surfacing as confirmed after `n_init` consecutive hits. Extra features include polygon-shaped detections, per-detection supplementary data passthrough, background-masked embedding, daily track-ID resets, and a fully overridable `Track` subclass hook.

---

## Installation

```bash
# From PyPI (recommended)
pip install deep-sort-realtime

# From source
git clone https://github.com/levan92/deep_sort_realtime
cd deep_sort_realtime && pip install .

# Optional embedder backends
pip install torch torchvision                          # PyTorch MobileNetV2 (default)
pip install torchreid gdown tensorboard               # TorchReID person-ReID models
pip install git+https://github.com/openai/CLIP.git    # CLIP embedder
pip install tensorflow                                 # TensorFlow MobileNetV2
```

---

## `DeepSort.__init__` — Create a tracker instance

Instantiates the multi-target tracker with all hyperparameters and selects an appearance embedder. The embedder is loaded once at construction time and warmed up automatically.

```python
from datetime import datetime
from deep_sort_realtime.deepsort_tracker import DeepSort

# --- Minimal usage: default MobileNetV2 embedder ---
tracker = DeepSort(max_age=30)

# --- Full parameter control ---
tracker = DeepSort(
    max_iou_distance=0.7,       # IoU gating threshold; associations above this are ignored
    max_age=30,                 # frames a track survives without a detection match
    n_init=3,                   # consecutive hits needed to confirm a new track
    nms_max_overlap=1.0,        # NMS threshold (1.0 = NMS disabled)
    max_cosine_distance=0.2,    # cosine-distance threshold for appearance matching
    nn_budget=100,              # max stored appearance features per track (None = unlimited)
    gating_only_position=False, # True → gate only on (x,y); False → gate on (x,y,a,h)
    override_track_class=None,  # supply a Track subclass for custom per-track logic
    embedder="mobilenet",       # one of: mobilenet | torchreid | clip_RN50 | clip_RN101 |
                                #         clip_RN50x4 | clip_RN50x16 | clip_ViT-B/32 | clip_ViT-B/16
    half=True,                  # FP16 inference (CUDA only, mobilenet embedder)
    bgr=True,                   # True if frames are BGR (OpenCV default)
    embedder_gpu=True,          # run embedder on GPU
    embedder_model_name=None,   # torchreid: model name from model zoo
    embedder_wts=None,          # explicit path to embedder weights file
    polygon=False,              # True → detections are polygons, not axis-aligned BBs
    today=datetime.now().date() # supply date to enable daily track-ID resets
)
```

---

## `DeepSort.update_tracks` — Run tracker for one frame

The primary API call. Accepts raw detections and either a frame image (for built-in embedding) or pre-computed feature vectors, then returns the current list of all active `Track` objects.

```python
import cv2
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

tracker = DeepSort(max_age=5, embedder="mobilenet", embedder_gpu=False)

cap = cv2.VideoCapture("video.mp4")

while cap.isOpened():
    ret, frame = cap.read()  # BGR frame, shape (H, W, 3)
    if not ret:
        break

    # raw_detections: list of ([left, top, width, height], confidence, class_name)
    raw_detections = [
        ([120, 80,  60, 120], 0.92, "person"),
        ([300, 150, 80, 160], 0.85, "person"),
        ([500, 200, 40,  80], 0.70, "car"),
    ]

    tracks = tracker.update_tracks(raw_detections, frame=frame)

    for track in tracks:
        if not track.is_confirmed():
            continue  # skip tentative tracks

        track_id  = track.track_id          # e.g. "1", "2", or "2024-01-15_1" with today=
        ltrb      = track.to_ltrb()         # [left, top, right, bottom] — Kalman predicted
        ltwh      = track.to_ltwh()         # [left, top, width, height]
        det_class = track.get_det_class()   # "person" / "car" / None
        det_conf  = track.get_det_conf()    # float or None if no match this frame

        l, t, r, b = [int(v) for v in ltrb]
        cv2.rectangle(frame, (l, t), (r, b), (0, 255, 0), 2)
        cv2.putText(frame, f"ID:{track_id} {det_class}", (l, t - 5),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1)

cap.release()
```

---

## `DeepSort.update_tracks` with pre-computed embeddings — External embedder

When you have your own ReID model, pass feature vectors directly via `embeds` and omit `frame`.

```python
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

# embedder=None → library will not load any built-in embedder
tracker = DeepSort(max_age=10, embedder=None)

# Simulated inputs from your own detector + embedder
frame = np.zeros((720, 1280, 3), dtype=np.uint8)
raw_detections = [
    ([50,  30, 80, 160], 0.91, "person"),
    ([200, 60, 70, 150], 0.88, "person"),
]

# Your own embedder produces a 512-d (or any-d) feature per detection crop
your_embeddings = [np.random.rand(512).astype(np.float32) for _ in raw_detections]

tracks = tracker.update_tracks(raw_detections, embeds=your_embeddings)

for track in tracks:
    if track.is_confirmed():
        print(f"Track {track.track_id}: {track.to_ltrb()}")
# Example output:
# Track 1: [50.  30. 130. 190.]
# Track 2: [200.  60. 270. 210.]
```

---

## `DeepSort.update_tracks` with polygon detections

When `polygon=True` is set at construction, detections are passed as a triplet of (polygons, classes, confidences). The polygon's bounding rectangle is used for tracking; if embedding is enabled the polygon area masks the crop so only foreground pixels feed the embedder.

```python
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

tracker = DeepSort(max_age=10, polygon=True, embedder="mobilenet", embedder_gpu=False)

frame = np.zeros((720, 1280, 3), dtype=np.uint8)

# Polygon format: list of [x1,y1,x2,y2,...] flat arrays
polygons    = [[100, 100, 150, 80, 200, 120, 160, 180, 110, 170],   # pentagon
               [300, 200, 360, 195, 370, 260, 295, 265]]            # quadrilateral
classes     = ["person", "car"]
confidences = [0.89, 0.76]

# raw_detections for polygon mode: [polygons_list, classes_list, confidences_list]
raw_detections = [polygons, classes, confidences]

tracks = tracker.update_tracks(raw_detections, frame=frame)

for track in tracks:
    if track.is_confirmed():
        polygon_coords = track.get_det_supplementary()  # original polygon stored here
        print(f"Track {track.track_id} | bbox {track.to_ltrb()} | polygon {polygon_coords}")
```

---

## `DeepSort.update_tracks` with supplementary detection data

Arbitrary per-detection payloads (e.g. segmentation masks, metadata dicts) are forwarded to the associated track and retrievable via `Track.get_det_supplementary()`.

```python
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

tracker = DeepSort(max_age=10, embedder="mobilenet", embedder_gpu=False)
frame = np.zeros((720, 1280, 3), dtype=np.uint8)

raw_detections = [
    ([50,  30, 80, 160], 0.91, "person"),
    ([200, 60, 70, 150], 0.78, "person"),
]

# Supplementary info — one entry per detection (any Python object)
others = [
    {"reid_score": 0.95, "zone": "entry"},
    {"reid_score": 0.80, "zone": "exit"},
]

tracks = tracker.update_tracks(raw_detections, frame=frame, others=others)

for track in tracks:
    supp = track.get_det_supplementary()  # None if no match this frame
    if supp is not None:
        print(f"Track {track.track_id} zone={supp['zone']}, reid={supp['reid_score']}")
```

---

## `DeepSort.update_tracks` with instance masks (background masking)

Boolean instance masks suppress background pixels before the crop reaches the embedder, reducing background bias. One mask per detection, same spatial size as the full frame.

```python
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort

tracker = DeepSort(max_age=10, embedder="mobilenet", embedder_gpu=False)
H, W = 720, 1280
frame = np.random.randint(0, 255, (H, W, 3), dtype=np.uint8)

raw_detections = [([100, 50, 120, 200], 0.90, "person")]

# Boolean mask — True where foreground (object pixels), False = background
mask = np.zeros((H, W), dtype=bool)
mask[50:250, 100:220] = True   # roughly cover the bounding box

tracks = tracker.update_tracks(
    raw_detections,
    frame=frame,
    instance_masks=[mask],
)

for track in tracks:
    inst_mask = track.get_instance_mask()  # stored mask (None if no match this frame)
    print(f"Track {track.track_id} | has_mask={inst_mask is not None}")
```

---

## `Track.to_ltrb` / `Track.to_ltwh` — Bounding box retrieval

Returns Kalman-predicted coordinates by default. Setting `orig=True` returns the coordinates of the raw detection associated this frame; `orig_strict=True` returns `None` instead of Kalman values when no detection is associated.

```python
for track in tracks:
    if not track.is_confirmed():
        continue

    # Kalman-filter predicted position (always available)
    kf_ltrb = track.to_ltrb()                          # [l, t, r, b]
    kf_ltwh = track.to_ltwh()                          # [l, t, w, h]

    # Original detection bbox (only non-None when matched this frame)
    orig_ltrb = track.to_ltrb(orig=True)               # falls back to KF if unmatched
    orig_ltwh = track.to_ltwh(orig=True)

    # Strict: returns None when track has no detection match this frame
    strict_ltrb = track.to_ltrb(orig=True, orig_strict=True)
    if strict_ltrb is not None:
        l, t, r, b = strict_ltrb
        print(f"Track {track.track_id} detection-only bbox: ({l},{t}) → ({r},{b})")
    else:
        print(f"Track {track.track_id} coasting (no detection this frame)")
```

---

## `Track` state and metadata accessors

Full set of read-only attributes and methods exposed by every `Track` object returned from `update_tracks`.

```python
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np

tracker = DeepSort(max_age=5, embedder="mobilenet", embedder_gpu=False)
frame   = np.zeros((480, 640, 3), dtype=np.uint8)
dets    = [([10, 10, 50, 80], 0.95, "person"), ([200, 100, 60, 90], 0.80, "bicycle")]
tracks  = tracker.update_tracks(dets, frame=frame)

for t in tracks:
    # Lifecycle state
    print(t.is_tentative())   # True for new tracks not yet confirmed
    print(t.is_confirmed())   # True after n_init consecutive hits
    print(t.is_deleted())     # True when track is stale (never returned after deletion)

    # Identity & timing
    print(t.track_id)             # unique str ID, e.g. "1" or "2024-06-01_3"
    print(t.hits)                 # total measurement updates
    print(t.age)                  # total frames since first occurrence
    print(t.time_since_update)    # frames since last detection match (0 = matched this frame)

    # Detection-associated data (reset to None each predict step, repopulated on match)
    print(t.get_det_class())         # class name string or None
    print(t.get_det_conf())          # float confidence or None
    print(t.get_instance_mask())     # boolean mask array or None
    print(t.get_det_supplementary()) # custom payload or None
    print(t.get_feature())           # latest appearance feature vector (np.ndarray)
```

---

## Custom `Track` subclass — `override_track_class`

Inject application-specific logic (e.g. activity counters, zone triggers) by subclassing `Track` and passing it at construction.

```python
from deep_sort_realtime.deep_sort.track import Track
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np

class MyTrack(Track):
    """Extended track that counts how many frames it spent in a region of interest."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.roi_frame_count = 0

    def check_roi(self, roi_ltrb):
        """Call each frame after update_tracks to accumulate ROI dwell time."""
        l, t, r, b = self.to_ltrb()
        rl, rt, rr, rb = roi_ltrb
        in_roi = l >= rl and t >= rt and r <= rr and b <= rb
        if in_roi:
            self.roi_frame_count += 1
        return in_roi

tracker = DeepSort(max_age=30, override_track_class=MyTrack, embedder_gpu=False)

frame  = np.zeros((720, 1280, 3), dtype=np.uint8)
roi    = (100, 100, 500, 400)  # [left, top, right, bottom]

for _ in range(10):  # simulate 10 frames
    dets = [([150, 120, 80, 160], 0.92, "person")]
    tracks = tracker.update_tracks(dets, frame=frame)
    for track in tracks:
        if track.is_confirmed():
            in_roi = track.check_roi(roi)
            print(f"Track {track.track_id}: ROI dwell={track.roi_frame_count} frames, in_roi={in_roi}")
```

---

## Daily track-ID reset with `today`

Providing a date object enables date-prefixed track IDs (e.g. `"2024-06-01_1"`) and resets the counter each calendar day, preventing ID overflow in long-running deployments.

```python
from datetime import datetime
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np

today = datetime.now().date()
tracker = DeepSort(max_age=30, nn_budget=100, today=today, embedder_gpu=False)

frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
dets  = [([0, 0, 50, 50], 0.9, "person"), ([100, 100, 50, 50], 0.85, "person")]

tracks = tracker.update_tracks(dets, frame=frame, today=datetime.now().date())
for track in tracks:
    print(track.track_id)
# Example output:
# 2024-06-01_1
# 2024-06-01_2
```

---

## `DeepSort.delete_all_tracks` — Reset tracker state

Wipes all active tracks and resets the internal ID counter to 1. Useful when switching scenes or video streams.

```python
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np

tracker = DeepSort(max_age=10, embedder_gpu=False)
frame   = np.zeros((480, 640, 3), dtype=np.uint8)

tracker.update_tracks([([10, 10, 50, 80], 0.9, "person")], frame=frame)
print(f"Active tracks before reset: {len(tracker.tracker.tracks)}")  # 1

tracker.delete_all_tracks()
print(f"Active tracks after reset:  {len(tracker.tracker.tracks)}")  # 0

# IDs restart from 1 on the next update
tracks = tracker.update_tracks([([10, 10, 50, 80], 0.9, "person")], frame=frame)
print(tracks[0].track_id)  # "1"
```

---

## `MobileNetv2_Embedder.predict` — Standalone PyTorch embedder

The default appearance embedder can be used independently of the tracker to produce 1280-dimensional feature vectors from arbitrary image crops.

```python
import cv2
import numpy as np
from deep_sort_realtime.embedder.embedder_pytorch import MobileNetv2_Embedder

embedder = MobileNetv2_Embedder(half=False, max_batch_size=16, bgr=True, gpu=False)

img1 = cv2.imread("test/smallapple.jpg")   # BGR image
img2 = cv2.imread("test/rock.jpg")

features = embedder.predict([img1, img2])  # list of np.ndarray, shape (1280,)
print(f"Feature dim: {features[0].shape}")  # (1280,)

# Cosine distance between two feature vectors
from scipy.spatial.distance import cosine
dist = cosine(features[0], features[1])
print(f"Cosine distance (apple vs rock): {dist:.4f}")  # e.g. 0.4410 — dissimilar
```

---

## `TorchReID_Embedder` — Person re-identification embedder

Uses [Torchreid](https://github.com/KaiyangZhou/deep-person-reid)'s model zoo for person-ReID feature extraction. Default model is `osnet_ain_x1_0` with domain-generalised weights bundled in the package.

```python
# pip install torchreid gdown tensorboard
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np, cv2

tracker = DeepSort(
    max_age=30,
    embedder="torchreid",
    embedder_model_name="osnet_ain_x1_0",   # default; see torchreid model zoo for others
    embedder_wts=None,                       # None → bundled osnet weights
    embedder_gpu=False,
    bgr=True,
)

frame = cv2.imread("test/smallapple.jpg")
dets  = [([0, 0, frame.shape[1]//2, frame.shape[0]], 0.9, "person")]

tracks = tracker.update_tracks(dets, frame=frame)
for track in tracks:
    print(f"Track {track.track_id}: feature_dim={track.get_feature().shape}")
# Track 1: feature_dim=(512,)
```

---

## `Clip_Embedder` — CLIP-based appearance embedder

Uses [OpenAI CLIP](https://github.com/openai/CLIP) image encoder as the appearance feature extractor, producing 1024-d vectors. Effective for general object categories beyond persons.

```python
# pip install git+https://github.com/openai/CLIP.git
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np

tracker = DeepSort(
    max_age=30,
    embedder="clip_ViT-B/32",   # options: clip_RN50 | clip_RN101 | clip_RN50x4 |
                                 #          clip_RN50x16 | clip_ViT-B/32 | clip_ViT-B/16
    embedder_wts=None,           # None → auto-download or look in embedder/weights/
    embedder_gpu=False,
    bgr=True,
)

frame = np.zeros((720, 1280, 3), dtype=np.uint8)
dets  = [([50, 50, 100, 150], 0.88, "cat"), ([300, 100, 80, 120], 0.79, "dog")]

tracks = tracker.update_tracks(dets, frame=frame)
for track in tracks:
    if track.is_confirmed():
        print(f"Track {track.track_id} | class={track.get_det_class()} | feat_dim={track.get_feature().shape}")
```

---

## Running the test suite

```bash
# From the repository root
python3 -m unittest
```

---

## Summary

`deep-sort-realtime` is the go-to drop-in multi-object tracker for real-time Python pipelines. The single-call API (`update_tracks`) integrates with any detector that emits bounding boxes — YOLO, Detectron2, MMDetection, etc. — requiring only a list of `([l, t, w, h], confidence, class)` tuples and either a raw frame or pre-computed embeddings. The returned `Track` objects carry stable IDs across frames, Kalman-predicted and detection-exact bounding boxes in multiple formats, per-class labels, detection confidence, and arbitrary supplementary payloads, making it straightforward to build downstream logic such as trajectory analysis, counting, zone alerts, and activity recognition on top of the tracker output.

Integration patterns range from the simplest one-liner `DeepSort(max_age=5)` with the bundled MobileNetV2 embedder, to fully custom setups that plug in a domain-specific ReID model via `embedder=None` + manual `embeds=` passing, override the `Track` class with a `MyTrack` subclass for per-track state, enable daily ID resets via `today=`, and leverage background masking via instance masks for higher-quality appearance features. The library is installable from PyPI (`pip install deep-sort-realtime`) with optional heavy dependencies gated behind the chosen embedder backend, making it lightweight to integrate and easy to scale to production video-analytics systems.