Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
Deep SORT Realtime
https://github.com/levan92/deep_sort_realtime
Admin
A realtime adaptation of Deep SORT for object tracking, offering enhanced performance and
...
Tokens:
12,168
Snippets:
74
Trust Score:
8.7
Update:
2 weeks ago
Context
Skills
Chat
Benchmark
94.2
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# deep-sort-realtime `deep-sort-realtime` (v1.3.2) is a real-time adaptation of the Deep SORT (Simple Online and Realtime Tracking with a Deep Association Metric) algorithm for multi-object tracking in video streams. It extends the [original Deep SORT repo](https://github.com/nwojke/deep_sort) by removing offline/academic-style processing in favour of a frame-by-frame update API suited to live detection pipelines. The library takes raw bounding-box detections from any object detector, associates them across frames using a combination of Kalman-filter motion prediction and deep appearance features, and returns a stable list of tracked objects with persistent IDs. At its core the library couples a Kalman-filter-based `Tracker` with pluggable appearance embedders (PyTorch MobileNetV2, TorchReID, CLIP, or TensorFlow MobileNetV2) to compute cosine-distance appearance features for each detection crop. Detections are matched to existing tracks via a cascaded matching strategy that first uses appearance features and then falls back to IoU. Tracks pass through a tentative → confirmed → deleted lifecycle, only surfacing as confirmed after `n_init` consecutive hits. Extra features include polygon-shaped detections, per-detection supplementary data passthrough, background-masked embedding, daily track-ID resets, and a fully overridable `Track` subclass hook. --- ## Installation ```bash # From PyPI (recommended) pip install deep-sort-realtime # From source git clone https://github.com/levan92/deep_sort_realtime cd deep_sort_realtime && pip install . # Optional embedder backends pip install torch torchvision # PyTorch MobileNetV2 (default) pip install torchreid gdown tensorboard # TorchReID person-ReID models pip install git+https://github.com/openai/CLIP.git # CLIP embedder pip install tensorflow # TensorFlow MobileNetV2 ``` --- ## `DeepSort.__init__` — Create a tracker instance Instantiates the multi-target tracker with all hyperparameters and selects an appearance embedder. The embedder is loaded once at construction time and warmed up automatically. ```python from datetime import datetime from deep_sort_realtime.deepsort_tracker import DeepSort # --- Minimal usage: default MobileNetV2 embedder --- tracker = DeepSort(max_age=30) # --- Full parameter control --- tracker = DeepSort( max_iou_distance=0.7, # IoU gating threshold; associations above this are ignored max_age=30, # frames a track survives without a detection match n_init=3, # consecutive hits needed to confirm a new track nms_max_overlap=1.0, # NMS threshold (1.0 = NMS disabled) max_cosine_distance=0.2, # cosine-distance threshold for appearance matching nn_budget=100, # max stored appearance features per track (None = unlimited) gating_only_position=False, # True → gate only on (x,y); False → gate on (x,y,a,h) override_track_class=None, # supply a Track subclass for custom per-track logic embedder="mobilenet", # one of: mobilenet | torchreid | clip_RN50 | clip_RN101 | # clip_RN50x4 | clip_RN50x16 | clip_ViT-B/32 | clip_ViT-B/16 half=True, # FP16 inference (CUDA only, mobilenet embedder) bgr=True, # True if frames are BGR (OpenCV default) embedder_gpu=True, # run embedder on GPU embedder_model_name=None, # torchreid: model name from model zoo embedder_wts=None, # explicit path to embedder weights file polygon=False, # True → detections are polygons, not axis-aligned BBs today=datetime.now().date() # supply date to enable daily track-ID resets ) ``` --- ## `DeepSort.update_tracks` — Run tracker for one frame The primary API call. Accepts raw detections and either a frame image (for built-in embedding) or pre-computed feature vectors, then returns the current list of all active `Track` objects. ```python import cv2 import numpy as np from deep_sort_realtime.deepsort_tracker import DeepSort tracker = DeepSort(max_age=5, embedder="mobilenet", embedder_gpu=False) cap = cv2.VideoCapture("video.mp4") while cap.isOpened(): ret, frame = cap.read() # BGR frame, shape (H, W, 3) if not ret: break # raw_detections: list of ([left, top, width, height], confidence, class_name) raw_detections = [ ([120, 80, 60, 120], 0.92, "person"), ([300, 150, 80, 160], 0.85, "person"), ([500, 200, 40, 80], 0.70, "car"), ] tracks = tracker.update_tracks(raw_detections, frame=frame) for track in tracks: if not track.is_confirmed(): continue # skip tentative tracks track_id = track.track_id # e.g. "1", "2", or "2024-01-15_1" with today= ltrb = track.to_ltrb() # [left, top, right, bottom] — Kalman predicted ltwh = track.to_ltwh() # [left, top, width, height] det_class = track.get_det_class() # "person" / "car" / None det_conf = track.get_det_conf() # float or None if no match this frame l, t, r, b = [int(v) for v in ltrb] cv2.rectangle(frame, (l, t), (r, b), (0, 255, 0), 2) cv2.putText(frame, f"ID:{track_id} {det_class}", (l, t - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1) cap.release() ``` --- ## `DeepSort.update_tracks` with pre-computed embeddings — External embedder When you have your own ReID model, pass feature vectors directly via `embeds` and omit `frame`. ```python import numpy as np from deep_sort_realtime.deepsort_tracker import DeepSort # embedder=None → library will not load any built-in embedder tracker = DeepSort(max_age=10, embedder=None) # Simulated inputs from your own detector + embedder frame = np.zeros((720, 1280, 3), dtype=np.uint8) raw_detections = [ ([50, 30, 80, 160], 0.91, "person"), ([200, 60, 70, 150], 0.88, "person"), ] # Your own embedder produces a 512-d (or any-d) feature per detection crop your_embeddings = [np.random.rand(512).astype(np.float32) for _ in raw_detections] tracks = tracker.update_tracks(raw_detections, embeds=your_embeddings) for track in tracks: if track.is_confirmed(): print(f"Track {track.track_id}: {track.to_ltrb()}") # Example output: # Track 1: [50. 30. 130. 190.] # Track 2: [200. 60. 270. 210.] ``` --- ## `DeepSort.update_tracks` with polygon detections When `polygon=True` is set at construction, detections are passed as a triplet of (polygons, classes, confidences). The polygon's bounding rectangle is used for tracking; if embedding is enabled the polygon area masks the crop so only foreground pixels feed the embedder. ```python import numpy as np from deep_sort_realtime.deepsort_tracker import DeepSort tracker = DeepSort(max_age=10, polygon=True, embedder="mobilenet", embedder_gpu=False) frame = np.zeros((720, 1280, 3), dtype=np.uint8) # Polygon format: list of [x1,y1,x2,y2,...] flat arrays polygons = [[100, 100, 150, 80, 200, 120, 160, 180, 110, 170], # pentagon [300, 200, 360, 195, 370, 260, 295, 265]] # quadrilateral classes = ["person", "car"] confidences = [0.89, 0.76] # raw_detections for polygon mode: [polygons_list, classes_list, confidences_list] raw_detections = [polygons, classes, confidences] tracks = tracker.update_tracks(raw_detections, frame=frame) for track in tracks: if track.is_confirmed(): polygon_coords = track.get_det_supplementary() # original polygon stored here print(f"Track {track.track_id} | bbox {track.to_ltrb()} | polygon {polygon_coords}") ``` --- ## `DeepSort.update_tracks` with supplementary detection data Arbitrary per-detection payloads (e.g. segmentation masks, metadata dicts) are forwarded to the associated track and retrievable via `Track.get_det_supplementary()`. ```python import numpy as np from deep_sort_realtime.deepsort_tracker import DeepSort tracker = DeepSort(max_age=10, embedder="mobilenet", embedder_gpu=False) frame = np.zeros((720, 1280, 3), dtype=np.uint8) raw_detections = [ ([50, 30, 80, 160], 0.91, "person"), ([200, 60, 70, 150], 0.78, "person"), ] # Supplementary info — one entry per detection (any Python object) others = [ {"reid_score": 0.95, "zone": "entry"}, {"reid_score": 0.80, "zone": "exit"}, ] tracks = tracker.update_tracks(raw_detections, frame=frame, others=others) for track in tracks: supp = track.get_det_supplementary() # None if no match this frame if supp is not None: print(f"Track {track.track_id} zone={supp['zone']}, reid={supp['reid_score']}") ``` --- ## `DeepSort.update_tracks` with instance masks (background masking) Boolean instance masks suppress background pixels before the crop reaches the embedder, reducing background bias. One mask per detection, same spatial size as the full frame. ```python import numpy as np from deep_sort_realtime.deepsort_tracker import DeepSort tracker = DeepSort(max_age=10, embedder="mobilenet", embedder_gpu=False) H, W = 720, 1280 frame = np.random.randint(0, 255, (H, W, 3), dtype=np.uint8) raw_detections = [([100, 50, 120, 200], 0.90, "person")] # Boolean mask — True where foreground (object pixels), False = background mask = np.zeros((H, W), dtype=bool) mask[50:250, 100:220] = True # roughly cover the bounding box tracks = tracker.update_tracks( raw_detections, frame=frame, instance_masks=[mask], ) for track in tracks: inst_mask = track.get_instance_mask() # stored mask (None if no match this frame) print(f"Track {track.track_id} | has_mask={inst_mask is not None}") ``` --- ## `Track.to_ltrb` / `Track.to_ltwh` — Bounding box retrieval Returns Kalman-predicted coordinates by default. Setting `orig=True` returns the coordinates of the raw detection associated this frame; `orig_strict=True` returns `None` instead of Kalman values when no detection is associated. ```python for track in tracks: if not track.is_confirmed(): continue # Kalman-filter predicted position (always available) kf_ltrb = track.to_ltrb() # [l, t, r, b] kf_ltwh = track.to_ltwh() # [l, t, w, h] # Original detection bbox (only non-None when matched this frame) orig_ltrb = track.to_ltrb(orig=True) # falls back to KF if unmatched orig_ltwh = track.to_ltwh(orig=True) # Strict: returns None when track has no detection match this frame strict_ltrb = track.to_ltrb(orig=True, orig_strict=True) if strict_ltrb is not None: l, t, r, b = strict_ltrb print(f"Track {track.track_id} detection-only bbox: ({l},{t}) → ({r},{b})") else: print(f"Track {track.track_id} coasting (no detection this frame)") ``` --- ## `Track` state and metadata accessors Full set of read-only attributes and methods exposed by every `Track` object returned from `update_tracks`. ```python from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np tracker = DeepSort(max_age=5, embedder="mobilenet", embedder_gpu=False) frame = np.zeros((480, 640, 3), dtype=np.uint8) dets = [([10, 10, 50, 80], 0.95, "person"), ([200, 100, 60, 90], 0.80, "bicycle")] tracks = tracker.update_tracks(dets, frame=frame) for t in tracks: # Lifecycle state print(t.is_tentative()) # True for new tracks not yet confirmed print(t.is_confirmed()) # True after n_init consecutive hits print(t.is_deleted()) # True when track is stale (never returned after deletion) # Identity & timing print(t.track_id) # unique str ID, e.g. "1" or "2024-06-01_3" print(t.hits) # total measurement updates print(t.age) # total frames since first occurrence print(t.time_since_update) # frames since last detection match (0 = matched this frame) # Detection-associated data (reset to None each predict step, repopulated on match) print(t.get_det_class()) # class name string or None print(t.get_det_conf()) # float confidence or None print(t.get_instance_mask()) # boolean mask array or None print(t.get_det_supplementary()) # custom payload or None print(t.get_feature()) # latest appearance feature vector (np.ndarray) ``` --- ## Custom `Track` subclass — `override_track_class` Inject application-specific logic (e.g. activity counters, zone triggers) by subclassing `Track` and passing it at construction. ```python from deep_sort_realtime.deep_sort.track import Track from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np class MyTrack(Track): """Extended track that counts how many frames it spent in a region of interest.""" def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.roi_frame_count = 0 def check_roi(self, roi_ltrb): """Call each frame after update_tracks to accumulate ROI dwell time.""" l, t, r, b = self.to_ltrb() rl, rt, rr, rb = roi_ltrb in_roi = l >= rl and t >= rt and r <= rr and b <= rb if in_roi: self.roi_frame_count += 1 return in_roi tracker = DeepSort(max_age=30, override_track_class=MyTrack, embedder_gpu=False) frame = np.zeros((720, 1280, 3), dtype=np.uint8) roi = (100, 100, 500, 400) # [left, top, right, bottom] for _ in range(10): # simulate 10 frames dets = [([150, 120, 80, 160], 0.92, "person")] tracks = tracker.update_tracks(dets, frame=frame) for track in tracks: if track.is_confirmed(): in_roi = track.check_roi(roi) print(f"Track {track.track_id}: ROI dwell={track.roi_frame_count} frames, in_roi={in_roi}") ``` --- ## Daily track-ID reset with `today` Providing a date object enables date-prefixed track IDs (e.g. `"2024-06-01_1"`) and resets the counter each calendar day, preventing ID overflow in long-running deployments. ```python from datetime import datetime from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np today = datetime.now().date() tracker = DeepSort(max_age=30, nn_budget=100, today=today, embedder_gpu=False) frame = np.zeros((1080, 1920, 3), dtype=np.uint8) dets = [([0, 0, 50, 50], 0.9, "person"), ([100, 100, 50, 50], 0.85, "person")] tracks = tracker.update_tracks(dets, frame=frame, today=datetime.now().date()) for track in tracks: print(track.track_id) # Example output: # 2024-06-01_1 # 2024-06-01_2 ``` --- ## `DeepSort.delete_all_tracks` — Reset tracker state Wipes all active tracks and resets the internal ID counter to 1. Useful when switching scenes or video streams. ```python from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np tracker = DeepSort(max_age=10, embedder_gpu=False) frame = np.zeros((480, 640, 3), dtype=np.uint8) tracker.update_tracks([([10, 10, 50, 80], 0.9, "person")], frame=frame) print(f"Active tracks before reset: {len(tracker.tracker.tracks)}") # 1 tracker.delete_all_tracks() print(f"Active tracks after reset: {len(tracker.tracker.tracks)}") # 0 # IDs restart from 1 on the next update tracks = tracker.update_tracks([([10, 10, 50, 80], 0.9, "person")], frame=frame) print(tracks[0].track_id) # "1" ``` --- ## `MobileNetv2_Embedder.predict` — Standalone PyTorch embedder The default appearance embedder can be used independently of the tracker to produce 1280-dimensional feature vectors from arbitrary image crops. ```python import cv2 import numpy as np from deep_sort_realtime.embedder.embedder_pytorch import MobileNetv2_Embedder embedder = MobileNetv2_Embedder(half=False, max_batch_size=16, bgr=True, gpu=False) img1 = cv2.imread("test/smallapple.jpg") # BGR image img2 = cv2.imread("test/rock.jpg") features = embedder.predict([img1, img2]) # list of np.ndarray, shape (1280,) print(f"Feature dim: {features[0].shape}") # (1280,) # Cosine distance between two feature vectors from scipy.spatial.distance import cosine dist = cosine(features[0], features[1]) print(f"Cosine distance (apple vs rock): {dist:.4f}") # e.g. 0.4410 — dissimilar ``` --- ## `TorchReID_Embedder` — Person re-identification embedder Uses [Torchreid](https://github.com/KaiyangZhou/deep-person-reid)'s model zoo for person-ReID feature extraction. Default model is `osnet_ain_x1_0` with domain-generalised weights bundled in the package. ```python # pip install torchreid gdown tensorboard from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np, cv2 tracker = DeepSort( max_age=30, embedder="torchreid", embedder_model_name="osnet_ain_x1_0", # default; see torchreid model zoo for others embedder_wts=None, # None → bundled osnet weights embedder_gpu=False, bgr=True, ) frame = cv2.imread("test/smallapple.jpg") dets = [([0, 0, frame.shape[1]//2, frame.shape[0]], 0.9, "person")] tracks = tracker.update_tracks(dets, frame=frame) for track in tracks: print(f"Track {track.track_id}: feature_dim={track.get_feature().shape}") # Track 1: feature_dim=(512,) ``` --- ## `Clip_Embedder` — CLIP-based appearance embedder Uses [OpenAI CLIP](https://github.com/openai/CLIP) image encoder as the appearance feature extractor, producing 1024-d vectors. Effective for general object categories beyond persons. ```python # pip install git+https://github.com/openai/CLIP.git from deep_sort_realtime.deepsort_tracker import DeepSort import numpy as np tracker = DeepSort( max_age=30, embedder="clip_ViT-B/32", # options: clip_RN50 | clip_RN101 | clip_RN50x4 | # clip_RN50x16 | clip_ViT-B/32 | clip_ViT-B/16 embedder_wts=None, # None → auto-download or look in embedder/weights/ embedder_gpu=False, bgr=True, ) frame = np.zeros((720, 1280, 3), dtype=np.uint8) dets = [([50, 50, 100, 150], 0.88, "cat"), ([300, 100, 80, 120], 0.79, "dog")] tracks = tracker.update_tracks(dets, frame=frame) for track in tracks: if track.is_confirmed(): print(f"Track {track.track_id} | class={track.get_det_class()} | feat_dim={track.get_feature().shape}") ``` --- ## Running the test suite ```bash # From the repository root python3 -m unittest ``` --- ## Summary `deep-sort-realtime` is the go-to drop-in multi-object tracker for real-time Python pipelines. The single-call API (`update_tracks`) integrates with any detector that emits bounding boxes — YOLO, Detectron2, MMDetection, etc. — requiring only a list of `([l, t, w, h], confidence, class)` tuples and either a raw frame or pre-computed embeddings. The returned `Track` objects carry stable IDs across frames, Kalman-predicted and detection-exact bounding boxes in multiple formats, per-class labels, detection confidence, and arbitrary supplementary payloads, making it straightforward to build downstream logic such as trajectory analysis, counting, zone alerts, and activity recognition on top of the tracker output. Integration patterns range from the simplest one-liner `DeepSort(max_age=5)` with the bundled MobileNetV2 embedder, to fully custom setups that plug in a domain-specific ReID model via `embedder=None` + manual `embeds=` passing, override the `Track` class with a `MyTrack` subclass for per-track state, enable daily ID resets via `today=`, and leverage background masking via instance masks for higher-quality appearance features. The library is installable from PyPI (`pip install deep-sort-realtime`) with optional heavy dependencies gated behind the chosen embedder backend, making it lightweight to integrate and easy to scale to production video-analytics systems.