Object Tracking with YOLO and ByteTrack/BoT-SORT Trackers

Article about tracking objects with YOLO and tracking algorithms (ByteTrack & BoT-SORT) in Python.

Object tracking is different from object detection; you have to assign a label to each detected object, otherwise there will only be detected boxes on the screen. For assigning labels to objects, you have to check similarity between objects for each frame, like color, shape, texture, and IoU (Intersection over Union).

So, after detecting objects with object detection models, you should use separate tracking algorithms, and in this article I will show you how to combine YOLO object detection models with ByteTrack and BoT-SORT tracking algorithms.

Object Tracking with YOLO and ByteTrack

Also, I have a YouTube video about this article, you can watch it.

Tracking Algorithms

You can see the table below; these are the most well-known trackers, and now we will combine pretrained YOLO models with ByteTrack and BoT-SORT.

Thanks to Ultralytics, the implementation is quite easy. Ultralytics has support for both tracking algorithms, and configuration files are ready to use. By using pretrained YOLO models, you don’t have to train any model, and by combining YOLO object detection models with these tracking algorithms, you can track objects with a few lines.

TrackerBased onKey New Idea
SORTN/A (baseline)Kalman + IoU association
Deep SORTSORTAppearance embedding for ID consistency
ByteTrackSORTTwo-stage association: high & low confidence detections
BoT-SORTDeep SORTBetter ReID + occlusion-aware association

ByteTrack is inspired by the SORT algorithm, and BoT-SORT is inspired by the Deep SORT algorithm.

ByteTrack basically gives attention to low-confidence detections as well. This helps to identify occluded objects, because most of the time occluded objects have low confidence.

BoT-SORT is more focused on object motion, it has an improved Kalman filter to estimate motion, and it has better performance when the camera is moving.

Object Tracking with YOLO and BoT-SORT (video source)

You can see the FPS comparison in the chart below. Keep in mind that there are YOLO models running on the GPU as well. These FPS values will change depending on your hardware, but we can clearly say that ByteTrack has a significant advantage in terms of FPS.

ByteTrack(blue) vs BoT-SORT(green) in terms of FPS

YOLO Object Detection Model

You can directly use pretrained YOLO models; you don’t have to prepare a dataset or train the model. Pretrained models are already trained on huge datasets, but of course they have a finite number of classes. If you want to detect specific objects (for example, bird species, plane types, etc.), you need to train a model from scratch. But no worries, I have an article about it as well; you can read it.

Tracking Objects with YOLO and ByteTrack/BoT-SORT Tracking Algorithms / CODE

I explained everything in detail with comment lines. The main logic is to detect objects with a YOLO object detection model and use tracking algorithms to assign IDs to each detected object.

You can install necessary libraries from terminal like below.

# YOLO from ultralytics (object detection & tracking)
pip install ultralytics        # https://pypi.org/project/ultralytics/

# OpenCV for video/image processing
pip install opencv-python      # https://pypi.org/project/opencv-python/

# PyTorch CPU version (works without GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# If you have a GPU and want GPU acceleration, uncomment the next line:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

If you encounter any issues installing PyTorch with GPU support, I have an article about it; you can read it.

1. Import Necessary Libraries

from ultralytics import YOLO  
import cv2
import time
import torch 
import subprocess
import shutil
import os

print(torch.__version__) # my version --> 2.2.0+cu121
print(torch.cuda.is_available()) # True (GPU is available) 

2. Load YOLO Object Detection Mode

If you have a different YOLO model, you need to change the path. The yolov8n.pt model is the lightest YOLO model. If you have powerful hardware, you can use yolov8m.pt or yolov8l.pt.

model = YOLO('yolov8n.pt')

3. YOLO Object Detection + Tracking

For testing different trackers, you can change the tracker_choice parameter.

def main(tracker_choice='bytetrack'):
    input_path = 'videos/plane_video.mp4'  # Your video file path
 
    # Validate tracker choice
    if tracker_choice not in ['botsort', 'bytetrack']:
        print(f"Invalid tracker choice '{tracker_choice}', defaulting to 'bytetrack'")
        tracker_choice = 'bytetrack'
 
    # Load the pretrained YOLOv8 model
    model = YOLO('yolov8n.pt')
 
    # Check if CUDA is available and set device accordingly
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    print(f"Using device: {device}")
 
    # Load the video file
    cap = cv2.VideoCapture(input_path)
    if not cap.isOpened():
        print(f"Error opening video file {input_path}")
        return
 
    # Initialize variables for FPS calculation
    prev_time = 0
 
    """
    Parameters of track method:
    source: str - path to video file or camera index
    tracker: str - tracker configuration file (e.g., 'bytetrack.yaml' or 'botsort.yaml')
    conf: float - confidence threshold for detections
    stream: bool - if True, yields frames one by one for real-time processing, 
                if False, processes the entire video at once
    """
    results = model.track(
        source=input_path, # path to video file 
        tracker=f'{tracker_choice}.yaml',  # 'bytetrack.yaml' or 'botsort.yaml'
        conf=0.3,   # confidence threshold                      
        stream=True # set it to True for continuous video processing                    
    )
 
    # loop through the results
    for frame_result in results:
        # Get the original frame
        img = frame_result.orig_img.copy()
 
        # Calculate FPS (fall back to input_fps until prev_time set)
        curr_time = time.time()
        fps = 1 / (curr_time - prev_time) if prev_time != 0 else cap.get(cv2.CAP_PROP_FPS)
        prev_time = curr_time
 
        # loop through the detected boxes and draw them on the frame
        for box in frame_result.boxes:
            # Extract bounding box coordinates
            x1, y1, x2, y2 = map(int, box.xyxy.cpu().numpy()[0])
            # Extract confidence
            conf = box.conf.cpu().item()
            # Extract class
            cls = int(box.cls.cpu().item())
            # Extract track ID 
            track_id = int(box.id.cpu().item()) if box.id is not None else -1
 
            """
            model.names is a dictionary mapping class indices to class names.
            {0: 'person',
            1: 'bicycle',
            2: 'car',
            3: 'motorcycle',
            ...,
            }
            """
            class_name = model.names[cls]
 
            # Draw bounding box and label
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
            label = f"{class_name} ID:{track_id} {conf:.2f}"
            cv2.putText(img, label, (x1, y1 - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
 
        # Display tracker name and FPS on top-left corner
        cv2.putText(img, f"Tracker: {tracker_choice}", (10, 70),
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)
        cv2.putText(img, f"FPS: {fps:.2f}", (10, 150),
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)
 
        # Display the frame with detections  
        cv2.imshow('YOLOv8 Tracking', img)
 
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
 
    cap.release()
    cv2.destroyAllWindows()
 
if __name__ == '__main__':
    """ There are two trackers available:
        1. ByteTrack (bytetrack.yaml) 
        2. Sort (botsort.yaml) 
    """
    main(tracker_choice='bytetrack')
Tracking Objects with YOLO + Tracking Algorithms