Object Tracking with Meanshift Algorithm using OpenCV

→ Object tracking using meanshift algorithm in OpenCV, implemented in both Python and C++.

When we look at an image, we see a bunch of pixels. All of these pixels represent color values, and if we consider an image as a color map, we can follow that color pattern. That is exactly what Meanshift and Camshift do.

1In this article, I am going to create an object tracker using the Mean Shift algorithm. I will use Python as the programming language, and you can also find a C++ implementation of this project at the end of the page.

Object Tracking with the Meanshift Algorithm in OpenCV Using Python and C++  (video source)

Mean Shift and Cam Shift Algorithms

Both meanshift and camshift use color histograms for tracking. A color histogram is a representation of the distribution of pixel intensities for each color channel (e.g., Red, Green, Blue) in an image.

Histogram of an image (image source)

Mean Shift and Cam Shift are similar to each other, but they produce different results.

  • Mean Shift gives simpler results; it can’t handle rotations or properly detect when the object size changes.
  • Cam Shift is more powerful and versatile. It can handle rotations, varying sizes, and more.

In the following two sections, I am going to explain these 2 algorithms in simple terms. You can read Wikipedia and the OpenCV documentation for more in-depth information.

Meanshift

Meanshift is a non-parametric algorithm used for clustering and mode-seeking (finding the most frequently occurring value in a dataset) in data. In object tracking, Meanshift tracks an object by repeatedly adjusting a window to follow the object’s color distribution.

You can see a small demonstration of how the Meanshift algorithm works below.

How Meanshift Algorithm Works in OpenCV (video source)

It is simple and computationally efficient but it can’t handle rotations and varying sizes. When computational resources are limited, Meanshift algorithm can be used instead of the Camshift algorithm.

Camshift

Cam shift is an advanced version of Meanshift. It adjusts the tracking window size and orientation to follow objects that rotate or change size. It can detect objects in more complex scenarios. It requires more computational resources.

Tracking Objects Using the Meanshift Algorithm with OpenCV

Now, it is time to create an object detector using meanshift algorithm. The main idea is quite simple: First user draws a rectangle to the interested area (object) with a mouse right-click and presses the ESC key. After that, a new window appears, and the object is tracked within that window.

1. Draw rectangle around the target object

The user defines the target object by drawing a rectangle to the first frame of the video.

import cv2
import numpy as np

 
# path to video  
video_path=r"your_path/video.mp4"  

video = cv2.VideoCapture(video_path)

# read only the first frame to draw a rectangle for object
ret,frame = video.read()

# I am giving  big random numbers for x_min and y_min because if you initialize them as zeros whatever coordinate you go minimum will be zero 
x_min,y_min,x_max,y_max=36000,36000,0,0

# function for choosing min and max coordinates 
def coordinat_chooser(event,x,y,flags,param):
    global go , x_min , y_min, x_max , y_max

    # when you click the right button it is gonna give variables some coordinates
    if event==cv2.EVENT_RBUTTONDOWN:
        
        # if current coordinate of x lower than the x_min it will be new x_min , same rules apply for y_min 
        x_min=min(x,x_min) 
        y_min=min(y,y_min)

         # if current coordinate of x higher than the x_max it will be new x_max , same rules apply for y_max
        x_max=max(x,x_max)
        y_max=max(y,y_max)

        # draw rectangle
        cv2.rectangle(frame,(x_min,y_min),(x_max,y_max),(0,255,0),1)


    """
        if you didn't like your rectangle (maybe if you made some misscliks),  reset the coordinates with the middle button of your mouse
        if you press the middle button of your mouse coordinates will reset and you can give new 2-point pair for your rectangle
    """
    if event==cv2.EVENT_MBUTTONDOWN:
        print("reset coordinate  data")
        x_min,y_min,x_max,y_max=36000,36000,0,0

cv2.namedWindow('coordinate_screen')
# Set mouse handler for the specified window, in this case, "coordinate_screen" window
cv2.setMouseCallback('coordinate_screen',coordinat_chooser)

while True:
    cv2.imshow("coordinate_screen",frame) # show only first frame 
    
    k = cv2.waitKey(5) & 0xFF # after drawing rectangle press ESC   
    if k == 27:
        break

2. Detect the color of target object

# inside of rectangle that user draw 
object_image=frame[y_min:y_max,x_min:x_max,:]

hsv_object=cv2.cvtColor(object_image,cv2.COLOR_BGR2HSV)    

# cx and cy are the centers of the rectangle that the user chose 
height, width, _ = hsv_object.shape
cx = int(width / 2)
cy = int(height / 2)

# take the center pixel to find out which rectangle color.
pixel_center = hsv_object[cy, cx]
hue_value = pixel_center[0] # axis 0 is hue values


# from hue_value find color
color =str()
if hue_value < 5:
    color = "red"
elif hue_value < 22:
    color = "orange"
elif hue_value < 33:
    color = "yellow"
elif hue_value < 78:
    color = "green"
elif hue_value < 131:
    color = "blue"
elif hue_value < 170:
    color = "violet"
else:
    color = "red"

# hue dict 
hue_dict={ "red":[[[0, 100, 100]],[10, 255, 255]],
           "orange":[[10, 100, 100],[20, 255, 255]],
           "yellow":[[20, 100, 100],[30, 255, 255]],
           "green":[[50, 100, 100],[70, 255, 255]],
           "blue":[[110,50,50],[130,255,255]],
           "violet":[[140, 50, 50],[170, 255, 255]]}

# find the upper and lower bounds of the image's color
lower_bound , upper_bound = np.asarray(hue_dict[color][0]) , np.asarray(hue_dict[color][1]) # lower and upper bound sequentially

print(f"detected color : {color}" )

3. Track the Target Object Using the Meanshift Algorithm

# This time display all video, not just the first frame 
# (in the first part only the first frame was displayed on the screen because the user was choosing an object by drawing a rectangle)   
video=cv2.VideoCapture(video_path)  # Open the video file

# we need first frame for creating roi(region of interest)
ret,cap = video.read()  # Read the first frame

# coordinates that the user gives with his mouse 
x=x_min  # x coordinate of top-left corner
y=y_min  # y coordinate of top-left corner
w=x_max-x_min  # width of the selected region
h=y_max-y_min  # height of the selected region

track_window = (x, y, w, h)  # Initial tracking window

# set up the ROI for tracking
roi = cap[y:y+h, x:x+w]  # Extract ROI from the first frame

hsv_roi =  cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)  # Convert ROI to HSV color space

# use lower_bound and upper_bound  inside of inRange function
mask = cv2.inRange(hsv_roi, lower_bound,upper_bound )  # Create mask using color thresholds
roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])  # Calculate histogram of hue values in ROI
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)  # Normalize histogram

# Setup the termination criteria, either 10 iterations or move by at least 1 pt
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )

while True:

    ret, frame = video.read()  # Read a new frame from the video

    cv2.putText(frame,f"detected color : {color}" , (25,25),cv2.FONT_HERSHEY_SIMPLEX ,1,(0,0,255),1)  # Display detected color

    if ret == True:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)  # Convert current frame to HSV
        dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)  # Backproject the histogram onto the frame

        # apply mean-shift to get the new location
        ret, track_window = cv2.meanShift(dst, track_window, term_crit)  # Update tracking window

        # Draw it on the image
        x,y,w,h = track_window  # Unpack updated tracking window
        img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2)  # Draw rectangle around tracked object
 
        cv2.imshow('img2',img2)  # Show the frame with tracking result

        if cv2.waitKey(30) & 0xff == 27:  # Exit if ESC is pressed
            break

cv2.destroyAllWindows()  # Close all OpenCV windows

Object Tracking with the Meanshift Algorithm in OpenCV Using Python and C++video source )