Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
→ Object tracking using meanshift algorithm in OpenCV, implemented in both Python and C++.
When we look at an image, we see a bunch of pixels. All of these pixels represent color values, and if we consider an image as a color map, we can follow that color pattern. That is exactly what Meanshift and Camshift do.
1In this article, I am going to create an object tracker using the Mean Shift algorithm. I will use Python as the programming language, and you can also find a C++ implementation of this project at the end of the page.
Both meanshift and camshift use color histograms for tracking. A color histogram is a representation of the distribution of pixel intensities for each color channel (e.g., Red, Green, Blue) in an image.
Mean Shift and Cam Shift are similar to each other, but they produce different results.
In the following two sections, I am going to explain these 2 algorithms in simple terms. You can read Wikipedia and the OpenCV documentation for more in-depth information.
Meanshift is a non-parametric algorithm used for clustering and mode-seeking (finding the most frequently occurring value in a dataset) in data. In object tracking, Meanshift tracks an object by repeatedly adjusting a window to follow the object’s color distribution.
You can see a small demonstration of how the Meanshift algorithm works below.
It is simple and computationally efficient but it can’t handle rotations and varying sizes. When computational resources are limited, Meanshift algorithm can be used instead of the Camshift algorithm.
Cam shift is an advanced version of Meanshift. It adjusts the tracking window size and orientation to follow objects that rotate or change size. It can detect objects in more complex scenarios. It requires more computational resources.
Now, it is time to create an object detector using meanshift algorithm. The main idea is quite simple: First user draws a rectangle to the interested area (object) with a mouse right-click and presses the ESC
key. After that, a new window appears, and the object is tracked within that window.
The user defines the target object by drawing a rectangle to the first frame of the video.
import cv2
import numpy as np
# path to video
video_path=r"your_path/video.mp4"
video = cv2.VideoCapture(video_path)
# read only the first frame to draw a rectangle for object
ret,frame = video.read()
# I am giving big random numbers for x_min and y_min because if you initialize them as zeros whatever coordinate you go minimum will be zero
x_min,y_min,x_max,y_max=36000,36000,0,0
# function for choosing min and max coordinates
def coordinat_chooser(event,x,y,flags,param):
global go , x_min , y_min, x_max , y_max
# when you click the right button it is gonna give variables some coordinates
if event==cv2.EVENT_RBUTTONDOWN:
# if current coordinate of x lower than the x_min it will be new x_min , same rules apply for y_min
x_min=min(x,x_min)
y_min=min(y,y_min)
# if current coordinate of x higher than the x_max it will be new x_max , same rules apply for y_max
x_max=max(x,x_max)
y_max=max(y,y_max)
# draw rectangle
cv2.rectangle(frame,(x_min,y_min),(x_max,y_max),(0,255,0),1)
"""
if you didn't like your rectangle (maybe if you made some misscliks), reset the coordinates with the middle button of your mouse
if you press the middle button of your mouse coordinates will reset and you can give new 2-point pair for your rectangle
"""
if event==cv2.EVENT_MBUTTONDOWN:
print("reset coordinate data")
x_min,y_min,x_max,y_max=36000,36000,0,0
cv2.namedWindow('coordinate_screen')
# Set mouse handler for the specified window, in this case, "coordinate_screen" window
cv2.setMouseCallback('coordinate_screen',coordinat_chooser)
while True:
cv2.imshow("coordinate_screen",frame) # show only first frame
k = cv2.waitKey(5) & 0xFF # after drawing rectangle press ESC
if k == 27:
break
# inside of rectangle that user draw
object_image=frame[y_min:y_max,x_min:x_max,:]
hsv_object=cv2.cvtColor(object_image,cv2.COLOR_BGR2HSV)
# cx and cy are the centers of the rectangle that the user chose
height, width, _ = hsv_object.shape
cx = int(width / 2)
cy = int(height / 2)
# take the center pixel to find out which rectangle color.
pixel_center = hsv_object[cy, cx]
hue_value = pixel_center[0] # axis 0 is hue values
# from hue_value find color
color =str()
if hue_value < 5:
color = "red"
elif hue_value < 22:
color = "orange"
elif hue_value < 33:
color = "yellow"
elif hue_value < 78:
color = "green"
elif hue_value < 131:
color = "blue"
elif hue_value < 170:
color = "violet"
else:
color = "red"
# hue dict
hue_dict={ "red":[[[0, 100, 100]],[10, 255, 255]],
"orange":[[10, 100, 100],[20, 255, 255]],
"yellow":[[20, 100, 100],[30, 255, 255]],
"green":[[50, 100, 100],[70, 255, 255]],
"blue":[[110,50,50],[130,255,255]],
"violet":[[140, 50, 50],[170, 255, 255]]}
# find the upper and lower bounds of the image's color
lower_bound , upper_bound = np.asarray(hue_dict[color][0]) , np.asarray(hue_dict[color][1]) # lower and upper bound sequentially
print(f"detected color : {color}" )
# This time display all video, not just the first frame
# (in the first part only the first frame was displayed on the screen because the user was choosing an object by drawing a rectangle)
video=cv2.VideoCapture(video_path) # Open the video file
# we need first frame for creating roi(region of interest)
ret,cap = video.read() # Read the first frame
# coordinates that the user gives with his mouse
x=x_min # x coordinate of top-left corner
y=y_min # y coordinate of top-left corner
w=x_max-x_min # width of the selected region
h=y_max-y_min # height of the selected region
track_window = (x, y, w, h) # Initial tracking window
# set up the ROI for tracking
roi = cap[y:y+h, x:x+w] # Extract ROI from the first frame
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV) # Convert ROI to HSV color space
# use lower_bound and upper_bound inside of inRange function
mask = cv2.inRange(hsv_roi, lower_bound,upper_bound ) # Create mask using color thresholds
roi_hist = cv2.calcHist([hsv_roi],[0],mask,[180],[0,180]) # Calculate histogram of hue values in ROI
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX) # Normalize histogram
# Setup the termination criteria, either 10 iterations or move by at least 1 pt
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
while True:
ret, frame = video.read() # Read a new frame from the video
cv2.putText(frame,f"detected color : {color}" , (25,25),cv2.FONT_HERSHEY_SIMPLEX ,1,(0,0,255),1) # Display detected color
if ret == True:
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) # Convert current frame to HSV
dst = cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1) # Backproject the histogram onto the frame
# apply mean-shift to get the new location
ret, track_window = cv2.meanShift(dst, track_window, term_crit) # Update tracking window
# Draw it on the image
x,y,w,h = track_window # Unpack updated tracking window
img2 = cv2.rectangle(frame, (x,y), (x+w,y+h), 255,2) # Draw rectangle around tracked object
cv2.imshow('img2',img2) # Show the frame with tracking result
if cv2.waitKey(30) & 0xff == 27: # Exit if ESC is pressed
break
cv2.destroyAllWindows() # Close all OpenCV windows