Pipeline For Training Custom YOLO Object Detection Models

→ Step-by-step guide for training YOLO object detection models with any dataset for any task.

Object detection is an important part of computer vision, and when you consider real-time performance and accuracy, YOLO models are the most preferred ones. There are different versions of YOLO models, but with the Ultralytics library, every step is nearly the same.

In this article, I will share my pipeline to train a YOLO object detection model, and I will cover preparing the dataset, configuration files, and different YOLO models of YoloV8.

By following this step-by-step guide, you can train your own YOLO object detection models with any dataset for any task.

Pipeline For Training Custom YOLO Object Detection Models — **Object Detection with YOLO models**

Also, I have a YouTube video about this article, you can watch it.

Training Environment

You need a GPU for training a Deep Learning model, especially if your task includes images. In the background, a lot of calculations occur. So you need GPUs for faster computations, otherwise it might take weeks to train even a small model.

You might not have a GPU, or even if you have a GPU, you might not prefer to create an environment because there are a lot of compatibility issues between CUDA, cuDNN, and PyTorch versions.

You can directly use servers like Kaggle and Google Colab. They are super friendly, after spending a little time, everything will be fine.

I recommend you use Kaggle because it is completely free, and it offers 30 hours of training time weekly, and trust me it is more than enough. You just need to upload your dataset to Kaggle and create a notebook, and then every step is the same, just write code in the cells D:

Anyway, if you want to create an environment for PyTorch with GPU support, you can watch this video.

Steps for Training YOLO Object Detection Models

There are seven main steps in my pipeline, and I will explain them one by one

Create a Dataset
Create a Base Folder for the Project
Create a YAML File
Choose a Pretrained YOLOv8 Model
Create a Python File for Training and Train the Model
Observe Model Metrics
Test the Model

1. Creating a Dataset

Data is one of the most important things in Deep Learning. Without proper data, it is impossible to train a model that performs well in different environments. There are two main ways to acquire data, you can either create your own or find a dataset from the internet.

If you decide to create your own dataset, it can be a painful and time consuming process (It will be painful for sure D:). First, you need to collect images(take screenshots, or take photos with your camera, etc.), and then annotate images by drawing bounding boxes and labeling objects using a tool like labelImg.
There are millions of datasets available on the internet. Here are some websites that you can find proper datasets for your purpose (about 90% of the time ): Kaggle, Roboflow

I always first check the internet, and if I can’t find data that fits my purpose, then I create my data by myself. As I said before, creating a dataset from scratch is an time-consuming process, but sometimes you have to create your own dataset especially for specific tasks.

For this article, I will use this dataset

2. Create a Base Folder for the Project

You need one main folder, and store everything inside it.

Dataset folder: Contains image and label files beside train and validation folders
YOLO model: Pretrained YOLO model
YAML file: Information about the dataset
Python file: I strongly prefer you to use notebooks (.ipynb files), but pure Python(.py) file will work as well

In the end, folders and files have to be like the image below. Now, step by step we create this files and folders.

3. Create a YAML File

YAML file contains information about the path of the dataset, the number of different classes, and the names of classes. I named the YAML file as chess_big_data.yaml, and you can name it whatever you want, but don’t forget to write the same name when needed.

train: C:/ml_dl_cv_files/computer-vision-main/chess/chess_model_big_data/chess_big_data/train/images
val: C:/ml_dl_cv_files/computer-vision-main/chess/chess_model_big_data/chess_big_data/valid/images

nc: 12

names: ['black-bishop', 'black-king', 'black-knight', 'black-pawn', 'black-queen',
        'black-rook', 'white-bishop', 'white-king', 'white-knight', 'white-pawn',
        'white-queen', 'white-rook']

4. Choose a pretrained YOLOv8 model

There are a bunch of pretrained YOLOv8 models and choosing the pretrained model is completely depends on your purpose. If you want higher accuracy you need to choose the more complex(bigger) model, or if you need higher FPS you need to choose more basic models. There is a trade-off between speed and accuracy.

But this is not true all the time, with bigger model metrics might be worse, because there are other factors like quality and quantity of the dataset.

If your dataset is not complex or not big, I recommend you to choose smaller models.

YOLOv8m is a great choice for both real-time performance and accuracy

Choose a pretrained model and download it, then paste it into the base project folder. I downloaded YOLOv8m, but nothing changes, you can choose another model.

Ohh, I almost forgot, if you choose bigger models, intuitively training will take more time.

5. Create a Python File for Training and Train the Model

Now create a file for training. I prefer to use Jupyter notebooks, and 99% of people use notebooks for training models. If you are using Kaggle or Google Colab, you already have a notebook.

I named my notebook as a Train_chess_model.ipynb

There are hundreds of parameters and you can check documentation for explanations

from ultralytics import YOLO
import torch

# check if GPU is available
print(torch.cuda.get_device_name(0))
 
# Load the model.
model = YOLO('yolov8m.pt') # pretrained model 
 
# Training.
results = model.train(
   data='chess_big_data.yaml',  # .yaml file 
   imgsz=416, # image size
   epochs=50,  # epoch number
   batch=4, # batch size, you can change this depending to your GPU 
   name='chess_big_data_model4', # The output folder name; model weights and all of the other things will be stored here.
    plots=True, # Plots about metrics (precision, recall,f1 score)
 amp=False, # amp=True gives me an error, If it doesn't give you an error, set it to True

For training, simply run the Python file. When training starts, you might be surprised because there are going to be many lines about the model, labels, and folders. If everything works well, the training will start.

It may take a long time depending to GPU.

After training is finished, there will be a summary about model that contains precision, recall, and mAP values.

6. Observe Model Metrics

After training is finished, you will see a line indicating: Results saved to “some_folder”, and this folder is exactly the same as name in the training script. Inside this folder, there are a lot of important informations about the model, and you can see metrics and change training parameters to obtain more powerful models.

And most importantly, this folder contains your trained model(best.pt)

7. Test the Model

Load the trained model, and test it with different images to see how your model performs.

from ultralytics import YOLO
from PIL import Image

model = YOLO("best.pt") # path to .pt model file

results = model("test_image.jpg") # path to test image

# Show the results
for r in results:
    im_array = r.plot()  # plot a BGR numpy array of predictions
    im = Image.fromarray(im_array[..., ::-1])  # RGB PIL image
    im.show()  # show image
    im.save('results.jpg')  # save image

Training Environment

Steps for Training YOLO Object Detection Models

1. Creating a Dataset

2. Create a Base Folder for the Project

3. Create a YAML File

4. Choose a pretrained YOLOv8 model

5. Create a Python File for Training and Train the Model

6. Observe Model Metrics

7. Test the Model

Related Posts

Pipeline for Training DETR Object Detection Models

Introduction to DINOv3: Generating Similarity Maps with Vision Transformers

Image Classification with CLIP: Image-Text Similarity and Zero-Shot Labels

Trending now