Clothes Detection - RoBorregos/robocup-home GitHub Wiki

Train a Clothes Detection Model using YOLOv5 and Google Colab

YOLOv5 is a model based on the YOLOv3 version of the YOLO family of object detection models. This family are based on the premise of obtaining all of the detections of the objects in images from the entirety of the images, in contrast of what other models do, which is dividing the images into smalles images and do separate detections in each section. The usage of the whole picture allows the model to not lose information when dividing the image into smaller sections, but it also has the disadvantage that it needs a larger processing capability for running the model, since the entire image is processed by the model.

The main improvements Ultralytics did in this model in comparison from previous iterations of the model are are the decrease in memory usage, in training time and overall improvement in mAP when tested with the COCO dataset, as well as its implementation in PyTorch. One innovation Ultralytics did is the implementation of the mosaic augmentation, which is decreasing the size of the images, joining them into a mosaic of images and run the training on these mosaics. This mosaic augmentation allows the model to detect objects even when these are a lot smaller from what we can see in the original images.

This model needs python 3.8 and all the dependencies in the requirements.txt installed.

For this specific model, a Google Colab was implemented, you should make a copy of it in your Google Drive. The Google Colab was done in the most automatic way, so the user only needs to do some minor actions for start the training and save the weights in a desired directory in google drive.

Preprocessing

The first step is to get pictures of the objects to train. There are several ways of doing this: Kaggle, [Open Images Dataset] (https://storage.googleapis.com/openimages/web/index.html), Searching for them in Google Images and then writing a script for downloading them into a folder, etc.

Bare in mind that the images should include also a .txt file that contains the coordinates and the size of the object to detect. If you download your dataset from a dataset website, then it is more likely that you will have the labels of the object in some annotation format. There are several formats of labels (Pascal VOC, coco JSON, RetinaNet, etc.), however, for this specific model, YOLO darknet labels are required. In case your dataset has a different annotation format, you could write or look for a script for converting the annotations from their original format to the format you need, or you could also upload the images to a website and let them convert the annotations for you.

In case you download the images in bulk from some other source that does not provide you with the labels, you can create the annotations from the images yourself using a tool like LabelImg.

There is not a magical number of images you should download, however, the larger your dataset is, the more robust your model will be, but it will also take a longer time to train.

After getting your images and your labels, you should split the images into train/test folders. Again there is not a specific proportion for doing this, so you can use whatever proportion you want (70%-30%, 80%-20%, 66%-34%), as long as the train dataset is larger than the test dataset.

For this model in particular, you should save your test and train datasets using the following folder structure:

/Dataset/
|-- test
|   |-- images
|   |-- labels
|-- train
|   |-- images
|   |-- labels

After getting your dataset following this structure, you should compress it into a zip file and upload it to your google drive and you can now start using the Google Colab Notebook.

Setup Google Colab Notebook

As I mentioned before, the Google Colab is done in a way that you should not do many actions in order for it to function. Before you run it, you should upload a classes.yaml file, which will contain the paths of the datasets, the number of classes and the names of the classes. In the code section you should find an example with which you should not modify in case you left the structure and folders name as I put it before, but you should still change the number of classes and the names of the classes to fit to whatever classes you want to train.

In one of the cells, it will require you to provide access to google drive, you should just follow the instructions to authorize it.

If you do not have any problem, you should reach a cell in which you will see a JavaScript snippet. This snippet is used for not getting disconnected from the Runtime Environment due to inactivity. You should press ctrl + shift + i to open the inspection tab; in the inspection tab, you should got to the console tab, paste the snippet and press enter for it to start running in the Google Colab Tab.

Follow the notebook and continue with the commands.

Training

Once you reach a cell that contains code like this, it will be the time to start training. You can read more about the hyperarameters in the YOLOv5 repository, but the most essential hyperparameters are the following:

  • img: specifies the size of the images to introduce into the model.
  • batch: indicates the size of the batches
  • epochs: give the number of epochs or iterations for the model to train
  • data: this hyperparameter asks for a file (in our case classes.yaml), that will contain all the specifications of the locations of the dataset, the number of classes and their names.
  • cfg: configuration of the model. The YOLOv5 repository includes several models that vary on the size and the number of ramifications among the neurons.
  • weights: in case you do not have any weight file, you can leave an empty space in this part. In case you have one from previous trainings, you type the path to it, so that the model does not start from scratch and can obtain better results.
  • name: the name of your trained model.
  • device: this is an extremely important parameter. This parameter allows the model to use the GPU in the computer it is training. Since this model is based on PyTorch, you may use multiple graphic cards and processors for the training (more information about that in the YOLOv5 repository, in the parallel training section).
!python train.py --img 640 --batch 8 --epochs 35 --data ../classes.yaml --cfg ./models/yolov5x.yaml --weights ../best.pt --name custom_model --device 0

Based on the number of epochs you choose and the size of your dataset, it may take from just an hour to six or more hours to train your model. Even if you leave it running, you should check constantly in case you get disconnected from the runtime environment. It may happen sometimes and in some cases you may loose your progress, in some others you may be lucky and you connect back to your previous environment.

After the training is done, in the Google Colab will compress the train folder into a zip file and will save a copy of it in a google drive directory you specify.

Running the model locally

After you have downloaded your zip file, you will unzip it and see the files saved from the training. Inside each 'experiment' as the model calls them, you will find several statistics and images of the training results. Inside the weights folder, you will find the best and the last weights files, you may choose any of those for running the model in your computer.

For running your model, you will first clone the YOLOv5 repository into your computer and install the necessary dependencies.

After the installation, you will get inside the directory, paste your weights file and run the following command:

$ python detect.py --weights yolov5s.pt --source 0  # webcam
                                                 file.jpg  # image 
                                                 file.mp4  # video
                                                 path/  # directory
                                                 path/*.jpg  # glob
                                                 'https://youtu.be/NUsoVlDFqZg'  # YouTube video
                                                 'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

Areas of opportunity

The model as it is, with the current weights, is able to detect shirts, skirts, jeans, trousers, accessories, glasses and suits. The main improvement could be to add more clothes (t-shirts, blouses, scarfs, etc.) as well as adding a color detection model, so that it is capable of determining the color of the cloth, besides the types of cloth a person is wearing.