Detection Process - RecycleAI/RecycleIT-A GitHub Wiki

YOLO labeling format

Most annotation platforms support export at YOLO labeling format, providing one annotations text file per image. Each text file contains one bounding-box (BBox) annotation for each of the objects in the image. The annotations are normalized to the image size and lie within the range of 0 to 1. They are represented in the following format:

(object-class-ID) (X center) (Y center) (Box width) (Box height)

YOLOv5s model results on Kaggle dataset

Here is the github for this model. https://github.com/ultralytics/yolov5

  • The dataset used here is the Kaggle dataset which has 4 classes (Aluminium, Glass, PET, HDPE). It consists of over 4700 images of objects which are photographed on various backgrounds. The dataset is annotated as explained in the former part (.txt file).

    https://www.kaggle.com/datasets/arkadiyhacks/drinking-waste-classification

  • The small model of yolov5 has been used.

  • Data is divided into trainset (3.5k), test set (475), and validation set (718).

  • In the preprocessing stage, data has been resized to 416x416 pixels.

  • In order to augment the new dataset, the Roboflow website has been used. Augmentation is done in two fashions. First, the images have been rotated 15 degrees and then some of them have been blurred and infused into the dataset.

  • The model then was trained on google colab with these parameters: image size 416, batch 16, epochs 100.

  • The results for loss are as below:

And here are the metrics:

  • The model was also tested on an object which is specific to Iran and the result is as below: (class 3 => PET)

YOLOv7 model results on Kaggle dataset

Here is the github for this model. https://github.com/WongKinYiu/yolov7

  • The labeling format for yolov7 is the same as yolov5.
  • Kaggle dataset was used again in order to compare the results with yolov5s.
  • Data is divided into trainset (3.5k), test set (475), and validation set (718).
  • In the preprocessing stage, data has been resized to 416x416 pixels.
  • In order to augment the new dataset, the Roboflow website has been used. Augmentation is done in two fashions. First, the images have been rotated 15 degrees and then some of them have been blurred and infused into the dataset.
  • The model then was trained on google colab with these parameters: image size 416, batch 32, epochs 50, workers 8.
  • A pre-trained model weights have been used in the training process.
  • The results for loss are as below:

And here are the metrics: