Test Plan - CankayaUniversity/ceng-407-408-2020-2021-Violent-Activity-Detection-from-Videos GitHub Wiki

1. Introduction

1.1 Version Control

table1

1.2 Overview

VADS is a simulation project. VADS structure based on machine learning and computer vision. The general test plan of this document is prepared for these two concepts.

1.3 Scope

This document includes information about the test plan of the Violent Activity Detection System. The following sections briefly explain what the test criteria will be and how we will do the test section.

1.4 Terminology

TERMS	DEFINITIONS
SRS	Software System Requirements
SDD	Software Design Document
VADS	Violent Activity Detection System
GUI	Graphical User Interface

2. Features To Be Tested

In this section, we will describe our test plan and give general information about the features to be tested.

In general, the data set we create consists of videos downloaded from the internet via YouTube. These videos are divided into violent content and normal content. For both contents, care has been taken to get the videos from a stationary camera. This is because the method used works correctly and we want the data set we created to give accurate results during the test. If the videos, provide the criteria we have determined, regardless of the video length, they are cut from the designated places and transferred to the relevant content.

The dataset we have created consists mostly of fight videos. In the dataset, there are exceptions such as gun use and harassment videos. These exceptions have been used to make the data set more comprehensive.

The data set will be divided into three parts as Training, Validation, and Test. Training and Validation data sets will be used in model development, Tests will be used in model evaluation. The explanation of these terms given in the following section:

table2

Validation Set:

We will try to make the accurate model separation by allocating a 15% part for the validation set. Our purpose is to increase the efficiency of the method which is using and to improve the model. We can rating of accurate the model is by calculating the error rate provided in the validation set at any point. For this, the most optimum coefficients will be found by trying Hyper-parameter tuning applications. In this case, overfitting and under-fitting situations should be considered. Later, the model will adjust its parameters according to the frequent evaluation results in the validation set. As a result of this data from the validation, a transition to the test set can be achieved.

Test Set:

The test data set allows the model to be evaluated objectively. A test data set is a data set that has valid probability distribution for the validation data set but separated from the training data set. When both validation and test sets are used, the test set enables the final model resulting in the validation set being evaluated. The compatibility of the model presented with the training data set with the data is revealed by the probabilities that the model may encounter with the test set. In order to evaluate the model impartially, a test set should be created from our training set. The test set consists of samples that the machine did not see and did not encounter while creating the model. In order to test the accuracy and reliability of the model, it is important that the test data set consists of random samples that have not been encountered by the model before. Another point to be considered is that the test set should be large enough and at the same time a data set that represents the whole data set in general because meaningful results emerge as a result of these. In our project, a data set divided into three is used. While 70 percent of our data set forms the training set, 15 percent of the remaining 30 percent constitutes the test set. While working on an existing data set, the data set we created will be included in the test set and validation set.

Evaluate models using metrics:

Evaluating the algorithm in machine learning projects is an important part of the project. We must measure the performance and success of the model, which is presented with data sets based on predictive using evaluation metrics. These evaluations present a result by comparing the model created with the data set with the expected values. Creating a model and being trained with data does not show the validity of the model. Therefore, we need to measure the performance of the model in order to evaluate the model and to develop the model with the results obtained as a result of these metrics. In this machine learning project, accuracy and loss metrics will be used for Hyper-parameters, while accuracy precision and recall metrics will be used in the final version of the model.

Accuracy:

Accuracy is a metric used to determine which model is best for building a model. The better the created model can generalize the unpredictable data, the more usability of the created model increases, and the machine's insights improve. It is obtained how accurately all data points processed can be estimated with accuracy. To define it as a formula, it is found as the ratio of the number of correct positives and correct negatives to the number of correct positives, false positives, correct negatives, and false negatives. The higher this ratio, the better for the created model. However, accuracy alone is not the only criterion for the functionality of the model. In this project, accuracy will be used as a referenced metric in training the hyperparameters and training the final model of the model.

accuracy

Loss:

Loss determines the number of bad predictions of the model created. The lower the number of bad predictions in the model, the more acceptability of the functionality of the model. This metric, which is generally used in the training process, aims to minimize the error. In this project, the loss metric will also be a metric used in the training of hyper-parameters.

Precision:

Precision is based on making a calculation on the correct results of the created model. It is defined as the ratio of true positives to all positives. In this project, a precision metric will be used for training the final version of the model.

precision

Recall:

It determines how many of the positive results as a percentage are correctly defined. The question asked is whether a result that is actually identified as correct is indeed correct. It is a metric that is used as a determining factor in the performance of the model. In this project, the recall metric will be used for training the final model of model.

recall

Confusion matrix:

A confusion matrix is used to summarize the performance of the model. The results and the metrics used are intended to show the functionality of the confusion matrix model. The confusion matrix used in this project is created only after testing, and metrics can be calculated for these values.

Hyper Parameters:

Learning rate, Regularization, Depth of layers hyper-parameters see metric tasks that can be adjusted during the training of the model. There may be many hyper-parameters, but we will fine-tune these hyper-parameters for our model to work best.

Under-fitting is that the machine learning model cannot reduce the error for the test or training set. An inappropriate model is not powerful enough to accommodate the underlying complexities of data distributions.

Overfitting occurs when the machine learning model is strong enough to fit the training set very well and the generalization error increases. As a result of this insufficient and under-fitting, the error rate within the train set increases.

Learning Rate:

If the learning rate is too low, overfitting may occur. Large learning rates help keep education streamlined, but if the learning rate is too large, education will different. Therefore, it is important to find converging or diverging learning rates. Typically, machine learning libraries preset a learning rate. As a second option, setting it up manually can show us what makes the most loss.

Regularization:

The primary reason for overfitting is that the model learns even the smallest details contained in the data. Therefore, after learning all the possible patterns it can find, the model tends to perform extremely well in the training set but fails to perform well in development and test sets. It dissipates when it encounters data that has never been seen before.

One way to avoid overfitting is to reduce the complexity of the model. And, this is the function of regulation. When we set the regularization parameter to a large value, the reduction in weights will be greater during the gradient descent update. Therefore, most hidden units must have weights close to zero.

Depth of Layers:

Depth describes the number of layers in a neural network: The more layers there are, the deeper. It is important to start from 1 layer to solve our model and it may be necessary to increase the number gradually to deepen the model. This approach helps not to complicate the model from the very beginning.

3. Features Not To Be Tested

VADS is a simulation-based system that deals with the concepts of machine learning and computer vision. As we have stated in SRS and SDD, the system aims at detecting violent content or not. Therefore, created GUI requirements and created buttons will not be included in the test plan. Related to this, hardware requirements will not be included in the test plan.

4. Item Pass/Fail Criteria

In general, according to the metrics mentioned above, the functionality of the model will be sufficient for us if the accuracy is as high as possible and the loss is as low as possible.

5. References

[1] IS502_Group1_SRS_V2.0, December 12, 2009

[2] https://serokell.io/blog/machine-learning-testing

[3] https://towardsdatascience.com/hyper-parameter-tuning-techniques-in-deep-learning-4dad592c63c8

[4] https://www.analyticsvidhya.com/blog/2018/11/neural-networks-hyperparameter-tuning-regularization-deeplearning/

[5] https://dergipark.org.tr/en/download/article-file/731590