Software Requirements Specification - CankayaUniversity/ceng-407-408-2020-2021-Violent-Activity-Detection-from-Videos GitHub Wiki

Software Requirements Specification

VADS: Violent Activity Detection System

Arda Efe ŞEN – 201611053

Özgün GÜLER - 201611023

Esin GÖKÇE – 201611022

Version 1.0

4/12/2020

Table of Contents

1 Introduction

1.1 Purpose

1.2 Scope

1.3 Definitions, Abbreviations, Acronyms

1.4 Overview

2 Overall Description

2.1 Product Perspective

2.2 Product functions

2.3 User characteristics

2.4 Constraints

2.5 Assumptions and Dependencies

3 Specific Requirements

3.1 External Interface Requirements

3.1.1 User Interface

3.1.2 Hardware Interface

3.1.3 Software Interface

3.2 Functional Requirements

3.3 Performance Requirements

3.4 Security Requirements

3.5 Design Constraints

3.6 Software System Attributes

References

1 Introduction

The following subsections are an overview of the entire Software Requirements Specification (SRS) document.

1.1 Purpose

We have explained all the software requirements of the Violent Activity Detection System with this document. In addition to the user characteristics, constraints, specific requirements of the system, product functions we have explained, the main topics that IEEE Std 1016-1998 [1]. standards take into account with this document The Software program developed to provide violent detection from certain videos for a basic surveillance system with Computer Vision (CV) is extensively described in this SRS.

1.2 Scope

VIOLENT ACTIVITY DETECTION SYSTEM (VADS) is a project that aims to identify certain violent contents in Surveillance Systems and convey them to the user. This system starts when the user enters the graphical interface and presents multiple videos played on the screen. The System sends a warning message to the user in case of any violence. VADS uses violence perception from the dataset we constructed from some videos contains violent activity on the internet. VADS aims to create a fast and reliable system interface that detection by offline. The purposes of the project are:

  •       To any user to access the VADS interface offline.
    
  •      To detect violent activity from videos.
    
  •      To inform the users if there is any violent content.
    

1.3 Definitions, Abbreviations, Acronyms

TERMS DEFINITIONS
IEEE Institute of Electrics & Electronics Engineering
SRS Software System Requirements
CNN Convolutional Neural Network
SVM Support Vector Machine
MBH Motion Limit Histogram
KNN K- Nearest Neighbour
CKS Cubic Kernel Support
SDD Software Design Document
MoSIFT Scale-invariant feature transform
DBN Dynamic Basian Network
KDE Kernel Density Estimation
AI Artificial Intelligence
CV Computer Vision

1.4 Overview

This document has five main titles. The first main title is Introduction. The introduction generally explains the main purpose, the scope of this project. The second main title is Overall Description, This part shows the dependencies, constraints, and tools used. In the third section, “Specific Requirements” has all the requirements, design constraints, and Software System Attributes of the system. In the fourth section, it includes Appendices. The fifth section and the last one has Index.

2 Overall Description

2.1 Product Perspective

The VADS system is a general video processing application. It is a security system simulation application with machine learning and computer vision algorithms. The outline of the method is as follows:

● The first step is to collect the videos on the Internet that generally contain violence. This dataset and the datasets in the literature will be ground-truth for this system. The VADS system simulates a security system. Therefore, it is assumed that the videos we will use in the dataset are taken from a security camera. All subsequent processing steps are shaped according to this assumption.

● Each video played in this dataset will be divided into frames and processed.

● In every frame created, noise reduction methods will be applied. Noise reduction is the process of removing noise from within an image. Noise is referred to as grainy texture when the pixel values in an image gradually transition from light to dark. Noise detection is a pre-processing method and will be used to improve system accuracy.

● We will examine how many of these noiseless frames, respectively, achieved remarkable results in processing.

● The interpolation method will be used to protect the inter-frame connection. Interpolation, in its simplest definition, is the process of estimating known points to estimate values at unknown points. Image interpolation tries to get the best approximation of the color and density of a pixel based on the values in the surrounding pixels of an image.

● Optical flow can be thought of as the movement of objects in the field of view. The projection is taken from the three-dimensional space to the two-dimensional plane as the edges of the object's image shrink and grow. As the edges move, it will have a speed relative to other objects in the viewing angle. The optical flow technique is the activity of finding the strength and direction of this velocity. In these created frames, new features will be obtained by adapting the optical flow method. Different features will be studied by combining the features obtained by the optical flow method and processing them.

● Classification results will be obtained using the obtained features, deep learning, and traditional machine learning algorithms.

● By examining the classification results, it is aimed to use the most appropriate method or set of methods.

The research we have done so far has shown us that similar Machine learning algorithms and techniques have been used in most studies developed on the Problem. Here, according to our research, the methods and artificial AI and CV algorithms used to detect violent videos are mentioned :

In this study, the MoSIFT descriptor, Motion Limit Histogram (MBH), and motion filtering algorithm are added to improve both accuracy and complexity. (Febin, Jayasree, & Joy, 2019)

In this study, the three-dimensional Convolutional Neural Networks (3D ConvNet) method was used for action recognition with 94.3% crowd density. (Song, Yu, Zheng, Wang, & Zhang, 2019)

Support vector machines are used in this study. (Chen, Su, & Hsu, 2011)

In this study, it is aimed to detect violent acts of voice cues using CNN and SVM algorithms. (Mu, Cao, & Jin, 2016)

In this project, Kernel Density Estimation (KDE) was used to increase the success result and it was aimed to better detect the videos. SVM is used with the Sparse coding method. 94% of success was achieved in the hockey and crowd violence data set. (Xu, Gong, Yang, Wu, & Yao, 2014)

In this project, Cubic Kernel Support tested it with the Vector Machine and K-closest classifiers on three dissimilar datasets using estimates and suggested methods. (Kaya & Keçeli, 2019)

A collective aggression indicator was generated using the Dynamic Bayesian Network and the data of visual and audio events. (Zajdel, Krijinders, Andringa, & Gavrilla, 2007)

This project aimed to detect the moving targets in the foreground with the K-Nearest Neighbor (KNN) method. Relief-F Wrapper algorithms are used to reduce the feature size. SVM is used as a classifier. (Ye, Wang, Ferdinando, Seppanen, & Alasaarela, 2018)

2.2 Product functions

In this project, it is aimed to improve the ability to focus on people's acts of violence, rather than simple actions. Several methods are presented to recognize these actions in computer vision studies, but they are still in the development and maturation stage. This project, which is generally designed as a video processing machine learning application approach, consists of three main stages. First of all, a test data set is created from the videos that can be accessed online and the model is evaluated. A detection method will be designed and the applicability of this method will be tested on the dataset. When it comes to the final stage, simulation software will be created to show the results offline.

2.3 User characteristics

A user will be required to follow the system interface. When unwanted situations occur, the personnel responsible for the relevant unit will be informed. This information function will not be elaborated. As a function, the interface that can be provided via SMS-Email or mobile application will be defined and its application will be implemented if time is available. Apart from these, the user must be familiar with computer usage.

2.4 Constraints

The product requires a powerful parallel processing server infrastructure for real-time operation. Since this infrastructure is not available, violent activity detection will be done offline, and the results will be shared with the user in an interface.

2.5 Assumptions and Dependencies

The developed product aims to provide an infrastructure that can work as a web or desktop application. Simulation software will be designed to display the results offline, and the number of screens shown in this simulation will be realized as much as the infrastructure/system to be used allows.

3 Specific Requirements

3.1 External Interface Requirements

3.1.1 User Interface

VADS does not require any user login or registration. It offers multiple video screens that users can watch for system monitoring. In this way, it is provided to monitor and detect violence on multiple screens. There is a home screen where the user can watch videos. Apart from this, there is no need for any other user interface.

3.1.2 Hardware Interface

The VADS is a security system simulation. The method to be used in the project requires devices that require high computing power. Since this is not possible, the results will be displayed on the user interface after calculating offline.

3.1.3 Software Interface

Python ML libraries will be used in the infrastructure of the system to be developed.

3.2 Functional Requirements

The VADS is a security system simulation, and the images will start to flow on the screen when the system is run. These images were assumed to come from different cameras. If any act of violence is seen in the images of the system, the user using the general interface will be informed. Besides, an API will be designed to inform the security personnel responsible for the relevant region.

3.3 Performance Requirements

The classification performance of the product may not be very promising when the literature is examined. A separate data set will also be collected for this study and used with the datasets from existing studies. The performance of the system may be low as this will be new and possibly challenging data. As the overall algorithm complexity of the system might be high as mentioned in the previous sections, it is impossible to reach real-time calculation performance with the current hardware infrastructure.

3.4 Security Requirements

Since the real-time application is a closed-circuit system, there will be no need for security concerns. If the system is web-based, attention should be paid to software security to prevent outside access and transmit video recordings encrypted. This issue will not be covered under this project.

3.5 Design Constraints

The user interface will be designed for easy use. The user does not have to have high skill or experience. It is sufficient for the user to have basic computer skills to use the product.

3.6 Software System Attributes

Accuracy: The classification performance of the system will be examined by examining existing methods and making improvements. The performance is not expected to be very high.

References

Chen, L. H., Su, C. W., & Hsu, H. W. (2011). Violent Scene Detection in Movies. International Journal of Pattern Recognition and Artificial Intelligence, s. 1161-1172.

Febin, I. P., Jayasree, K., & Joy, P. T. (2019). Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm. Pattern Analysis and Applications. Pattern Analysis and Applications.

Kaya, A., & Keçeli, A. S. (tarih yok). Violent activity detection with the transfer. Electronics Letter, s. 1047-1048.

Mu, G., Cao, H., & Jin, Q. (2016). Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features. Pattern Recognition, s. 451-463.

Song, W., Yu, J., Zheng, R., Wang, A., & Zhang, D. (2019). A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks. IEEE Access, s. 39172-39179.

Xu, L., Gong, C., Yang, J., Wu, Q., & Yao, L. (2014). Violent Video Detection Based on MoSift feature and Sparse Coding. IEEE International, s. 3538-3542.

Ye, L., Wang, L., Ferdinando, H., Seppanen, T., & Alasaarela, E. (2018). A Video-Based DT–SVM School Violence Detecting Algorithm. Sensors.

Zajdel, W., Krijinders, J. D., Andringa, T., & Gavrilla, D. M. (2007). CASSANDRA: audio-video sensor fusion for aggression detection. CASSANDRA: audio-video sensor fIEEE Conference on Advanced Video and Signal Based Surveillance, s. 200-205.