Software Design Description - CankayaUniversity/ceng-407-408-2020-2021-Violent-Activity-Detection-from-Videos GitHub Wiki

Table of Contents

1.Introduction

1.1 Purpose

1.2 Scope

1.3 Definitions, Abbreviations, Acronyms

1.4 Overview

2.Architecture Design

2.1 Approach

2.2 Tools Used

2.3 Use Case

2.4 Activity Diagram

2.4.1 Image Noise

2.4.2 Interpolation

2.4.3 Optical Flow

2.5 Interface Design

3. System Architecture

3.1 GUI Design

3.2 Datasets Available

4 Reference

1 Introduction

The Software Design Document includes information about how to create a software system. Details, interface diagrams, and usage scenario models, etc. It consists of some graphical representations to present the System. This document is prepared to be an understandable guide for any interested user.

1.1 Purpose

The concept of VADS is not only a project but also a research study. Considering the studies conducted to detect violent videos, it aims to detect with a high percentage of success the content of violence in videos using data sets containing violent videos.

1.2 Scope

VADS is targeted as a system working on simulation-based security systems. The Python programming language and OpenCV framework, which is commonly used on computer vision, is used to display more than one violent video on the screen. Software Design Document has explained all the components and design of the System for each model and has made these documents available as a guide.

1.3 Definitions, Abbreviations, Acronyms

TERMS DEFINITIONS
SRS Software System Requirements
SDD Software Design Document
PyCharm The Python IDE
OpenCV The Python Framework
TensorFlow The Python Framework
VADS Violent Activity Detection System
IDE Integrated Development Environment
GUI Graphical User Interface
NumPy The Python Library
Tkinter GUI tool

1.4 Overview

This document contains specific sections about the software design description. The first chapter includes information on how the software is developed. The second chapter gives information about architecture design and tools used. The third chapter shows and explains the System Architecture.

2 Architecture Design

2.1 Approach

VADS project is a surveillance system-based simulation project. Each image used in this project taken from the videos registered in the System. Each of these images is assumed to be from a surveillance system. In this context, the conceptual model used in this project. A conceptual model's main purpose is to make the basic function or functions of a developed system observable. The conceptual model, generally used in simulation development studies, provides an easy and proper understanding of the problem. Conceptual models have an essential role in determining the scope of specific work to be done in developing simulation systems that are both technical and manageable. The VADS project is a simulation of a security system project due to the inadequacy of cost and equipment and designed with a Conceptual Model.

2.2 Tools Used

In general, Python programming language, OpenCV, and TensorFlow framework used in Computer vision have been used. The merging of images plays a role in splitting each image into a specific frame and processing these frames by Machine learning algorithms. At this point, PyCharm is used for the IDE. Numpy library and GUI tool Tkinter are used for the interface of the system.

2.3 Use Case

usecase

Figure 1: Use Case Diagram

Use Case Description:

As shown in the figure above, the VADS system starts with the user runs the program. The user does not need to register to enter the System. On the system side, the personal information, identities of the users are not recorded in the VADS system. The system's database contains only the videos in the dataset and the images created from these videos.

When the user logs into the System, he/she sees six camera screens simultaneously and in the same window. In these windows, the videos in the dataset start to play, respectively. Each video played gets its input from the dataset. On the system side, every video played is divided into frames and saved back to the dataset. The recorded images become ready to be processed in the VADS system with the determined preprocessing methods. Each frame created is sent back to the user after machine learning algorithms have processed it. If there is any violent content in the output video sent, the user is warned.

2.4 Activity Diagram

fdhzszh

Figure 2: Activity Diagram

Activity Diagram Description:

This diagram shows the flow of the System to be created. After logging into the System, six camera videos are input from the images in the database offline. Then, these videos in the database will be processed, and the stage of dividing them into frames will be done. In the images that we split into frames, first of all, noise analysis will be done. Interpolation will be used to protect the inter-frame connection. A severity analysis will be made by applying various methods in the data with reduced noise, and the simulation will give an alarm at the point where violence is encountered. In the background, the features of the frames will be obtained by using the optical flow. If there is no violence, it will be continued with other data in the database. This System will continue until violence is detected, and the simulation is kept open.

2.4.1 Image Noise

Image noise that focuses on the random variation of brightness or color information in the images represents an arm in electronic images. Image noise can also be defined as an unwanted signal. Image noise can be caused by film grain or the shot of an ideal photon detector. More understandably, unwanted electrical fluctuations are called noise. To analyze the images in the video that we split into frames, we need to consider the image noise. Wide-scale image noise can range from imperceptible blemishes in a clean and good image to an almost entirely noisy image, and no information can be derived. Therefore, when trying to determine the subject we are trying to analyze, the noise level can make it impossible. Image noise is studied as different types. Gauss noise can be encountered while taking a digital image. The electronic circuits connected to the sensor add their electronic circuit noise, and the sensors have a natural noise due to their illumination levels and their temperatures. Although Gauss is an additive noise model, it occurs independently of signal intensity and independently in each pixel [1].

Salt and pepper noise can be called a sudden or impulsive noise. In such a noise, dark pixels will be encountered in bright areas, while bright pixels will be encountered in dark areas. This type of noise is caused when converting analog images to digital or by bit errors in transmission. They can be eliminated by dark frame extraction, combined median and mean median filtering, or interpolation around dark bright pixels. It is also called photon shot noise because the shot noise is caused by the change in photons' number in the exposure. Shot noise, which has a root mean square value, produces a Poisson distribution close to the gaussian distribution as long as it does not have very high intensity because the noise in the pixels is independent of each other.

This noise can be eliminated by the dark frame removal method. Quantization noise, which has a homogeneous distribution, occurs depending on the signal, but if other types of noise cause flickering, it can also occur independently of the signal. Film grain can be regarded as non-directional and signal-dependent noise. As the Poisson shot may be close to the noise distribution, Gaussian distribution is generally used as an accurate model. A periodic noise source is usually an electrical or electromechanical interference that occurs while capturing the image. Notch filters can be used to reduce this noise. To reduce and analyze all this noise, it is necessary to distinguish between noise or sufficient detail in the image. The algorithms used try to clean the noise while trying to preserve the fine details in the image. However, the algorithms we have cannot make this decision in the best way.

450px-Photon-noise

Figure 3: Photon noise simulation. The number of photons per pixel increases from left to right and from the upper row to bottom row [1].

2.4.2 Interpolation

A digital image interpolation is an approach used to predict data at unknown points using the available data. However, the image quality will decrease each time interpolation occurs. Therefore, results that minimize losses due to interpolation should be available. First, image interpolation works in two directions, revealing the best approximation to a pixel's color and density. It does this based on the surrounding pixels. So having more information about surrounding pixels will make interpolation better. Interpolation algorithms are divided into adaptive and non-adaptive. Adaptive algorithms contain special algorithms. These algorithms are designed to maximize non-imperfect detail, but non-adaptive algorithms treat all pixels equally and require a long processing time [2].

2.4.3 Optical Flow

Optical flow, a motion model, is the visible modeling of images between two frames made up of motion. The optical flow has also been used in some computer vision problems such as video encoding and video re-timing stabilization. In optical flow, the goal is to predict motion between frames, and it assumes that neighboring pixels have similar motion and that their intensity will not change between frames. Since the optical flow equation is created with an equation with two unknowns, the Lucas-Kanade algorithm, which is one of the methods that can solve it, is generally used [3].

opticalflow

Figure 4: shows the Optical flow representation [3].

2.5 Interface Design

interfacedesign

Figure 5 shows that the interface of the VADS system has different windows display on the screen. The System is expected to find the violent content in the videos played and display it to the user without any action from the user.

3 System Architecture

3.1 GUI Design

There is only one interface between the user and the System in the GUI. This interface is the interface that the user encounters when he/she enters the System and follows all transactions from here. For example, while the user is viewing videos from this interface, the user will receive a warning over the same interface if violence is detected.

3.2 Datasets Available

Many data sets were used to achieve high success for such projects, and the data sets consist of many violent videos. A few of them are MediaEval [4], Violent-Flow Database [5], Hockey Fight [6], violent scenes in movies datasets [7]. The MediaEval dataset includes videos of blood presence, fights, presence of fire, presence of weapons, presence of cold weapons, car chases and gory scenes, and violent acts such as gunshots, explosions, and screams for voice mode. Also, Violent Flow Datasets includes 246 videos, including crowd violence, smoking, video footage of crowded violence, violent and non-violent videos. For this project, many violent videos such as street fighting, protests, movie scenes, security camera footage found on YouTube. These videos were divided into frames and intended to be used as a dataset. Besides, it is planned to get help from the datasets mentioned above.

hockeyfight

Figure 6: Hockey Fight Dataset [6]

MediaEval

Figure 7: Sample video frames from the MediaEval VSD2014 dataset [8].

4 Reference

[1] https://en.wikipedia.org/wiki/Image_noise

[2] https://www.cambridgeincolour.com/tutorials/image-interpolation.htm

[3] https://cs.brown.edu/courses/csci1290/2011/results/final/psastras/

[4] C. H. Demarty, C. Penet, G. Gravier, M. Soleymani, A benchmarking campaign for detecting violent scenes in movies, ECCV2012 workshop on Information Fusion in Computer Vision for Concept Recognition, Firenze, October 2012.

[5] T. Hassner, Y. Itcher, and O. Kliper-Gross, Violent Flows: Real-Time Detection of Violent Crowd Behavior, 3rd IEEE International Workshop on Socially Intelligent Surveillance and Monitoring (SISM) at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Rhode Island, June 2012.

[6] Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno Garc´ıa, and Rahul Sukthankar. Violence Detection in Video Using Computer Vision Techniques

[7] C.H. Demarty,C. Penet, M. Soleymani, G. Gravier. VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. In Multimedia Tools and Applications, May 2014.

[8] M. Schedl, M. Sjoberg, I. Mironica, B. Ionescu, V. L. Quang, Y. G. Jiang, C. H. Demarty, “VSD2014: A Dataset for Violent Scenes Detection in Hollywood Movies and Web Videos,” In Proc. International Workshop on Content-Based Multimedia Indexing, 2015.