System Function - person-in-hangang/HanRiver GitHub Wiki

Detection Camera

We implemented tracking using open source [1]. This open source was simply to detect multiple objects, but we transformed it to detect when a person is out of a particular location, capture the moment, and extract multiple information, such as bounding boxes and photographs.

As a result, camera 1 sends alarms to the server, extracts bounding box information, and captures photos when a person crosses the line assumed to be a railing.

The sequence of implementations is described.

Drop detection is based on the computer vision library openCV on Android. And I got a box of objects using yolov3. – Yolo is the fastest.

Using this, we hypothesized

fixing the camera on the railing of the bridge, and drew a virtual line on the screen called the railing.
And when a human object is detected, draw a bounding box.

And when the bounding box crosses the imaginary line from left to right, it is determined that the person has fallen.
And I send the picture of the drop and the information of the bounding box to the server.
If a person is detected once on the right side of the line, i.e., dropped, We solved the problem that the same person was deemed to have fallen several times by not detecting it until it was detected on the left again.

And additionally, when a fall was detected, the camera was uploaded to the database, such as the Han River Bridge, Mapo Bridge, and visual information when it fell.

And as for the performance of detail, When comparing how well each detected object is, the cup took a long time, and the human detail was the best.

So We adjusted the frame size appropriately. Growing the frame size gives us more accurate results. People can detail the frame size well enough just by making it 128, so we set the frame size to 128 for faster speed. So we focused on detecting people as quickly as possible.

Tracking Camera

We implemented tracking using open source [2]. However, this open source required the user to drag the object to be tracked directly. But our project had to automatically target people and track them. So we put a detector that automatically detects peole in front of tracking. As a result, camera number 2 detects only humans and extracts bounding boxes. Then put the bounding box in the tracker and start tracking. When tracking, the bounding box becomes smaller and smaller as people move away from the camera. At this time, set the box size in advance to recognize that a person has fallen into the water, and stop tracking when the box size is reached.

This box size was determined through experiments. In fact, the experiment was conducted with blue cloth and dolls because it could not be carried out in the Han River. After dropping the doll on a blue cloth several times, I calculated the box size shown on the tracker whenever the doll fell. And I got the average box size through the calculated box sizes.

However, since the box size calculated here is based on dolls, the final box size was determined by calculating the ratio of dolls to people.

Camera 2 also makes a picture showing what path a person fell until they fell into the water and sends it to the server. The basic picture is the moment when a person was first recognized on camera 2. On top of that, it saves all tracked paths and draws a trajectory. The trajectory is made of bounding boxes. The bounding box of the first time a person was recognized is blue and the bounding box of the moment when he fell into the water is red. And the bounding box is black during the falling. Then, a picture of the trajectory is made like this way.

This picture and location info of fallen point are sent to server.

Tracker performance was important to track the falling person well. Therefore, we chose CSRT, the best performing tracker, after trying all 8 trackers provided in openCV 3.4.8. Several trackers, such as MedianFlow and KCF, missed objects while tracking or failed to follow fast-moving objects to the end. However, CSRT does not miss objects and follows objects that move fast without any problems.

Server

We implemented Server using open source [3] & open source [4]. On the server, it manages everything in an integrated way on the server. Communication is possible through the server, and when information is exchanged, everything goes through the server.

A thread is created for each communication flow and managed by the server. Using multithreading, each piece of information can be retrieved separately. And also, Most of the big functions are handled by the server. Big features include RootNet, PoseNet, PAR, etc.

RootNet

The RootNet estimates the camera-centered coordinates of the human root, R = (xR, yR, ZR) from a cropped human image. The estimated 2D image coordinates are back-projected to the camera-centered coordinate space using the estimated depth value, which becomes the final output.

PoseNet

The PoseNet estimates the root-relative 3D pose from a cropped human image. The first part is the backbone, which extracts a useful global feature from the cropped human image using ResNet. The second, the pose estimation part takes a feature map from the backbone part and upsamples it using three consecutive deconvolutional layers with batch normalization layers and ReLU activation function. A 1-by-1 convolution is applied to the upsampled feature map to produce the 3D heatmaps for each joint.

PAR

We implemented PAR-Pedestrian Attribute Recognition using open source [5].

Pedestrian attribute recognition is to predict multiple attributes of pedestrian images as semantic descriptions in video surveillance, such as age, gender and clothing.

This technique uses PAR, namely pedestrian feature extraction algorithms. Previously, the Python server manages socket communication and predicts the transferee, and additionally implements it.

Currently, we make predictions on a total of 26 labels, and we only select the three most predictive features and send them to the application. The characteristics of the person we send are mainly gender, bag, clothes, shoes, etc.

Notice App

Notice

On the notice tab of the notice app, you can check the photos of the moment a person fell, the information of the person's body, the photos of the path where the person fell, and the location of the accident occurred. And the notice tab is divided into three blocks.

First Block

First, in the first block, information from camera 1 can be seen.

Camera 1's information is a picture of the moment a person fell. However, the application receives this information from the server. In this block, you can see pictures of the moment a person fell on the bridge.

Second Block

In the second block, user can check the information of the fallen person sent by the server.

This information is extracted by the server with a picture received from Camera 1. There is information on how old a person is and what he or she looks like and what he or she was wearing. And we can know the time of the fallen.

Third Block

In the third block, user can check trajectory picture and latitude, longitude information of accident point.

A picture of the trajectory sent by Camera 2 to the server immediately enters the application and displays it. And the latitude and longitude of the place are exposed under the picture. If you press the '지도보기' button at the top of the block, you can check the location corresponding to latitude and longitude on Google Maps.

In addition, the user can initialize all blocks by pressing the reset button on the first block. Then user can receive new information again.

Graph

The notice app's graph tab provide drown information using database.

With Firebase, this application show graphs containing accident information.

graph showing the status of fall over the past six months
a pie graph showing the status of fall by bridge

What is used here is the realtime database. The Firebase project was created and connected to the Android device (camera 1 & notice app). We used realtime DB to upload and download the data. When camera 1 uploads the drown information into the realtime DB, the notice app takes the information on its own. Camera 1 uploads date, time, and location of the bridge information to the database whenever a drown sign is detected. The notice app then takes the data, organizes them by date, and produces two graphs. Graphs are made with information from the last six months.

Reference

[1] https://github.com/matteomedioli/AndroidObjectDetection-OpenCV

[2] longpth/Android-Object-Tracking-OpenCV (link: https://github.com/longpth/Android-Object-Tracking-OpenCV)

[4] https://github.com/valencebond/Strong_Baseline_of_Pedestrian_Attribute_Recognition13