BKM: Use CSRnet to Count Crowded People - OpenVisualCloud/Smart-City-Sample GitHub Wiki

Problem Statement

It’s a common request to know how many persons in an area, indoor or outdoor. Subsequently understand the distribution map to get more accurate and comprehensive information. This could be critical for making correct decisions in high-risk environment, such as stampede and riot. Among many DNN structures, CSRnet is one of which delivered a state-of-the-art crowd counting tasks.

Model Description

CSRnet chooses VGG-16 as the front-end, whose output size is 1/8 of the original input size. Then dilated convolution layers are used as the back-end for extracting deeper information and generating a heatmap which is the same size of the front-end. At the final stage, a bilinear interpolation with factor of 8 is used for scaling and making the output the same resolution of the input size.

The above image shows the CSRnet on three images from the ShanghaiTech dataset. The first line contains the original images; the second line contains the ground truth of the dataset; and the third line is the CSRnet output heatmap. The darker pixels indicate the denser crowded people.

To count the crowd, sum the values of the output matrix of the model.

Convert Model to the IR Format

The author shares the training code at https://github.com/leeyeehoo/CSRNet-pytorch, in the PyTorch format. There is no pre-trained model weights. There is another implementation in the Keras format at https://github.com/Neerajj9/CSRNet-keras. We use the weight files in our project. It’s also trained on the ShanghaiTech dataset.

The Intel OpenVINO Model Optimizer (MO) tool can't convert the Keras model directly, so we first convert it to the tensorflow format by the h5_to_pb.py python script. Please note the input keras path. Then use the MO tools to convert the tensorflow model (.pb) to IR files as follows:

cd $(OPENVINO_PATH)/deployment_tools/model_optimizer
python3 mo.py --framework tf --input_model <path_to_pb> --input_shape [1,768,1024,3] --output_dir <output_path> --mean_values [123.675,116.28,103.53] --scale_values [58.395,57.12,57.375]

The 2nd and 3rd dimension number in [input_shape], which is the input blob height and width. They can be changed to get a balance between performance and accuracy. The higher resolution will give better accuracy but lower the throughput of the inference pipeline.

The [mean_values] and [scale_values] settings are decided by the training process. Because we don’t train the model by ourselves, these settings shouldn’t be changed.

Then you can find the converted IR model in the <output_path>.

Generate the INT8 Model

The calibration tool in the Intel OpenVINO can convert models of FP32/16 to INT8 to achieve higher performance. There are two modes in the calibration process. We choose the simple mode to get better performance improvement. See also the calibration reference.

cd $(OPENVINO_PATH)/deployment_tools
python3 ./tools/calibrate.py -sm -m <path_to_FP32_IR > -s <path_test_images> -p INT8 -td CPU -e ./inference_engine/lib/intel64/libcpu_extension_avx512.so -o <output_path>