Methodology ‐ Deployment - trap-fish/uav-human-detection GitHub Wiki
Deployment
A pivotal part of this project was the creation of a device capable of assisting search and rescue operations; as such for aid groups, charities and other non-governmental organisations to utilize such hardware, the cost would need to be as low as possible. This was one of the key reasons for selecting a Raspberry Pi 5 Model B+ for the target device. While Hailo’s Hailo8-L Neural Processor was an obvious choice for the accelerator given the cost and speed benefits. Two model formats were selected for deployment, OpenVino, which can run on the PI’s ARM CPU and Hailo Executable File, which is needed for the NPU.
As such, following training, all models were exported using the following command for openVino:
model.export(format='openvino', int8=True, imgsz=imgsz, data=data_path, batch=1, device='cpu')
int8
was for quantization to an 8-bit integer format, which was needed for a fair comparison with the .HEF models that required quantization to int8 in order to run on the NPU device. Likewise, a batch size of one was selected since this was the expected input to the model from a video stream once it was ready for real world deployment.
Exporting to ONNX format followed a similar approach, but without the need to quantize. The ONNX model could then be inspected using netron.app to identify end nodes needed for parsing; followed by optimisation of the model (which included quantization to int8) before finally compiling it HEF format. This is detailed in the flowchart below from the Hailo Data Flow Compiler documentation:
Hailo Software Suite
The Hailo Software Suite was used to parse ONNX models, then optimize and compile. Two methods exist for this, which include the Data Flow Compiler and Hailo Model Zoo. Hailo Model Zoo is more user friendly in regards to setup and use, however using customized models can be tricker. Therefore the Data Flow Compiler approach was taken. The recommended way to setup the DFC was using Hailo AI’s Software Suite (Hailo-AI SW), version 2025.1 was used for this project. This was setup using a Docker Container on an intel based Ubuntu 20.04 machine.
Within the Hailo-AI SW container, there were a range of tutorials in the form of Jupyter notebooks – in the end these notebooks were adapted for the YOLOv11 and YOLOv5 models used and can be found in the code base.
The parsing step involved identifying end nodes, for example this code was used for the YOLOv11n-P2 model:
runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(
onnx_path,
onnx_model_name,
start_node_names=["/model.0/conv/Conv"],
end_node_names=["/model.21/cv2.0/cv2.0.2/Conv",
"/model.21/cv3.0/cv3.0.2/Conv",
"/model.21/cv2.1/cv2.1.2/Conv",
"/model.21/cv3.1/cv3.1.2/Conv",
"/model.21/cv2.2/cv2.2.2/Conv",
"/model.21/cv3.2/cv3.2.2/Conv"],
net_input_shapes={"/model.0/conv/Conv": [1, 3, 640, 640]},
)
During optimization, the following example was used for the YOLOv11n-P2
alls = """
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
change_output_activation(conv44, sigmoid)
change_output_activation(conv55, sigmoid)
change_output_activation(conv65, sigmoid)
model_optimization_config(calibration, batch_size=16, calibset_size=64)
nms_postprocess("/local/shared_with_docker/visdrone/postprocess_config/yolov11p2_nms_config_visdrone.json", meta_arch=yolov8, engine=cpu)
allocator_param(width_splitter_defuse=disabled)
"""
These output layers were again identified by inspecting the HAR file (which is the parsed model) and finding the output nodes which will have been renamed (for example to conv44). These same output nodes were added to the NMS configuration files which need to be adapted for these customized models. Similarly, the NMS thresholds were also adjusted since this was still in the evaluation phase, with an NMS score and IOU threshold of $0.001$ and $0.6$ respectively. The NMS configs can be found in the repo, for example, YOLOv11-P2 is here The DFC configurations provided would then handle quantization, which required a calibration dataset and a dataset for post-quantization finetuning. The VisDrone-Humans training dataset was provided for both, with the calibration data being a 256 image subset, while 3000 images were used for finetuning. Before compiling, the quantized HAR was evaluated using Faster COCO Eval on the validation dataset and when completed, compiled to HEF by simply using:
runner = ClientRunner(har=quant_model_har_path, hw_arch=chosen_hw_arch)
hef_model_name = quant_model_har_path.replace(".har", ".hef")
print(f"compiling har model to {quant_model_har_path}\nto {hef_model_name}\nhardware arch: {chosen_hw_arch}")
hef = runner.compile()
Raspberry Pi
All OpenVINO and HEF models were then transferred to the Raspberry Pi’s storage system. At this point, Hailo Runtime (HailoRT) was installed onto the device at which point the ‘HRT’ tutorials from the Hailo Tutorial library were followed and later adapted for the compiled HEF models. The following dependencies were used for Hailo runtime to work
# Import Hailo Runtime dependencies
from hailo_platform import (
HEF,
ConfigureParams,
FormatType,
HailoSchedulingAlgorithm,
HailoStreamInterface,
InferVStreams,
InputVStreamParams,
OutputVStreamParams,
VDevice
)
Inference was run with the following setup:
# Load the compiled HEF to Hailo device
root_path = '/home/trap-fish/uav-human-detection/hailo-ai/shared_with_docker/visdrone/models/best/hef/'
model = 'yolov11s_p2.hef'
hef_path = os.path.join(root_path, model)
hef = HEF(str(hef_path))
# Set VDevice (Virtual Device) params to disable the HailoRT service feature
params = VDevice.create_params()
params.scheduling_algorithm = HailoSchedulingAlgorithm.NONE
# Create a Hailo virtual device with the specified parameters
target = VDevice(params=params)
# Get the "network groups" (connectivity groups, aka. "different networks") information from the .hef
# Configure the device with the HEF and PCIe interface
configure_params = ConfigureParams.create_from_hef(hef=hef, interface=HailoStreamInterface.PCIe)
network_groups = target.configure(hef, configure_params)
# Select the first network group (there's only one in this case)
network_group = network_groups[0]
network_group_params = network_group.create_params()
# Create input and output virtual streams params
# These specify the format of the input and output data (in this case, 32-bit float)
input_vstreams_params = InputVStreamParams.make(network_group, format_type=FormatType.FLOAT32)
output_vstreams_params = OutputVStreamParams.make(network_group, format_type=FormatType.FLOAT32)
# Get information about the input and output virtual streams
input_vstream_info = hef.get_input_vstream_infos()[0]
output_vstream_info = hef.get_output_vstream_infos()[0]
def run_inference(vstream_params, input_data):
network_group, input_vstreams_params, output_vstreams_params = vstream_params
#input_data = {input_vstream_info.name: input_tensor}
with InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
with network_group.activate(network_group_params):
infer_results = infer_pipeline.infer(input_data)
return infer_results
Using this inference could be run using video streaming, or directly from the Raspberry Pi’s camera. However, since this above example was for evaluation, it was only used to stream the validation dataset so the compiled model could be evaluated. During inference on OpenVINO models and the HEF models, the power consumption to the Raspberry Pi was recorded using a USB as seen below. From this, the power consumption in Watts was logged over the course of the inference session and the average was taken as the result shown in the report.
Finally, all inference results were converted to COCO format for evaluation using Faster COCO Eval, from which the AP50 and AP50-95 metrics were extracted for the final report.