Setup & Use of Pose Detection CNN (ZED2) - Carleton-SRL/SPOT GitHub Wiki

This guide will be covering the installation of Tensorflow onto the Jetson Orin computer, with the goal being to run the CNN pose detection algorithm developed by Frank Despond. This guide assumes you are running this algorithm on a relatively fresh Jetson Orin, with JetPack 5.1.4-b17 and Ubuntu 20.04 LTS . This guide assumes you are familiar with how to work with the remote SSH terminal to ensure commands and edit Python code.

Compatibility

It is important to make sure you are using the right versions for this guide. You can check the JetPack version by running the command sudo apt-cache show nvidia-jetpack in the Jetson Orin terminal. You should see something like:

Package: nvidia-jetpack
Version: 5.1.4-b17
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1.4-b17), nvidia-jetpack-dev (= 5.1.4-b17)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.1.4-b17_arm64.deb
Size: 29298
SHA256: 5439dabb8d7a097c215602f7cd11773e4745c5e7d5841d9a0a2551a58b82883e
SHA1: b0c2faa6a9d14e056e394625a11a1a8ed8780327
MD5sum: 05fc4b73de35fd33a535fb887c1947ee
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

To check your Ubuntu version, run the command lsb_release -a and you should see something like this:

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal

Environment Preparation

For any neural network applications, you will need to make use of Tensorflow. The Jetson’s have special versions of this library, so you can’t install just any version. For the assumed configuration listed above, the following procedure can be used (see this link for the original steps from NVIDIA: https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html). This process requires you to set up a virtual environment. Failure to do so may result in broader problems with the Jetson Orin computer.

First, ensure you are in your working directory created in the previous guide. If you want to start from a fresh folder, create and navigate to this folder. I suggest:

cd Documents
mkdir ZED_CNN_Example
cd ZED_CNN_Example
python3.8 -m venv .venv

Download the last working version of Frank’s CNN code here if you don’t have a copy here: https://www.dropbox.com/scl/fi/i6b40xisvwf7xs8oxql32/FCNN_Release.zip?rlkey=415kz0x4yyld8cqe369fm2qkz&st=4y0ht05p&dl=0
Extract the entire folder and place it in the working directory you created. You are now ready to set up TensorFlow!

Installing TensorFlow

To begin, we need to install the following dependencies by running this code in the terminal.

sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran

Then, enter the virtual environment you created and upgrade pip3 :

source ./.venv/bin/activate
python3 -m pip install --upgrade pip

Next, install the main Python dependencies:

pip3 install -U testresources setuptools==65.5.0
pip3 install -U numpy==1.22 future==0.18.2 mock==3.0.5 keras_preprocessing==1.1.2 keras_applications==1.0.8 gast==0.4.0 protobuf pybind11 cython pkgconfig packaging h5py==3.7.0
pip3 install opencv-python
pip3 install matplotlib

And finally, install Tensorflow :

pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v512 tensorflow==2.12.0+nv23.06

While the code above will work for the JetPack version listed in the compatibility section, you will need to modify the installation command if you deviate from these assumptions in any way. Refer to the offical NVIDIA guide to understand how to change the command.

Running the CNN

Before we can successfully run the CNN, some changes will need to be made to the code. First, navigate into Vision_code and use the file SPOTNet_Combined_Experiment_Alex.py as your starting point. This code will start up the ZED camera and will output the pose and inference time, as well as send this pose and confidence value over UDP to a specified UDP port. The code looks like this:

import numpy as np
import cv2
from tensorflow import keras
import tensorflow as tf
import sys
import time
import SPOTPoseNet
import os
import CustomVideoDataGenerator
import CustomImageDataGenerator
import matplotlib.pyplot as plt
import math
from datetime import datetime
import socket
import recordStereo
import struct
import signal

should_exit = False
outputYawSize = 128
image_size = 320
stop_count = 500
x_median = 1.46221606
y_median = -0.05725426

# save_dir = os.path.join(os.path.expanduser('~/sdcard'),'Combined_Experiment_Saved_ZED_Images')

# if not os.path.exists(save_dir):
#     os.makedirs(save_dir)

image_buffer = []

def signal_handler(sig, frame):
    global should_exit
    print("Images saved. Exiting")
    should_exit = True

##################################################
        #### Load SPOTNet Weights  #####
##################################################

print(tf.__version__)
keras.backend.clear_session()
path_to_weights = 'final_models/Final_Trained_Experiment_June_Small_3202021-06-13-08_15.h5' #sys.argv[1]

print("loading with leaky relu")
model = keras.models.load_model(path_to_weights,  custom_objects={'LeakyReLU': tf.keras.layers.LeakyReLU,
                                                                     'custom_loss': SPOTPoseNet.custom_loss,
                                                                     'custom_metric': SPOTPoseNet.custom_metric})
model.summary()
zedCamera = recordStereo.StereoCamera(8, 'MJPG', 30.0, 2560, 720)
zedCamera.initImageRecord()
left = np.zeros(shape=(1, image_size, image_size, 1))
right = np.zeros(shape=(1, image_size, image_size, 1))

frame_counter = 0

signal.signal(signal.SIGINT, signal_handler)

while True:

    if should_exit:
        break

    ##################################################
                #### Run SPOTNet  #####
    ##################################################

    start = time.time()

    image, timestamp = zedCamera.getImageAndTimeStamp()

    if len(image.shape) == 2:
        continue

    # frame_filename = f"{save_dir}/frame_{frame_counter:06d}.png"
    # cv2.imwrite(frame_filename, image)
    frame_counter += 1

    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    stereoFrame = np.split(image, 2, axis=1)

    left[:, :, :, 0] = cv2.resize(stereoFrame[0], (image_size, image_size))
    right[:, :, :, 0] = cv2.resize(stereoFrame[1], (image_size, image_size))

    stacked_stereo = np.concatenate((left, right), axis=3)

    output = model.predict(stacked_stereo)
    end = time.time()

    maxIdxVal = 0
    maxIdx = 0
    outputIndexs = output[2]

    for yawIdx in range(0, outputYawSize):
        if float(outputIndexs[0][yawIdx]) > maxIdxVal:
            maxIdxVal = float(outputIndexs[0][yawIdx])
            maxIdx = yawIdx

    yaw = ((maxIdx + 1) / outputYawSize) * 2 * math.pi - math.pi
    x = float(output[0] + x_median)
    y = float(output[1] + y_median)

    print("Network Output")
    print("X: " + str(abs(x)) + " Y: " + str(y) + " Yaw: " + str(yaw) + " Confidence: " + str(float(output[3])))
    print("\nInference Speed: ", str(end-start))

    try:
        # Create a UDP socket
        sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

        # Define the address and port to send to
        server_address = ('192.168.1.110', 30172)

        # Encode the data as a byte string
        data = bytearray(struct.pack("ffff", x, y, yaw, float(output[3])))

        # Send the data
        sock.sendto(data, server_address)
    except:
        print("Failed to Send Packet")

    # try:  # Try to receive data
    #    out_data = "SPOTNet\n" + str(x) + "\n" + str(y) + "\n" + str(yaw) + "\n" + str(float(output[3])) + "\n"
    #    client_socket.send(out_data.encode())
    # except:  # If receive fails, we have lost communication with the JetsonRepeater
    #    print("Lost communication with JetsonRepeater")
    #    connected = False
    #    continue  # Restart from while True

print("\n\nCompleted\n\n")
sys.exit(0)

Here are the most important parts of the code. The path_to_weights indicates where the model weights are located. In this example, they are hard coded here:

path_to_weights = 'final_models/Final_Trained_Experiment_June_Small_3202021-06-13-08_15.h5' #sys.argv[1]

The zedCamera initializes the camera:

zedCamera = recordStereo.StereoCamera(8, 'MJPG', 30.0, 2560, 720)

The first input is the camera identifier, and this likely needs to be changed. To check the current port in use, enter this command into the terminal:

v4l2-ctl --list-devices

You should see an output like this:

NVIDIA Tegra Video Input Device (platform:tegra-camrtc-ca):
        /dev/media0

Intel(R) RealSense(TM) 515: Int (usb-3610000.xhci-1.1):
        /dev/video0
        /dev/video1
        /dev/video2
        /dev/video3
        /dev/video4
        /dev/video5
        /dev/video6
        /dev/video7
        /dev/media1
        /dev/media2

ZED: ZED (usb-3610000.xhci-1.3):
        /dev/video8
        /dev/video9
        /dev/media3

You’ll want to find the ZED entry. The number you need in this example is 8 , from the /dev/video8 entry. So the default command is correct. If everything was set up correctly, you are now ready to run the code! To run it, simply type:

python3 SPOTNet_Combined_Experiment_Alex.py

After a short wait time, you should see this output start streaming continuously:

Inference Speed:  0.13782334327697754
getImageAndTimeStamp
1/1 [==============================] - 0s 30ms/step
Network Output
X: 0.8505134582519531 Y: -0.013773031532764435 Yaw: -1.227184630308513 Confidence: 0.0003084400959778577

Inference Speed:  0.1278059482574463
getImageAndTimeStamp
1/1 [==============================] - 0s 31ms/step
Network Output
X: 0.8540260791778564 Y: -0.01333414763212204 Yaw: -1.227184630308513 Confidence: 0.0003360812843311578