Robot Vision Concepts - MDHSRobotics/TeamWiki GitHub Wiki

Robot Vision Concepts

Objective

The objective of this document is to lay out basic Robot Vision concepts.

Introduction

Robot Vision is a tool that can significantly increase the capabilities of a robot. At its core, robot vision is about using cameras as sensors that enables a robot to perceive more about its environment. Cameras can be used to identify objects, track objects, calculate distances, identify freinds / foes, etc.

There are essentially two flavors of robot vision: Computer Vision proper and Augmented Reality. Computer Vision is about processing the video feeds, and sending information to the robot that the robot can act upon in order to achieve a task, e.g. target is 20 ft away, bearing 20 degrees. Augmented Reality is more focus on sending information for the operator often overlaying information over a live video feed. This can greatly enhance operator capabilities. Both are very useful.

Conceptual System Architecture

The key components of any robot vision system are:

Camera Device that captures images
Frame Grabber Component that grabs an image frame from a camera
Image Processor Component that processes image data to extract information or modify image
Encoder Component that encodes a series of images into a video stream for file storage or transmission
Streaming Server Component that transmits video streams to a viewer and/or other components
Viewer Component that is used to decode and display the content of a video stream on a screen

RobotVision Conceptual Architecture

Image processing and encoding can be performed either on the robot or on the driver station. There are many possible topologies, see .

There are several constraining factors that can influence the topology selection.
Latency - It is most optimal to minimize the latency between video capture and output of video processing, especially in robotics, where real-time feedback can be critical for the human operator or for control loops. Bandwidth - There is limited bandwidth allocated to each robot during a match. Concurrently streaming multiple streams to multiple clients from multiple feeds is probably not realistic. Processing Power - On board computing power is limited, on board processing should be minimized, whenever possible.

Given these constraints, a topology that minimizes the number video streams and minimizes the number of processes that need to run on the on board processors should be favored. Minimizing the components and processes will also help to minimize latency, along with careful selection of encoding format, resolution, video quality and processing tasks.

Products

There are several good products that can be used:

OpenCV - excellent and well documented computer vision library
mjpg-streamer - a lightweight encoding and streaming server
FFMPEG - an encoding and decoding framework, compatible with opencv. It can do limited streaming
FFServer - A streaming server able to support multiple concurrent feeds, streams and clients
gstreamer - an advanced library for building media handling components.

Streaming video

the many ways to stream video using rtp and rtsp explains the preferred approach to stream live video
several online sources explain that UDP results in lower transmission latency than other protocols
The FMS whitepaper describes the bandwidth allocation limit per team and video streaming bandwidth requirements.

Simple Option

A simple option, used by team 0116 and some other teams in Blue Alliance is to use mjpg-streamer.

mjpeg-streamer option

In this option, mjpeg-streamer is configured to use the opencv input plug-in. The opencv input plug in allows the addition of a opencv fileter to implement image processing code. Possible uses of this code are to implement augmented reality and overlay information that can be useful to the operator, transform to another view, like edge detection and feed into another process or possibly send information to the robot. The output can be configured to stream the output in a few formats or capture the output to file.

This option works well when the image processing is performed on the same machine as the robot code (e.g. on the roboRio) and uses standard cameras, like a usb or ip camera. This option does not allow for non-standard video capture drivers, like the Zed drivers. If the image processing is performed on a separate machine (e.g. on a beagle board, Tegra, etc.) then one needs to solve for the discovery and communication between the processes running on separate machines.

Custom Option

An ideal solution would enable additional features, such as

discovery of the robot
configuration
communication with the console
flexibility to leverage complex drivers for stereoscopic cameras

These can more easily be implemented in a custom option. The existing configuration and web sockets framework can be used to configure the robot process and solve the inter-process communications. The console can be used to adjust image capture, processing and streaming settings.

Custom Robot Vision

Things to think about

RobotPy opencv31 build for RoboRio announcement
RobotPy opkg feed which contains an opencv31 build that can be deployed to the RoboRio
Negligible latency using OpenCV and UDP