Robot Vision Concepts - MDHSRobotics/TeamWiki GitHub Wiki
Robot Vision Concepts
Objective
The objective of this document is to lay out basic Robot Vision concepts.
Introduction
Robot Vision is a tool that can significantly increase the capabilities of a robot. At its core, robot vision is about using cameras as sensors that enables a robot to perceive more about its environment. Cameras can be used to identify objects, track objects, calculate distances, identify freinds / foes, etc.
There are essentially two flavors of robot vision: Computer Vision proper and Augmented Reality. Computer Vision is about processing the video feeds, and sending information to the robot that the robot can act upon in order to achieve a task, e.g. target is 20 ft away, bearing 20 degrees. Augmented Reality is more focus on sending information for the operator often overlaying information over a live video feed. This can greatly enhance operator capabilities. Both are very useful.
Conceptual System Architecture
The key components of any robot vision system are:
- Camera Device that captures images
- Frame Grabber Component that grabs an image frame from a camera
- Image Processor Component that processes image data to extract information or modify image
- Encoder Component that encodes a series of images into a video stream for file storage or transmission
- Streaming Server Component that transmits video streams to a viewer and/or other components
- Viewer Component that is used to decode and display the content of a video stream on a screen
Image processing and encoding can be performed either on the robot or on the driver station. There are many possible topologies, see .
There are several constraining factors that can influence the topology selection.
Latency - It is most optimal to minimize the latency between video capture and output of video processing, especially in robotics, where real-time feedback can be critical for the human operator or for control loops.
Bandwidth - There is limited bandwidth allocated to each robot during a match. Concurrently streaming multiple streams to multiple clients from multiple feeds is probably not realistic.
Processing Power - On board computing power is limited, on board processing should be minimized, whenever possible.
Given these constraints, a topology that minimizes the number video streams and minimizes the number of processes that need to run on the on board processors should be favored. Minimizing the components and processes will also help to minimize latency, along with careful selection of encoding format, resolution, video quality and processing tasks.
Products
There are several good products that can be used:
- OpenCV - excellent and well documented computer vision library
- mjpg-streamer - a lightweight encoding and streaming server
- FFMPEG - an encoding and decoding framework, compatible with opencv. It can do limited streaming
- FFServer - A streaming server able to support multiple concurrent feeds, streams and clients
- gstreamer - an advanced library for building media handling components.
Streaming video
- the many ways to stream video using rtp and rtsp explains the preferred approach to stream live video
- several online sources explain that UDP results in lower transmission latency than other protocols
- The FMS whitepaper describes the bandwidth allocation limit per team and video streaming bandwidth requirements.
Simple Option
A simple option, used by team 0116 and some other teams in Blue Alliance is to use mjpg-streamer.
In this option, mjpeg-streamer is configured to use the opencv input plug-in. The opencv input plug in allows the addition of a opencv fileter to implement image processing code. Possible uses of this code are to implement augmented reality and overlay information that can be useful to the operator, transform to another view, like edge detection and feed into another process or possibly send information to the robot. The output can be configured to stream the output in a few formats or capture the output to file.
This option works well when the image processing is performed on the same machine as the robot code (e.g. on the roboRio) and uses standard cameras, like a usb or ip camera. This option does not allow for non-standard video capture drivers, like the Zed drivers. If the image processing is performed on a separate machine (e.g. on a beagle board, Tegra, etc.) then one needs to solve for the discovery and communication between the processes running on separate machines.
Custom Option
An ideal solution would enable additional features, such as
- discovery of the robot
- configuration
- communication with the console
- flexibility to leverage complex drivers for stereoscopic cameras
These can more easily be implemented in a custom option. The existing configuration and web sockets framework can be used to configure the robot process and solve the inter-process communications. The console can be used to adjust image capture, processing and streaming settings.
Things to think about
- RobotPy opencv31 build for RoboRio announcement
- RobotPy opkg feed which contains an opencv31 build that can be deployed to the RoboRio
- Negligible latency using OpenCV and UDP