Localization - umrover/mrover-ros GitHub Wiki

What is Localization?

In simple terms, localization is just figuring out where the robot is. In the case of our rover, we need to know what point it is located at in 3D space, as well as what angle it is rotated to in each axis. This information is also known as a pose (see this page). We need to have this data at all times, and it needs to be updated frequently enough to keep up with the rover's motion. We typically estimate this data by measuring it with a variety of sensors and then processing that sensor data with a variety of fusion and filtering algorithms to make it more accurate.

Sensors

GPS

GPS (Global Positioning System) is a system that uses signals from an array of satellites orbiting earth to figure out where you are on the planet (read more about how this works here). We use a GPS unit mounted to the rover, which outputs our position on the earth in degrees latitude and degrees longitude.

TODO: copy/adapt David's RTK wiki page (and any other useful pages) from the old mrover workspace into here

GPS Driver

We use an ArduSimple RTK2B Budget GPS receiver. Once configured, this receiver outputs NMEA messages (full reference here) over USB, which contain all of the GPS data we need for our autonomy system. In order to get them from serial messages on a USB port to ROS messages sent to our autonomy nodes, we need a GPS driver.

Fortunately, there is already a ROS package that handles this for us, called nmea_navsat_driver. This package can be configured to read from a certain USB port at a certain baud rate, and to accept messages that produce either estimated position covariance or estimated velocity. These parameters are configured in config/esw.yaml. The driver will then publish NavSatFix messages to the \fix topic, which contains the information we want.

IMU

An IMU (Inertial Measurement Unit) is a device that contains several sensors that are used in combination to measure orientation and movement. A 9DoF (Degree of Freedom) IMU like the one we use consists of:

A 3 axis gyroscope, which can measure angular velocity in each axis
A 3 axis accelerometer, which can measure linear acceleration in each axis
A 3 axis magnetometer, which essentially acts as a compass and can measure the direction of magnetic north in each axis

All of this data is then combined using sensor fusion algorithms (more on this later) in order to produce a 3D orientation estimate.

IMU Driver

We use an Adafruit BNO055 IMU. It is centered around the Bosch BNO055 chip, which is what contains all of the sensors. This IMU has onboard sensor fusion capabilities, as it can (allegedly) produce a globally accurate 3D orientation measurement using its own onboard black box sensor fusion algorithms. It can provide a wide range of data over I2C and UART.

In order to access the data it measures, both the raw sensor readings and the filtered orientation estimate, we obviously need to read them from the sensor into our computer. Since the only good library for this sensor only supports I2C, and ordinary computers can't read I2C, we need an intermediary processor to read it for us and convert the data to serial data that the computer can actually read. For this we are using an Arduino Nano Every microcontroller. We have an Arduino sketch kept in the embedded-testbench repository which uses the Adafruit BNO055 Unified Sensor Library to read the IMU data over I2C and then publish it over a serial connection. The arduino is then plugged into our main Jetson computer over USB so the data is accessible on a serial port.

To get this data from serial to the ROS network, we have an IMU driver node. this node reads the IMU data over serial and then publishes it to several standard ROS messages, which other nodes can then subscribe to. The IMU driver node is configured in config/esw.yaml. The specific data published includes:

IMU: Imu messages on the /imu/data topic
Magnetometer: MagneticField messages on the /imu/magnetometer topic
IMU Temperature: Temperature messages on the /imu/temp topic
IMU Calibration: custom CalibrationStatus messages on the /imu/calibration topic

For information on the design process and decisions behind the IMU driver, read this discussion.

TODO: add section about cameras and visual odometry

GPS Linearization

The geodetic coordinates (latitude, longitude, altitude) that the GPS gives us are very hard to work with, since they can be very inconvenient and unintuitive over small distances. To solve this problem, we can linearize them into cartesian coordinates (x, y, z) using an ENU local tangent plane approximation.

This approximation essentially assumes the earth is flat in the area where the rover is driving. In our case this is actually a pretty good approximation, since the rover only ever drives a few kilometers during the auton mission, and this distance is very small relative to the size of the earth.

The linearization is computed by first converting the geodetic coordinates from the GPS to ECEF (earth centered, earth fixed) coordinates using the WGS 84 ellipsoid model of the earth. These ECEF coordinates are cartesian coordinates, but their origin (0,0,0) is at the center of the earth. These coordinates are then transformed to be centered at the tangent plane around the rover. The equations for this transform can be found here.

The local tangent plane is centered at a reference geodetic coordinate, which is a (latitude, longitude, altitude) coordinate that we choose which is close enough to the area of operation that the linearization approximation is accurate. This means the reference coordinate needs to be changed when the rover is being operated in very different geographical location, such as when we move from Michigan to Utah for our competition.

Accuracy

It's difficult to measure the accuracy of the tangent plane linearization, since we don't have any pairs of ground truth geodetic coordinates that are a known distance apart (maybe you could find some of these online). What we've done to measure accuracy is use this calculator to calculate distance between geodetic coordinates (that were around 30 km apart) and compare that calculated distance to the distance calculated through linearization. The calculator uses Vincenty's Formulae to calculate distance, which is accurate within a millimeter. Unfortunately the calculator is only precise to 1 meter, so the only hard accuracy bound we can give is within 1 meter across 30km. If we want a tighter bound, we could find a higher precision calculator that uses the same formula and compare to that.

Having a tight accuracy bound probably isn't all that important, because small inaccuracies probably won't be reflected in our system's actual performance. This is because our target geodetic waypoints are linearized using the same algorithm, so effectively both our measured position and our target position are being mapped through the same function. The properties of this linearization function mean that even if it maps our current GPS geodetic coordinate to an incorrect cartesian coordinate, it will map our target coordinate to an incorrect cartesian coordinate in the same way. This means our distance to the target will still approach zero as we get closer to the target, and the distance between position and target will have all of the properties required for our navigation to work, even if the actual distances are inaccurate.

Guide to Localization Frames

When determining and defining where the robot is, we have to deal with an inherent tradeoff in data quality. Usually data sources can be either locally accurate or globally accurate, but it is much more difficult to get a single source of data that is both globally and locally accurate.

This problem is addressed by the standards introduced in REP 105 (which you should read to get a general overview). Unfortunately, the REP uses some misleading terms and gives a somewhat confusing explanation, so here is a hopefully more clear way of explaining it.

First, a few important notes and definitions:

local vs global accuracy

In the context of localization, we often talk about local and global accuracy, so we need to be very clear about what these things mean. Locally accurate data must change in a smooth way, i.e. no discrete/discontinuous jumps. However, it may drift over time without bounds, meaning it can gradually accumulate large errors. On the other hand, globally accurate data must not drift significantly over time, but may have discrete/discontinuous jumps at any time. These definitions are of course a little bit vague, since they somewhat depend on the idea of a short vs long period of time, but this isn't usually a problem since we don't deal with a lot of significant edge cases.

abstraction of sensor data sources into transforms

For the purposes of this explanation, we will abstract away any sensor processing/fusion algorithms and just imagine all localization data sources as providing a transform telling us where the robot is located. A global sensor transform will be relative to a fixed frame, in this case the map frame. A local sensor transform on the other hand may be relative to any arbitrary starting point, so long as the transform obeys the rules of local accuracy.

`map`

Since pose is relative (see SE3 wiki entry), we need to define where the robot is relative to some "fixed" frame. Fixed in this case means it does not move relative to the thing/place we are navigating in, which we will call the world. We will use the "map" frame as our fixed frame, and the pose of the robot will be defined relative to that frame, i.e. it will be defined as a transform from map to the robot.

`base_link`

The base_link frame is simply the frame of the robot. It is rigidly attached to the robot, and in our case is located at the center of the chassis (TODO: is this true?).

`odom`

The odom frame is an intermediate frame in between map and base_link. It doesn't really have a physical representation, but it gives us a good way to separate local and global data sources. Odom essentially acts as a local reference frame for the robot. This means that the pose of the robot relative to the odom frame should always be locally accurate, but doesn't need to be globally accurate.

There are two transforms we need to define in order to connect these three frames, which thereby defines the pose of the robot:

`odom_to_base_link`

The transform from the odom frame to the base_link frame, which we will call odom_to_base_link, will be defined using locally accurate sensor data. This means the pose of the robot in the odom frame (which is the exact same thing as this transform) will be locally accurate, i.e it can drift over time, but must always change smoothly and continuously without discrete jumps.

`map_to_odom`

The transform from the map frame to the odom frame, which we will call map_to_odom in this case, will be defined using globally accurate sensor data. Our goal here is to use this data to make the transform from map to base_link (which is equal to map_to_odom * odom_to_base_link) globally accurate, i.e it will not drift over time, but it may change non-continuously with discrete jumps. We want to do this by only changing the map_to_odom frame, which means we have to first figure out what the odom_to_base_link transform is (by asking the TF tree) and then "subtract" that from our global sensor transform in order to determine the correct map_to_odom transform.

Using these frames in practice

The main benefit of this system is having an organized way of separately accessing local and global localization data. For applications where you need a locally accurate localization source, you can get the odom_to_base_link transform from the TF tree and use that as your robot pose. Similarly, for applications where you need globally accurate localization, you can get the map_to_odom transform from the TF tree and use it as your robot pose (for exact syntax, see the wiki section about the TF tree or just google it).

Here's a simple example of using these localization frames: suppose we have a simple 4 wheeled robot equipped with a GPS unit and wheel encoders. The GPS provides a globally accurate source of localization data, while the wheel encoder data can be fed into a sensor processing algorithm to produce a locally accurate source of localization data.

In order to use these two data sources, our first step is to publish the wheel encoder localization data as the odom_to_base_link transform to the TF tree. Similarly, we will also use the GPS data to publish the map_to_odom transform to the TF tree, as described in the map_to_odom section. Once this is done, we can actually use this data for some autonomous routines.

Suppose we want to write a function that rotates the robot 90 degrees in place. Since this is a routine that won't take very much time, is only based on relative measurements, and needs to be quite accurate on a local scale, we want to use a locally accurate data source. To do this, we will get the odom_to_base_link transform from the TF tree, look it's rotation, and then send the robot drive commands until the rotation we read from odom_to_baselink has increased by 90 degrees.

Now suppose we want to write a function to drive to a specific waypoint on our map, a mile away. Since this routine will take a while and is based on absolute measurements (relative to the map frame), we need to use a globally accurate data source to avoid drift and so we can measure relative to the map frame. To do this, we will get the map_to_odom transform from the TF tree, look at its position, and then use a drive controller to drive the robot in the direction of the waypoint until the position we read from map_to_odom is close enough to our target waypoint.

Global EKF

To get a globally accurate estimate of the rover's pose, we use an Extended Kalman Filter (EKF) that fuses GPS and IMU data. The EKF outputs a more accurate, less noisy pose estimate which is then published to the TF tree effectively as the map->base_link transform.