Stop to Shape Matching - Hillsborough-Transit-Authority/onebusaway-application-modules GitHub Wiki

GTFS data defines a trip as a sequence of stops. ‪There is often also shape data associated with that trip, describing the geographic path traveled by the transit vehicle during the trip‬. When importing data, OneBusAway attempts to figure out where along the shape of travel each stop lies. It's designed to be flexible, but sometimes the stop locations and the shape data don't totally agree.

Specifically, OneBusAway attempts to find the best location along a shape for each stop. The stop-to-shape matching is smart enough to deal with routes that loop and other complex shapes. However, sometimes we just can't find a good match. Specifically, if stop A comes before stop B in a trip sequence, but A comes after B when matching the stops to shape data, we have a problem. Typically, this usually indicate a problem with shape data, stop locations, or both.

When OneBusAway detects such a matching problem, it spits out an error message along with a LOT of debugging information in order to help you debug what's going on. Let's look at an example:

# potential assignments:
# stopLat stopLon stopId
#   locationOnShapeLat locationOnShapeLon distanceAlongShape distanceFromShape shapePointIndex
#   ...
47.62 -122.32 0 1_9320
  47.620079966001924 -122.32915494936833 1145.11 688.00 11
47.61 -122.33 1 1_940
  47.607038769281715 -122.33691959492059 3503.08 615.47 43
47.61 -122.34 2 1_1920
  47.61012527189429 -122.33966533081063 3102.73 28.75 40
  ^ potential problem here ^
47.61 -122.34 3 1_280
  47.61012527189429 -122.33966533081063 3102.73 28.75 40

# shape points:
47.6109478971 -122.340454014 2993.7063787125153
47.610786218 -122.340274078 3016.1819026690114
47.6107186651 -122.340210924 3025.060900783843
47.6102675258 -122.339781771 3084.654962867508
47.6092280261 -122.338930918 3216.673362909483
47.6082381679 -122.338061536 3344.5895835571296
47.6072657033 -122.337131968 3473.233312506266
47.6065280165 -122.336441622 3570.2230287296
47.6061766019 -122.336121383 3616.084785249523

The first bunch of debug output lists the lat-lon location of each stop for the trip along with each stop's ids. For each stop, it also lists the point(s) along the shape where that stop could potentially lie. We call these potential assignments. The potential assignment lists the best location along the shape for a stop, the distance along the shape for the stop (from the start of the shape, in meters), the distance of the stop from the shape, and the finally the index into the list of the shape points for the trip in shapes.txt. Regarding distance along the shape, as we mentioned, if you imagine snapping each stop to the shape and measuring the distance from the start of the shape to the stop, you'd hope that the distance is always increasing aka a bus should never travel backwards. The second bunch of debug output is the actual shape data we are attempting to match to.

Let's look at our example debug output again. We started with stop 1_9320, found a point along the shape that's 1145 meters in and all looks good so far. Same for the second stop 1_940 (3503 meters along the shape at this point). But then we look at stop 1_1920 and we see that the distance along the shape is 3102.73 aka the bus moved backwards 400 meters! Something is not right here.

So what's going on? As mentioned, it's usually one of two things:

  1. The shape data is wrong
  2. The stop data is wrong

At this point, it's usually best to visualize what's going on and I even wrote a simple tool to help:

http://developer.onebusaway.org/maps/

This is a little widget I put together for quickly plotting things on a map

‪For example, you can copy the following stop locations into the big text box in the upper right:‬

47.62 -122.32 0 1_9320
47.61 -122.33 1 1_940
47.61 -122.34 2 1_1920
47.61 -122.34 3 1_280

And hit "Map" with "Points" selected, and it should plot the position of the four stops. It will even show the stop id in a pop-up info-window if you click on a marker, which is handy for figuring out which markers corresponds to which stop.

We can also visualize some shape data. Copy the following points into the text box in the upper right:

47.6109478971 -122.340454014 2993.7063787125153
47.610786218 -122.340274078 3016.1819026690114
47.6107186651 -122.340210924 3025.060900783843
47.6102675258 -122.339781771 3084.654962867508
47.6092280261 -122.338930918 3216.673362909483
47.6082381679 -122.338061536 3344.5895835571296
47.6072657033 -122.337131968 3473.233312506266
47.6065280165 -122.336441622 3570.2230287296

Copy that into the maps widget but this time select "Polyline" and hit "Map" and you should get a nice black line. ‪So hopefully at this point, your map should show some stop locations and the shape data.

Stop to Shape Matching

Most importantly, hopefully you'll notice that one of the stops is nowhere near the shape! That's a bad thing. In fact, the stop location for stop 1_940 turned out to be wrong in our GTFS data.

Building a Bundle with Bad Shape Data

Sometimes, you receive a GTFS feed from an agency with bad stop locations, bad shape data, or possibly both. And sometimes, even after you've carefully explained the issues with the data to the agency, they still won't fix it. In these situations, we provide a few options to tweak how OneBusAway handles shape data.

The first is to specify the following command-line option to the OneBusAway bundle builder:

-P tripEntriesFactory.throwExceptionOnInvalidStopToShapeMappingException=false

This will tell the bundle builder NOT to throw an exception when a stop-to-shape mapping problem is found.

The second options tweak how matching works.

-P distanceAlongShapeLibrary.localMinimumThreshold=50
-P distanceAlongShapeLibrary.maxDistanceFromStopToShapePoint=1000

The localMinimumThreshold threshold controls the situation where there are potentially more than one good assignment for a stop to the stop's location along the shape. Typically, complex shape geometry such as loops, double-backs and overlapping paths can make it tricky to determine just where the best location lies. We will look for multiple potential assignments when the distance from the stop to the shape is less than the localMinimumThreshold, in meters. So if you have a stop that's 45 meters from one part of the shape but actually needs to be snapped to a section of the shape that's 55 meters away, you might bump the localMinimumThreshold accordingly. The maxDistanceFromStopToShapePoint threshold causes an exception to be thrown if the closest match of any kind from a stop to a shape is more than the specified threshold.