Step 2: Linking to timetable data - aclong/dummy_data_linkage GitHub Wiki

Code for this step is in this code.

The data are retrieved from a database using SQL, manipulated in R then results returned to an SQL database.

The journeys can now be separated. These distinct journeys can be compared against timetables to assign them to timetabled services, and then each transaction to a geolocated stop.

The fare stage gives an indication of how far progressed along the bus route the machine is. The maximum fare stage for a bus route on a given day can be used to calculate this progression. This is then comparable against the timetable information which has the stop sequence number. The journey taken by the bus shown through the transactions can then be compared against timetable data to find the most similar journey and the transactions can be assigned to different stops along the route.

(INSERT FIG OF TRANSACTION VS TIMETABLE DATA)

The initial simple way I have come up with for choosing which service the transactions may belong to is by taking a linear regression of the journeys. See below.

(INSERT FIG OF THE SAME BUT WITH REGRESSION LINES)

I then find the value for the x intercept, and compare that against the x intercepts of the different journeys, and find the nearest one. From this I store the coefficient of the transaction data and of the chosen timetabled service and calculate the difference between them. This creates a "degree of similarity" value that I will use to test how accurate the stop assigning is, and how this relates to the regularity of bus service.