Step 3: Assigning stops - aclong/dummy_data_linkage GitHub Wiki

After the transaction journey has been linked with an individual timetabled service I have two techniques for assigning the transactions to actual stops. Without testing these against the transactions it is not certain which of the techniques work best.

The first assignment is made using the time progressed along the journey. Assigning the transaction to a stop using the x axis. Every "nearest" stop from the timetable is recorded and connected to this transaction. This includes "nearest" in either direction as long as the difference is equal. Or, if there is more than one stop for the assigned time or equidistant from it, all of these are added. As the transactions are recorded by minute without seconds, there are many occasions on which the timetable will have more than one stop given at the "same" time as they happen within a minute of each other. This results in on average 2/3 possible stops for each transaction (will add proper numbers for this when tested against the real data).

The second assignment is made using the proportion of the journey travelled, so matched along the y axis. As there are generally fewer than 15 fare stages, and often less, I round journey proportion for the transaction data to 1 decimal place. To find as wide a variety of possible matches from the timetable, I also round the journey proportion variable to 1 decimal place, then carry out the same matching technique as with time. This results in 5/6 possible stops for each transaction point.

The end point of this is the collection of two arrays of "possible stops" one assigned by time and the other by journey proportion. I create a third array of the shared stops from each array. These arrays are recorded along with the difference in coefficient between the timetabled journey and the transaction journey in a new data table.