Models - dssg/cta-sim GitHub Wiki
- Required Software :
- Required R libraries :
-
Boarding Model : Poisson and Negative Binomial Regression
- For a specific stop on a given route, we know the arrival time of a bus, and the number of people who got on the bus. We convert this to bucketed half hour interval observations that measure the number of bus boardings at the stop in a given half hour on a given day.
- Define
is the observed count during half hour interval i, on day j for stop k on route l. Then we assume that:
whereis an indicator of the half hour,
is an indicator of what month the current day falls in, and
is an indicator of whether the current day falls on the week or weekend. We assume that the factors are independent but the factor levels are dependent. In particular, we assume that:
. That is, we have one parameter controlling both the mean behavior and the variability. While this may hold for certain stops, this will not hold in general. We therefore assume a [Negative Binomial](http://en.wikipedia.org/wiki/Negative_binomial_distribution) model. This can be seen as a mixture of Poissons and allows for a dispersion parameter which can increase the variability beyond a simple Poisson regression model. * The negative binomial model can be found [here](https://github.com/dssg/dssg-cta-project/blob/master/stat-models/passenger_on_models/neg_binom_model/passengeron_negbin_model.R). * The code outputs a JSON file which is stored in the
./json_output/
subfolder. -
Alighting Model : Binomial Regression
- For a specific stop on a given route, we know the arrival time of a bus, and the number of passengers in the bus when it arrives at the stop, and the number of people who get off the bus. We aggregate these observations per half hour.
- Then if
is the number of people in the bus and
are the number of people getting off, we say that:
./json_output/
-
Schedule Deviation Model : Gaussian Process
- Each route has a handful of timepoints, for which we know their geographic location (latitude and longitude) and the scheduled arrival time at the various time points. As a bus makes a trip along the route, a GPS unit records when the bus actually arrives at the various time points. From these two pieces of information, the schedule deviation can be calculated by taking the actual arrival time and subtracting it from the scheduled arrival time. Thus, a negative value implies that the bus is late, while a positive value implies ahead of schedule.
- Let
be the schedule deviation at time point i on route j on day k for the l-th run of the route. Then let
be the scheduled arrival of this bus. Our base model assumes that the schedule deviations follow a Gaussian Process. That is:
whereis the distance between time-points. A more complicated model would have
depend on time of day and year and where along the route the timepoints are. That way, during morning rush hour, the coefficients will reflect possible traffic patterns and any other idiosyncracies along the route. * This model assumes that the schedule deviations are normal, but upon further investigation we see a certain skew to the distribution. We can take care of that by incorporating a half-normal component in the mean component,
where
.
./json_output/
folder.