Models - dssg/cta-sim GitHub Wiki

Needed installations

  • Required Software :
    • R : Language and Environment for Statistical computing and graphics
    • Jags : A program for analysis of Bayesian hierarchical models using MCMC simulation
    • ODBC : standard programming language middleware API for accessing database management systems
  • Required R libraries :
    • CODA : Output analysis and diagnostics for MCMC
    • R2jags : Allows connectivity from R into JAGS software
    • rjson : Allows for JSON formatting inside of R

Models

  1. Boarding Model : Poisson and Negative Binomial Regression

    • For a specific stop on a given route, we know the arrival time of a bus, and the number of people who got on the bus. We convert this to bucketed half hour interval observations that measure the number of bus boardings at the stop in a given half hour on a given day.
    • Define is the observed count during half hour interval i, on day j for stop k on route l. Then we assume that:



    where is an indicator of the half hour, is an indicator of what month the current day falls in, and is an indicator of whether the current day falls on the week or weekend. We assume that the factors are independent but the factor levels are dependent. In particular, we assume that:


    * The code for the model can be found in the [Poisson Model](https://github.com/dssg/dssg-cta-project/tree/master/stat-models/passenger_on_models/old_models/poisson_model) section. * Unfortunately, the Poisson Model assumes that . That is, we have one parameter controlling both the mean behavior and the variability. While this may hold for certain stops, this will not hold in general. We therefore assume a [Negative Binomial](http://en.wikipedia.org/wiki/Negative_binomial_distribution) model. This can be seen as a mixture of Poissons and allows for a dispersion parameter which can increase the variability beyond a simple Poisson regression model. * The negative binomial model can be found [here](https://github.com/dssg/dssg-cta-project/blob/master/stat-models/passenger_on_models/neg_binom_model/passengeron_negbin_model.R). * The code outputs a JSON file which is stored in the ./json_output/ subfolder.
  2. Alighting Model : Binomial Regression

    • For a specific stop on a given route, we know the arrival time of a bus, and the number of passengers in the bus when it arrives at the stop, and the number of people who get off the bus. We aggregate these observations per half hour.
    • Then if is the number of people in the bus and are the number of people getting off, we say that:


    * We assume a logistic model for the probability, p, and model it in the same way as we modeled the log rate parameter in the Poisson and Negative Binomial models. * The code that implements the model and all supporting scripts can be found [here](https://github.com/dssg/dssg-cta-project/tree/master/stat-models/passenger_on_models/neg_binom_model) * The R script outputs a JSON of estimate parameter values to ./json_output/
  3. Schedule Deviation Model : Gaussian Process

    • Each route has a handful of timepoints, for which we know their geographic location (latitude and longitude) and the scheduled arrival time at the various time points. As a bus makes a trip along the route, a GPS unit records when the bus actually arrives at the various time points. From these two pieces of information, the schedule deviation can be calculated by taking the actual arrival time and subtracting it from the scheduled arrival time. Thus, a negative value implies that the bus is late, while a positive value implies ahead of schedule.
    • Let be the schedule deviation at time point i on route j on day k for the l-th run of the route. Then let be the scheduled arrival of this bus. Our base model assumes that the schedule deviations follow a Gaussian Process. That is:


    where is the distance between time-points. A more complicated model would have depend on time of day and year and where along the route the timepoints are. That way, during morning rush hour, the coefficients will reflect possible traffic patterns and any other idiosyncracies along the route. * This model assumes that the schedule deviations are normal, but upon further investigation we see a certain skew to the distribution. We can take care of that by incorporating a half-normal component in the mean component, where .

    * The files exist [here](https://github.com/dssg/dssg-cta-project/tree/master/stat-models/supplyside_models). The R script fits the parameters for a specific route and direction. The JSON output are saved to a ./json_output/ folder.
⚠️ **GitHub.com Fallback** ⚠️