About project - dssg/cta-sim GitHub Wiki

The Chicago Transit Authority (CTA) has made a concerted effort to decrease crowding on public transportation that has been a cause for concern over the last few years. The CTA has been collecting information such as: the number of people riding a specific bus, the times at which more people board buses at a specific stop, the delay times on different routes, among other things. This data, after some processing, is used to understand and modify schedules and routes for the following quarter.

The buses pick up information regarding ridership and bus performance while executing a proposed schedule. The Planning Analytics department then calculates performance metrics such as Load, Flow, Bunching, and Crowding to determine the effectiveness of current CTA strategies.

While current best practices are data-focused, they are retrospective in nature. It is only after we implement the proposed schedule and wait for data to be collected (which takes several months not including data clean-up and aggregation) that we can assess the effectiveness of a certain schedule on de-crowding. We propose to turn scheduling into a more prospective exercise through statistical modeling and simulation (described below). This will allow the planning analytics department to be proactive and better understand the impact of certain scheduling decisions on bus crowding before implementation. Given the richness of the data available, we believe that even a simple statistical model and simulation approach will provide useful insight into bus de-crowding.

Solution

  • Demand Model: At the stop level, we assume that individuals arrive at stops in a stochastic manner. That is, the next individual shows up to the stop after a certain amount of time T, which is random. This is a common approach to waiting time problems.

  • Supply Model: At the route level, we have information regarding the deviation from the scheduled arrival time at timepoints (designated points along the route which need not be stops). The supply model will estimate jointly the deviation from the scheduled time along a route for a specific bus. We expect to see that deviation at subsequent timepoints will be positively correlated with the deviation at a specific timepoint. That is, there will be some type of accumulation in the deviation away from the schedule which occurs over the course of a bus running its route. At the most basic level, we don't wish to attribute this variation to specific sources but simply to account for it in the simulations to follow.

  • Fitting a Model vs. Running a Simulation

  • The Model component requires the historical data and a specified model for both demand and supply components. Each model has a set of parameters we need to estimate. The historical data is then used to update parameter estimates.

  • The Simulation component only requires the parameter estimates from the model. With these we can simulate the entire CTA bus system for a given time period. This includes simulating arrival of people at bus stops and the buses as they follow their schedule (including the estimated deviations from bus schedule we described above).

We would like the CTA to outline about how such a tool would be used in practice by different departments (strategic planning, service planning, scheduling, operations) and what features are most key to uptake and success.

  • Web interface
  • The key deliverable is a web interface that allows interactive querying of the simulated bus system. The interface will link to the APC and AVAS datasets so that model parameters can be re-estimated after new data is collected. Due to the computational complexity, the model will be pre-computed, perhaps once per quarter.
  • The interface will display plots of key metrics similar to what is already in use by service planning but computed with simulated rather than historical data.
    • Users can toggle select parameters, such as headway, and visualize the impact on crowding by time period or route / stop.
  • We hope to update current plots to incorporate the variability in performance metrics and provide a more holistic view.