Phase 3 Scope of Work - ActivitySim/activitysim GitHub Wiki

Overview

Phase 3 of ActivitySim development is focused on improving the data pipelining procedures and implementing remaining sub-models in order to more easily add additional contributors to the effort. Once the core features of the system are in place, the plan is for a couple of the AMPO partner staff to assist in implementing the first release in 2018. This scope of work covers the first half of 2017. See the Phase 3 Amendment Scope of Work for work planned in the second half of 2017.

Table of Contents

Task 1: Project Management
Task 2: Data Pipelining
Task 3: Fix Random Number Sequences
Task 4: Logsums in Utility Functions
Task 5: At-Work Subtour Models
Task 6: Completing Phase 1 Models

Task 1: Project Management

The purpose of this first task is to manage the overall project, including invoicing and conference calls with the project team, and coordination with the AMPO agency partners. All deliverables, including meeting notes, software, tests, documentation, and issue tracking will be managed through GitHub. UrbanLabs will twice review project progress and QA/QC select project deliverables, as identified by the AMPO partners.

Deliverable(s): (Due 30 weeks from NTP)

  • Management of Bi-Weekly Meetings
  • Pre- and Post-Meeting Notes
  • Invoicing and Progress Reports
  • Client Coordination
  • QA/QC of Select Deliverables

Comments

  • ( Elizabeth ) User Definition / Restructure existing processes: Just reiterating my vote that this be a explicit task, possibly named "Design for User Effectiveness" or similar which would have subtasks to define user types and their needs and then restructure existing code to more effectively serve each of these users.
  • ( Elizabeth ) Agile Project Management Practices: As velocity on this project increases and potentially more developers become involved, I suggest that we start using true agile project management practices with users, user-stories, etc.
  • ( Elizabeth ) Management Plan and User Community Maintenance: At TRB we had discussed specific deliverables and budget associated with developing a management plan for the future and managing any pull requests/coordination/publicity with other users.
  • ( Elizabeth ) Add Additional Contributors: I agree that doubling Jeff's time makes sense as it wasn't too significant to begin with. However, one way to test usability/maintainability/extensibility is to have multiple contributors. @lmz suggested she would be interested in peeling off a task or two and I'm guessing other agency staff and potentially other consulting resources may be good to get involved.

top

Task 2: Data Pipelining

The goal of this task is to better manage the movement and transformation of data within ActivitySim through the development of a consistent, comprehensive, and efficient approach to data pipelining. Currently, ActivitySim uses Orca to define sub-model inputs and to setup and run the sub-models. The outputs of sub-models, which are often the inputs to later sub-models, are not explicitly defined and are simply stored in-memory and available if needed. Nothing, including fundamental outputs such as trip matrices, is currently written to disk.

The purpose of this task to develop software to make the user interface for data inputs and outputs easier and to improve the orchestration of model setup and running. This will include the development of ActivitySim specific code for reading and writing data, getting and setting inputs and outputs of model steps, and the ability to (re)start and stop model runs midstream for debugging and testing. We will begin this task by evaluating alternatives to Orca, such as Luigi or Airflow, since restarting within a model run is likely required and Orca wasn't built with this in mind. We will then prototype data pipelining for the first couple of sub-models and share our findings with the AMPO partners. Upon selection of an approach, we will revise the example to use the new data pipelining framework, and the updated software will be described in the online documentation.

A note about multi-threading/processing. ActivitySim is currently single threaded and iterates through chunks of pandas table records when solving a model. It is planned to revise ActivitySim to concurrently operate on the chunks via multi-processing. This means creating cross-process-safe shared data structures, dispatching chunks of choosers to different Python processes, accumulating the results in a process-safe manner, and waiting for all the processes to complete before moving on. We will keep this approach to multi-threading/processing in mind when building the data pipelining procedures.

Deliverable(s): (Due 8 weeks from NTP)

  • Prototype Data Pipelining Procedures
  • Data Pipelining Procedures
  • Updated Example
  • Updated Documentation and Tests

Comments

  • We can continue to use orca's computed column capability even if we don't use orca as our overall data pipelining technology. It is fairly powerful and concise - but also potentially baffling to newcomers unversed in functional programming and dependency injection. We should compare that approach with a couple of alternatives to assess the tradeoffs between elegance and comprehensibility. (jwd)

  • (Daniels, Clint) This task implements a more well-documented transition flow between sub-models. In coordination with that, I would like to see a more detailed data dictionary and formatting considered here. I think that is what is meant by updating documentation, but I would like for it to be more clear.

  • ( Elizabeth ) Review Multiprocessing Libraries: We discussed not implementing multiprocessing at this point, but having Jeff review the Dask library before doing any more writing so as to write new code that would easily work with it later. This is mentioned in this task, but maybe it is a deliverable related to the design notes above in Task 1?

top

Task 3: Fix Random Number Sequences

The goal of this task is to add the ability to fix random number sequences during the model run in order to replicate results under various setups. Some example setups include: a) restarting the model run in the middle of the run and getting the same results as before, b) running the same household under different sampling plans and getting the same results (assuming there is no interaction between households, i.e. shadow pricing), and c) helping to ensure stable results across alternative network scenarios so that differences in results are due primarily to changes in inputs and not random number sequencing. MTC travel model one has a separate random number generator object for each household and then generates random numbers in sequence from the start of the run. The model also has some additional functionality to keep track of numbers drawn before and after each sub-model in order to be restarted, but the functions were often not called when required.

For ActivitySim, we may implement an improved version with one random number generator object and each household by person by sub-model having different random seed offsets. The random number offsets could be pandas attributes and be stored in the local data store so they will be available for downstream sub-models. Requesting at once a vector of random numbers given a vector of offsets for each household, person, tour, etc. is a requirement. The random number sequencing procedures and attributes will be described in the online documentation.

Deliverable(s): (Due 12 weeks from NTP)

  • Improved Random Number Management
  • Updated Sub-Models
  • Updated Documentation and Tests

Comments


top

Task 4: Logsums in Utility Functions

The goal of this task is to develop the functionality to calculate discrete choice model logsums and to then make them accessible to upstream model calculations. The most common application of this is calculating mode choice model logsums (i.e. multimodal accessibility) for each potential destination in a destination choice model. Since ActivitySim is solving each sub-model for all choosers at once, it also needs to solve logsums for all tours/trips/etc. at once, and then store the results in-memory and/or the data store for later use in pandas expressions in other sub-models.

We will begin this task by implementing the MTC travel model one tour mode choice model and logsum calculation engine. We will likely simplify the mode choice model by including only a handful of utility expressions in order to focus on the functionality and its correctness, rather than on the actual model design. The simplified implementation will include at least one variable from each type of data object in order to ensure all the required data connections/interfaces are implemented. We will revise the existing ActivitySim sub-models to use the new logsums interface. The new procedures will be described in the online documentation.

Deliverable(s): (Due 20 weeks from NTP)

  • Simplified Mode Choice Logsums Calculation Procedure and Interface
  • Updated Sub-Models to Use Logsums Interface
  • Updated Documentation and Tests

Comments

  • JEF: Should be ok with this task so long as you are sampling destinations, since it will be prohibitively expensive to calculate a logsum for each tour and person. An alternative, iterative approach would be to pre-calculate logsums for segments, and cache them, rather than calculate them on-the-fly. The on-the-fly logsums could then be added later. This would not work in the MAZ world though. And though you are testing this with the tour mode choice model, it should be designed in such a way that we can obtain logsums from any choice model.

  • (Daniels, Clint) I am little worried this task will lead to memory bloat. Memory usage is one of the biggest problems in the current implementations of CT-RAMP. In coordination with the informed sampling procedures above, I'd like to see if there are ways we can get smarter about what needs to be available all the time and what can be pulled in and out without causing huge runtime performance problems.

top


Task 5: At-Work Subtour Models

The goal of this task is to implement the at-work subtour frequency, scheduling, and destination sub-models. The at-work subtour frequency, scheduling, and destination sub-models are similar in form to the existing (partially) implemented non-mandatory tour frequency, departure and duration, and destination models. However, a few key missing features of the existing models are the processing of each tour purpose, the calculation of time windowing variables, and logsums. These missing expressions and underlying software will be as faithfully implemented as possible within the available budget (in in addition to what is done in other tasks). The new procedures will be described in the online documentation.

Deliverable(s): (Due 24 weeks from NTP)

  • At-Work Subtour Models
  • Updated Example, including Expression and Config Files
  • Updated Documentation and Tests

Comments

  • xxxx

top

Task 6: Completing Phase 1 Models

The goal of this task is to continue to verify, and correct as needed, sub-models implemented in phase 1. Consultant will work through the sub-models in order and fix as many unresolved issues, such as implementing missing utility variables, logsums information, time windows, additional time periods, etc., as budget allows. Consultant may also re-factor/organize the file/folder setup in order to improve the separation of concerns. Consultant will update the source code, documentation, and tests as a result of any revisions to the framework.

Deliverable(s): (Due 30 weeks from NTP)

  • Full Model Run with All Zones, Skims, and Households of the Sub-Models Implemented
  • Comparisons of Model Results to Expected Results
  • Updated Source Code, Configuration Files, Documentation, and Tests

Comments

  • (Daniels, Clint) Does this get us a fully functional replica of Travel Model One?

top