Data Pipeline - psrc/shiny-fixie GitHub Wiki

  • make sure trip_id stays the same throughout data pipeline

Data Delivery and Prep

  • materials:

    • raw data tables
    • data cleaning codebook
  • data notes:

    • NULL codes: unify all NULL codes to -995 (not 995)

Step 1: Rulesy

Rulesy cleans data programmatically, performs trip linking and preps tables for Shiny-Fixie

Rulesy Processes

  1. Prep trip table
  2. Procedure to recalculate derived fields (This is used both in Rulesy and Fixie.)
  3. Data Corrections
  4. Trip Linking
  5. Mode number standardization, including access and egress characterization
  6. Harmonize trips where possible
  7. Revise travel times for excessive speed trips
  8. Flag inconsistencies for further scrutiny

Prep Tables for Shiny-Fixie

  • save tables as temporal tables right after the second task in Rulesy "2. Procedure to recalculate derived fields"
  • double check if values in all variables match with codebook (example: -995 in hhmember variables)

Step 2: Shiny-Fixie App

The Shiny-Fixie app collects data edits and executes update query/ stored procedures to update tables in database

Shiny-Fixie Main features

  1. Edit trip
  2. Add trip (create blank trip, add reverse trip, add return home trip)
  3. Dismiss flag
  4. Delete trip
  5. Trip linking
  6. Trip unlinking

Step 3: Post Fixie data finalization

  • recalculate all derived fields: make sure we updated every variable in the final dataset (e.g., number of trips per day/person/household, number of complete days)
  • weighting