Required Tables for Data Cleaning - psrc/shiny-fixie GitHub Wiki

Tables
Trip Table (HHSurvey.Trip)
Error Flags (HHSurvey.trip_error_flags)

Trip Table (HHSurvey.Trip)

why do we need to clean up variables in the trip table?

  • Tailor the trip table output from Rulesy so that the trip table used in Shiny-Fixie only includes the relevant variables needed in cleaning
  • Ensure that we update all calculated fields in the Trip table after manual data cleaning (similar process should also be in place for other tables)
  • for "Add new trip" feature in Shiny-Fixie, identify variables that should have same values as the previous trip and copy over

Trip Variables

variables needed in Shiny-Fixie

  1. ID variables (essential ID variables that won't be edited)

    • "hhid"
    • "person_id"
    • "pernum"
    • "daynum"
    • "day_id"
    • "tripid"
  2. variables that need automated assignment

    • "recid": auto-generate the number at the end of recid list
    • "tripnum"
  3. manual input variables

    • "depart_time_timestamp"
    • "arrival_time_timestamp"
    • "origin_lat"
    • "origin_lng"
    • "dest_lat"
    • "dest_lng"
    • "hhmember1"
    • "hhmember2"
    • "hhmember3"
    • "hhmember4"
    • "hhmember5"
    • "hhmember6"
    • "hhmember7"
    • "hhmember8"
    • "hhmember9"
    • "hhmember10"
    • "hhmember11"
    • "hhmember12"
    • "hhmember13"
    • "driver"
    • "travelers_hh"
    • "travelers_nonhh"
    • "origin_purpose"
    • "dest_purpose"
    • "dest_purpose_other": check logic behind this variable
    • "mode_1"
    • "mode_2"
    • "mode_3"
    • "mode_4"
    • "mode_acc"
    • "mode_egr"
    • "mode_other_specify"
  4. calculated variables by recalculate_after_edit during cleaning process

    • "distance_miles"
    • "travel_time"
    • "speed_mph"
    • "origin_geog"
    • "dest_geog"
    • "dest_county"
    • "dest_city"
    • "dest_zip"
    • "dest_is_home"
    • "dest_is_work"
    • "travelers_total"
  5. variables that are only used for cleaning purpose (remove before final version)

    • "psrc_inserted"
    • "revision_code"
    • "psrc_resolved"
    • "psrc_comment"
    • "modes"

variables in delivered trip table that are not relevant to data cleaning

variables below will be removed from HHSurvey.Trip table

  1. calculated variables after cleaning process is completed (in Post-Fixie step)

    • "traveldate": the date of departure but with a 3am boundary
    • "depart_date"
    • "depart_dow"
    • "depart_time_hour"
    • "depart_time_minute"
    • "depart_time_second"
    • "arrive_date"
    • "arrive_dow"
    • "arrival_time_hour"
    • "arrival_time_minute"
    • "arrival_time_second"
    • "duration_minutes"
    • "o_in_region"
    • "o_puma10"
    • "o_bg"
    • "d_in_region"
    • "d_puma10"
    • "d_bg"
    • "svy_complete"
    • "day_iscomplete"
  2. unnecessary variables: variables that will never be used in data cleaning

    • "distance_meters"
    • "duration_seconds"
    • "dwell_mins"
    • "is_transit"
    • "user_added"
    • "user_merged"
    • "user_split"
    • "analyst_merged"
    • "analyst_split"
    • "analyst_split_loop"
    • "flag_teleport"
    • "is_access"
    • "is_egress"
    • "has_access"
    • "has_egress"
    • "mode_type"
    • "origin_purpose_cat"
    • "dest_purpose_cat"
    • "copied_trip"
    • "speed_flag"
    • "trace_quality_flag"
    • "survey_year"

Error Flags (HHSurvey.trip_error_flags)

  • recid
  • person_id
  • tripnum
  • error_flag