Required Tables for Data Cleaning - psrc/shiny-fixie GitHub Wiki
Tables |
---|
Trip Table (HHSurvey.Trip) |
Error Flags (HHSurvey.trip_error_flags) |
Trip Table (HHSurvey.Trip)
why do we need to clean up variables in the trip table?
- Tailor the trip table output from Rulesy so that the trip table used in Shiny-Fixie only includes the relevant variables needed in cleaning
- Ensure that we update all calculated fields in the Trip table after manual data cleaning (similar process should also be in place for other tables)
- for "Add new trip" feature in Shiny-Fixie, identify variables that should have same values as the previous trip and copy over
Trip Variables
variables needed in Shiny-Fixie
-
ID variables (essential ID variables that won't be edited)
- "hhid"
- "person_id"
- "pernum"
- "daynum"
- "day_id"
- "tripid"
-
variables that need automated assignment
- "recid": auto-generate the number at the end of recid list
- "tripnum"
-
manual input variables
- "depart_time_timestamp"
- "arrival_time_timestamp"
- "origin_lat"
- "origin_lng"
- "dest_lat"
- "dest_lng"
- "hhmember1"
- "hhmember2"
- "hhmember3"
- "hhmember4"
- "hhmember5"
- "hhmember6"
- "hhmember7"
- "hhmember8"
- "hhmember9"
- "hhmember10"
- "hhmember11"
- "hhmember12"
- "hhmember13"
- "driver"
- "travelers_hh"
- "travelers_nonhh"
- "origin_purpose"
- "dest_purpose"
- "dest_purpose_other": check logic behind this variable
- "mode_1"
- "mode_2"
- "mode_3"
- "mode_4"
- "mode_acc"
- "mode_egr"
- "mode_other_specify"
-
calculated variables by recalculate_after_edit during cleaning process
- "distance_miles"
- "travel_time"
- "speed_mph"
- "origin_geog"
- "dest_geog"
- "dest_county"
- "dest_city"
- "dest_zip"
- "dest_is_home"
- "dest_is_work"
- "travelers_total"
-
variables that are only used for cleaning purpose (remove before final version)
- "psrc_inserted"
- "revision_code"
- "psrc_resolved"
- "psrc_comment"
- "modes"
variables in delivered trip table that are not relevant to data cleaning
variables below will be removed from HHSurvey.Trip table
-
calculated variables after cleaning process is completed (in Post-Fixie step)
- "traveldate": the date of departure but with a 3am boundary
- "depart_date"
- "depart_dow"
- "depart_time_hour"
- "depart_time_minute"
- "depart_time_second"
- "arrive_date"
- "arrive_dow"
- "arrival_time_hour"
- "arrival_time_minute"
- "arrival_time_second"
- "duration_minutes"
- "o_in_region"
- "o_puma10"
- "o_bg"
- "d_in_region"
- "d_puma10"
- "d_bg"
- "svy_complete"
- "day_iscomplete"
-
unnecessary variables: variables that will never be used in data cleaning
- "distance_meters"
- "duration_seconds"
- "dwell_mins"
- "is_transit"
- "user_added"
- "user_merged"
- "user_split"
- "analyst_merged"
- "analyst_split"
- "analyst_split_loop"
- "flag_teleport"
- "is_access"
- "is_egress"
- "has_access"
- "has_egress"
- "mode_type"
- "origin_purpose_cat"
- "dest_purpose_cat"
- "copied_trip"
- "speed_flag"
- "trace_quality_flag"
- "survey_year"
Error Flags (HHSurvey.trip_error_flags)
- recid
- person_id
- tripnum
- error_flag