Clean career placement data - UCSB-MEDS/shiny-dashboard GitHub Wiki
MEDS and MESM career placement data (both initial placement and active placement (status)) must be cleaned before it can be used by the dashboard. It is almost guaranteed that these data cleaning pipelines will require updates each year as new data are received. Some examples of necessary cleaning:
- removing repeated alumni (may happen if they fill out the career exit survey more than once when accepting a new position within the 6 month post-graduation window)
- harmonize state and country codes / names
- fix incorrectly entered data (these updates may be requested by the Career Team)
- fix incorrectly spelled employer names
- etc.
This requires careful data exploration and cleaning pipeline modifications. It's important to take your time and explore all variables for inconsistencies. Follow the generalized steps below:
- 1. Locate the two career placement cleaning scripts inside the
shiny-dashboard/data-cleaning/
directory: meds-placement-cleaning.qmd
(for cleaningmeds_placement_YYYY_YYYY.rds
)mesm-placement-cleaning.qmd
(for cleaningmesm_placement_YYYY_YYYY.rds
)- 2. Choose one to begin with (I recommend MEDS, since there's less data and generally fewer cleaning pipeline modifications that need to be made)
- 3. In section 0, update the file name within
readRDS()
to match the new pre-processed data file name (this should only mean changing the year range suffix of the file name). - 4. Clean active placement (status) data by running the code in section 1
- 5. Inspect the resulting
meds_status_cleaned
data frame. Newly added data almost always necessitate additional data cleaning -- update the cleaning pipeline in section 1 accordingly, then rerun. - 6. Clean initial placement data by running the code in section 2
- 7. Inspect the resulting
meds_placement_cleaned
data frame. Newly added data almost always necessitate additional data cleaning -- update the cleaning pipeline in section 1 accordingly, then rerun. - 8. Run section 3 to write both cleaned data frames to file. Both files will be saved to the
shiny-dashboard/bren-student-data-explorer/data/
directory (NOTE: you may need to create thedata/
folder in thebren-student-data-explorer
directory first). - 9. Repeat steps 3-8 for the other program / cleaning script (e.g. if you began with
meds-placement-cleaning.qmd
, it's time to move ontomesm-placement-cleaning.qmd
NOTE: Currently, all data cleaning takes place in the shiny-dashboard
repository, which also contains the dashboard code. The cleaning scripts can be found in the data-cleaning/
directory. While it might be more organizationally appropriate to store these scripts in the career-data
repository, keeping them alongside the app code is more practical. Each year, the cleaning pipeline requires updates to accommodate new data, and these needs often become apparent when running the app with the latest data. Having the cleaning scripts and the app in the same repository makes it easier to test the app, revise the cleaning pipeline code, and regenerate the .rds files that the app calls in global.R
.