Clean career placement data - UCSB-MEDS/shiny-dashboard GitHub Wiki

MEDS and MESM career placement data (both initial placement and active placement (status)) must be cleaned before it can be used by the dashboard. It is almost guaranteed that these data cleaning pipelines will require updates each year as new data are received. Some examples of necessary cleaning:

removing repeated alumni (may happen if they fill out the career exit survey more than once when accepting a new position within the 6 month post-graduation window)
harmonize state and country codes / names
fix incorrectly entered data (these updates may be requested by the Career Team)
fix incorrectly spelled employer names
etc.

This requires careful data exploration and cleaning pipeline modifications. It's important to take your time and explore all variables for inconsistencies. Follow the generalized steps below:

1. Locate the two career placement cleaning scripts inside the shiny-dashboard/data-cleaning/ directory:
meds-placement-cleaning.qmd (for cleaning meds_placement_YYYY_YYYY.rds)
mesm-placement-cleaning.qmd (for cleaning mesm_placement_YYYY_YYYY.rds)
2. Choose one to begin with (I recommend MEDS, since there's less data and generally fewer cleaning pipeline modifications that need to be made)
3. In section 0, update the file name within readRDS() to match the new pre-processed data file name (this should only mean changing the year range suffix of the file name).
4. Clean active placement (status) data by running the code in section 1
5. Inspect the resulting meds_status_cleaned data frame. Newly added data almost always necessitate additional data cleaning -- update the cleaning pipeline in section 1 accordingly, then rerun.
6. Clean initial placement data by running the code in section 2
7. Inspect the resulting meds_placement_cleaned data frame. Newly added data almost always necessitate additional data cleaning -- update the cleaning pipeline in section 1 accordingly, then rerun.
8. Run section 3 to write both cleaned data frames to file. Both files will be saved to the shiny-dashboard/bren-student-data-explorer/data/ directory (NOTE: you may need to create the data/ folder in the bren-student-data-explorer directory first).
9. Repeat steps 3-8 for the other program / cleaning script (e.g. if you began with meds-placement-cleaning.qmd, it's time to move onto mesm-placement-cleaning.qmd

NOTE: Currently, all data cleaning takes place in the shiny-dashboard repository, which also contains the dashboard code. The cleaning scripts can be found in the data-cleaning/ directory. While it might be more organizationally appropriate to store these scripts in the career-data repository, keeping them alongside the app code is more practical. Each year, the cleaning pipeline requires updates to accommodate new data, and these needs often become apparent when running the app with the latest data. Having the cleaning scripts and the app in the same repository makes it easier to test the app, revise the cleaning pipeline code, and regenerate the .rds files that the app calls in global.R.