Clean admissions data - UCSB-MEDS/shiny-dashboard GitHub Wiki

Like career placement data, admissions data must also be cleaned before it can be used by the dashboard. In addition to this cleaning, a number of differently-wrangled data frames must be produced and saved in order to build the various dashboard visuals. Be sure to explore all variables for inconsistencies (though these admissions data should not change as much year-to-year as career placement data). Follow the steps below:

1. Locate the application-cleaning.qmd script inside the shiny-dashboard/data-cleaning/ directory
2. In section 0, update the filename being read in and assigned to the apps object to reflect the new pre-processed data file (this should only mean changing the year range suffix of the file name):

apps <- readRDS(here::here("raw-data", "apps_2017_20XX.rds"))

3. We account for student "melt" by filtering out those applicants by their application_id. If available, create new vectors (with the naming scheme, me**20YY) containing these application_ids in section 0.

Important: While application data is typically available after the April 15th SIR deadline, data on student melt will not be finalized until the Fall quarter is underway. If you’re updating admissions data in the Spring, you’ll need to revisit and update the admissions cleaning pipeline once this new data becomes available. For access to spreadsheets containing student names and application IDs, please contact Kimberly Yom ([email protected]). Save these spreadsheets in the Bren Dashboard Google Drive under data/admissions/raw-data/dropped-applications.

4. During pre-processing, a column called sheet_name_year is added to the apps dataset by extracting the year from the Google Sheet name. While this can serve as a proxy for a prospective student’s admission year (if accepted), it is error-prone (for example, if the sheet name is entered or modified incorrectly). To improve accuracy, we add a new column, admission_year, during the data cleaning stage. This field infers the admission year based on the application submission date, which typically falls between September and March (there are a few exceptions where applicants are able to submit past the standard December 15th due date). In section 1, update the admission_year variable within the mutate(case_when()) statement to include the new admissions cycle. For example:

mutate(admission_year = case_when(
    
    ..., 

    # 2025 apps: 2024-09-01 to 2025-03-31 ----
    year(submission_date) == 2024 & month(submission_date) >= 9 ~ 2025,
    year(submission_date) == 2025 & month(submission_date) <= 3 ~ 2025

  )) |>

5. Run sections 2 - 7, which wrangle a number of data frames that are used throughout the dashboard. You should not need to make any large updates to the wrangling pipelines in these sections, but always inspect the resulting data frames to ensure they look appropriate.
6. Run section 8 to write the following data frames to file. All files will be saved to the shiny-dashboard/bren-student-data-explorer/data/ directory:
enrolled
admissions
ug_geoms
ipeds
diversity_stats