Pre‐process raw admissions data - UCSB-MEDS/shiny-dashboard GitHub Wiki

1. Convert raw data to Google Sheets

  • 1. Open the Bren Dashboard shared Drive (Sam Shanny-Csik or Jamie Montgomery can provide access) and navigate to the data/admissions/raw-data/application-exports/ folder.
  • 2. Add the new year of application data -- this often come in the form of a Google Sheet, but if you receive it as an .xlsx or .csv file, convert to a Google Sheet (open the file, then click File > Save as Google Sheets -- and name it application-export-YYYY
  • 3. Move any .xslx files to data/admissions/raw-data/xlsx files (original), for organization's sake

2. Get shareable links

We use the {googlesheets4} package to read raw data directly into process-admissions-data.qmd (see admissions-data repo). This requires a shareable link:

  • 1. From an open Google Sheet, click Share > make sure General access is set to "Restricted" > Copy link. It should look something like this:

3. Pre-process / harmonize data

The admissions-data repository houses two custom functions and one processing file (process-admissions-data.qmd) which calls those functions to read in (read_apps()) and process (process_apps()) application data:

  • 1. Open process-application-data.qmd in your cloned application-data project
  • 2. Add the shared Google Slides link to the link_str vector in section 1. READ IN DATA. Do not delete links from past years. It is also helpful to add a short annotation next to the link with the year. E.g. application-export-2025 data is read in and saved to link_str like this:
#............create vector of all admissions data URLs...........
link_str <- c(
  "https://docs.google.com/spreadsheets/d/1lXqFidrhbOKDqckcVlN5DoHo3PRCflDVs1XBUISrsnI/edit?usp=sharing", # 2025
  "https://docs.google.com/spreadsheets/d/1wZgn0LXfO6FgDZnrFpGoI_UsC4f4LcqiXp0A481g3V0/edit?usp=sharing", # 2024
  "https://docs.google.com/spreadsheets/d/10GUiqb26IMVj1T5ZnWT4YwzigtL-qjXjoiAAa7tTR0o/edit?usp=sharing", # 2023
  ...
)

  • 3. After adding the new Google Sheet link, read in the data by running all code in section 1. If this is your first time using the {googlesheets4} package, you'll need to set up your account authentication (follow the prompts, which will likely be displayed when you run apps_list <- read_apps(link_str = link_str). If it's been a while since you've used {googlesheets4}, you may need to go through a similar process to re-authenticate (again, follow the prompts).
  • 4. Run section 2 to process all application data. Application export data has historically been provided in the same format each year, however it is possible that you'll need to update process_apps() to accommodate differently-formatted data in the future.
  • 5. Run section 3 to write processed data to an .rds file (NOTE: you may need to create a processed-data/ folder in your project's root directory first.) It is helpful to save with a file name that includes the range of years (e.g. apps_2017_2025.rds).