Pre‐process raw admissions data - UCSB-MEDS/shiny-dashboard GitHub Wiki
1. Convert raw data to Google Sheets
- 1. Open the Bren Dashboard shared Drive (Sam Shanny-Csik or Jamie Montgomery can provide access) and navigate to the
data/admissions/raw-data/application-exports/folder. - 2. Add the new year of application data -- this often come in the form of a Google Sheet, but if you receive it as an
.xlsxor.csvfile, convert to a Google Sheet (open the file, then click File > Save as Google Sheets -- and name itapplication-export-YYYY - 3. Move any
.xslxfiles todata/admissions/raw-data/xlsx files (original), for organization's sake
2. Get shareable links
We use the {googlesheets4} package to read raw data directly into process-admissions-data.qmd (see admissions-data repo). This requires a shareable link:
- 1. From an open Google Sheet, click Share > make sure General access is set to "Restricted" > Copy link. It should look something like this:
3. Pre-process / harmonize data
The admissions-data repository houses two custom functions and one processing file (process-admissions-data.qmd) which calls those functions to read in (read_apps()) and process (process_apps()) application data:
- 1. Open
process-application-data.qmdin your clonedapplication-dataproject - 2. Add the shared Google Slides link to the
link_strvector in section1. READ IN DATA. Do not delete links from past years. It is also helpful to add a short annotation next to the link with the year. E.g.application-export-2025data is read in and saved tolink_strlike this:
#............create vector of all admissions data URLs...........
link_str <- c(
"https://docs.google.com/spreadsheets/d/1lXqFidrhbOKDqckcVlN5DoHo3PRCflDVs1XBUISrsnI/edit?usp=sharing", # 2025
"https://docs.google.com/spreadsheets/d/1wZgn0LXfO6FgDZnrFpGoI_UsC4f4LcqiXp0A481g3V0/edit?usp=sharing", # 2024
"https://docs.google.com/spreadsheets/d/10GUiqb26IMVj1T5ZnWT4YwzigtL-qjXjoiAAa7tTR0o/edit?usp=sharing", # 2023
...
)
- 3. After adding the new Google Sheet link, read in the data by running all code in section 1. If this is your first time using the
{googlesheets4}package, you'll need to set up your account authentication (follow the prompts, which will likely be displayed when you runapps_list <- read_apps(link_str = link_str). If it's been a while since you've used{googlesheets4}, you may need to go through a similar process to re-authenticate (again, follow the prompts). - 4. Run section 2 to process all application data. Application export data has historically been provided in the same format each year, however it is possible that you'll need to update
process_apps()to accommodate differently-formatted data in the future. - 5. Run section 3 to write processed data to an
.rdsfile (NOTE: you may need to create aprocessed-data/folder in your project's root directory first.) It is helpful to save with a file name that includes the range of years (e.g.apps_2017_2025.rds).