Pre‐process raw admissions data - UCSB-MEDS/shiny-dashboard GitHub Wiki
1. Convert raw data to Google Sheets
- 1. Open the Bren Dashboard shared Drive (Sam Shanny-Csik or Jamie Montgomery can provide access) and navigate to the
data/admissions/raw-data/application-exports/
folder. - 2. Add the new year of application data -- this often come in the form of a Google Sheet, but if you receive it as an
.xlsx
or.csv
file, convert to a Google Sheet (open the file, then click File > Save as Google Sheets -- and name itapplication-export-YYYY
- 3. Move any
.xslx
files todata/admissions/raw-data/xlsx files (original)
, for organization's sake
2. Get shareable links
We use the {googlesheets4}
package to read raw data directly into process-admissions-data.qmd
(see admissions-data
repo). This requires a shareable link:
- 1. From an open Google Sheet, click Share > make sure General access is set to "Restricted" > Copy link. It should look something like this:
3. Pre-process / harmonize data
The admissions-data
repository houses two custom functions and one processing file (process-admissions-data.qmd
) which calls those functions to read in (read_apps()
) and process (process_apps()
) application data:
- 1. Open
process-application-data.qmd
in your clonedapplication-data
project - 2. Add the shared Google Slides link to the
link_str
vector in section1. READ IN DATA
. Do not delete links from past years. It is also helpful to add a short annotation next to the link with the year. E.g.application-export-2025
data is read in and saved tolink_str
like this:
#............create vector of all admissions data URLs...........
link_str <- c(
"https://docs.google.com/spreadsheets/d/1lXqFidrhbOKDqckcVlN5DoHo3PRCflDVs1XBUISrsnI/edit?usp=sharing", # 2025
"https://docs.google.com/spreadsheets/d/1wZgn0LXfO6FgDZnrFpGoI_UsC4f4LcqiXp0A481g3V0/edit?usp=sharing", # 2024
"https://docs.google.com/spreadsheets/d/10GUiqb26IMVj1T5ZnWT4YwzigtL-qjXjoiAAa7tTR0o/edit?usp=sharing", # 2023
...
)
- 3. After adding the new Google Sheet link, read in the data by running all code in section 1. If this is your first time using the
{googlesheets4}
package, you'll need to set up your account authentication (follow the prompts, which will likely be displayed when you runapps_list <- read_apps(link_str = link_str)
. If it's been a while since you've used{googlesheets4}
, you may need to go through a similar process to re-authenticate (again, follow the prompts). - 4. Run section 2 to process all application data. Application export data has historically been provided in the same format each year, however it is possible that you'll need to update
process_apps()
to accommodate differently-formatted data in the future. - 5. Run section 3 to write processed data to an
.rds
file (NOTE: you may need to create aprocessed-data/
folder in your project's root directory first.) It is helpful to save with a file name that includes the range of years (e.g.apps_2017_2025.rds
).