FUNCTION EXPLANATIONS - MorganLevineLab/PC-Clocks GitHub Wiki

calcPCClocks

This is the function sourced from R script run_calcPCClocks.R

This function uses 3 main packages: dplyr, tidyr, and tibble. It will check for installation, install if necessary, and otherwise warn you if there are issues. You do not need to manually install these packages if you have not already.
The function also assumes that the directories of the PC-Clocks repository were installed as directed on the installation page. You may place other files in these directories if desired, but do not remove or move original files around within the directory.
If you do not receive the message "PCClocks Data successfully loaded" in the RStudio console, it indicates that you may have forgot to install the appropriate RData files from the google drive links in installation. Please check that you have done so.
Because PC Clocks generate principal components based upon 78,464 CpGs as described in our paper, it is essential to perform imputation of any missing CpGs. While you could implement whatever form of imputation you would like in theory, unless there are a significant number of CpGs missing (across all samples--i.e. columns missing from the dataframe), this should not influence the results much. Therefore, the function by default will perform imputation using the mean values from GSE40279:

Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B, Bibikova M, Fan JB, Gao Y, Deconde R, Chen M, Rajapakse I, Friend S, Ideker T, Zhang K. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013 Jan 24;49(2):359-367. doi: 10.1016/j.molcel.2012.10.016. Epub 2012 Nov 21. PMID: 23177740; PMCID: PMC3780611.

If instead individual (but not all) samples from your data are missing beta values, mean imputation is performed across all the samples with values available
PLEASE NOTE that if you have multiple tissues and have missing CpGs (and will thus need to run imputation) that you should separate the tissues into separate dataframes and run each tissue through the functions separately so that imputation is performed separately in each tissue.
In order to calculate each PC Clock, the function utilizes the essential PCs used to generate each clock, as stored in the file CalcAllPCClocks.RData

If you are interested in looking at the significance of individual PCs in your data, we encourage you to manually load and project the PCs for the clocks using the R commands:
load(file = paste(path_to_PCClocks_directory,"CalcAllPCClocks.RData", sep = ""))

run steps from function to restrict CpGs to the 78,464 sites, and perform mean imputation as necessary. Then:

sweep(as.matrix(datMeth),2,CalcPCHorvath1$center) %*% CalcPCHorvath1$rotation swap out the name of whichever clock you wish to look at in addition to PCHorvath1

This function will return a dataframe with all of the PC Clocks' scores for each sample, as well as the PCGrimAge components, all appended to the original datPheno data frame.

calcPCClocks_Accel

This is the function sourced from R script run_calcPCClocks_Accel.R

This function will give you the option to get residuals for all PCGrimAge components or not. The primary motivation for including this option is that individuals may not want to clutter their output with too many acceleration values, which puts them at further risk of multiple testing issues.
This function will only give simple linear age acceleration for each clock. It does not perform correction for demographic factors, batch, sex, race/ethnicity, or cell intrinsic factors. For more particular epigenetic age acceleration measures, users are requested to perform this manually working from the following base functions:

clockColumns = c("PCHorvath1", "PCHorvath2", "PCHannum", "PCPhenoAge", "PCDNAmTL", "PCGrimAge")

for (i in clockColumns){ DNAmAge[,paste0(i,"Resid")] = resid(lm(DNAmAge[,i][1](/MorganLevineLab/PC-Clocks/wiki/1) ~ DNAmAge$Age + DNAmAge$[your-column-name(s)-here]))