RUNNING THE TEMPLATE SCRIPT - MorganLevineLab/PC-Clocks GitHub Wiki

The code implemented in the template_get_PCClocks_script.R is reproduced below with further comments on how to proceed.

Setup and Load in data

FIRST, ensure that you know where you installed this Github Repository.

As an example, when we download the file to our Macs, the path looks something like:

clocksDir <- "~/Downloads/PC-Clocks/"

SECOND, we need to read in the essential functions for the code. Now that we know where the PC-Clocks directory lives, this looks like:

source(paste(clocksDir, "run_calcPCClocks.R", sep = ""))
source(paste(clocksDir, "run_calcPCClocks_Accel.R", sep = ""))

These functions are responsible for calculating the PC Clocks, and getting the age acceleration (Residual) respectively.

THIRD, load 2 important data frames into your R Workspace:

  1. The Methylation Beta Values (we refer to this as datMeth)
  2. The Clinical/ Demographic data (we refer to this as datPheno)

PLEASE NOTE that if you have multiple tissues and have missing CpGs (and will thus need to run imputation) that you should separate the tissues into separate dataframes and run each tissue through the functions separately so that imputation is performed separately in each tissue.

  • Please note that the formatting of these data frames is important! The order of samples row-wise in each of the 2 data frames is the same. (e.g. if Patient 1's methylation is row 1 of the datMeth frame, their clinical or demographic information is row 1 of datPheno.
  • Also, be advised that you should have row names (sample Identifiers) and column names (CpG identities) for datMeth.
  • Further, in order to properly calculate the PC Clocks and Age Acceleration, you must have a datPheno column named 'Age' and a column named 'Female' (where Females = 1, and Males = 0). Capitals on column names are required.

If you would like to try out the code with some example data rather than your own, you can use the example data we have provided with this distribution.

load(paste(clocksDir,"Example_PCClock_Data_final.RData",sep=""))

For this example, we use data from GSE55763
Lehne B, Drong AW, Loh M, Zhang W, Scott WR, Tan ST, Afzal U, Scott J, Jarvelin MR, Elliott P, McCarthy MI, Kooner JS, Chambers JC. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 2015 Feb 15;16(1):37. doi: 10.1186/s13059-015-0600-x. Erratum in: Genome Biol. 2016;17:73. PMID: 25853392; PMCID: PMC4365767.

Running the code

This template implements the simplest to operate version of our code by utilizing our two functions. To see more details about the functions, see their wiki page.

FIRST get the PC Clock values for each sample. MORE INFO

PCClock_DNAmAge <- calcPCClocks(path_to_PCClocks_directory = clocksDir, datMeth = datMeth, datPheno = datPheno)

This implements the calcPCClocks function. When it runs, it is going to prompt you to type into the console the name of the SampleID column from the datPheno frame matching the rownames in datMeth. This is a check to ensure that your values will properly correspond and be appended to the datPheno frame properly to make "PCClock_DNAmAge".

SECOND you can add the accelerations of the PC Clocks to your new data frame. MORE INFO

PCClock_DNAmAge <- calcPCClocks_Accel(PCClock_DNAmAge)

This will simply append the acceleration values to the new data frame you made in step 1. These values are typically the preferable values to compare to phenotypic and demographic data. In brief, they are simply calculated by taking a linear model of each PC Clock with age, and then getting each sample's residual from the line of best fit.

Congratulations! You've successfully run the PC Clocks in your data!

If you have issues running the code, check that your files were installed under the proper subdirectories, and that your clocksDir path is correct.
We have attempted to include common mistakes in the data format as error messages in our functions, but inevitably these are non-comprehensive. If you have other issues, please raise it or email the corresponding authors at

Albert Higgins-Chen (a.higginschen[at]yale[dot]edu)
or
Morgan Levine (Morgan.levine[at]yale[dot]edu)