Run GWAS of age acceleration - genetics-of-dna-methylation-consortium/godmc_phase2 GitHub Wiki
MODULE STATUS
Developers: Dr Eilis Hannon and Siyi Wang
Scripts status: Ready
Prerequisite scripts: ./00-setup_folders.sh
, ./01-check_data.sh
, ./02-snp_data.sh
, ./03a-methylation_variables.sh
Data upload method: Manual upload to GoogleDrive after completion of scripts in module 11
GWAS of age acceleration
We will perform a number of genome-wide association anlayses of biological age biomarkers where epigenetic clocks are used to derive measures of biological aging. We have selected three epigenetic clocks to use as proxies for biological again: Horvath’s clock, Levine’s clock, and DunedinPACE clock. We have selected one example of each generation of epigenetic clock. Given the vast number of clocks available, this could have been a never ending list so we have made a practical decision to limit the quantity of data we generate.
Generation of the biological aging measures will be performed by the pipeline. Please make sure you've run the ./01-check_data.sh
, ./02-snp_data.sh
and ./03a-methylation_variables.sh
before you run the following script. For each cohort with sample size less than 1000, this script will take 10-30min to finish.
To initiate the analyses, execute the following script:
./10a-gwas_aar.sh -c /path/to/your/config/file
In cohorts where age is a variable phenotype, there will be a total of six estimated "age acceleration" measures (two for each clock). These will be treated as phenotypes in separate GWAS. GCTA will be utilized for fastGWA computations.
The script will systematically process each phenotype, performing the following steps:
- Compute a sparse genetic relationship matrix (GRM) for autosome.
- Execute fastGWA.
- Generate Manhattan plots, QQ-plots based on the GWAS result.
- There will be two sets of GWAS results. One is
${section_10_dir}/${clock_name}.fastGWA
and another one is${section_10_dir}/${clock_name}_PCA.fastGWA
.
Please check the following graphs:
results/10/*SD_manhattan.pdf
- This graph displays SNPs with a P-value less than 0.01. The y-axis(starts from 2) represents the -log10 of the P-value, while the x-axis indicates the position on the chromosome. Please make sure the manhatten plot did contains data points. Here is the example:
results/10/*SD_qqplot.png
- Observed P-values for each SNP, sorted from largest to smallest are plotted against the expected values from a theoretical χ2-distribution. The genomic lambda may vary from 0.86 and 1.07, if the value is outside of this range this may indicate inflation. Please check the lambda score on the top of the each qq-plot. Here is the example:
Hertability of age acceleration
Since the SNP heritability measures the proportion of phenotypic variance explained by all measured SNPs, accurate estimation of SNP heritability can help us better understand the degree to which measured genetic variants influence phenotypes.
To initiate the analyses, execute the following script:
./10b-heritability_aar.sh -c /path/to/your/config/file
For each phenotype, the result will be saved in the .hsq file under the results/10
folder. The script will take < 1 min.
More details about .hsq file can be found in GCTA.
In the .hsp file, there are 10 parameters:
- V(G) is the genetic variance;
- V(e) is the environmental variance;
- Vp is the phenotypic variation;
- V(G)/Vp is the heritability;
- logL is the log likelihood for the full model;
- logL0 is the log likelihood for the residual model;
- LRT is 2[logL - logL0] which is distributed as a mixture of 0;
- df is the degreed of freedom for chi-squared;
- Pval is the p value;
- n is the sample size;
The .hsp file should looks like this: