Run GWAS of MZ twinning - genetics-of-dna-methylation-consortium/godmc_phase2 GitHub Wiki

MODULE STATUS

Developers: Dr Jenny van Dongen, [email protected]

Scripts status: Under development

Prerequisite scripts: 00-setup_folders.sh, 01-check_data.sh, 02-snp_data.sh

Data upload method: Manual upload to GoogleDrive

Run GWAS of a DNA methylation signature of monozygotic twinning

A Genome-Wide Association Study (GWAS) will be conducted using a DNA methylation-derived phenotype for monozygotic (MZ) twinning (https://www.nature.com/articles/s41467-021-25583-7). We hypothesize that this DNA methylation score might also be elevated in individuals with a vanished MZ twin or a propensity to MZ twinning, and aid in the detection of genetic variants associated with MZ twinning. Analyses will be adjusted for the first 10 Principal Components (PCs) derived from the genotype data. Please make sure you've run the 01-check_data.sh and 02-snp_data.sh before you run the following scripts. This script might take 5-10 minutes to complete for a cohort with a sample size < 3000.

Additional R-package required: glmnet

To install the package, run:

install.package(“glmnet”)

Optional phenotype file for MZ Twinning

Only cohorts that include twins should provide a phenotype file, containing IID and Twinzygosity. The column Twinzygosity should contain values of “MZ” or “DZ” or "UZ" for all twins (including single twins from incomplete pairs). MZ means monozygotic. DZ means dizygotic. UZ means unknown zygosity. Any other individuals in the dataset from twin cohorts (for example, parents, siblings, unrelated individuals) should have “non-twin” in this column. An example of the phenotype file that twin cohorts should put in their input_data directory is provided below.

IID Twinzygosity
1 DZ
2 DZ
3 MZ
4 MZ
5 non-twin

For twin cohorts, please specify the name (including directory) of your phenotype file in the config file:

phenotypes_MZT="${home_directory}/input_data/Pheno_EPIMZT.txt"

All cohorts without twins should not provide a phenotype file, and should not change the default setting of the config file:

phenotypes_MZT="NULL"

To run the GWAS, run the following script:

./13a-gwas_MZepi.sh

The script will:

  1. Generate MZ-EpiScores (epigenetic signature of MZ twins)
  2. Classify samples as “predicted MZ twin” or “predicted non-twin” and generate descriptives of the MZ twinning epigenetic score (pdf file with boxplots and .RData file with frequencies of the number of samples predicted as MZ twin and non-twin).
  3. Perform fastGWA
  4. Generate Manhattan plots and QQ plots based on the GWAS results.

Please check the following graphs:

  1. MZEpiscore_distribution.pdf- This graph displays the continuous MZ-EpiScores against known zygosity of twins (for twin cohorts). Two examples are provided below. For cohorts without twins, a boxplot showing the distribution in non-twins will be generated.
  1. results/13/ GWASepiMZ_manhattan.pdf - This graph displays SNPs with a P-value less than 0.01. The y-axis(starts from 2) represents the -log10 of the P-value, while the x-axis indicates the position on the chromosome. Please make sure the Manhattan plot contains data points for each chromosome. Here is an example:
  1. results/13/ GWASepiMZ_qqplot.png - This plot showcases observed P-values for each SNP, sorted from largest to smallest and plotted against expected values from a theoretical χ2-distribution.

GWASepiMZ_qqplot

Check, compress and encrypt results files for upload

To check that everything ran successfully and compress the output files from 13, please run:

./13b-check_compress_data.sh 

This indicates that two compressed files, namely MZtwinGWAS_${study_name}.tgz.md5sum and MZtwinGWAS _${study_name}.tgz.gpg, have been successfully generated and are now ready for upload to the GoogleDrive. Please follow the steps below to finish the upload:

  1. Please download them to your local machine first;
  2. Please upload the MZtwinGWAS_${study_name}.tgz.md5sum and MZtwinGWAS_${study_name}.tgz.gpg via this link tobeadded;

Please note that this step replaces the need to use the check_upload.sh script.

Thank you for your contribution! Please do not hesitate to contact [email protected] if you have any questions.