Run inversion mQTL analysis - genetics-of-dna-methylation-consortium/godmc_phase2 GitHub Wiki

MODULE STATUS

Developer: Carlos Ruiz Arenas

Scripts status:

  • Scripts 8a and 8b (phase 1): ready
  • Scripts 8c and 8d (phase 2): under development.

Prerequisite scripts: scripts 00, 01, 02 and 03.

Data upload method: sftp to UNAV (Universidad de Navarra) server.

Introduction

The goal of this project is to explore the links between chromosomal inversions, DNA methylation patterns and phenotypes. This project will be run in two phases. In the first phase, we will explore the effect of chromosomal inversions in DNA methylation. In the second phase, we will explore whether DNA methylation mediates the effect of chromosomal inversions in phenotypes.

Phase 1

The first phase will have two steps:

  • Chromosomal inversion calling
  • mQTL analysis

Inclusion criteria

All cohorts are welcome to run this analysis. We are interested in both 450K and EPIC arrays.

Status

This analysis has been tested locally.

Requirements

  • SNP data: SNP data transformed in step 2.
  • Methylation data: inverse rank normalized DNA methylation data after removing the effect of covariates. This dataset is generated during step 03.
  • scoreInvHap: Bioconductor package to genotype chromosomal inversions. Installation instructions are provided below.
  • MatrixEQTL: R package to run eQTL analysis.

Chromosomal inversion calling

The chromosomal inversion calling will be performed using scoreInvHap Bioconductor package and imputed genetic. scoreInvHap compares the SNP genotypes in the inversion region with reference SNP genotypes for homozygous inverted, homozygous standard and heterozygous individuals. More information about the method can be found in our paper.

Step 8a requires having scoreInvHap installed. scoreInvHap can be installed with:

## Install BiocManager if not already installed.
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager") 

BiocManager::install("scoreInvHap")

This step can be run with:

./08a-genotypeInversion.sh

This script will generate two files in results/08/ folder:

  • inversionsSummary.txt: which contains the frequency of the inversions that were genotyped.
  • badInversions.txt: which contains the inversions that failed to be genotyped

We expect that some chromosomal inversions will fail during genotyping, because they are small and contain few SNPs. These SNPs might not be available for all cohorts, even after imputing genetic data. Nonetheless, we expect to be able to genotype at least 10 chromosomal inversions.

Three chromosomal inversions (inv8_001, inv17_007 and inv16_009) are large and frequent, so we can use them to evaluate the performance of the inversion genotyping. To do so, please check that the frequencies of these inversions are around 58%, 24% and 35%, respectively in your cohort.

mQTL analysis

We will compute inversion meQTLs using MatrixEQTL, which is greatly optimized to compute eQTL associations. In our analysis, we will define chromosomal inversions as the SNPs and CpGs as the genes. We will use the DNA methylation matrix adjusted for covariates and genetic principal components and after rank inverse-normal transformation. DNA methylation adjustment should be previously performed in section 03.

./08b-inversionmeQTL.sh

This step generates a Rdata file containing the summary statistics of the inversion meQTL analysis. This file will be used in the meta-analysis.

Check and upload the results

To check that everything ran successfully, please run:

./check_upload.sh 08 check

This should tell you that Section 08 has been successfully completed!. Now please upload the results like this:

./check_upload.sh 08 upload

It will make sure everything looks correct and connect to the sftp server. Results from section 08 will be uploaded to UNAV server. You have received your upload password during the install and set up phase. Once you have entered your password it will upload the results files from section 08.

NOTE: This module requires sshpass to upload the results to the UNAV server.

Some cohorts have reported that they have received the following error:

$ ./check_upload.sh 08 upload     
Already up to date.
-----------------------------------------------

Using config located at: ./config

-----------------------------------------------

Checking log files for 08
Version required: 1.0.0
Version used: 1.0.0
Correct script version
08a-genotypeInversion completed successfully.
Version required: 1.0.0
Version used: 1.0.0
Correct script version
08b-inversionmeQTL.sh completed successfully.

Checking results for 08
Bad inversions file present
Inversion frequency file present
inversionmeQTL statistics file present

Section 08 has been successfully completed!

Enter SFTP password: Host key verification failed.

This error can appear the first time you try to connect to the UNAV sftp. You need to confirm that you trust the connection to our server. To solve this issue, please, run the following command:

sftp -P 22 -oBatchMode=no -b - [email protected]:/usr/local/etc2/ftp/godmc/pub/bristol

Then, you will be asked to enter your password. After that, the following message will appear:

The authenticity of host '[ftp.unav.es](http://ftp.unav.es/) (159.237.13.49)' can't be established.
ECDSA key fingerprint is SHA256:RvWE+l5n76bikvRrLXp7gGRfaYt9LaYRiTLyw3VM6zA.
ECDSA key fingerprint is MD5:ad:69:88:c6:e2:eb:7b:5a:7d:ec:9f:21:cf:07:71:67.
Are you sure you want to continue connecting (yes/no)?

Answer yes and exit the sftp. The command ./check_upload.sh 08 upload should now work.

Note: In this section, you will have to introduce two passwords: the UNAV sftp password and the encryption password. Notice that the encryption password is the same password you used for the previous steps, while the UNAV sftp password will only be used for this analysis.

Phase 2

The second phase will be run after the results of phase 1 have been collected and meta-analyzed. In this phase, we will select those pairs of inversion-CpG where the inversion affect DNA methylation. Then, we will explore which phenotypes are associated with chromosomal inversions and where the effect of chromosomal inversions on the phenotype is mediated by DNA methylation. The second phase will have two steps:

  • IWAS - Inversion Wide Association Study
  • Mediation study

IWAS results will be meta-analyzed to ensure that the selected phenotypes are associated with chromosomal inversions. Then, we will meta-analyzed only the results involving phenotypes and CpGs previously associated with chromosomal inversions.

Inclusion criteria

The phenotypes will be selected based on their availability on the cohorts. Importantly, each analysis is independent for each phenotype. Therefore, we will ask to cohorts to run all the analyses they have phenotype for.

Status

Current scripts has been tested locally. However, the scripts will be modified to select just the chromosomal inversions and CpGs present in mQTLs from step 1. In addition, inclusion criteria will be updated once we have defined those phenotypes extensively available in the cohorts.

Requirements

  • Same requirements as for phase 1
  • Phenotypes: A tabular file with phenotypes in columns. The format is the same than for the covariate data.

IWAS - Inversion Wide Association Study

An IWAS (Inversion Wide Association Study) will be run between the selected chromosomal inversions and the selected phenotypes. Association will be run with gcta using the code:

./08c-IWAS_phenotype.sh phenotype

phenotype should be a column present in the phenotypes file (which is defined in config). This step will generate the folder results/08/${phenotype}_IWAS/ containing the result of the analysis. Depending on the covariate, additional covariates can be required.

Mediation study

A mediation analysis between chromosomal inversions, DNA methylation and phenotypes will be run with the mediation R package. The analysis will be run with the script:

./08d-mediation_inversion_CpG_phenotype.sh phenotype

phenotype should be a column present in the phenotypes file (which is defined in config). This step will generate the folder results/08/${phenotype}_mediation/ containing the result of the analysis. Depending on the covariate, additional covariates can be required.