Step 5. Additional analysis with rank‐based inverse normal transformed beta values - ariadnacilleros/Cis-mQTL-mapping-protocol-for-methylome GitHub Wiki
We will also run an additional model; this will use the rank-based inverse normal transformed beta values (RNT). As covariates, we will consider the known confounders from the previous model (sex, genotype PCs and Planet cell type estimations) and the methylation PCs (mPCs) calculated in the residualized methylation data, with the aim of correcting for extra technical variation. We have generated the code to perform this step based on outputs that you have already obtained in the previous steps. We hope that this will ensure the simplicity of this extra analysis and its execution.
Step 5.1. Obtention of the RNT values and the bed file
In this step, you will calculate the RNT values from the beta values that you have obtained in the previous Step 2.2, and get the BED file. The script in the ReadMe (GSet_to_Bed_RNT.R) is based on GSet_to_Bed.R, and the only difference is located in line 40, where the transformation of the values is performed.
Step 5.2. Get residualized mPCs
In this additional model we will consider the already known confounders (sex, genotype PCs and Planet cell-type estimations) and methylation PCs as covariates. To avoid any correlation with the known confounders and therefore, multicollinearity, we will perform the PCA on the residuals from a multiple linear regression. This regression will include the transformed methylation values as the outcome, and the known confounders as the predictors. Once we get the residuals, we will perform a PCA on them, and select all mPCs that accumulatively explain 80% of the variance. ATTENTION: Once we have filtered the mPCs by the variance, we will keep a maximum number of 20 mPCs. All these steps are included in the script get_residualized_mPCs.R, and the inputs needed have been obtained in previous steps. Make sure that all the variables from the multiple regression are detected as numeric, this will allow you to get a matrix of residuals (Nº samples x Nº CpGs).
Step 5.3. Perform a GWAS with the mPCs
With the aim of avoiding potential correlations with the genotype, we will perform a GWAS considering each of the mPCs calculated in Step 5.2 as phenotype, and the final PLINK file set obtained in Step 3 as the genotype. We have prepared a script that will allow you to create a new PLINK fam file per mPC, perform the GWAS with PLINK1.9, and filter the results. In the case that any SNP is associated with a particular mPC (p-value < 1e-7), we will exclude that mPC.
Step 5.4. Get the covariate file
Finally, we will simply include the selected mPCs in a new covariate file. With this aim, we have developed a smooth script (covariates_RNT.R) where we add the mPCs to the previous covariates.txt file obtained in Step 4. The output from this script should be the same table as covariates.txt plus the selected mPCs.