Run GWAS of the imprinted gene nc886 - genetics-of-dna-methylation-consortium/godmc_phase2 GitHub Wiki
Developers: Emma Raitoharju and Sonja Rajic
Scripts status: Ready
Prerequisite scripts: Scripts 00, 01, 02, 03a
Data upload method: sftp to CSC's allas server
Background
As successful genetic imprinting is crucial for embryonic development, genetic variations disturbing this process often leads to embryonic lethality. Thus, GWAS studies have not provided candidate genes for this process in humans. nc886 is the only known canonical polymorphically imprinted gene in humans, where the major variation in methylation status is not associated with genetics and where changes do not lead to serious disorder. In addition to the two prevalent patterns (both alleles non-methylated vs. maternal allele methylated), a small percentage (1-6%) of individuals are chimeric of these two types of cells, presenting intermediate methylation levels. The loss or gain of methylation in a subset of cells has been suggested to happen during the global de- and re-methylation of the early embryo with the proportions of the cells later remaining stable
We perform a GWAS analysis between those presenting intermediate methylation pattern of nc886, which could be determined by genetics, and those with either of the more prevalent methylation patterns. First individuals will be clustered according to nc886 methylation status, followed by GWAS analysis.
The two main analyses would be the GWAS on
- intermediately methylated vs. non-methylated individuals and
- intermediately methylated vs. imprinted individuals
To further confirm that the major variation is not associated with genetics, third GWAS will be run 3) non-methylated individuals vs. imprinted individuals
INCLUSION CRITERIA Both EPIC and 450K data can be included. There are no restrictions on ancestral origins of the population.
EXCLUSION CRITERIA If the data has too few individuals presenting intermediately methylated nc886 locus, the cohort cannot be included in the GWAS analyses. If there is less than 10 individuals with this particular methylation status, the code will automatically stop and you will receive an error and no results files will be created.
RUNNING THE ANALYSIS
NOTE! This is done using the non-adjusted methylation beta values
Run 14-nc886_gwas.sh
After running the code, check the scatter plot and nc886 group frequencies.
The scatter plot should look similar to this. With majority of individuals clustering to methylation beta of approx. 0.5 and another cluster is close to 0.1 or below. In addition the prevalence of methylation status groups should be 1=20-30%, 2=1-6%, 3=70-80% and 4<1%. If the scatter plot or the prevalence's differs greatly email them to [email protected]
Common issues:
- individuals don't cluster well. This can be due to pre-prosessing/batch correction.
- Majority of individuals cluster closer to 0.4, not 0.5.
Instructions how to proceed will be given by [email protected].
SAVING THE RESULTS
Scatter plot, frequency table and GWAS results get encrypted and uploaded to the sftp-server as a part of the check upload script
To check that everything ran successfully, please run:
./check_upload.sh 14 check
This should tell you that Section 14 has been successfully completed!
. Now please upload the results like this:
./check_upload.sh 14 upload
It will make sure everything looks correct and connect to the sftp server. Results from section 14 will be uploaded to CSC's allas server. You have received your upload password during the install and set up phase. Once you have entered your password it will upload the results files from section 14.
NOTE: The server throws the following error message, though it still uploads results normally. If you get the same message, feel free to ignore it. However, please contact us if it is a different error.
<s:exception>Internal Server Error</s:exception> <s:message> The server was unable to complete your request. If this happens again, please send the technical details below to the server administrator. More details can be found in the server log. </s:message>
Note: In this section, you will have to introduce two passwords: the CSC's allas sftp password and the encryption password. Notice that the encryption password is the same password you used for the previous steps, while the CSC's allas sftp password will only be used for this analysis.