Methylation - folkehelseinstituttet/mobagen GitHub Wiki

Table of contents

Introduction

This page is about methylation data (raw data and QC results) are found on p229methylation/data/durable/Datasets on TSD. (Getting access to TSD)

WARNING. PLEASE READ

(... and excuse the caps-lock. But this is really important).

The NIPH has an obligation to remove all data (including methylation data) from MoBa participants that have asked for deletion of their data.

Participant withdrawal

If participant in the MoBa cohort opt to withdraw their data, their idat-files will, with little or no warning, be deleted and the QC-rerun. This will be done in suitable intervals and logged in the changelog. See more details on the withdrawal page.

Changelog

We are maintaining a changelog for methylation data

Naming conventions and historical comments

Various PACE publications/projects have historically denoted methylation sets from MoBa by MoBa-1, MoBa-2 etc. This may later create confusions with other non-methylation such as GWAS, so we have introduced met001, met002 as naming convention for methylation data. In the same line of thoughts, GWAS (snpArray) data will be called snp001, snp002.

To keep the naming confusion to a minimum, met001 will correspond to MoBa-1 and so on.

Documentation valid for all data-sets

Every data-set will be found on its own sub-directory under Datasets. They are organized by batches named metnnn. This section is about info valid on for all sets.

You will also find a directory MetCommon with some data common for all sets.

QC

Refer to Methylation QC details for details on how the QC is performed and where the results are found - all in the subdirectory QC

adm

These are various administrative files.

  • md5.yymmdd contains md5 checksums for all files on a given date. The content will change slightly through the dates due to samples being removed due to participants withdrawing from the cohort.

raw-data

These data are almost as delivered by the lab that processed the underlying biological material. We might have done some minor renaming in order to protect/standardize IDs, as well as flatten out the directory structure. We will also remove participants who has withdrawn from the cohort.

This makes it possible for you to run your own QC, or reuse those make of others, such as [this one] (https://github.com/bhklab/illuminaEPICmethylation). NIPH/MoBa cannot not give you any support here of course.

idat files

Red/Green files are found under the idats directory. So is the corresponding samplesheet, typically called sampleSheet_metnnn.csv.

We are aiming to have sample-files that contain the same information as found on the Illumina sample sheet format - see appendix A.

We will try to include optional but relevant fields as sample plate/well, and extra information that the scanning lab gave us will usually be left untouched. We can however not give you any support on these fields.

Whenever Sex is included (it will normally be for methylation data), M means Male and F Female. Unknown might be both U or missing.

For newer sets as well that all containing both children and parents, you might see 3 directories:

  • idats (all the individuals)
  • idats_child (only childs, files are (hard) links to the ../idats directory)
  • idats_parents (left as an exercise for the reader)

The subsets also contain the corresponding sampleSheets.

Sample-sheets

Sample-sheets might be augmented with extra information such as sex and sampleTypes describing where the DNA was extracted from. This information comes from the NIPH biobank - sex is 'legal gender' and can therefore change. Samplesheets are found under the idats directory described above.

Individual datasets

Number of samples are specified to be 'original number of samples'. This is because some participants have withdrawn from MoBa and their sample information has been deleted. Also, we are only considering samples where we actually got returned idat-files (pairs of raw data files for each samples).

The results of the QC might have fewer samples than the original due to samples excluded during QC. All these samples should be traceable through the QC-results (see above).

Finally participants in MoBa might have asked to have their data deleted - this will no decrease the 'original number of samples' but the idat-files will be gone both from the raw-data and all intermediary QC-results.

met001

  • NIPH reference of the project: PDB291/Biobankretrieval 406
  • Project lead: Wenche Nystad
  • Selection criteria: Asthma
  • Processed by lab:
  • Chip used: Illumina Infinium 450k BeadChip
  • Date scanned: 2011?
  • Original number of samples: 1204 (including controls and failed/low concentration as well as duplicated)
  • Also referred in literature/other places as MoBa-1

The set contains control-samples as well as samples with errors. Check samplesheet comment field for details. We might clean up the set later.

Sample info: Singleton childs only, umbilical cord blood

Special considerations

Whenever this data-set is used, the project is responsible for inviting Stephanie London ([email protected]) at the National Institute of Environmental Health Sciences (NIEHS) as a project member.

Warning: As of 14.7.22 there are duplicates in this set, this will be cleaned up.

met002

  • NIPH reference of the project: PDB594/Biobankretrieval 491
  • Project lead: Wenche Nystad
  • Selection criteria: Asthma?
  • Processed by lab:
  • Chip used: Illumina Infinium 450k BeadChip
  • Date scanned: 1.14.2013 - 24.2.2013
  • Original number of samples: 864 (including controls, failed/low concentration and probably duplicates)
  • Also referred in literature/other places as MoBa-2

The samplesheet contains (for now) a column called Best_run. This might be the result of a previous QC effort. We might clean up the set later.

Sample info: Singleton childs only, umbilical cord blood.

Warning:

  • As of 14.7.22 there are duplicates in this set, this will be cleaned up.
  • Discovered 8.5.23 Some of the samples (19) are from non-MoBa adults (lab-controls) and must be removed. The QC assumes only cord-blood, but this is not the case for these 19 samples

met003

  • NIPH reference of the project: PDB1152/Biobankretrieval 555/556
  • Project lead: Monica Munthe-Kaas
  • Selection criteria: Cancer
  • Processed by lab: IARC (Lyon, France)
  • Chip used: Illumina Infinium 450k BeadChip
  • Date scanned: Between August and December 2013
  • Original number of samples: 264
  • Also referred to as MoBa-3

Note the we currently do not have sample_plate/well info in the samplesheet. However a plate variable call MoBa1, MoBa2 and MoBa3 indicates that there were 3 runs. These name are not to be confused by the earlier naming of the sets themselves.

Sample info: Childs only, umbilical cord blood

WARNING: The Sex column in the samplesheet cannot be trusted (always 1)

met004

  • NIPH reference of the project: PDB2374/Biobankretrieval 972
  • Project lead: Siri Håberg
  • Processed by lab: Life and Brain Laboratory, Bonn, Germany
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip V1 manifest B3
  • Date scanned: 2019.6.3 - 2019.7.4
  • Original number of samples: 6046

Sample info: Triads with k1/k2 blood for parents, umbilical cord blood for childs

met005

  • NIPH reference of the project: PDB2327/Biobankretrieval 1003
  • Project lead: Per Magnus
  • Processed by lab: Erasmus (Rotterdam, the Netherlands)
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip V1 manifest B5
  • Date scanned: 2020.06.24
  • Original number of samples: 1000

Consists only of mothers

met006

  • NIPH reference of the project: PDB1299/Biobankretrieval 716/717
  • Project title: Epigenetic effects of parcetamol during pregnancy and the risk of neurodevelopmental disorders (ADHD) in childhood
  • Project lead: Robert Lyle
  • Processed by lab: Norwegian Radium Hospital, Department of Medical Genetics
  • Chip used: Illumina Infinium 450k BeadChip
  • Date scanned: 2015
  • Original number of samples: 384
  • Selection criteria: ADHD/Paracetamol

Sample info: Childs only, umbilical cord blood

met007

  • NIPH reference of the project: PDB1299/Biobankretrieval 943
  • Project title: Epigenetic effects of paracetamol during pregnancy and the risk of neurodevelopmental disorders (ADHD) in childhood
  • Project lead: Robert Lyle
  • Processed by lab: Norwegian Radium Hospital, Department of Medical Genetics
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip
  • Date scanned: April-May 2019
  • Original number of samples: 261
  • Selection criteria: ADHD/Paracetamol

Sample info: Childs only, umbilical cord blood

met008

  • NIPH reference of the project: PDB315/Biobankretrieval 1994
  • Project lead: Pål Njølstad
  • Processed by lab: Erasmus (Rotterdam)
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip manifest B5 (???)
  • Date scanned: 2021.07.13
  • Original number of samples: 3741 (?) to be checked
  • Selection criteria: Obesity/diabetes for childs

Sample Info: Same as/see met004

met009

  • NIPH reference of the project: PDB1605/Biobankretrieval 958 and 1161
  • Project lead: Nur Duale
  • Processed by lab: Life and Brain Laboratory, Bonn, Germany
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip V1 manifest B3
  • Date scanned: 2021.07.20 - 2021.07.30
  • Original number of samples: 1456
  • Selection criteria: ADHD

Sample info: Triads with k1/k2/m blood for parents, umbilical cord blood for childs

met010

  • NIPH reference of the project: PDB2240/Biobankretrieval 2715
  • Project lead: Robert Lyle
  • Processed by lab: Life and Brain Laboratory, Bonn, Germany
  • Chip used: Illumina Infinium Human MethylationEPIC BeadChip V1 manifest B4
  • Date scanned: 2022.06.30
  • Original number of samples: 1018 (children)
  • Selection criteria: Mothers use of paracetamol, anti-psychotic medication, no opiates/triptans

Sample info: Children only, umbilical cord blood

met011

  • NIPH reference of the project: PDB3245/Biobankretrieval 4069
  • Project lead: Per Magnus
  • Processed by lab: Life and Brain Laboratory, Bonn, Germany
  • Chip used: Illumina Infinium Human Methylation EPIC-8v2-0_A1
  • Date scanned: 2023.07.06
  • Original number of samples: 109 (children) - should have been 110?
  • Selection criteria: Leukemia (55 children that later developped/55 without)

Sample info: Children only, umbilical cord blood

Note that as of 5.10.2023, only raw data is available. This is due to the quite new EPICv2 chip that we have not developed a QC for.