Description of 450k Manifest File - sups-k/methylation GitHub Wiki

Infinium Human Methylation450k Manifest File

This file contains the entire BeadChip array design for the 450k machine. Information in the methylation manifest references Genome Build 37 (HG19). Below is an explanation of the headers:

  1. IlmnID: Unique identifier from the Illumina CG database. (The probe ID).
  2. Name: The IlmnID.
  3. AddressA_ID: Physical location of the probe on the array.
  4. AlleleA_ProbeSeq: Sequence of the probe corresponding to Address A.
  5. AddressB_ID: Physical location of the probe on the array.
  6. AlleleB_ProbeSeq: Sequence of the probe corresponding to Address B.

Only Type I Infinium probes have both Address A and Address B information. Type II Infinium probes have only Address A information. (Explained later).

  1. Infinium_Design_Type: Infinium I (2 probes/locus) or Infinium II (1 probe/locus).
  2. Next_Base: For Infinium I probes, the nucleotide immediately following the CpG. Blank for Infinium II.
  3. Color_Channel: For Infinium I probes, the color channel of the “Next_Base” signal (red or green). This part is explained further in a section below.
  4. Forward_Sequence: Plus (+) strand (HapMap) sequence (5'-3') flanking the CG.
  5. Genome_Build: Genome Build referenced by the manifest.
  6. CHR: Chromosome containing the CpG (Build 37).
  7. MAPINFO: Chromosomal coordinates of the CpG (Build 37).
  8. SourceSeq: The original, genomic sequence used for probe design before bisulfite conversion.
  9. Chromosome_36: Chromosome containing the CpG (Build 36).
  10. Coordinate_36: Chromosomal coordinates of the CpG (Build 36).
  11. Strand: The Forward (F) or Reverse (R) designation of the Design Strand.

In methylation manifest files, the Forward Strand = the genomic Plus (+) Strand and the Reverse Strand = the genomic Minus (-) Strand. In this context, Forward and Reverse ARE NOT EQUIVALENT to the Forward and Reverse Strand designations originating from dbSNP or as given in Infinium Genotyping manifests.

  1. Probe_SNPs: rsid(s) of SNP(s) located 10­–50 bases from the target CpG.

The rs number is an accession number used by researchers and databases to refer to specific SNPs. It stands for Reference SNP cluster ID.

  1. Probe_SNPs_10: rsid(s) of SNP(s) located ≤ 10 bases from the target CpG.
  2. Random_Loci: CpG loci chosen randomly by consortium members during the design process are marked “True”.
  3. Methyl27_Loci: CpG’s carried over from the HumanMethylation27 array (95% carryover) are marked “True”.
  4. UCSC_RefGene_Name: Target gene name(s), from the USCS Database. Multiple listings of the same gene name indicate splice variants.
  5. UCSC_RefGene_Accession: The UCSC accession number(s) of the target transcript(s). Accession numbers are given in the same order as the target gene transcripts.
  6. UCSC_RefGene_Group: Gene region feature category describing the CpG position, from UCSC. Features listed in the same order as the target gene transcripts.
    • TSS200 = 0–200 bases upstream of the transcriptional start site (TSS).
    • TSS1500 = 200–1500 bases upstream of the TSS.
    • 5'UTR = Within the 5' untranslated region, between the TSS and the ATG start site.
    • Body = Between the ATG and stop codon; irrespective of the presence of introns, exons, TSS, or promoters.
    • 3'UTR = Between the stop codon and poly-A signal.
  7. UCSC_CpG_Islands_Name: Chromosomal coordinates of the CpG Island from UCSC.
  8. Relation_to_UCSC_CpG_Island: The location of the CpG relative to the CpG island.
    • Shore = 0–2 kb from island.
    • Shelf = 2–4 kb from island.
    • N = upstream (5’) of CpG island.
    • S = downstream (3’) of CpG island.
  9. Phantom: Classifications from the FANTOM (Functional Annotation of the Mammalian Genome) consortium as a low- or high-CpG density region associated with FANTOM 4 promoters.
  10. DMR: Differentially methylated regions (experimentally determined).
    • DMR = Differentially Methylated Region.
    • CDMR = Cancer-specific Differentially Methylated Region.
    • RDMR = Reprogramming-specific Differentially Methylated Region.
  11. Enhancer: Predicted enhancer elements (determined by the ENCODE Consortium using informatics) are marked “True”.
  12. HMM_Island: Hidden Markov Model Islands. Chromosomal map coordinates of computationally predicted CpG islands.
  13. Regulatory_Feature_Name: Chromosomal map coordinates of the regulatory feature (determined by the ENCODE Consortium using informatics).
  14. Regulatory_Feature_Group: Description of the regulatory feature referenced in “Regulatory_Feature_Name” as provided by the Methylation Consortium.
    • Gene_Associated
    • Gene_Associated_Cell_type_specific
    • NonGene_Associated
    • Promoter_Associated
    • Promoter_Associated_Cell_type_specific
    • Unclassified
    • Unclassified_Cell_type_specific
  15. DHS: DNase I Hypersensitivity Site (experimentally determined by the ENCODE project).

Probe Address, Sequence, and Color

A single Type I probe detects a single CpG region in both, methylated and unmethylated DNA, using different probes. For example, probe sequence A detects methylated DNA and probe sequence B detects unmethylated DNA. The single-base extension of the probe can incorporate either a green or a red base. Thus, in order to know which color represents the methylated probe, the color information is given in the Color_Channel column.

A single Type II probe detects a single CpG region in methylated and unmethylated DNA using a single probe. Thus for this probe type, there is a single probe sequence (AlleleA_ProbeSeq) and a single probe address (AddressA_ID). The methylation state of the DNA is identified by the color of the base that extends the probe.

* Green = methylated DNA
* Red = unmethylated DNA

For this reason, there is no color information for Type II probes in the Color_Channel column.

References

  1. https://sapac.support.illumina.com/bulletins/2016/05/infinium-methylationk-manifest-column-headings.html
  2. Page 19 of minfi.pdf