IDAT Files and Object Types - sups-k/methylation GitHub Wiki
IDAT Files
The last step carried out by the Infinium® Human Methylation 450 Beadchip is imaging, where the Illumina Scanner uses red and green lasers to excite the fluorophores attached to the probes on each bead of the array, and measures the fluorescence intensity signal for each bead.
IDAT files are the raw intensities obtained from each colour channel - red and green - for each sample. This means that every sample will have 2 IDAT files associated with it.
Reading IDAT Files
RGChannelSet
IDAT files are binary files and cannot be read with a standard text editor. There are different open source packages available in R to read these files. I'm using the most popular one, minfi, to read my files. Minfi reads IDAT files using the function read.metharray.exp(targets = data.frame). Prior to reading in the IDAT files, you have to load the sample sheet as a data.frame to specify which IDAT files are to be read. This function creates an output of the type RGChannelSet. The data in the RGChannelSet is organized at the probe level, not CpG level. This means it contains data for each probe sequence (given by probe address - Address A or Address B) not for the actual CpG position in the DNA. It also contains data for control probes. No other object downstream will contain probe information.
This object contains details of the type of array (450k or EPIC), all the information from the sample sheet, and the raw intensities in 2 different vectors (green and red). Each vector contains intensities for each probe and each sample. You can see the raw intensities by giving the commands rgSet@assays@data@listData["Green"](/sups-k/methylation/wiki/"Green") and rgSet@assays@data@listData["Red"](/sups-k/methylation/wiki/"Red").
MethylSet
After the IDAT files are read, the next step is to convert them into a MethylSet object which will organize the raw intensities by the CpG locus level. This means that instead of data being defined for each probe sequence like in the RGChannelSet, the data is now defined for each CpG position in the DNA (CpG probe ID that starts with "cg"). However, the data is not mapped to the genome. The MethylSet divides the data into 2 channels: methylated and unmethylated. This object has fewer features because some loci are measured using two probes.
Minfi provides an option to perform the kind of normalisation that Illumina's software Genome Studio performs on its data. This comprises background correction and control normalisation (normalising values based on the control probes). The function is called this way: preprocessIllumina(rgSet, bg.correct = TRUE, normalize = "controls").
GenomicMethylSet
When you map the MethylSet to a genome, which in this case is hg19 because Illumina probes are configured for hg19, using the function mapToGenome, you get a GenomicMethylSet. This object is used when we have to discard all the SNP readings. In further downstream analyses, only the GenomicMethylSet with SNP's discarded will be used.