Demultiplexing - nsc-norway/pipeline GitHub Wiki

Illumina demultiplexing

Illumina sequencing is typically done with multiple samples pooled and processed in a single experiment, as it is seldom necessary to have the full data of the run for a single sample. The samples are identified by adding "index sequences" (also known as barcode sequences) to the DNA fragments to be sequenced.

The demultiplexing process separates the data from each experiment into data files for each sample. It also converts the data from BCL format into compressed fastq format (fastq.gz). Fastq is a de-facto standard format for DNA and RNA sequence data, used by almost all downstream applications for research and clinical purposes.

Demultiplexing applications for Illumina sequencers:

Demultiplexing details

Demultiplexing results:

  • One directory per project
  • One subdirectory per sample in a project
    • Contains fastq.gz files
  • Global demultiplexing statistics

The structure and naming of the fastq files is changed by the NSC scripts to keep the data delivery format consistent even when the underlying tools change.

While not required, samples from different projects are typically kept separate, and not multiplexed.

The demultiplexing is configured using a "sample sheet", which is a CSV file in a predefined format. The sample sheet may be generated by the LIMS.