5. Output - sneuensc/mapache GitHub Wiki

Columns in FASTQ_stats.csv

Each row in this file corresponds to each row in the samples file specified in the config file. For example, if 5 FASTQ files were mapped to two reference genomes, this file should have 5 * 2 rows (+ 1 line for the header).

  • genome: Genome ID to which the reads were mapped.
  • SM: Sample ID
  • LB: Library ID
  • ID: ID of FASTQ file analysed, as specified in the samples file.
  • reads_raw: Number of starting raw reads in FASTQ file.
  • reads_trim: Number of reads that passed the trimming step.
  • trim_prop: Proportion of reads that passed the trimming step.
  • mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • length_reads_raw: Average length of starting raw reads in FASTQ file.
  • length_reads_trimmed: Average length of reads that passed the trimming step.
  • length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.

Notice that no statistics for the duplicated reads is reported. This is because duplicates are removed/flagged only at the library level.

Columns in LB_stats.csv

The statistics reported in this table are grouped by library. Thus, if your samples file had 3 libraries mapped to 4 different genomes, this table should have 3 * 4 rows (+1 row for the header).

  • genome: Genome ID to which the reads were mapped.
  • SM: Sample ID
  • LB: Library ID
  • reads_raw: Number of starting raw reads in all FASTQ files of the library.
  • reads_trim: Number of reads that passed the trimming step.
  • trim_prop: Proportion of reads that passed the trimming step.
  • mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • duplicates: Number of mapped reads that were identified as duplicates.
  • duplicates_prop: Proportion of mapped reads that were identified as duplicates.
  • mapped_unique: Number of mapped reads passing the mapping quality filter, after removing duplicates.
  • length_reads_raw: Average length of starting raw reads in the library.
  • length_reads_trimmed: Average length of reads that passed the trimming step.
  • length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • length_mapped_unique: Average length of mapped reads passing the mapping quality filter, after removing duplicates.
  • endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.
  • endogenous_unique: Endogenous proportion, computed as mapped_unique / reads_raw.
  • Sex: Sex inferred for the individual, if the sex inference was requested (otherwise there is a message in this cell).
  • read_depth Average read depth.

Optional (see options below)

If you asked to output the average depth of coverage for a specific chromosome, you will have extra columns prefixed with depth_ and followed by the name of the chromosome.

  • depth_chromosome_name: Average read depth for chromosome chromosome_name.

Columns in SM_stats.csv

  • genome: Genome ID to which the reads were mapped.
  • SM: Sample ID
  • reads_raw: Number of starting raw reads in all FASTQ files of all the libraries of the sample.
  • reads_trim: Number of reads that passed the trimming step.
  • trim_prop: Proportion of reads that passed the trimming step.
  • mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • duplicates: Number of mapped reads that were identified as duplicates.
  • duplicates_prop: Proportion of mapped reads that were identified as duplicates.
  • mapped_unique: Number of mapped reads passing the mapping quality filter, after removing duplicates.
  • length_reads_raw: Average length of starting raw reads in the sample.
  • length_reads_trimmed: Average length of reads that passed the trimming step.
  • length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
  • length_mapped_unique: Average length of mapped reads passing the mapping quality filter, after removing duplicates.
  • endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.
  • endogenous_unique: Endogenous proportion, computed as mapped_unique / reads_raw.
  • Sex: Sex inferred for the individual, if the sex inference was requested (otherwise there is a message in this cell).
  • read_depth Average read depth.

Optional (see options below)

If you asked to output the average depth of coverage for a specific chromosome, you will have extra columns prefixed with depth_ and followed by the name of the chromosome.