5. Output - sneuensc/mapache GitHub Wiki

Columns in FASTQ_stats.csv

Each row in this file corresponds to each row in the samples file specified in the config file. For example, if 5 FASTQ files were mapped to two reference genomes, this file should have 5 * 2 rows (+ 1 line for the header).

genome: Genome ID to which the reads were mapped.
SM: Sample ID
LB: Library ID
ID: ID of FASTQ file analysed, as specified in the samples file.
reads_raw: Number of starting raw reads in FASTQ file.
reads_trim: Number of reads that passed the trimming step.
trim_prop: Proportion of reads that passed the trimming step.
mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
length_reads_raw: Average length of starting raw reads in FASTQ file.
length_reads_trimmed: Average length of reads that passed the trimming step.
length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.

Notice that no statistics for the duplicated reads is reported. This is because duplicates are removed/flagged only at the library level.

Columns in LB_stats.csv

The statistics reported in this table are grouped by library. Thus, if your samples file had 3 libraries mapped to 4 different genomes, this table should have 3 * 4 rows (+1 row for the header).

genome: Genome ID to which the reads were mapped.
SM: Sample ID
LB: Library ID
reads_raw: Number of starting raw reads in all FASTQ files of the library.
reads_trim: Number of reads that passed the trimming step.
trim_prop: Proportion of reads that passed the trimming step.
mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
duplicates: Number of mapped reads that were identified as duplicates.
duplicates_prop: Proportion of mapped reads that were identified as duplicates.
mapped_unique: Number of mapped reads passing the mapping quality filter, after removing duplicates.
length_reads_raw: Average length of starting raw reads in the library.
length_reads_trimmed: Average length of reads that passed the trimming step.
length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
length_mapped_unique: Average length of mapped reads passing the mapping quality filter, after removing duplicates.
endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.
endogenous_unique: Endogenous proportion, computed as mapped_unique / reads_raw.
Sex: Sex inferred for the individual, if the sex inference was requested (otherwise there is a message in this cell).
read_depth Average read depth.

Optional (see options below)

If you asked to output the average depth of coverage for a specific chromosome, you will have extra columns prefixed with depth_ and followed by the name of the chromosome.

depth_chromosome_name: Average read depth for chromosome chromosome_name.

Columns in SM_stats.csv

genome: Genome ID to which the reads were mapped.
SM: Sample ID
reads_raw: Number of starting raw reads in all FASTQ files of all the libraries of the sample.
reads_trim: Number of reads that passed the trimming step.
trim_prop: Proportion of reads that passed the trimming step.
mapped_raw: Number of reads that were mapped, passing the mapping quality threshold (duplicates included).
duplicates: Number of mapped reads that were identified as duplicates.
duplicates_prop: Proportion of mapped reads that were identified as duplicates.
mapped_unique: Number of mapped reads passing the mapping quality filter, after removing duplicates.
length_reads_raw: Average length of starting raw reads in the sample.
length_reads_trimmed: Average length of reads that passed the trimming step.
length_mapped_raw: Average length of reads that were mapped, passing the mapping quality threshold (duplicates included).
length_mapped_unique: Average length of mapped reads passing the mapping quality filter, after removing duplicates.
endogenous_raw: Raw endogenous proportion, computed as mapped_raw / reads_raw.
endogenous_unique: Endogenous proportion, computed as mapped_unique / reads_raw.
Sex: Sex inferred for the individual, if the sex inference was requested (otherwise there is a message in this cell).
read_depth Average read depth.

Optional (see options below)

If you asked to output the average depth of coverage for a specific chromosome, you will have extra columns prefixed with depth_ and followed by the name of the chromosome.