Output files - Snitkin-Lab-Umich/QCD GitHub Wiki

QCD output files—where to find your results

This page describes the main output files generated by the QCD workflow, including annotation, assembly, and quality control summary. Understanding where to find these files and what they represent is essential for downstream analysis.


1. Assembly files (spades)

Location:

results/{prefix}/spades/{sample}/{sample}_contigs_l1000.fasta

Description:

  • This file contains assembled contigs > than 1kb for each sample.
  • The _l1000.fasta suffix indicates that only contigs of at least 1000 base pairs are included.
  • Why use this file?
    Short contigs (<1000 bp) are often low quality or uninformative. Filtering for contigs ≥1000 bp improves downstream annotation and analysis by focusing on more reliable sequence data.

2. Annotation and genbank files (Prokka)

Location:

results/{prefix}/prokka/{sample}/{sample}.gff

Description:

  • This .gbk file contains the annotated assembly in GenBank format
  • The .gff file contains gene predictions and functional annotations in GFF format.
  • Other Prokka outputs (in the same folder) include:
    • {sample}.gbk — GenBank format annotation (see below)
    • {sample}.faa — Protein sequences
    • {sample}.ffn — Nucleotide sequences of predicted genes

3. Summary report

Location:

results/{prefix}/{prefix}_Report/data/{prefix}_QC_summary.csv

Description:

  • This CSV file summarizes key QC metrics for all samples, including coverage, assembly statistics, annotation results, and pass/fail status, etc.
  • Use this file to quickly assess which samples passed all QC steps and to review detailed metrics for each sample.

4. Additional Notes

  • All output files are organized under the results/{prefix}/ directory for easy navigation.

Best Practices:

  • Always use the _l1000.fasta assembly file for annotation and downstream analysis to avoid including unreliable short contigs.
  • If you are running variant calling, refer to the .gff and .gbk files in the Prokka output directory.
  • Use the QC summary to filter or flag samples for further analysis.