Output files - Snitkin-Lab-Umich/QCD GitHub Wiki
QCD output files—where to find your results
This page describes the main output files generated by the QCD workflow, including annotation, assembly, and quality control summary. Understanding where to find these files and what they represent is essential for downstream analysis.
1. Assembly files (spades)
Location:
results/{prefix}/spades/{sample}/{sample}_contigs_l1000.fasta
Description:
- This file contains assembled contigs > than 1kb for each sample.
- The
_l1000.fasta
suffix indicates that only contigs of at least 1000 base pairs are included. - Why use this file?
Short contigs (<1000 bp) are often low quality or uninformative. Filtering for contigs ≥1000 bp improves downstream annotation and analysis by focusing on more reliable sequence data.
2. Annotation and genbank files (Prokka)
Location:
results/{prefix}/prokka/{sample}/{sample}.gff
Description:
- This
.gbk
file contains the annotated assembly in GenBank format - The
.gff
file contains gene predictions and functional annotations in GFF format. - Other Prokka outputs (in the same folder) include:
{sample}.gbk
— GenBank format annotation (see below){sample}.faa
— Protein sequences{sample}.ffn
— Nucleotide sequences of predicted genes
3. Summary report
Location:
results/{prefix}/{prefix}_Report/data/{prefix}_QC_summary.csv
Description:
- This CSV file summarizes key QC metrics for all samples, including coverage, assembly statistics, annotation results, and pass/fail status, etc.
- Use this file to quickly assess which samples passed all QC steps and to review detailed metrics for each sample.
4. Additional Notes
- All output files are organized under the
results/{prefix}/
directory for easy navigation.
Best Practices:
- Always use the
_l1000.fasta
assembly file for annotation and downstream analysis to avoid including unreliable short contigs. - If you are running variant calling, refer to the
.gff
and.gbk
files in the Prokka output directory. - Use the QC summary to filter or flag samples for further analysis.