Home - StephenFordham/BacGenomePipeline GitHub Wiki
Welcome to the BacGenomePipeline wiki!
General Description
BacGenomePipeline
Complete Bacterial Genome Assembly and Annotation Pipeline
Program developed by Stephen Fordham
General Description
BacGenomePipeline is a complete convenience bacterial genome assembly pipeline. Assembled and annotated bacterial genomes can be created with only Oxford Nanopore long raw reads as input! BacGenomePipeline can accept either fastq or gzipped fastq files.
Relax and grab a coffee while BacGenomePipeline does the genomic heavy lifting.
This pipeline filters raw reads to produce the best 500mb reads. The filtering process also places weight on read quality, to ensure small high quality reads are not discarded. This is considered vital to aid the recovery of small plasmids present within bacterial strains.
Optionally, the user can run Nanostat to assess read quality metrics. The best reads are then assembled using the flye genome assembler with settings adjusted to help recovery of plasmids with an imbalanced distribution. Optionally, the assembly is then polished with one round of medaka-consensus polishing. The polished assembly is annotated using staramr which scans bacterial genome contigs against the ResFinder, PointFinder, and PlasmidFinder databases (used by the ResFinder webservice and other webservices offered by the Center for Genomic Epidemiology) and abricate and compiles a summary report of detected antimicrobial resistance and virulence genes.
The default settings selected in BacGenomePipeline have been tested against challenging gemomes, such as Klebsiella pneumoniae strain ATCC700721/MGH78578. This strain contains 2 small plasmids (3.4kb and 4.2kb), two medium sized plasmids (88kb and 107.5kb), and one large plasmid (175kb) in addtion to the chromosome (5.3mb). The pipeline was able to successfully build to closure (i.e. assemble as a circular unitig) all structures exlusively using ONT long reads!
BacGenomePipline can now be run in 4 modes. These modes include; pipeline, pipe_red_mem, assembly and annotation. These modes offer the user more flexibility when using BacGenomePipe. For example, the user may want to only run an assembly, alternatively the user may have a gemome assembly in FASTA format and want to annotate the assembly for antimicrobial resistance and virulence genes.
Additionally, BacGenomePipeline can be run in 4 modes.
These modes include:
- Running the entire pipeline workflow.
--pipeline
- Running the pipeline using reduced memory by setting parameters for genome size and coverage for initial disjointings.
--pipe_red_mem
- Running a genome only assembly.
--assembly
- Running the annotation step on an pre-exisiing genome assembly in FASTA format.
--annotation