Tools - NBISweden/workshop-genome_assembly GitHub Wiki
Scripts and Code snippets for tools used in Genome Assembly.
Uppmax specific information
- Writing slurm scripts: How to include the code snippets below as slurm scripts.
- Tool installation on Uppmax: How to install tools for yourself on Uppmax.
General Scripting
- Pure Bash Bible: A collection of pure bash alternatives to external processes.
Basecalling and Format conversion:
- Flappie: ONT Fast5 base-calling and Fastq conversion ( > R9.X ).
- Guppie: ONT Fast5 base-calling and Fastq conversion ( > R9.X ).
- SMRT Tools: PacBio HDF5 format conversion to BAM and Fastq.
- Seqret: General format conversion tool, including Sanger trace signals to Fasta and Fastq
Read QC:
- Fastq Validation: Format validation
- pbvalidate: PacBio data validation
- Data Quantity: Data quantity assessment
- FastQC: General data property assessment, Illumina
- Kmer Analysis Toolkit: Data quantity assessment, Bias evaluation, Illumina
- Kraken: Data contamination assessment
- FastQ Screen: Data contamination assessment
- Mash Screen: Data contamination assessment
Filtering:
- Trimmomatic: Adapter trimming
- Fastp: Adapter Trimming, Quality trimming, Nova/NextSeq poly-G removal, merging, ...
- Subsampling: Data reduction
- Normalisation: Data reduction
- Reference Filtering: Contaminant removal
- Pacasus: Chimeric read correction for long read WGA data
- yacrd: Chimeric read detection for long read data.
Assemblers:
- Spades: Illumina, Hybrid
- MaSuRCA: Illumina, Hybrid
- Abyss: Illumina, Hybrid
- Canu: ONT, PacBio
- Miniasm: ONT, PacBio
- HGAP4: PacBio
- Falcon-integrate + Falcon-Unzip: PacBio diploid aware assembler
- wtdbg2: ONT, PacBio - (also known as Redbean)
- Marvel: ONT, PacBio - Assembler for highly repetitive genomes
- Shasta: ONT, PacBio
- Supernova: 10X Genomics diploid aware assembler
Scaffolding, Gap-Filling, and Assembly reconciliation
- Links: ONT, PacBio, 10X scaffolding.
- Sealer: ONT, PacBio, 10X gap-filling.
- HapCut2: Hi-C Pipeline.
- Juicer: Hi-C Pipeline.
- Salsa: Reference-free Hi-C scaffolding
- RaGOO: Reference-aided Hi-C scaffolding
- Metassembler: Assembly reconciliation
- GAM-NGS: Assembly reconciliation
- NucMerge: Assembly reconciliation
- QuickMerge: Assembly reconciliation
Consensus, and Polishing:
- Racon: Illumina, ONT, PacBio, 10X.
- Pilon: Illumina, (ONT, PacBio, 10X).
- Arrow: PacBio Sequel
- Quiver: PacBio RSII
- Medaka: ONT base level
- Nanopolish: ONT signal level
- ntEdit: Scalable polishing, Illumina.
- Indel correction pipeline: Fix indels remaining after polishing.
Assembly QC:
- Preseq: Data quantification
- Quast: Assembly sequence metrics
- Kmer Analysis Toolkit: Assembly completeness metrics, Illumina
- FRCBam: Assembly accuracy metrics, Illumina
- NucBreak: Assembly accuracy metrics, Illumina
- TigMint: Assembly accuracy metrics, 10X, (ONT?, PacBio?)
- Busco: Assembly gene space metrics
- Bandage: Assembly graph visualisation and manipulation
- HBAR-DTK: PacBio HGAP assembler graph visualisation, (any Celera based assembler).
- Blast: Assembly contamination metrics
- Blobtools: Assembly contamination metrics
- Kraken: Assembly contamination metrics
- MashMap: Assembly build comparison