HGAP4 - NBISweden/workshop-genome_assembly GitHub Wiki
HGAP4 / Falcon assembly
Notes:
- HGAP4 is based on the Falcon assembler.
- HGAP4 is available through the SMRT Link package (and SMRT Link must be correctly setup to either use the queuing system or run locally)
pbsmrtpipe
can be forced to run locally using--local-only
.- Falcon-Unzip is not compatible with HGAP4 output, but the folder structure can be modified with some work. See https://pb-falcon.readthedocs.io/en/latest/hgap4_adapt.html for details.
- If running locally on a node, modify the workflow options accordingly after running
pbsmrtpipe show-workflow-options
.
Assembly with HGAP4
This uses the pbsmrtpipe
preconfigured pipeline to run HGAP4.
# 1) Convert subreads BAM to a subreadset XML
dataset create ${PREFIX}.subreadset.xml ${PREFIX}.subreads.bam
# 2) Create a preset XML for the pipeline parameters, and print description of parameters to screen.
pbsmrtpipe show-template-details -o ${PREFIX}.preset.xml pbsmrtpipe.pipelines.polished_falcon_fat
# 3) Change important parameters in the ${PREFIX}.preset.xml such as estimated genome size (`HGAP_GenomeLength_str`).
# Add --placeGapConsistently to pbalign parameters (`pbalign.task_options.algorithm_options`).
# Data can be downsampled by changing the `pbcoretools.task_options.downsample_factor`. See screen output for description.
vim ${PREFIX}.preset.xml
# 4) Run HGAP4 through SMRTpipe
pbsmrtpipe pipeline-id pbsmrtpipe.pipelines.polished_falcon_fat -e eid_subread:${PREFIX}.subreadset.xml -o ${OUTDIR}_hgap4asm --preset-xml ${PREFIX}.preset.xml