HGAP4 - NBISweden/workshop-genome_assembly GitHub Wiki

HGAP4 / Falcon assembly

Notes:

  • HGAP4 is based on the Falcon assembler.
  • HGAP4 is available through the SMRT Link package (and SMRT Link must be correctly setup to either use the queuing system or run locally)
  • pbsmrtpipe can be forced to run locally using --local-only.
  • Falcon-Unzip is not compatible with HGAP4 output, but the folder structure can be modified with some work. See https://pb-falcon.readthedocs.io/en/latest/hgap4_adapt.html for details.
  • If running locally on a node, modify the workflow options accordingly after running pbsmrtpipe show-workflow-options.

Assembly with HGAP4

This uses the pbsmrtpipe preconfigured pipeline to run HGAP4.

# 1) Convert subreads BAM to a subreadset XML
dataset create ${PREFIX}.subreadset.xml ${PREFIX}.subreads.bam

# 2) Create a preset XML for the pipeline parameters, and print description of parameters to screen.
pbsmrtpipe show-template-details -o ${PREFIX}.preset.xml pbsmrtpipe.pipelines.polished_falcon_fat

# 3) Change important parameters in the ${PREFIX}.preset.xml such as estimated genome size (`HGAP_GenomeLength_str`).
# Add --placeGapConsistently to pbalign parameters (`pbalign.task_options.algorithm_options`).
# Data can be downsampled by changing the `pbcoretools.task_options.downsample_factor`. See screen output for description.
vim ${PREFIX}.preset.xml

# 4) Run HGAP4 through SMRTpipe
pbsmrtpipe pipeline-id pbsmrtpipe.pipelines.polished_falcon_fat -e eid_subread:${PREFIX}.subreadset.xml -o ${OUTDIR}_hgap4asm --preset-xml ${PREFIX}.preset.xml