HBAR DTK - NBISweden/workshop-genome_assembly GitHub Wiki

HBAR-DTK: HGAP Assembly layout tool

Notes:

  • This tool is used to layout the graph from the HGAP3 pipeline.
  • Dependencies: AMOS 3.1.0
  • Writes a GML file rather than GFA. GML can be opened with yEd.

Install notes:

module load python/2.7.15
python -m virtualenv HGAP3_env --system-site-packages
cd HGAP3_env
git clone https://github.com/PacificBiosciences/HBAR-DTK
source bin/activate
pip install networkx biopython

Patch notes for CA_best_edge_to_GML.py (CA_best_edge_to_GML.patch) to work with current version of AMOS:

34c34
<     os.system("tigStore -g %s -t %s 1 -d fr -u %d > frag_list" % ( gkp_store, tig_store, unitig_id) )
---
>     os.system("tigStore -g %s -t %s 1 -d frags -u %d > frag_list" % ( gkp_store, tig_store, unitig_id) )
36c36
<     args = shlex.split( "tigStore -g %s -t %s 1 -d fr -u %d" % ( gkp_store, tig_store, unitig_id) )
---
>     args = shlex.split( "tigStore -g %s -t %s 1 -d frags -u %d" % ( gkp_store, tig_store, unitig_id) )
51c51
<         id1, lib_id, best5, o1, best3, o3 = l
---
>         id1, lib_id, best5, o1, best3, o3, e_rate5, e_rate3 = l

The command patch originalfile -i patchfile.patch -o updatedfile will apply the changes to the original file to make the updated file.

Code to make GML file (run_draw_assembly_graph.sh):

#! /usr/bin/env bash

PROJ='/proj/uppstoreXXXX'
HGAP3_ENV="$PROJ/NBIS_Assembly/tools/HGAP3_env"
HGAP3_TOOLS="$HGAP3_ENV/HBAR-DTK/src"
# Add Celera tools (AMOS) to path
PATH="$PATH:/sw/apps/bioinfo/cabog/8.1/bianca/bin"

GKP_STORE="$PROJ/NGI_deliveryXXX/pb_XXX/SMRTportal/pb_XXXX/XXXXX/data/celera-assembler.gkpStore"
TIG_STORE="$PROJ/NGI_deliveryXXX/pb_XXX/SMRTportal/pb_XXXX/XXXXX/data/celera-assembler.tigStore"
BEST_EDGE="$PROJ/NGI_deliveryXXX/pb_XXX/SMRTportal/pb_XXXX/XXXXX/data/4-unitigger/best.edges"
GML_OUTPUT="assembly_graph.gml"

source "$HGAP3_ENV/bin/activate"
./CA_best_edge_to_GML.py "$GKP_STORE" "$TIG_STORE" "$BEST_EDGE" "$GML_OUTPUT"