Autocycler resolve - rrwick/Autocycler GitHub Wiki

Basics

This step is performed on a per-cluster basis. It aims to resolve repeats and ambiguities in the cluster by following these steps:

  • Identifying anchors: unitigs which occur exactly once in each of the cluster's contigs.
  • Creating bridges: connections between anchors that follow the most common path.
  • Applying bridges: first unique bridges (those with no conflicts) and then the most supported conflicting bridges.
  • Merging linear sequences into 'consentigs'.

Ideally, this will result in a single consentig for the cluster.

Example command

for c in autocycler_out/clustering/qc_pass/cluster_*; do
    autocycler resolve -c "$c"
done

Autocycler resolve is typically run on each of the QC-pass clusters, so the command above is in a Bash loop.

This command takes a cluster directory (represented by "$c" in the example) which must contain a 2_trimmed.gfa file (created by the previous step, Autocycler trim). It will create 3_bridged.gfa, 4_merged.gfa and 5_final.gfa files in the cluster directory.

Full usage

Usage: autocycler resolve [OPTIONS] --cluster_dir <CLUSTER_DIR>

Options:
  -c, --cluster_dir <CLUSTER_DIR>  Autocycler directory (required)
      --verbose                    Enable verbose output
  -h, --help                       Print help
  -V, --version                    Print version

Implementation details

Bridges can either be unique or conflicting. Unique bridges are where the anchor order is consistent across all input contigs, and conflicting bridges are where different input contigs have different anchor orders. The 3_bridged.gfa and 4_merged.gfa files contain unique bridges but not conflicting bridges. The 5_final.gfa file also has the most-supported conflicting bridges applied. If a cluster has no conflicting bridges, 4_merged.gfa and 5_final.gfa will be the same.

For example, if all input contigs agree that anchor 11+ is followed by anchor 18-, then 11+ → 18- is a unique bridge. However, if some input contigs have anchor 11+ followed by anchor 18- but others have anchor 11+ followed by anchor 22+, then Autocycler will make two conflicting bridges: 11+ → 18- and 11+ → 22+. To resolve conflicts, Autocycler iteratively discards the least-supported conflicting bridge (present in the fewest input contigs) until no conflicts remain.

Notes

  • Examining the 4_merged.gfa file can reveal locations of structural heterogeneity in the genome – sites where input assemblies disagreed on the ordering of anchor unitigs.

Toy example

(The toy example is introduced on the Autocycler compress page.)

The image below shows the two clusters as they progress through Autocycler resolve. Cluster 1 has structural heterogeneity, so it remains incomplete in the 4_merged.gfa file, where only unique bridges are applied, but it is fully resolved in the 5_final.gfa file after conflicting bridges are applied.

Autocycler resolve graphs

⚠️ **GitHub.com Fallback** ⚠️