Algorithm - Oshlack/Lace GitHub Wiki

The algorithm to construct the SuperTranscript can be conceptually depicted with the figure below:

https://github.com/Quarkins/SuperTranscript/blob/master/WikiFigs/Software_flow_v2.png

Breaking this down into steps:
1) Input a list of trancsript sequence in a fasta file and a text file with the clustering information for which gene/cluster each transcript belongs to.

For each gene (or defined cluster):
2) Using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html) pairwise align each transcript in the cluster to find the regions of the transcripts which overlap.

3) Construct a directed graph, where each node is a base on one of the transcripts and the directed edge retains the ordering of the bases in each transcript. Using the pairwise alignments of all transcripts in a cluster, merge shared bases (nodes) together.

4) Simplify graph and remove all cycles (see below) in order to create a Directed Acyclic Graph (DAG), necessary for sorting.

5) Topologically sort the nodes ( each node now is a string of bases from the original unsimplified graph) using Khan's algorithm, which will give a non-unique sorting of the bases.

6) Extract the annotations (both SuperBlock style and transcript style)

Example of a cycle break: