Links - graph-genome/Schematize GitHub Wiki

Links for Rearrangements

The Matrix is incapable of depicting nonlinear structural rearrangements. We therefore break the matrix up into blocks, called Components, where the ends of each Component correspond to the break-points for inversions and translocations in the genome. We use colored lines to show structural variations present in the pangenome. These join different components in different ways for each individual. We call these lines “Links”.

Link is an alternative Edge representing a structural rearrangement that shows the order of matrix components in each individual. (Graph Genomes use edges for every kind of variant. Our Component Segmentation software identifies the rare few Edges that are nonlinear rearrangements and groups individuals sharing the same structural rearrangement (https://github.com/graph-genome/component_segmentation). If rearrangements are rare, one Component can contain thousands of Graph Genome Nodes.)

In order to clearly show which Link applies to which individual, we introduce “Departure” and “Arrival” columns at the end of each Component. Links are drawn with an arrow point on the Arrivals side, and presence / absence of that particular Link is shown in the Link Column below for each individual.

There can be more than one Departure or Arrival Column at the edges of one component, to show multiple structural rearrangements at the same break point. The rule for reading multiple Links is that each individual will only follow each Link at most once.

Show Rearrangements Only view: Each box is one Component. Adjacent Components are connected by black Links. Alternative Links are structural variants shown in various colors. Links can be decorated with an allele frequency based on the number of individuals that share the structural variant.

The Matrix containing sequence can be hidden using “Show Rearrangements Only” view, in which sequence information is hidden in favor of showing the frequency of different structural variants within the pangenome. The browser has two affordances for following a particular individual.

The user can hover over an individual to highlight the entire path of that individual. Links show non-linear connections between two components. Sometimes the end of the Link corresponds to a component outside the viewport. Clicking on a Link can be used to jump the genome browser to the other side of the Link when the corresponding Component is outside the viewport.

Figure: Schematic Layout for Graph Genomes. Top: The Five aligned sequences that were used to generate the Graph with color coded components. Bottom: Schematic showing all information available inside of a graph genome: SNPs, indels, structural rearrangements, and copy number variation. The last row of GGTT is colored more darkly because of the two traversals of the same Component.

Figure: A) Five example sequences in a multiple sequence alignment. Similar sequences have been colored by hand to indicate rearrangements not shown in a multiple sequence alignment. B) A Preview of a Pangenome Schematic showing all the same variation but including the transpositions and duplications. SNPs, duplications, and rearrangements in five individuals. Every component reads from left to right; only follow each Link once.

Next: Inversions