The Slope Method - mbassalbioinformatics/SLICER GitHub Wiki

The "slope" method for de novo reference generation operates on the principle that authentic, unaltered sequences within a library will be substantially more prevalent than variants arising from sequencing errors or minor biological mutations.

The process starts by sorting all identified valid barcodes in descending order based on their read counts. Subsequently, the percentage decrease in read counts between successive, sorted barcodes—termed the "slope" of the read count distribution—is calculated. A significant, abrupt increase in this slope (a steep drop in read abundance) serves as a heuristic to differentiate highly abundant barcodes (presumed to be true references) from their less frequent, and therefore more likely erroneous or variant counterparts. Barcodes that appear in the distribution before this inflection point are consequently selected as putative reference barcodes.

For each selected reference barcode, SLICER then identifies all unique core sequences associated with it in the sequenced data. To determine the most representative core sequence for a given reference barcode, pairwise Levenshtein distances are computed among all its associated unique core sequences. The core sequence exhibiting the lowest average pairwise dissimilarity score to all other associated core sequences is designated as the "true" core sequence for that barcode. The final predicted reference sequence is then constructed by concatenating the Left Flanking Sequence (LFS), the selected reference BARCODE, the Right Flanking Sequence (RFS), and the determined "true" Core Sequence.

fig3_slope