02. How it works - cbg-ethz/LongSom GitHub Wiki
LongSom overview
LongSom takes an aligned bam file and a barcode-to-cell-type file as input. It detects variants in both cancer and non-cancer cells, then calls somatic variants, i.e. variants unique to cancer cells.
LongSom first corrects the cell type annotation based on the mutational burden of cells. The idea is that cancer cells misannotated as non-cancer will cause somatic variants to be detected in non-cancer cells and filtered out as germline variants (false negatives). To avoid this, LongSom detects cancer cells misannotated as non-cancer and reannotates them as cancer cells.
After reannotation, LongSom calls SNVs using a modified version of SComatic and fusions using ctat-LR-fusion. It then uses Bayesian non-parametric clustering BnpC to cluster cells into subclones based on those somatic SNVs and fusions. In parallel, LongSom uses inferCNV to call CNAs and cluster cells into subclones based on them.
For more information, see our workflow below as well as or read Dondi et al. 2024.
Workflow
Figure 1: Overview of LongSom.
a. LongSom’s methodology for detecting somatic SNVs, fusions, and CNAs and subsequently inferring cancer subclones in LR scRNA-seq individual patients data. (1) SNV and (2) fusion candidates are detected from pseudo-bulk samples. (3) High-confidence cancer variants (SNVs and fusions) are selected based on mutated cell fraction in cancer and non-cancer cells. (4) Cells are reannotated based on high-confidence cancer variants. (5) A new set of candidate variants is called based on reannotated barcodes. (6) Candidate SNVs are filtered through a set of 10 filters. (7) cells are clustered based on somatic fusions and SNVs. In parallel, (8) gene expression per cell is computed, (9) CNAs are detected, (10) cells are clustered based on CNAs, and (11) CNA clones are incorporated to the fusions and SNVs clustered matrix. b. Candidate nuclear SNV filtering steps. Candidates passing all 10 steps are called as somatic SNVs (Methods). c. Candidate mtSNVs filtering steps. ΔMCF represents the difference of mutated cells fraction between cancer and non-cancer cells. Candidates passing all 5 steps are called as somatic mtSNVs (Methods).