07. Understanding LongSom output - cbg-ethz/LongSom GitHub Wiki

Output directory

Provided your sample map is:

sample
SampleID1
SampleID2

The relevant files in the output directory will be organized as such:

output_dir
--| SNVCalling
   --|BaseCellCalling
      --|SampleID1.BaseCellCalling.step3.tsv
      --|SampleID2.BaseCellCalling.step3.tsv
--| FusionCalling
   --| SampleID1.somatic_fusion_predictions.tsv
   --| SampleID2.somatic_fusion_predictions.tsv
--| SingleCellGenotype
   --| SampleID1.AltMatrix.tsv
   --| SampleID1.DpMatrix.tsv
   --| SampleID1.VAFMatrix.tsv
   --| SampleID1.BinaryMatrix.tsv
   --| SampleID1.SingleCellGenotype.tsv
   --| SampleID2.AltMatrix.tsv
   --| SampleID2.DpMatrix.tsv
   --| SampleID2.VAFMatrix.tsv
   --| SampleID2.BinaryMatrix.tsv
   --| SampleID2.SingleCellGenotype.tsv
--| BnpC
   --| SampleID1
      --| genoCluster_posterior_mean_raw.pdf
      --| assignement.txt
   --| SampleID2
      --| genoCluster_posterior_mean_raw.pdf
      --| assignement.txt

Somatic SNVs

LongSom outputs a SampleID.BaseCellCalling.step3.tsv file for each SampleID, containing all information regarding somatic SNVs detected. Each .tsv field is explained in the file's header.

Somatic fusions

LongSom outputs a SampleID.somatic_fusion_predictions.tsv file for each SampleID, containing all information regarding somatic fusions detected.

Cell-variant matrices

LongSom also outputs cells-variants (SNVs and fusions) matrices: SampleID.{}Matrix.tsv, {} being either:

  • Alt, count of reads supporting the alternative allele/fusion
  • Dp, total reads mapped to the loci (equal to Alt for fusions)
  • VAF, a division of Alt/Dp matrices
  • Binary, binarized 1/0 mutated/not mutated status, based on a Beta-Binomial test for SNVs, and on whether the fusion is present or not.

It also outputs a 'long' format: SampleID.SingleCellGenotype.tsv, containing the information from all matrices above in a line for each cell-variant combination.

Clones

LongSom uses BnpC to cluster all cells (including non-cancer) based SNVs and fusions (SampleID.BinaryMatrix.tsv) and outputs a genoCluster_posterior_mean_raw.pdf for each SampleID, with an assignment file associating barcodes to the cells clusters. Here is an example: git_BnpC_example