07. Understanding LongSom output - cbg-ethz/LongSom GitHub Wiki
Output directory
Provided your sample map is:
sample
SampleID1
SampleID2
The relevant files in the output directory will be organized as such:
output_dir
--| SNVCalling
--|BaseCellCalling
--|SampleID1.BaseCellCalling.step3.tsv
--|SampleID2.BaseCellCalling.step3.tsv
--| FusionCalling
--| SampleID1.somatic_fusion_predictions.tsv
--| SampleID2.somatic_fusion_predictions.tsv
--| SingleCellGenotype
--| SampleID1.AltMatrix.tsv
--| SampleID1.DpMatrix.tsv
--| SampleID1.VAFMatrix.tsv
--| SampleID1.BinaryMatrix.tsv
--| SampleID1.SingleCellGenotype.tsv
--| SampleID2.AltMatrix.tsv
--| SampleID2.DpMatrix.tsv
--| SampleID2.VAFMatrix.tsv
--| SampleID2.BinaryMatrix.tsv
--| SampleID2.SingleCellGenotype.tsv
--| BnpC
--| SampleID1
--| genoCluster_posterior_mean_raw.pdf
--| assignement.txt
--| SampleID2
--| genoCluster_posterior_mean_raw.pdf
--| assignement.txt
Somatic SNVs
LongSom outputs a SampleID.BaseCellCalling.step3.tsv
file for each SampleID
, containing all information regarding somatic SNVs detected. Each .tsv
field is explained in the file's header.
Somatic fusions
LongSom outputs a SampleID.somatic_fusion_predictions.tsv
file for each SampleID
, containing all information regarding somatic fusions detected.
Cell-variant matrices
LongSom also outputs cells-variants (SNVs and fusions) matrices: SampleID.{}Matrix.tsv
, {}
being either:
Alt
, count of reads supporting the alternative allele/fusionDp
, total reads mapped to the loci (equal toAlt
for fusions)VAF
, a division ofAlt
/Dp
matricesBinary
, binarized1
/0
mutated/not mutated status, based on a Beta-Binomial test for SNVs, and on whether the fusion is present or not.
It also outputs a 'long' format: SampleID.SingleCellGenotype.tsv
, containing the information from all matrices above in a line for each cell-variant combination.
Clones
LongSom uses BnpC to cluster all cells (including non-cancer) based SNVs and fusions (SampleID.BinaryMatrix.tsv
) and outputs a genoCluster_posterior_mean_raw.pdf
for each SampleID
, with an assignment file associating barcodes to the cells clusters. Here is an example: