04. Preparing input - cbg-ethz/LongSom GitHub Wiki
Input Files
LongSom is designed to run multiple samples in parallel.
Sample map
LongSom first reads a samplemap.tsv
containing all sample names that should be analyzed. These SampleIDs will be used to name files and identify the sample throughout the workflow.
samplemap.tsv
should look like:
sample
SampleID1
SampleID2
For each SampleID, LongSom takes an aligned SampleID.bam
file together with a file linking barcodes to their cell type annotations SampleID.tsv
as input.
Input directory
The input directory has to be organized as follows
input_dir
--| samplemap.tsv
--| bam
--| SampleID1.bam
--| SampleID2.bam
--| barcodes
--| SampleID1.tsv
--| SampleID2.tsv
BAM files
The input BAM files must be aligned, and barcoded i.e. have a CB
tag.
Barcodes files
Barcodes files should have an Index
, containing unique barcodes, and a Cell_type
column, containing cell type annotation:
Index Cell_type
AAACCCATCGAGATAA HGSOC
AAAGTGATCCAACTGA T.cell
ACACCAAAGGTCCAGA Fibroblast
ACATGCAGTACGGATG HGSOC
etc.
LongSom compares "cancer" and "non-cancer" cells. For this, you specify which cell type should be viewed as "cancer" in the config/config.yaml
file. The rest will be aggregated and viewed as "non-cancer". LongSom only supports one cancer cell type at the moment.