STAR Fusion scRNA seq - STAR-Fusion/STAR-Fusion GitHub Wiki
STAR-Fusions on Single Cell Transcriptomes
STAR-Fusion supports single cell transcriptome data from SmartSeq2, which typically involves having sets of pairs of fastq files where each individual read pair corresponds to a single cell.
The process for running STAR-Fusion on such data involves:
- Defining your input data sets using a 'samples.txt' file with the following format (tab-delimited):
cellnameA /path/to/cellA_1.fastq.gz /path/to/cellA_2.fastq.gz
cellnameB /path/to/cellB_1.fastq.gz /path/to/cellB_2.fastq.gz
....
cellnameZ /path/to/cellZ_1.fastq.gz /path/to/cellZ_2.fastq.gz
- Define batches of cells for STAR-Fusion according to N cells per execution (typically, we use N=24):
${STAR_FUSION_BASEDIR}/util/sc/prep_distributed_jobs.py \
--sample_sheet samples.txt \
--cells_per_job 24 \
--output_dir star_fusion_SingleCellPE
- Generate STAR-Fusion commands for execution
${STAR_FUSION_BASEDIR}/util/sc/write_sc_starF_cmds.py \
--batches_list_file star_fusion_SingleCellPE.batches.list \
--genome_lib_dir ${CTAT_GENOME_LIB} \
--use_shared_mem \
> star_fusion_SingleCellPE.batches.starF.cmds
- Execute the commands using multithreading:
${STAR_FUSION_BASEDIR}/util/sc/run_distributed_jobs_locally.py \
--cmds_file star_fusion_SingleCellPE.batches.starF.cmds \
--num_parallel_exec ${CPU} \
--genome_lib_dir ${CTAT_GENOME_LIB}
The above first loads the genome into shared memory, and then runs parallel executions of STAR-Fusion on the batches of single cell transcriptomes.
- Aggregate fusion predictions and generate cell-level fusion findings
The following aggregates results from each of the batches of cells and generates a final STAR-Fusion report yielding the fusion results.
${STAR_FUSION_BASEDIR}/util/sc/aggregate_and_deconvolve_fusion_outputs.py \
--batches_list_file star_fusion_SingleCellPE.batches.list \
--output_prefix star_fusion_SingleCellPE
See the files 'star_fusion_SingleCellPE.fusions.tsv' and 'star_fusion_SingleCellPE.fusions.abridged.tsv' with standard formatting from STAR-Fusion, but including a 'Cell' column to indicate which cell the fusion was identified in.
For an example of how fusion-detection at single cell resolution was performed, see Jerby-Arnon et al. Opposing immune and genetic mechanisms shape oncogenic programs in synovial sarcoma, Nat Med. 2021 Feb;27(2):289-300
For running FusionInspector on single cell data to focus on specific fusion occurrences, see the FusionInspector sc-RNAseq documentation, which largely mirrors the processes here.