STAR Fusion scRNA seq - STAR-Fusion/STAR-Fusion GitHub Wiki

STAR-Fusions on Single Cell Transcriptomes

STAR-Fusion supports single cell transcriptome data from SmartSeq2, which typically involves having sets of pairs of fastq files where each individual read pair corresponds to a single cell.

The process for running STAR-Fusion on such data involves:

  • Defining your input data sets using a 'samples.txt' file with the following format (tab-delimited):
cellnameA    /path/to/cellA_1.fastq.gz     /path/to/cellA_2.fastq.gz
cellnameB    /path/to/cellB_1.fastq.gz     /path/to/cellB_2.fastq.gz
....
cellnameZ    /path/to/cellZ_1.fastq.gz     /path/to/cellZ_2.fastq.gz
  • Define batches of cells for STAR-Fusion according to N cells per execution (typically, we use N=24):
${STAR_FUSION_BASEDIR}/util/sc/prep_distributed_jobs.py \
    --sample_sheet  samples.txt \
    --cells_per_job 24 \
    --output_dir star_fusion_SingleCellPE
  • Generate STAR-Fusion commands for execution
${STAR_FUSION_BASEDIR}/util/sc/write_sc_starF_cmds.py \
    --batches_list_file star_fusion_SingleCellPE.batches.list \
    --genome_lib_dir ${CTAT_GENOME_LIB} \
    --use_shared_mem \
    > star_fusion_SingleCellPE.batches.starF.cmds
  • Execute the commands using multithreading:
${STAR_FUSION_BASEDIR}/util/sc/run_distributed_jobs_locally.py \
    --cmds_file star_fusion_SingleCellPE.batches.starF.cmds \
    --num_parallel_exec ${CPU} \
    --genome_lib_dir ${CTAT_GENOME_LIB}

The above first loads the genome into shared memory, and then runs parallel executions of STAR-Fusion on the batches of single cell transcriptomes.

  • Aggregate fusion predictions and generate cell-level fusion findings

The following aggregates results from each of the batches of cells and generates a final STAR-Fusion report yielding the fusion results.

${STAR_FUSION_BASEDIR}/util/sc/aggregate_and_deconvolve_fusion_outputs.py \
    --batches_list_file star_fusion_SingleCellPE.batches.list \
    --output_prefix star_fusion_SingleCellPE

See the files 'star_fusion_SingleCellPE.fusions.tsv' and 'star_fusion_SingleCellPE.fusions.abridged.tsv' with standard formatting from STAR-Fusion, but including a 'Cell' column to indicate which cell the fusion was identified in.

For an example of how fusion-detection at single cell resolution was performed, see Jerby-Arnon et al. Opposing immune and genetic mechanisms shape oncogenic programs in synovial sarcoma, Nat Med. 2021 Feb;27(2):289-300

For running FusionInspector on single cell data to focus on specific fusion occurrences, see the FusionInspector sc-RNAseq documentation, which largely mirrors the processes here.