installing star fusion - STAR-Fusion/STAR-Fusion GitHub Wiki

Installing STAR-Fusion

STAR-Fusion requires the following software and data resources to be installed.

Note, if you can use our Docker or Singularity images, then you'll have all the software pre-installed and can hit the ground running. Just install the data resources required below.

1. Downloading a STAR-Fusion Release (Preferred)

Visit https://github.com/STAR-Fusion/STAR-Fusion/releases

and be sure to download the 'FULL' version. The others are auto-generated by GitHub and are missing required submodules.

After unpacking the tar.gz file, run 'make' in the base installation directory.

2. Installing from GitHub Clone:

%  git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git

The --recursive parameter is needed to integrate the required submodules.

Type 'make' in the base installation directory.

Tools Required:

.

   A typical perl module installation may involve:
   perl -MCPAN -e shell
   install DB_File
   install URI::Escape
   install Set::IntervalTree
   install Carp::Assert
   install JSON::XS
   install PerlIO::gzip

If you plan on using the included FusionInspector for 'inspect' or 'validate' modes, please install the FusionInspector dependencies.

Computing / Hardware Requirements and Execution Times

Memory requirements

If you're planning to run STAR to align reads to the human genome, then you'll need ~30G RAM. If you've already run STAR and are just planning on running STAR-Fusion given the existing STAR outputs, then modest resources are required and it should run on any commodity hardware.

When the '--FusionInspector validate' mode is used, memory requirements can increase to 40G or 50G. If '--FusionInspector inspect' mode is used, additional RAM should generally not be required.

Execution times

Execution times are largely determined by how long it takes for STAR to align reads. The fusion-finding component generally takes minutes on large samples. If '--FusionInspector validate' mode is used, then roughly double the total execution time, as STAR is needed to perform an additional full alignment of the reads in FusionInspector mode.

Data Resources Required:

A reference genome and corresponding protein-coding gene annotation set, including blast-matching gene pairs must be provided to STAR-Fusion. We provide several alternative resources for human fusion transcript detection depending on whether you want to use GRCh37 or GRCh38 reference human genomes and corresponding Gencode annotation sets. Options are available here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/, so choose one, and below we refer to it as 'CTAT_resource_lib.tar.gz'. The 'plug-n-play' libs are that... just download, unpack it (tar -zxvf filename.tar.gz)

Use the plug-n-play library if possible. Building the source one takes hours to days and available as a last resort.

The latest release of STAR-Fusion will be compatible with the currently available version of the CTAT resource genome lib. For older versions of STAR-Fusion, see the STAR-Fusion release and CTAT Genome Lib Compatibility Matrix.

If you're looking to apply STAR-Fusion using a different target, you'll need to generate the required resources as described by our ctat-genome-lib-builder resource builder. The ctat-genome-lib-builder comes included in the STAR-Fusion software.

Preparing the genome resource lib (Only if not using plug-n-play)

Preferred If you downloaded the large (30G) 'plug-n-play' resource lib, then just untar/gz the archive and use it directly.

Otherwise, if you downloaded the much smaller (~4G) unprocessed resource lib, then you'll need to prep it for use with STAR-fusion as follows:

 (only if building from source data archive - note plug-n-play is preferred!)   

 %  tar xvf CTAT_resource_lib.tar.gz

 %  cd CTAT_resource_lib/

 %  $STAR_FUSION_HOME/ctat-genome-lib-builder/prep_genome_lib.pl \
                         --genome_fa ref_genome.fa \
                         --gtf gencode.*.annotation.gtf \
                         --fusion_annot_lib fusion_lib.*.dat.gz \
                         --annot_filter_rule AnnotFilterRule.pm \
                         --pfam_db current \
                         --dfam_db human \
                         --human_gencode_filter

When building the human ctat genome lib with the --human_gencode_filter, certain reference annotation and genome sequence modifications are performed to facilitate the identification of certain more challenging fusions (usually involving IGH). These differences are described here.

Note, the above builder has a number of additional software requirements including blast, hmmer, among others. See the ctat-genome-lib-builder wiki for full installation details. Using our Docker or Singularity images for doing this step is easiest and preferred if you're planning to go this route. For example, if you have Singularity installed, you can leverage the singularity image we provide on our release downloads page and run like so:

 (only if building from source data archive - note plug-n-play is preferred!)       

% singularity exec -e star-fusion.simg \
   /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl \
      --genome_fa genome.primary.fa \
      --gtf gencode.*.annotation.gtf \
      --fusion_annot_lib fusion_lib.*.dat.gz \
      --annot_filter_rule AnnotFilterRule.pm \
      --pfam_db current \
      --dfam_db human \
      --human_gencode_filter  # include only if human data

Once the build process completes successfully, you can then refer to the above like so with STAR-Fusion:

   STAR-Fusion --genome_lib_dir /path/to/your/CTAT_resource_lib   ...