HowToSetUpJAFFA - Oshlack/JAFFA GitHub Wiki

In this wiki we describe how to install JAFFA and give some basic instructions to start running it. JAFFA is designed to be run on the bash command-line in Linux, or within a container. Having an understanding of bash (and R) would be useful to understand what the pipeline is doing, but isn't essential.

Installing

Container installation

Docker and Apptainer are container runtimes which can provide a consistent, stable, and reproducible environment to execute code in. The benefit to the user is that what you get is exactly the same as what we have, down to the exact structure of the filesystem.

JAFFA has an official container image at davidsongroup/jaffa. The container image bundles all dependencies, so all you will need is a container platform such as Docker or Apptainer (formerly Singularity). As an added bonus, everything is prebuilt, so all you need to do is wait for a ~700MB compressed image download. This image has the same functionality as the regular version of JAFFA.

The JAFFA image does not bundle reference files. You will need to download the reference files and extract them into any directory, let's say $REF_DIR. JAFFA expects references to be found under /ref/, so you will need to bind this. All pipeline files (JAFFA_direct.groovy, JAFFA_assembly.groovy, JAFFA_hybrid.groovy, JAFFAL.groovy) can be found under the /JAFFA/ directory.

To run JAFFA in a container:

REF_DIR=/path/to/references
INPUT_PATH=/path/to/input

# native command, after installing and configuring the reference directory
JAFFA_PATH=/path/to/jaffa
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFA_direct.groovy $INPUT_DIR/*.fasta <parameters>

# Docker command
docker run                                \
    -B $REF_DIR:/ref                      \
    -B $INPUT_PATH                        \
    davidsongroup/jaffa:latest            \
    /JAFFA/JAFFA_direct.groovy            \
    $INPUT_DIR/*.fasta                    \
    <parameters>

# Apptainer/singularity command
apptainer run                             \
    -B $REF_DIR:/ref                      \
    -B $INPUT_PATH                        \
    docker://davidsongroup/jaffa:latest   \
    /JAFFA/JAFFA_direct.groovy            \
    $INPUT_DIR/*.fasta                    \
    <parameters>

Warning

Docker and Apptainer do not typically resolve symbolic links, and any bindings will silently fail. Ensure that no folder you bind is under a symbolic link. Also, ensure that your current directory is not under a symbolic link. If you aren't sure, execute cd $(readlink -f .).

Native installation

If you do not have access to Docker or Apptainer, or would prefer to download and install JAFFA the manual way, you can follow these steps:

  1. Download the JAFFA package and untar it: tar -zxvf JAFFA-version-X.XX.tar.gz (replacing X.XX with the version number)
  2. Download the JAFFA reference files and untar inside the JAFFA-version-X.XX directory: tar -zxvf JAFFA_REFERENCE_FILES......tar.gz
  3. For hg19 and mm10 reference files, follow the additional steps described here
  4. Before running JAFFA, there are quite a few other programs which must be installed. For this you will need gcc version >= 4.9 and wget. Run the following script in the directory where you installed JAFFA (note linux only is supported currently). When it is finished, check that all paths are filled in the file tools.groovy.
./install_linux64.sh
  1. If you don't already have it, you will need to install R. Note that the R package, IRanges, must be installed.
  2. If needed, configure the JAFFA pipeline options for your data. Note, this is often not necessary as JAFFA can work out of the box with default values. Changing the defaults can be done either by editing the JAFFA_stages.groovy file, or by passing the parameters to bpipe when you run JAFFA. readLayout - change to "single" if you have single-end reads otherwise paired-end is assumed. genomeFasta - this is the path to the human genome. If you leave this unchanged it will default to the directory of the JAFFA package and use hg38. fastqInputFormat - This tells bpipe how to split on samples and group of read pairs. The default should work if your reads are named like SampleA_1.fastq.gz SampleA_2.fastq.gz SampleB_1.fastq.gz SampleB_2.fastq.gz etc. JAFFA will create one directory for each sample. If you find this does not happen in a way you expect, you might need to adjust this variable. See this bpipe doc page for more information. Also, you may need to change this parameter if your reads have the fq extension instead of fastq.

Input Type

The input to JAFFA should be either reads which have been gzipped. i.e. with an ending like ".fastq.gz" or a fasta file of contigs with an ending like ".fasta" (unzipped). JAFFA assumes there is one file (single-end) or a pair of files (paired-end) per sample.

Running

Create and change into the directory where you intend the output files of JAFFA to be placed. You then have a choice of four JAFFA running modes: Direct, Hybrid, Assembly and Long. Which mode to use will depend on your read length and error rate.

When to use which mode?

  • For low error rate sequencing with 100bp reads or longer (most common), we recommend the direct mode, JAFFA_Direct.groovy. This would include Illumina sequencing as well as long assembled data.
  • For high error long reads, use the long mode, JAFFAL.groovy. This would include ONT or PacBio data.
  • For low error rate sequencing of 70-95bp, the hybrid mode is the most sensitive, JAFFA_hybrid.groovy. However, because it involves assembly, it requires a lot of memory and CPU time. If computational resources are a constraint, we recommend using the direct method.
  • For low error rate short reads of <70bp you can use the Assembly mode, JAFFA_assembly.groovy. Assembly may be useful if you are interested in the full transcript sequence of fusion genes as these will be reconstructed in this mode.

Direct

JAFFA will map reads to the known reference transcriptome and extract reads which do not map. It will then search for fusions from amongst the unmapped reads.

# native runtime
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFA_direct.groovy $INPUT_PATH/*.fastq.gz

# container runtime
apptainer run                           \
  -B $REF_DIR:/ref                      \
  -B $INPUT_PATH                        \
   docker://davidsongroup/jaffa:latest  \
  /JAFFA/JAFFA_direct.groovy $INPUT_PATH/*.fastq.gz

In this mode, you can also search for fusions in pre-assembled transcriptomes, but providing a fasta file as input. In this case we skip the step where we filter for unmapped sequences.

# native runtime
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFA_direct.groovy $INPUT_PATH/*.fasta

# container runtime
apptainer run                           \
  -B $REF_DIR:/ref                      \
  -B $INPUT_PATH                        \
   docker://davidsongroup/jaffa:latest  \
  /JAFFA/JAFFA_direct.groovy $INPUT_PATH/*.fasta

Long (JAFFAL)

For noisy long reads such as ONT or PacBio data, use JAFFAL, which is similar to the Direct pipeline in concept, but uses the accurate ONT aligner minimap2 to maximise sensitivity for fusion detection.

# native runtime
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFAL.groovy $INPUT_PATH/*.fastq.gz

# container runtime
apptainer run                           \
  -B $REF_DIR:/ref                      \
  -B $INPUT_PATH                        \
   docker://davidsongroup/jaffa:latest  \
  /JAFFA/JAFFAL.groovy $INPUT_PATH/*.fastq.gz

Unzipped .fasta files may also be provided to the pipeline.

Assembly

JAFFA will call Velvet and Oases to assemble the reads. It will then search for fusions from amongst the assembled contigs.

# native runtime
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFA_assembly.groovy $INPUT_PATH/*.fastq.gz

# container runtime
apptainer run                           \
  -B $REF_DIR:/ref                      \
  -B $INPUT_PATH                        \
   docker://davidsongroup/jaffa:latest  \
  /JAFFA/JAFFA_assembly.groovy $INPUT_PATH/*.fastq.gz

Hybrid

This is a combination of the Direct and Assembly modes. First JAFFA will call Velvet and Oases to assemble the reads. It will then search for fusions from amongst the assembled contigs. Next it will map reads to both the known reference transcriptome and the assembled transcriptome. It will then search for fusions from amongst the unmapped reads.

# native runtime
$JAFFA_PATH/tools/bin/bpipe run $JAFFA_PATH/JAFFA_hybrid.groovy $INPUT_PATH/*.fastq.gz

# container runtime
apptainer run                           \
  -B $REF_DIR:/ref                      \
  -B $INPUT_PATH                        \
   docker://davidsongroup/jaffa:latest  \
  /JAFFA/JAFFA_hybrid.groovy $INPUT_PATH/*.fastq.gz
⚠️ **GitHub.com Fallback** ⚠️