Running barraCUDA - SIWLab/Lab_Info GitHub Wiki

UPDATE: DO NOT USE THIS TO ALIGN--ALIGNMENTS HAVE UNSOLVED ISSUES. RUN ON BWA-MEM INSTEAD.

The lab recently (Fall 2016) purchased a GPU server, Gustave, in order to run massively parallel jobs. The main intention here was to speed up genome alignments and get them off of the other servers to free up CPU. Because this speed up is due to the use of the GPU, not more CPU, it has a few kinks that you have to get used to in order to run things efficiently.

What is on the server?

Gustave is a small server with 12 fast CPU cores and a Tesla K80 GPU. The K80 is a dual core GPU, that is it is essentially two GPUs each with roughly 12Mb RAM and 5,000 CUDA cores. The 5,000 cores are each individually much less powerful than a CPU core, but with so many of them we can run jobs much faster through massive parallelization. This is essentially a cheap, brute force way of making something run fast, that doesn't require exceptionally efficient code or the costly use of a few expensive CPU cores. Gustave makes use of the graphics processing unit for computation using Nvidia's CUDA GPGPU (general purpose GPU) interface, taking advantage of hardware originally intended for complex graphics processing. There are two hard drives on Gustave, /home and /storage, each 1Tb, but the main drive on Ohta is also connected as /ohta. Due to limited space on the server, it is recommended that you either put raw data on the /storage drive and write alignments to /home, making sure to remove both raw data and alignments when your jobs are done, or to keep everything on ohta and simply use the full path to your data.

barraCUDA

As we mostly intended this server to be used for speeding up alignments, one of the most immediate uses will be running the barraCUDA aligner. BarraCUDA is an aligner based on the Burrows-Wheeler Transform that makes it equivalent to bwa, only written to run in the CUDA interface to run multiple times faster with fewer resources. As of writing this, barraCUDA seems to be the most accurate GPU aligner available.

Running barraCUDA

BarraCUDA is fairly simple to run in few steps: The first step is to index your reference fasta:

barracuda index /path/to/fasta

this will output several indexing files in the same directory as your fasta, and only needs to be run once for all alignments if you keep these files. This step is very fast.

The next step is to align your raw reads to this reference fasta with

barracuda aln -C 1 /path/to/fasta /path/to/raw/data.fq > /path/to/output.sai

The option -C here tells barraCUDA which GPU to use and is not a necessary step, though it can come in handy (Gustave's GPUs are 0 and 1). If your raw data is paired end, you have to run the aln function on both fastq files for a sample.

The next step will take your alignment and convert it into the general sam format, and is different for paired or single end reads. For single end reads:

barracuda samse -t 4 /path/to/fasta /path/to/output.sai /path/to/raw/data.fq > /path/to/output.sam

For paired end reads:

barracuda sampe -t 4 /path/to/fasta /path/to/output.1.sai /path/to/output.2.sai /path/to/raw/data.1.fq /path/to/raw/data.2.fq > /path/to/output.sam

Note that option -t denotes the number of CPU threads to use (this portion does not use the GPU), in this case I'm using 4 cores.

barraCUDA speed

For the aln step, I'm seeing speeds between 15,000 and 28,000 reads per second, or near 2.5 million bp per second. For paired end reads, using both GPU and 4 CPU to make the sam, it takes roughly 45min-1hr to finish a sample.

Sample setup

While the above should give you a sense of how to run the aligner, below is a sample script that I use, which takes advantage of both GPU, and eliminates clutter on Gustave:

#!/bin/bash

barracuda index ~/RawData/Capsella_rubella_v1.0_combined.fasta #rubella ref

while read sample
do
gunzip /storage/tyler.kent/${sample}.1.fq.gz &
gunzip /storage/tyler.kent/${sample}.2.fq.gz &
wait

barracuda aln -C 1 ~/RawData/Capsella_rubella_v1.0_combined.fasta /storage/tyler.kent/${sample}.1.fq > ~/RawData/${sample}.1.sai 2>>aln.err &
barracuda aln -C 0 ~/RawData/Capsella_rubella_v1.0_combined.fasta /storage/tyler.kent/${sample}.2.fq > ~/RawData/${sample}.2.sai 2>>aln2.err &
wait

barracuda sampe -t 4 ~/RawData/Capsella_rubella_v1.0_combined.fasta ~/RawData/${sample}.1.sai ~/RawData/${sample}.2.sai /storage/tyler.kent/${sample}.1.fq /storage/tyler.kent/${sample}.2.fq > ~/Alignments/${sample}.sam

samtools view -Sb ~/Alignments/${sample}.sam  >  ~/Alignments/${sample}.bam

rm ~/Alignments/${sample}.sam
gzip /storage/tyler.kent/${sample}.1.fq &
gzip /storage/tyler.kent/${sample}.2.fq &
wait
gzip ~/Alignments/${sample}.bam

done < ~/RawData/fastq/names.txt

This script is using a text file with the sample names each on their own line, and aligning one sample at a time. The & commands and wait commands indicate times when I'm running two commands at once and waiting until both are done before moving to the next step. One addition that should be made is to make sure all data and alignments are either being read and written to ohta, or are being moved to otha in the loop. With many samples, it is easy to accidentally fill Gustave, and have alignments finish without writing any data.