Step 3b: Transcriptome Alignment (BWA, Bowtie2) - srkoppolu/SK_RNA-Seq GitHub Wiki

For aligning reads to a reference transcriptome, which comprises of RNA transcripts excluded of introns, using unspliced aligners such as BWA, Bowtie2 or MAQ would be sufficient. Unspliced aligners do not allow large gaps in the reads and may be the appropriate choice for mapping against reference transcriptomes. This type of alignment is however limited to the identification of known exons and junctions only. Bowtie2 is fast and accurate in aligning the reads and is used for ChIP-Seq data as well. BWA is generally slower than Bowtie2 with similar sensitivity, but is a bit more accurate than Bowtie2 and provides information on which alignments are trustworthy. So, it would be the choice of mapping for variant calling applications where accuracy is paramount.

Most popular aligners for transcriptome based alignment are BWA, Bowtie2, MAQ, NovoAlign, etc. For a complete list of unspliced aligners, click here.


BWA

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads. [source]

Depending on read length, BWA has different modes optimized for different sequence lengths:

  • BWA-backtrack: designed for Illumina sequence reads up to 100bp (3-step)

  • BWA-SW: designed for longer sequences ranging from 70bp to 1Mbp, long-read support and split alignment

  • BWA-MEM: shares similar features to BWA-SW, but faster and more accurate.

To download and install BWA:

tar xvfj bwa-0.7.17.tar.bz2
cd bwa-0.7.17
make

export PATH=$PATH:/path/to/bwa-0.7.17

source ~/.bashrc

.

Indexing:

Similar to the genome alignment tools, the first step in BWA alignment is the creation of an index for the transcriptome. We can also use prebuilt transcriptome indices built using Kallisto.

To create a index for the reference transcriptome using bwa:

bwa index -a bwtsw indices/transcripts.fa

.

Alignment:

After generating (or downloading) the trnascriptome index, we need to align the paired-end sequences to the indexed transcriptome. Briefly, the algorithms works by seeding alignments with Maximal Exact Matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman (SW) algorithm. For aligning with BWA-MEM, use the following command:

bwa mem indices/transcripts.fa -t 16 sample_1_sortmerna_trimmomatic_1.fq.gz sample_1_sortmerna_trimmomatic_2.fq.gz | gzip -3 > sample_1_sortmerna_trimmomatic_BWAmem_sam.gz

Note: minimap2 has replaced BWA-MEM for PacBio and Nanopore read alignment. It retains all major BWA-MEM features, but is ~50 times as fast, more versatile, more accurate and produces better base-level alignment. A beta version of BWA-MEM2 has been released for short-read mapping. BWA-MEM2 produces alignment identical to bwa-mem and is ~80% faster.


Bowtie2

To download and install Bowtie2:

tar xvfj bowtie2-2.3.5.1-linux-x86_64.zip
cd bowtie2-2.3.5.1-linux-x86_64
make

export PATH=$PATH:/path/to/bowtie2-2.3.5.1-linux-x86_64

source ~/.bashrc

.

Indexing:

To create an index for the Lambda phage reference genome included with Bowtie 2, create a new temporary directory (it doesn’t matter where), change into that directory, and run:

bowtie2-build indices/transcripts.fa transcripts

You can use bowtie2-build to create an index for a set of FASTA files obtained from any source, including sites such as UCSC, NCBI, and Ensembl. When indexing multiple FASTA files, specify all the files using commas to separate file names. You may also want to bypass this process by obtaining a pre-built index. We can also use prebuilt transcriptome indices built using Kallisto.
.

Alignment:

To align paired-end reads included with Bowtie 2, stay in the same directory and run:

bowtie2 -x transcripts -1 sample_1_sortmerna_trimmomatic_1.fq.gz -2 sample_1_sortmerna_trimmomatic_2.fq.gz -S sample_1_sortmerna_trimmomatic_BWAmem_sam

. For more information, click here