Selective sequence subtraction using bowtie2 in prep for Trinity assembly - trinityrnaseq/BerlinTrinityWorkshop2018 GitHub Wiki

Selective sequence subtraction using bowtie2 in prep for Trinity assembly

If your reads represent multiple species and you're interested in capturing those reads that map (or do not map) to one of the species, you can use bowtie2 to align your reads to a target and capture those reads that align (or do not align) to the target.

First, you must build a bowtie2 index for your target sequence. Your target sequence in this case can be a genome fasta file, a reference transcriptome fasta file, or some combination of both. If your reads might span multiple exons, providing a combined genome and reference transcriptome combined target would be best.

Given a 'target.genome.fa' file that contains target sequences, build the bowtie2 index like so:

%  bowtie2-build target.genome.fa target.genome.fa

Below is the simplest approach to take, ignoring read pairing info.

Selective capture

Then, perform bowtie2 to align the reads. If you want to capture the reads that align to your target sequences of interest, you would run the following:

%  bowtie2 --threads 4 --local --no-unal -x target.genome.fa -q -k 1 \
   --al aligned_reads.fastq \
   -U reads_1.fastq,reads_2.fastq > bowtie2.sam

Selective depletion

Alternatively, if you want to capture the unaligned reads, you would run:

% bowtie2 --threads 4 --local --no-unal -x target.genome.fa -q -k 1 \
   --un aligned_reads.fastq \
   -U reads_1.fastq,reads_2.fastq > bowtie2.sam

Running Trinity:

 % Trinity --seqType fq --CPU 6 \
   --max_memory 10G --single aligned_or_unaligned_reads.fastq \
  2>&1 | tee trinity.log