3. RNA assembly - Sara-SL/GenomeAnalysis GitHub Wiki

Method

To assemble the RNA data I planed to use the software Trinity. Since I had a reference genome I wanted to run genome guided transcriptome assembly. For this I first needed to run Tophat to generate a bam file and before that I needed to generate a bowtie index file.

Bowtie2

To generate a bowtie index file I created a batch script containing the following command:

bowtie2-build -f /home/sarasl/git/GenomeAnalysis/Data/scaffold/sel4_NW_015503979.fna bowtie2_index

Where the first parameter was the reference genome and the second parameter was the prefix of the output index files.

Tophat

When I'd got the index files I could run Tophat. I created a batch script containing a command with the following format:

tophat /path/bowtie2_index /path/read1_1P.fq.gz,/path/read2_1P.fq.gz...,/path/read1_1U.fq.gz,/path/read2_1U.fq.gz...,/path/read1_2U.fq.gz,/path/read2_2U.fq.gz... /path/read1_2P.fq.gz,/path/read2_2P.fq.gz

Where, for example, read1_1P.fq.gz represent FL_CS15_13.trim_1P.fastq.gz and read2_1P.fq.gz represent FL_CS15_16.trim_1P.fastq.gz and so on.

Trinity

When I had run Tophat I could run Trinity by creating a batch script containing the following command:

Trinity --genome_guided_bam /home/sarasl/git/GenomeAnalysis/3_rna_assembly/tophat_out/accepted_hits.bam --max_memory 12G --genome_guided_max_intron 10000 --CPU 2

I used max_memory 16 since I used 2 cores and 1 CPU has around 6GB.

Evaluation

To evaluate my RNA assembly I run Nucmer and MUMmerplot by creating a batch job containing the following commands:

nucmer -p Nucmer_rna_out /home/sarasl/git/GenomeAnalysis/Data/scaffold/sel4_NW_015503979.fna /home/sarasl/git/GenomeAnalysis/3_rna_assembly/trinity_out_dir/Trinity-GG.fasta

mummerplot -p MUMmerplot_out --png --layout --filter /home/sarasl/git/GenomeAnalysis/3_rna_assembly/Evaluation/Nucmer_rna_out.delta

Result

Figure 1: MUMmerplot of RNA assembly

MUMmerplot_rna

In figure 1 we can se the result from MUMmerplot.

Discussion

As with the DNA assembly, a linear graph with slope=1 represent a good assembly. The plot shown in figure 1 show a more or less linear graph which imply a good assembly. The plot is not perfectly linear but since RNA assembling is much more difficult than DNA assembly I would say it is good enough.