3. RNA assembly - Sara-SL/GenomeAnalysis GitHub Wiki
Method
To assemble the RNA data I planed to use the software Trinity. Since I had a reference genome I wanted to run genome guided transcriptome assembly. For this I first needed to run Tophat to generate a bam file and before that I needed to generate a bowtie index file.
Bowtie2
To generate a bowtie index file I created a batch script containing the following command:
bowtie2-build -f /home/sarasl/git/GenomeAnalysis/Data/scaffold/sel4_NW_015503979.fna bowtie2_index
Where the first parameter was the reference genome and the second parameter was the prefix of the output index files.
Tophat
When I'd got the index files I could run Tophat. I created a batch script containing a command with the following format:
tophat /path/bowtie2_index /path/read1_1P.fq.gz,/path/read2_1P.fq.gz...,/path/read1_1U.fq.gz,/path/read2_1U.fq.gz...,/path/read1_2U.fq.gz,/path/read2_2U.fq.gz... /path/read1_2P.fq.gz,/path/read2_2P.fq.gz
Where, for example, read1_1P.fq.gz represent FL_CS15_13.trim_1P.fastq.gz and read2_1P.fq.gz represent FL_CS15_16.trim_1P.fastq.gz and so on.
Trinity
When I had run Tophat I could run Trinity by creating a batch script containing the following command:
Trinity --genome_guided_bam /home/sarasl/git/GenomeAnalysis/3_rna_assembly/tophat_out/accepted_hits.bam --max_memory 12G --genome_guided_max_intron 10000 --CPU 2
I used max_memory 16 since I used 2 cores and 1 CPU has around 6GB.
Evaluation
To evaluate my RNA assembly I run Nucmer and MUMmerplot by creating a batch job containing the following commands:
nucmer -p Nucmer_rna_out /home/sarasl/git/GenomeAnalysis/Data/scaffold/sel4_NW_015503979.fna /home/sarasl/git/GenomeAnalysis/3_rna_assembly/trinity_out_dir/Trinity-GG.fasta
mummerplot -p MUMmerplot_out --png --layout --filter /home/sarasl/git/GenomeAnalysis/3_rna_assembly/Evaluation/Nucmer_rna_out.delta
Result
Figure 1: MUMmerplot of RNA assembly
In figure 1 we can se the result from MUMmerplot.
Discussion
As with the DNA assembly, a linear graph with slope=1 represent a good assembly. The plot shown in figure 1 show a more or less linear graph which imply a good assembly. The plot is not perfectly linear but since RNA assembling is much more difficult than DNA assembly I would say it is good enough.