Align the experimental transcripts to the reference genome - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki

We will use GMAP as splice-aware aligner suitable for long reads to align the experimental datasets to the reference genome. A splice-aware aligner allows to see the positions to the reference genome directly by mapping RNA-seq reads to the reference genome, not only the transcriptome. This allows to, for example, detect new isoforms other than the annotated (constituting the transcriptome).

The reason why we chose this aligner are mainly two: (1) GMAP appears to be one of the best according to our benchmark work taking as parameters the accuracy and the speed, and (2) allows to directly save the output in GFF format, which is very handy for comparison to the reference annotation (next section).

First, we need to download the reference genome from Ensembl (you can also use the assembly you built in the genome assembly part):

cd $reference
wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-36/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa.gz
gunzip -d Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa.gz
mv Drosophila_melanogaster.BDGP6.dna.chromosome.4.fa Dmel_chr4.fasta

Then, we need to build the GMAP hash index on the reference genome. This will create a subfolder containing the index for your genome within the folder from which you execute the commands:

gmap_build -d <gmap_index_name> -D <path/to/index/> Dmel_chr4.fasta
# -d: string for the index name
# -D: path to the genome to be indexed

Finally, we can run the alignment:

gmap -d <gmap_index_name> -D </path/to/index/> <input_reads> -f gff3_match_cdna > <gmap_output>.gff3
# -d: string for the index name
# -D: path to the genome to be indexed
# -f: specifies the output format, in our case gff3

You shall repeat this on both the fastq files from PacBio Iso-Seq and the MinION corrected with Canu.

Next

Go to tutorial Compare the experimental transcripts to the reference annotation .

Go to checkpoint .

Go back to Table of content .