TranscriptomeRefBased - aechchiki/SIB_LongReadsWorkshop_Zurich18 GitHub Wiki

Compare to reference annotation

Section: Transcriptome assessment [3/4].

We will use Cuffcompare to compare our set of experimental transcripts to the reference annotation. This is one software among others to compare transcript annotation files.

Other possibilities include:

If you would like to know more about this topic, you can check this Bioinformatics SE thread. You are very welcome to contribute if any other software comes to your mind.

Get the reference annotation

First, we need to download the reference annotation from Ensembl:

wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-36/gff3/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3.gz
gunzip -d Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3.gz
mv Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3 Dmel_chr4.gff3

Time for conversion

Reference

And convert those gff to GTF using gffread:

gffread Dmel_chr4.gff3 -T -o Dmel_chr4.gtf

Experimental

Convert the experimental gff3 files too! (the ones you generated just before).

Now, we have to convert the experimental GFF3 file we got in the previous section to a GTF format. To do this, we prepared a parsing script to make the annotation file compatible with the cuffcompare software (trickey trick!):

wget https://drive.switch.ch/index.php/s/ySKNwPmD16GuOQ0/download -O gmap_gff2gtf.py

To run this script, simply make it executable and call it on your experimental annotation file:

chmod +x gmap_gff2gtf.py
./gmap_gff2gtf.py <gmap_output>.gff3 > <gmap_output>.gtf

Do it for all the files for which you generated a gff3 output.

Backup

Here's backup of the conversion if you need that.

wget https://drive.switch.ch/index.php/s/BhI4u7C4RXhK5yl/download -O gtf.tar.gz

Compare to the reference

And now you're ready to compare your annotation to the reference set:

cuffcompare -o <prefix> -G -r <reference_gtf>.gtf <gmap_output>.gtf
# -o: prepends <prefix> to outfile name
# -G: tells cuffcompare that the input experimental GTF file might not come from Cufflinks (it did not, ha!)
# -r: specifies the reference file

The files you're interested in are written to <prefix>.*. Example:

training@e4c716e7a973:~$ ls -lha cuff_isoseq.*
-rw-rw-r-- 1 training training  14K Nov  8 22:33 cuff_isoseq.combined.gtf
-rw-rw-r-- 1 training training 1.2K Nov  8 22:33 cuff_isoseq.isoseq_out.gtf.refmap
-rw-rw-r-- 1 training training  41K Nov  8 22:33 cuff_isoseq.isoseq_out.gtf.tmap
-rw-rw-r-- 1 training training  14K Nov  8 22:33 cuff_isoseq.loci
-rw-rw-r-- 1 training training 1.1K Nov  8 22:33 cuff_isoseq.stats
-rw-rw-r-- 1 training training  41K Nov  8 22:33 cuff_isoseq.tracking

Go check on them! This page might be useful to check and understand the output.

Next

Back

⚠️ **GitHub.com Fallback** ⚠️