Compare the experiemental transcripts to the reference annotation - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki
We will use Cuffcompare to compare our set of experimental transcripts to the reference annotation. This is one software among others to compare transcript annotation files. Other possibilities include: PASA pipeline (Program to Assemble Spliced Alignments) and TACO (Multi-sample transcriptome assembly from RNA-Seq).
If you would like to know more about this topic, you can check this Bioinformatics SE thread. You are very welcome to contribute if any other software comes to your mind.
First, we need to download the reference annotation from Ensembl:
wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-36/gff3/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3.gz
gunzip -d Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3.gz
mv Drosophila_melanogaster.BDGP6.36.chromosome.4.gff3 Dmel_chr4.gff3
And convert it to GTF
using gffread:
gffread Dmel_chr4.gff3 -T -o Dmel_chr4.gtf
Now, we have to convert the experimental GFF3 file we got in the previous section to a GTF format. To do this, we prepared a parsing script to make the annotation file compatible with the cuffcompare
software:
wget https://drive.switch.ch/index.php/s/ySKNwPmD16GuOQ0/download -O gmap_gff2gtf.py
To run this script, simply make it executable and call it on your experimental annotation file:
chmod +x gmap_gff2gtf.py
./gmap_gff2gtf.py <gmap_output>.gff3 > <gmap_output>.gtf
And now you're ready to compare your annotation to the reference set:
cuffcompare -G -r <reference_gtf>.gtf <gmap_output>.gtf
# -G: tells cuffcompare that the input experimental GTF file might not come from Cufflinks
# -r: specifies the reference file
The files you're interested in are written to cuffcmp.*
. Go check on them! This page might be useful to check and understand the output.
Next
Go to checkpoint .
Go back to Table of content .