Example: combining a reference and assembled transcriptome - Oshlack/Lace GitHub Wiki

In this example, we show how to build a comprehensive yet compact superTranscriptome from a reference annotation, a genome-guided assembly and a de novo assembly (ie. to get the best gene models using all available information). The superTranscriptome can then be used for downstream analyses such as differential gene/isoform expression analysis, variant calling etc., or it can be used to identify novel sequence which is absent from a reference genome. We demonstrate using chicken and the galGal4 reference genome. This is a simplified version of what was done for our paper.

####Step 1 - Assemble the RNA-Seq data#### This example will demonstrate how to combine reference annotation, genome-guided assembly and de novo assembled data together into a superTranscriptome. You will therefore need to start by:

  • Downloaded the appropriate reference genome annotation (as a .gtf file). For example an Ensembl or RefSeq annotation.
  • Perform a genome-guided assembly. Refer to the cufflinks website for instructions. You will also use tools from the cufflinks suite..

####Step 2 - Combine the genome reference annotation with the genome guided assembly####

####Step 3 - Cluster the de novo assembly with the genome based superTranscriptome####

####Step 4 - Cluster the remaining assembled contigs using a related species####

####Step 5 - Assemble all transcripts using Lace####

####Step 6 - Annotate the superTranscripts####