TrinotateWeb - Trinotate/Trinotate GitHub Wiki

TrinotateWeb: Graphical Interface for Navigating Trinotate Annotations and Expression Analyses

Background

TrinotateWeb provides a web-based graphical interface to support local user-based navigation of Trinotate annotations and differential expression data. Simply run the included light-weight webserver, open your web browser and point it to your Trinotate SQLite database, and begin your data exploration. Note, Trinotate is not yet a full-featured application, but is instead in a very early state of development. As an open source community-driven software development project, contributions are always welcome.

Trinotate makes extensive use of the CanvasXpress infrastructure, as developed and made publicly available by Isaac Neuhaus. "CanvasXpress was developed as the core visualization component for bioinformatics and systems biology analysis at Bristol-Myers Squibb". More about CanvasXpress.

TrinotateWeb Showcase

The current capabilities of TrinotateWeb are showcased below (note, the interface is continually improved across software releases):

Populating Trinotate.sqlite with Expression Data

TrinotateWeb currently revolves around using the differential expression analysis pipeline supported by the Trinity suite, as described here, specifically first generating abundance estimates using the quantification pipeline and then performing the DE analysis using Bioconductor.

After having run the above abundance estimation and DE analysis steps, you can begin loading the expression data like so:

 # import the fpkm and DE analysis stuff
 $TRINOTATE_HOME/util/transcript_expression/import_expression_and_DE_results.pl \
        --sqlite Trinotate.sqlite \
        --samples_file samples_n_reads_described.txt \
        --count_matrix Trinity_trans.counts.matrix \
        --expr_matrix Trinity_trans.counts.matrix.TMM_normalized.FPKM \
        --DE_dir edgeR_trans/ \
        --transcript_mode

The above is based on using the transcript-level abundance estimates and corresponding DE analyses. Abundance estimation and DE analysis can be performed at the 'gene' level as well. Optionally, run the above using '--gene_mode' with the corresponding gene-based files. Both transcript-level and gene-level data can be loaded into the single Trinotate.sqlite instance for analysis.

Populate Trinotate.sqlite with Transcript Expression Clusters

The Trinity analysis framework describes methods for clustering transcripts based on related expression patterns here. After defining clusters, load these into the Trinotate.sqlite database like so:

# import transcript clusters
$TRINOTATE_HOME/util/transcript_expression/import_transcript_clusters.pl \
     --group_name edgeR_DE_analysis \
     --analysis_name edgeR_trans/diffExpr.P0.001_C2.matrix.R.all.RData.clusters_fixed_P_20 \
     --sqlite Trinotate.sqlite \
     edgeR_trans/diffExpr.P0.001_C2.matrix.R.all.RData.clusters_fixed_P_20/*matrix

The step of defining expression clusters can be run multiple times using different parameters according to the Trinity documentation. Each set of clusters can be loaded in separately as above and then separately studied within TrinotateWeb.

Populate Trinotate.sqlite with Annotation Text

Any free-text annotation can be applied to your input transcripts, and these data can then be queried by text searches within TrinotateWeb. This was the simplest way to get text-searching up and running in a flexible way, and more sophisticated methods for querying the data will be included in future releases. If you've used Trinotate to generate annotations (ex. generated the Trinotate_report.xls tab-delimited summary data file), you can simply import that data table back into Trinotate as the textual annotation for the transcripts (Highly Recommended). Alternatively, if you have a file containing a tab delimited format of:

gene_id (tab) transcript_id (tab) annotation text

you can use that file instead. Just realize that the TrinotateWeb text-querying is currently soley based on this 'annotation text'. Load the annotations into the Trinotate.sqlite database like so:

 # Load annotations
 $TRINOTATE_HOME/util/annotation_importer/import_transcript_names.pl Trinotate.sqlite Trinotate_report.tsv

Sample data

A full set of sample data for loading a Trinotate.sqlite database and populating it with expression and annotation data according to the above steps is provided at

 $TRINOTATE_HOME/testing/

Simply './runMe.sh' in that directory followed by './runMe.add_expression_prep_TrinotateWeb.sh' to generate the fully populated 'Trinotate.sqlite' database that's ready for exploration using TrinotateWeb.

Running TrinotateWeb

To run TrinotateWeb, you'll need a light-weight webserver installed. We recommend using lighttpd, which is free and easy to obtain for Linux and Mac. Once installed, you can launch it via:

  cd $TRINOTATE_HOME

  ./run_TrinotateWebserver.pl 8080

and leave it running within your terminal window. To stop it, type cntrl-C or exit the terminal.

The number 8080 is the port at which it will be listening to connections from your web browser. You can use whatever open port is available.

Then, go to your web browser and visit the URL: 'http://localhost:8080/cgi-bin/index.cgi'

You should be prompted to enter in the path to your Trinotate sqlite database.

Navigating TrinotateWeb

Overview Tab

The overview tab will show basic summary statistics.

Differential Expression Analysis

Differential expression can be explored from interactive volcano plots, MA plots, and heatmaps, starting from the 'Differential Expression' tab.

An example volcano plot for a pairwise comparison between two samples is shown below:

Just double-click on a point to visit that gene or transcript's expression and annotation report. Drag and select a range to zoom in.

Gene/Transcript ID or Keyword Searches

Keyword Search

You can search for genes or transcripts via keyword searches. For example, from the keyword search tab, I type in 'transporter' like so:

Submitting the search results in a list of all entries where 'transporter' was found among the annotation text.

Gene/Transcript ID Search

Given a gene or transcript identifier, the feature can be searched for directly via the 'Gene or Transcript ID Search' tab:

Gene/Transcript Expression and Annotation Reports

A given gene or transcript search will lead to a gene/transcript expression and annotation report page:

Annotation information will be displayed below, including a view of the position of the ORF on the transcript and any homology match information.