EPI View - HealthHackAu2013/wiki GitHub Wiki

Team names & bios

Matt Ritchie, Medical Researcher/Bioinformatician working in genomics ([email protected])

Michael Walker, PhD student, MSB-group, Pathology, MDHS, Uni of Melbourne ([email protected])

Philip Goebel, Amateur Dev, Physiotherapist ([email protected])

John Kavadias, Software Engineer+Architect, Distributed Computing, Machine Learning ([email protected]) ([email protected])

The Problem

I want to display a heap of genetic information at once …

We need a simple and highly visually effective software package to display genetic and epigenetic marks, where we can view many different layers of information at once, and easily switch between viewing a single region of one gene, to viewing many genes, to even viewing a whole chromosome.

There are many publicly available datasets that could be used to create such visualisation software, and it would be worth considering the currently available systems for viewing such data (e.g. Galaxy, UCSC Genome Browser, IGV, SeqMonk). While each of these has its benefits, none provide a holistic view of the data that allows optimal visualisation. *Every cell in the body contains all the genes of the genome – the entire DNA content of a cell – but not every cell in the body is the same. For example, a heart cell is very different to a neuron. How can these cells be so different if they all contain the same genes?

The answer is that each cell type turns on, or uses, a distinct set of genes. This means that each cell type makes its own complement of protein products that help determine the cell type’s function.

So what controls whether a gene is turned on or off? We know this is influenced by how easily accessible the gene is to the factors required to turn the gene on: the more tightly packaged the gene is, the less likely it is to be accessible to factors that bind and activate it, and the less likely it will be turned on.* *We now know that the different packaging of the DNA can be correlated with various marks that are made to the DNA, called ‘epigenetic marks’. These epigenetic marks can be considered as punctuation marks in the genome – they allow the cell to interpret how to read the information contained in the DNA sequence. Due to transformational changes in the way we examine these marks throughout the whole genome, biologists are creating a wealth of data that reports not only the DNA sequence, but also the amount of various epigenetic marks, and the amount of different factors bound to the DNA throughout the whole genome. This landslide of data is highly complex, and difficult for bench biologists to interrogate.

The Solution

An application that biologists can use to load genome data files to visualise the entire genome in a Circos plot.

Application/Relevance

Pretty genome pictures!

Datasets

Bam files from Encode project (http://genome.ucsc.edu/encode/)

Links

Available from git repository (http://github.com/mritchie/epiview)

Tech stack

We used various R libraries (RCircos)

Shiny to interact with R via web page

D3.js to enhance Shiny generated web page for dynamic and interactive visualisation

Future functionality

Ability to upload multiple samples, and select from existing samples

Ability to zoom in on data from a particular gene in a separate panel

Ability to show multiple plots in the one panel

Handling of more compact file types to speed up data upload / import

Sliding scale to determine visualisation window size

Choice of colour scales

Pointer driven direct interaction with plots / charts for zooming

Animated transitions and manipulation of plots / charts (in browser)