Home - adarob/eXpress-d GitHub Wiki

eXpress-D

In eXpress-D, we've implemented a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters - "the cloud". eXpress-D is based on the model of eXpress, but has better accuracy due to its use of the batch EM for optimization.

The selections below offer guides to running eXpress-D locally and on EC2. More advanced users can check out the tuning and configuration guides, which document eXpress-D, Spark, and JVM parameters that can be used to optimize eXpress-D performance.

User Documentation | More Information

User Documentation

All issues and source code is monitored through GitHub (i.e commits, pull requests). For any questions about these guides, or eXpress-D in general, you can post on the eXpress users group.

Setting Up and Running eXpress-D

Running on EC2: Launch EC2 clusters and run eXpress-D on them.

Notes on Configuration and Tuning

Publications

Roberts A (2013). Ambiguous fragment assignment for high-throughput sequencing experiments. EECS Department, University of California, Berkeley. [link]

Roberts A, Feng H, and Pachter L (2013). Fragment assignment in the cloud with eXpress-D. BMC Bioinformatics. [link]

Roberts A (2013). Thesis: Ambiguous fragment assignment for high-throughput sequencing experiments. EECS Department, University of California, Berkeley. [link]

Roberts A and Pachter L (2013). Streaming fragment assignment for real-time analysis of sequencing experiments. Nature Methods. [link]