High dimensional sampling with applications to structural biology - GeomScale/gsoc2020 GitHub Wiki

Overview

In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. The corresponding polyhedra typically lie in hundreds or thousands of dimensions. Fast convergence to the stationary uniform distribution is crucial from a computational point of view, to enable reliable and tractable sampling of genome-scale biochemical networks.

Related work

Currently the state-of-the-art method is coordinate Hit-and-Run with rounding. Implementation is in opencobra MATLAB toolbox. See also this paper.

Details of your coding project

The project could be split in the following tasks:

  • The student should understand the theory and the algorithms for constraint-based metabolic modelling reading the literature in the bonding period.
  • Apply sampling methods implemented in volesti, e.g. Billiard walk and Coordinate HnR.
  • Implement exponential sampling under a linear biological objective function.
  • Adapt and try the rounding algorithms in volesti.
  • Implement useful methods from open-cobra to Geomscale
  • Implement R API
  • Write tests and documentation

Expected impact

This is an important project for the structural biology community. Moreover, having an R package for basic operations in constraint-based metabolic modelling will not compete open-cobra but will give an efficient open-source alternative.

Mentors

Students, please contact both mentors below after completing at least one of the tests below.

  • Vissarion Fisikopoulos <vissarion.fisikopoulos at gmail.com> is an expert in mathematical software, computational geometry and optimization, and has previous GSOC mentoring experience with Boost C++ libraries (2016-2019) and the R-project (2017-2019).

  • Apostolos Chalkis <tolis.chal at gmail.com> is a PhD student in Computer Science. His research focuses on mathematical computing, optimization and computational finance. He has previous experience in GSoC 2018 and 2019 as a student under Org. R-project, implementing state-of-the-art algorithms for sampling from high dimensional multivariate distributions. He is one of the authors of volesti.

  • Elias Tsigaridas <elias.tsigaridas at inria.fr> is an expert in computational nonlinear algebra and geometry with experience in mathematical software. He has contributed to the implementation, in C and C++, of several solving algorithms for various open source computer algebra libraries and has previous GSOC mentoring experience with the R-project (2019).

Tests

Students, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project.

  • Easy: compile and run VolEsti. Use the R extension to visualize sampling in a polytope.
  • Medium: import the data from bigg.ucsd.edu/models/e_coli_core and create a matrix in R
  • Hard: support lower dimensional polytopes in volesti and use existing methods to sample from them

Solutions of tests

Students, please post a link to your test results here.

  • EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.