High dimensional sampling with applications to structural biology - GeomScale/gsoc2020 GitHub Wiki
Overview
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. The corresponding polyhedra typically lie in hundreds or thousands of dimensions. Fast convergence to the stationary uniform distribution is crucial from a computational point of view, to enable reliable and tractable sampling of genome-scale biochemical networks.
Related work
Currently the state-of-the-art method is coordinate Hit-and-Run with rounding. Implementation is in opencobra MATLAB toolbox. See also this paper.
Details of your coding project
The project could be split in the following tasks:
- The student should understand the theory and the algorithms for constraint-based metabolic modelling reading the literature in the bonding period.
- Apply sampling methods implemented in
volesti
, e.g. Billiard walk and Coordinate HnR. - Implement exponential sampling under a linear biological objective function.
- Adapt and try the rounding algorithms in
volesti
. - Implement useful methods from open-cobra to Geomscale
- Implement R API
- Write tests and documentation
Expected impact
This is an important project for the structural biology community. Moreover, having an R package for basic operations in constraint-based metabolic modelling will not compete open-cobra but will give an efficient open-source alternative.
Mentors
Students, please contact both mentors below after completing at least one of the tests below.
-
Vissarion Fisikopoulos <vissarion.fisikopoulos at gmail.com> is an expert in mathematical software, computational geometry and optimization, and has previous GSOC mentoring experience with Boost C++ libraries (2016-2019) and the R-project (2017-2019).
-
Apostolos Chalkis <tolis.chal at gmail.com> is a PhD student in Computer Science. His research focuses on mathematical computing, optimization and computational finance. He has previous experience in GSoC 2018 and 2019 as a student under Org.
R-project
, implementing state-of-the-art algorithms for sampling from high dimensional multivariate distributions. He is one of the authors ofvolesti
. -
Elias Tsigaridas <elias.tsigaridas at inria.fr> is an expert in computational nonlinear algebra and geometry with experience in mathematical software. He has contributed to the implementation, in C and C++, of several solving algorithms for various open source computer algebra libraries and has previous GSOC mentoring experience with the R-project (2019).
Tests
Students, please do one or more of the following tests before contacting the mentors above.
MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project.
- Easy: compile and run VolEsti. Use the R extension to visualize sampling in a polytope.
- Medium: import the data from bigg.ucsd.edu/models/e_coli_core and create a matrix in R
- Hard: support lower dimensional polytopes in volesti and use existing methods to sample from them
Solutions of tests
Students, please post a link to your test results here.
- EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.