Leveraging the increased statistical value of flux sampling - GeomScale/gsoc24 GitHub Wiki

Overview

The project will introduce new features for further statistical analysis of the samples and visualizations generated by dingo.

These will include at least the following features:

  • inference of pairwise correlated reactions
  • construction of a weighted graph of the model's reactions with the correlation coefficients as weights
  • annotation of these weights to the metabolic model and extraction to an annotated SBML file

All methods will be implemented in Python and merged in the dingo library.

The contributor will also run experiments with several metabolic networks to investigate the scaling of their findings.

Related work

In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors.

Contrary to Flux Balance Analysis (FBA), flux sampling is not dependent on an objective function [1]. Thus, sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. Moreover, by sampling a sufficient number of samples one can study the properties of certain components of the whole network and deduce significant biological insights such as correlated reactions and/or pathways and more.

To sample uniformly from a convex polytope, dingo uses Multiphase Monte Carlo Sampling method based on Billiard walk [2].

Metabolic models are usually in SBML format and can be visualized through Cytoscape.

For more information contact the mentors.

References:

[1] Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos, Geometric algorithms for sampling the flux space of metabolic networks, 2021.

[2] Wedmark, Y. K., Vik, J. O., & Øyås, O. (2023). A hierarchy of metabolite exchanges in metabolic models of microbial species and communities. bioRxiv, 2023-09.

Details of your coding project

The contributor will have to initiate the a post-process statistical analysis of the returned samples. Then, they will extend dingo's illustrations.py to visualize reaction pairs found correlated. Both Plotly and plotnine libraries will be considered.

Then they will have to run a few experiments on benchmark metabolic networks to assess how their methods scale in real-world metabolic networks and write a brief report with the results.

Difficulty: Medium

Size

Large (350 hours)

Skills

  • Required: python, basic knowledge in mathematics (especially linear algebra and/or geometry)
  • Preferred: Experience with mathematical software, C++ and/or biology is a plus

Expected impact

The project will provide great help in the interpretation of the sampling findings. This benefits both the biologists community as they would gain novel insight and the geometry community highlighting the added value of the random sampling methods. Also, it brings together GeomScale Org. with the NRNB community supporting Cytoscape.

Mentors

  • Haris Zafeiropoulos <haris.zafeiropoulos at kuleuven.be> is working on metabolic modeling software development and applications as a post-doc in the Lab of Systems Biology at KU Leuven and has previous GSoC student experience (2021) and mentoring experience with GeomScale (2022) and NRNB (2023).

  • Apostolos Chalkis <tolis.chal at gmail.com> is a Research Engineer at Quantagonia GmbH. He is an expert in statistical software, computational geometry, and optimization, and has previous GSoC student experience (2018 & 2019) and mentoring experience with GeomScale (from 2020 to 2023).

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: compile and run dingo. Use the documentation to sample from the flux space of the e_coli model.

  • Medium: Compare FBA solution with your samples. Choose radom pairs of reactions and check if they are correlated or not.

  • Hard: For a pair of correlated reactions, show whether all the reactions of the metabolic pathway they belong to are also correlated.