# Efficient MCMC sampling based on the underdamped Langevin diffusion - GeomScale/gsoc2020 GitHub Wiki

## Overview

This project is about sampling from high dimensional log-concave distributions. The student will implement in C++ a Markov chain Monte Carlo (MCMC) algorithm based on the underdamped Langevin (ULD) and add the implementation to `volesti` package. In particular she/he will use a framework to discretize stochastic differential equations and simulate ULD, which converges to the target distribution. This framework is the current theoretical state-of-the-art for sampling from log-concave densities and is based on the paper at https://arxiv.org/pdf/1909.05503.pdf.

## Related work

There are not any open source software to sample from a probability distribution by solving stochastic differential equations. Hence this implementation would be a prototype one and would aspire to be beyond fastest implementations. Besides the log-concave sampling problem the proposed framework can be used to solve any problem that involves simulating stochastic differential equations (SDEs). There are various open source software to solve a SDE but the proposed method guarantees fast computations, within arbitrarily small error and evolves only two gradient evaluations per iteration. So an additional outcome of this project would be an efficient SDE solver.

## Details of your coding project

The student has a) to implement a SDE solver when evaluations oracles are given for a strongly convex function and its gradient, b) to use this solver to sample from the corresponding log-concave distribution without truncation and c) to employ boundary reflections on the computed, by the SDE, trajectory to sample from polytopes. The proposed programming language is C++ because the implemented code will have to be added to `volesti` package and `GeomScale` project in general. Of course, the implementation will be based on the current software of `volesti`, so the basic geometrical concepts (polytopes, boundary reflections, other random walks) are ready to be used.

The student should examine possible integrations with stan a state-of-the-art platform for statistical modelling and high-performance statistical computation.

## Expected impact

This project is expected to be very important to `GeomScale` project as it is about efficient sampling from log-concave distributions which appear in many applications. Moreover, it will be the starting point towards to efficient SDE solvers which is a new area for `GeomScale` project with numerous of applications to require such computations.

## Mentors

• Apostolos Chalkis <tolis.chal at gmail.com> is a PhD student in Computer Science. His research focuses on mathematical computing, optimization and computational finance. He has previous experience in GSoC 2018 and 2019 as a student under Org. `R-project`, implementing state-of-the-art algorithms for sampling from high dimensional multivariate distributions. He is one of the authors of `volesti`.
• Zafeirakis Zafeirakopoulos is an expert in implementing and benchmarking geometric and algebraic algorithms and has previous GSOC experience with the R-project (2018, 2019).

## Tests

Students, please do one or more of the following tests before contacting the mentors above.

• Easy: Download, compile and run a simple sampling example with both C++ and R interfaces of volesti. For example, you can sample uniformly distributed points from a 100-dimensional cube using all the implmented in volesti random walks and project the points onto the plane to demonstrate the mixing of the random walks.

• Medium: Given an evaluation oracle of a strongly convex function, implement ball walk to sample from the corresponding log-concave distribution truncated to a polytope. You are free to choose if the oracle is written in C++ or R.

• Hard: Implement gradient-descent algorithm when additionally, an evaluation oracle is given for the gradient of a strongly convex function. Use the step size of Barzilai–Borwein method. Again you are free to choose if the gradient oracle is written in C++ or R.

Students, please post a link to your test results here.

EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.