Machine Learning and Optimization for Finance: Index Replication - GeomScale/gsoc24 GitHub Wiki

Goal:

Replicate the payouts (profits and losses (P&L)) of a certain stock market index (i.e., a portfolio of stocks) with only a limited number of liquid (i.e., highly traded) stocks. Index replication is an important tool for portfolio managers seeking returns of indices that are not directly investable for most people by mimicking the performance of such indices with investable stocks.

Task:

Develop a learner to create a portfolio based on a given set of stocks and their respective characteristics, with the goal of mimicking a predefined index as closely as possible. A major challenge is to define a meaningful objective function that serves as a measure of similarity to the index and to optimize a payoff that reflects the performance of the index.

Initially, your objective is to replicate the methodology introduced by Shu, Shi, and Tian (2020)*, who formulate index tracking as a constrained and regularized regression problem. Subsequently, assess how open-source solvers can tackle this problem. Then, explore techniques for data-driven tuning of the regularization parameters. Next, to enhance the solution, incorporate a mean estimator (i.e., an estimation of the stock’s expected returns) into the objective function or as a constraint, and investigate methods for calibrating the weight of this component. Finally, conduct backtesting of the solution using real-world stock market data and evaluate the outcomes in terms of computational resources and economic performance.

*L. Shu, F. Shi & G. Tian (2020): Highdimensional index tracking based on the adaptive elastic net. Quantitative Finance, DOI: 10.1080/14697688.2020.1737328

Data:

Stock market data will be provided.

Methods:

To start, you should follow the model proposed in Shu, Shi, and Tian (2020). However, you are free to utilize any machine learning tool or solver of your choice to address the problem (do not implement their suggested coordinate descent algorithm). Later, as the basic problem is to be enhanced, you are at liberty to modify the model based on your own concepts and ideas.

Difficulty: Medium

Size

Large (350 hours)

Skills

  • Required: Python, R, linear algebra, optimization
  • Preferred: Experience with mathematical software and/or knowledge in computational finance is a plus

Expected impact

Index replication is an important tool for portfolio managers seeking returns of indices that are not directly investable for most people by mimicking the performance of such indices with investable stocks. That would be a great enhancement for GeomScale packages.

Mentors

  • Bachelard Cyril <cyril.bachelard at quantarea.ch> He serves as the Head of Quant Engineering and is a founding partner at Quantarea, a quantitative Asset Manager in Switzerland. He has 12+ years of experience in quantitative portfolio management and systematic equity research. His areas of expertise include high-dimensional portfolio optimization, machine learning, and signal processing for dynamic asset allocation.

  • Apostolos Chalkis <tolis.chal at gmail.com> is a Research Engineer at Quantagonia GmbH. He is an expert in statistical software, computational geometry, and optimization, and has previous GSoC student experience (2018 & 2019) and mentoring experience with GeomScale (from 2020 to 2023).

  • Vissarion Fisikopoulos <vissarion.fisikopoulos at gmail.com> is an expert in mathematical software, computational geometry, and optimization, and has previous GSOC mentoring experience with Boost C++ libraries (2016-2017) and the R-project (2017).

Tests

Using either R or Python, conduct the following tasks:

  1. Generate a multivariate gaussian time series, $X \in \mathbb{R}^{T \times n}$ with non-zero mean vector and non-zero correlations. Choose $T = 100$ and $n = 5$.
  2. Generate a univariate Gaussian time series, $y \in \mathbb{R}^T$, having a positive mean.
  3. Using the synthetically generated data, implement and solve the following optimization problem using a quadratic open-source solver of your choice: $$w^* = \underset{w \in \mathcal{S}}{\mathrm{argmin}} \quad \frac{1}{T} ||Xw - y||_2^2$$

where $\mathcal{S}$ denotes the standard simplex $\mathcal{S} := { x \in \mathbb{R}^n \ |\ x_i \geq 0, \sum\nolimits_{i=1}^n x_i = 1, i=1,...,n }$.

Hint: Notice that the squared 2-norm expands as $||Xw - y||_2^2 = w^{\top} \left( X^{\top}X \right) w - 2 (X^{\top}y)^{\top} w + y^{\top} y$.