ruptures for change point detection - rstats-gsoc/gsoc2025 GitHub Wiki

Background

ruptures (1.7k stars on github as of Mar 2025) is a popular python implementation of classic and state-of-the-art change-point detection algorithms, which are reviewed in Selective review of offline change point detection methods. Rebecca Killick maintains a helpful web page with references.

Related work

The Time Series CRAN Task View has a section reviewing R packages for change-point detection. However, there are drawbacks to existing R/python packages.

Details of your coding project

The goal of this project is to re-implement classic (binary segmentation) and state-of-the-art (PELT, FPOP) change-point detection in modern C++ (using, for instance, Armadillo), which can be interfaced with R (and eventually Python).

  • Re-implement core algorithms in C++
    • common distributions (normal, Poisson, ...)
    • Dynamic programming
      • PELT for multi-variate data.
      • FPOP for uni-variate data.
    • Binary segmentation
  • R package with consistent API to access each algorithm.
  • Documentation
  • vignettes
  • tests

Expected impact

Ruptures is extremely popular so this project could have a large impact.

Mentors

Contributors, please contact the mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Charles Truong is the maintainer of ruptures in Python.
  • Toby Hocking [email protected] is the author of numerous R packages, and has been a mentor/admin for R-GSOC since 2013.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Write an R function getCumsum
    • Input is $X$, an array of size TxD containing doubles.
    • Output is $Y$, an array of size (T+1)xD such that the first row is filled with 0s and $Y[i] = \sum_{j<i} X[j]$. Here, $Y[i]$ is the i-th row of $Y$.
    • Write a C++ implementation using Armadillo and a R wrapper using RcppArmadillo.
  • Medium: Write a C++ class Cost
    • This class is constructed from a double array $X$ of size TxD.
    • Add a member function double eval(int start, int end) which returns $\sum_{t=\text{start}}^{\text{end}-1} \left| X[t] - m\right|^2$ where $|\cdot|$ is the Euclidean norm and $m$ is the empirical mean of $X$ on the segment $[i, j[$.
  • Hard: Write an efficient Cost::eval function with constant time complexity. Use the following relations to precompute relevant quantities.

$$ \sum_{t=\text{start}}^{\text{end}-1} \left| X[t] - m \right|^2 = \left(\sum_{t} \left| X[t] \right|^2\right) - \left(\left| \sum_{t} X[t] \right|^2 / (\text{end}-\text{star})\right) $$

and

$$ \sum_{t=\text{start}}^{\text{end}-1} \left| X[t] \right|^2 = \left(\sum_{t=1}^{\text{end}-1} \left| X[t] \right|^2 \right) - \left(\sum_{t=1}^{\text{start}-1} \left| X[t] \right|^2 \right) $$

and

$$ \sum_{t=\text{start}}^{\text{end}-1} X[t] = \left(\sum_{t=1}^{\text{end}-1} X[t] \right) - \left(\sum_{t=1}^{\text{start}-1} X[t] \right). $$

Solutions of tests

Contributors, please post a link to your test results here.

Minh Long Nguyen | Github | Solutions for all 3 tests
Dev Goel | Github | Solutions
Olivier Boulant | Github | Solutions