spinebil: Package to provide diagnostics for projection pursuit - rstats-gsoc/gsoc2025 GitHub Wiki

Background

This project will enhance the spinebil package to equip it with new methods for diagnosing projection pursuit (PP) indexes.

Related work

The package ferrn provides diagnostics for the projection pursuit guided tour, which is available in the tourr package. The paper by Laa et al describes methods currently available in the spinebil package. This paper also has numerous references to the projection pursuit literature.

Details of your coding project

The project involves

  • preparing the package for acceptance on CRAN
  • adding routines that can assess PP index behaviour, that include
    • the practical scale observed,
    • the change in function as the projection goes from pure noise to pure structure
    • the effect of sample size on index expected value, and selected quantiles
  • test these routines on existing indexes
  • provide examples of usage of the new functionality
  • develop revised scagnostic indexes that have better behaviour
  • document code
  • write a vignette with example usage

Expected impact

The availability of this package will enable better development and testing of new projection pursuit indexes. Projection pursuit is widely used to reduce dimension of high-dimensional data sets, to capture structure and associations that cannot be seen from principal component analysis. It is a linear dimension reduction method, so that it doesn't suffer from hallucinations occurring from non-linear dimension reduction methods like t-SNE and UMAP.

Mentors

Contributors, please contact mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Di Cook [email protected] is the author of numerous R packages including tourr, nullabor, GGally, and has had extensive GSOC experience since 2012.
  • Co-mentor: Jess Leung [email protected] is an in optimisation.
  • Co-mentor: Ursula Laa [email protected] is the current maintainer of the spinebil package, and has two years of GSOC experience.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Fork the package and run the package checks using devtools. Make the fixes needed for it to pass CRAN checks.
  • Medium: Add a GitHub Actions workflow to automate the CRAN checks when code is pushed to the GitHub.
  • Hard: Write a simulation to check the minimum and maximum values that we might observe for any 2D pattern for the stringy index available in tourr package. Report the data generated for testing, and the minimum and maximum values that would be expected.

Solutions of tests

Contributors, please post a link to your test results here.

  • EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
  1. Chengming Ma, GitHub Profile, Test Results and Patches

  2. Mukul kumar, Github profile, Test result

  3. Vidhaan Khare, Github profile, Test results

  4. Tina Rashid Jafari, Github profile, Test Results