changepoint.online - rstats-gsoc/gsoc2018 GitHub Wiki
Detecting changes in statistical properties of a time series is important in a large number of fields. The original motivation for changepoint analysis was in sequential process control, where a decision as to whether a change has occurred must be made in real time. With the increasing amount of sensor and similar online data, there is a clear need for online changepoint detection methods to be available through R packages.
Similarly more and more users of our changepoint and ecp packages are requesting online functionality.
There are many R packages available for offline changepoint detection but, to our knowledge, only one for online changepoint detection (cpm). Whilst this package implements traditional “resetting” methodology, whereby once a change has occurred previous data is forgotten. In contrast this project would bring the accuracy benefits of the offline methodology to the online setting, allowing users to implement the state of the art offline methods is a computational efficient manner for online use.
This project will create the changepoint.online
R package. The package will mirror the functionality of the changepoint
and ecp
packages in terms of functionality but for an online setting. More specifically:
- Setup a github repo for changepoint.online, with TravisCI for GNU/Linux testing, Appveyor for windows testing and Coveralls for code coverage.
- Break the back end code for the PELT and ecp algorithms into initialization and update functions.
- Write user facing functions and plotting tools.
- Write some extensive test cases using
testthat
and building on tests in thechangepoint
andecp
packages. Goal: 100% coverage in both R and C code by the end of summer. If time allows, port applicable tests back tochangepoint
andecp
to increase coverage. - Can test on windows via win-builder.
- Write a vignette describing how to use the package.
- Create a shiny app demonstrating the functionality of the package for real time analysis.
The package will provide a new and important alternative to the “resetting” algorithms currently available. The package will also include parametric models which are not included in cpm
. Additionally we have received a considerable number of requests for this functionality over the last year so we expect the package to be well used by the community.
Students, please contact mentors below after completing at least one of the tests below.
- Rebecca Killick <[email protected]>
- David Matteson <[email protected]>
Easy: Download and install the changepoint
and ecp
packages. Write a for
loop to analyze a data set with an increasing number of data points. Graph the ouput adding a new timepoint in each iteration of the loop and updating the best changepoint locations.
Medium: Fork the changepoint, changepoint.np, EnvCpt or ecp packages on github and write some new tests to increase the code coverage. Commit these back to the main repository.
Hard: Make your easy task into R functions, remembering to include checks on your code. Write a package which includes tests for your functions. Upload to github and link in TravisCI
testing and code coverage via covr
.
Students, please post a link to your test results here.
Name: Andrew Connell
Email: [email protected]
University: Lancaster University
Course: BSc Mathematics and Statistics
Solution to Easy Test: Easy Test
Name: Daniel Morales
Email: [email protected]
University: Instituto Tecnológico de Querétaro, MX
Degree: Computer Systems Engineering
Solution to Easy Test: Easy Test