Need for speed project discussion, summer 2012 - maniteja123/scikit-learn GitHub Wiki
The initial iteration will be over the linear models.
- Choose some datasets for benchmarking the regression problem. These need to explore as many of the possible gotchas as we can: Maybe use our generators.
** Shape / simple structure: wide X, tall X, sparse X, etc. ** Mathematic structure: conditioning number, local optima, spectrum shape
- Set up a (pilot) benchmark runner using these datasets. This will slowly build up into a nice speed.pypy -like (but hopefully cleaner) interface so we can monitor the overall performance of the scikit.
** vbench ** cProfile / lineprofiler output (or automatically redacted output, ie. top-k worst lines) ** buildbot
- Lose nights obsessing over getting the plot to go lower and lower.
** explore the possibility of cython + openmp on all platforms.