Need for speed project discussion, summer 2012 - maniteja123/scikit-learn GitHub Wiki

The initial iteration will be over the linear models.

Choose some datasets for benchmarking the regression problem. These need to explore as many of the possible gotchas as we can: Maybe use our generators.

** Shape / simple structure: wide X, tall X, sparse X, etc. ** Mathematic structure: conditioning number, local optima, spectrum shape

Set up a (pilot) benchmark runner using these datasets. This will slowly build up into a nice speed.pypy -like (but hopefully cleaner) interface so we can monitor the overall performance of the scikit.

** vbench ** cProfile / lineprofiler output (or automatically redacted output, ie. top-k worst lines) ** buildbot

** explore the possibility of cython + openmp on all platforms.