Statistical Programming - HarlanH/julia GitHub Wiki
One goal of Julia is to be a platform for Statistical Programming that challenges R, Matlab's statistics toolbox, Clojure's Incanter, Python's Pandas, and a variety of commercial platforms. This is a tall order, but the flexibility, performance, and simplicity of the core Julia language give it a reasonable chance of succeeding in this ecosystem. </editorial>
Here is a likely-incomplete list of early requirements to get to a stage where basic linear models could be easily built in Julia. Some are specific to statistical programming, while others are language-general.
- New data types that support
NA. They might be calledIntData,NumData,BoolData,StrData, etc. Issue #470. - An updated testing framework to better allow test-driven development. Issue #8.
- A
FactorDatatype, supporting optionally ordered enumerations withNAs. - Either named arguments with defaults (e.g.,
f(a, b, q=7, x="hi")) or some alternative approach to options to functions. Issue #485. - A
DataFrame(or maybeDataTableis a better name) type, of heterogeneous *Data columns, complete with rownames and colnames. We should find out more about what John Chambers thinks aboutdata.frames in S/R and how they should be done better. We should also look at thedata.tableimplementation and also at what Pandas is doing. - A deep dive into the core libraries of R and Pandas and maybe other languages to learn from previous mistakes and develop a clean, modern, orthogonal set of methods for data manipulation. For the love of god, please let Julia not have a broken
sample()function like R's... - Formulas will probably be explicitly quoted expression in Julia, ala
lm(:(y ~ x), dat). So we just need a set of conventions (and maybe an extra operator or two). csvread()anddlmread()only generate matrices. There should be similar functions that read intoDataFrames, as well as output them.model.matrixand related equivalent methods on formulas.- a pure-julia implementation of
lm(). - Packages/Libraries/Gems/whatever.
Please add or edit this list as thinking evolves!
Packages for inspiration: