Statistical Programming - HarlanH/julia GitHub Wiki
One goal of Julia is to be a platform for Statistical Programming that challenges R, Matlab's statistics toolbox, Clojure's Incanter, Python's Pandas, and a variety of commercial platforms. This is a tall order, but the flexibility, performance, and simplicity of the core Julia language give it a reasonable chance of succeeding in this ecosystem. </editorial>
Here is a likely-incomplete list of early requirements to get to a stage where basic linear models could be easily built in Julia. Some are specific to statistical programming, while others are language-general.
- New data types that support
NA
. They might be calledIntData
,NumData
,BoolData
,StrData
, etc. Issue #470. - An updated testing framework to better allow test-driven development. Issue #8.
- A
FactorData
type, supporting optionally ordered enumerations withNA
s. - Either named arguments with defaults (e.g.,
f(a, b, q=7, x="hi")
) or some alternative approach to options to functions. Issue #485. - A
DataFrame
(or maybeDataTable
is a better name) type, of heterogeneous *Data columns, complete with rownames and colnames. We should find out more about what John Chambers thinks aboutdata.frame
s in S/R and how they should be done better. We should also look at thedata.table
implementation and also at what Pandas is doing. - A deep dive into the core libraries of R and Pandas and maybe other languages to learn from previous mistakes and develop a clean, modern, orthogonal set of methods for data manipulation. For the love of god, please let Julia not have a broken
sample()
function like R's... - Formulas will probably be explicitly quoted expression in Julia, ala
lm(:(y ~ x), dat)
. So we just need a set of conventions (and maybe an extra operator or two). csvread()
anddlmread()
only generate matrices. There should be similar functions that read intoDataFrame
s, as well as output them.model.matrix
and related equivalent methods on formulas.- a pure-julia implementation of
lm()
. - Packages/Libraries/Gems/whatever.
Please add or edit this list as thinking evolves!
Packages for inspiration: