Statistical Programming - HarlanH/julia GitHub Wiki

One goal of Julia is to be a platform for Statistical Programming that challenges R, Matlab's statistics toolbox, Clojure's Incanter, Python's Pandas, and a variety of commercial platforms. This is a tall order, but the flexibility, performance, and simplicity of the core Julia language give it a reasonable chance of succeeding in this ecosystem. </editorial>

Here is a likely-incomplete list of early requirements to get to a stage where basic linear models could be easily built in Julia. Some are specific to statistical programming, while others are language-general.

  • New data types that support NA. They might be called IntData, NumData, BoolData, StrData, etc. Issue #470.
  • An updated testing framework to better allow test-driven development. Issue #8.
  • A FactorData type, supporting optionally ordered enumerations with NAs.
  • Either named arguments with defaults (e.g., f(a, b, q=7, x="hi")) or some alternative approach to options to functions. Issue #485.
  • A DataFrame (or maybe DataTable is a better name) type, of heterogeneous *Data columns, complete with rownames and colnames. We should find out more about what John Chambers thinks about data.frames in S/R and how they should be done better. We should also look at the data.table implementation and also at what Pandas is doing.
  • A deep dive into the core libraries of R and Pandas and maybe other languages to learn from previous mistakes and develop a clean, modern, orthogonal set of methods for data manipulation. For the love of god, please let Julia not have a broken sample() function like R's...
  • Formulas will probably be explicitly quoted expression in Julia, ala lm(:(y ~ x), dat). So we just need a set of conventions (and maybe an extra operator or two).
  • csvread() and dlmread() only generate matrices. There should be similar functions that read into DataFrames, as well as output them.
  • model.matrix and related equivalent methods on formulas.
  • a pure-julia implementation of lm().
  • Packages/Libraries/Gems/whatever.

Please add or edit this list as thinking evolves!

Packages for inspiration: