Whiteboard Notes - Josh-Joseph/data_science GitHub Wiki

The number in parentheses indicates priority and items with a star indicates where we're starting

  • DS Process (0)
    • what is the distinction between a "wish" and the spectrum of "nebulous problem statement" to "well defined problem statement"
      • (?) wish -> potential problem -> potentially solvable problem -> try to solve -> how well did it meet the wish -> wish -> (cycle back)
    • examples
      • "make money" -> pairs trading -> timescale? features? universe of stocks?
      • "make money" -> news event trading ->
      • "make money" -> momentum trading ->
      • "utilize our assets better" -> what do people want? recent successes? -> change in tech needed? custom? realtime?
      • "utilize our assets better" -> what do we have?
      • "utilize our assets better" -> why is the problem a wish? what is the current state? -> performance? "coolness"?
      • robust detection -> outlier rejection
    • first pass at ordering
      • problem statement
      • metric
      • what metric value is "good enough"?
      • is the baseline already "good enough"?
      • validation
      • modeling
        • assumptions
        • features
      • computation
  • Real-world Public-facing (1)
    • github (start now)
      • wiki hosted here
    • slack (start now)
    • website (not yet)
      • hosted
      • self-admin
      • domain: skynet.institute
  • Visualization (2)
  • Methods (2)
    • Neural Networks (convolutional) *
    • Causality
    • Bayesian (nonparametric)
    • scikit-learn
    • azure / amazon / google ML
    • ensembles *
  • System, distributed computation, databases (3)
    • Aerospike (DB) *
    • TensorFlow *
    • Spark/hadoop
    • Streaming
      • Storm
      • Kafka
    • GPU/Co-processor
      • OpenCL
    • Postgresql
    • Couchbase
    • greenplum/vertica
    • vectorized instances/linear algebra
    • probabilistic programming
    • pymc
  • Language (3)
    • Python
    • Julia
    • R
    • Scala
    • C++
    • OpenCL
    • Church
  • Data sets (4)
    • Kaggle
    • HeroX
    • academic data sets
  • Validation (5)
  • Business Problems Consulting Taxonomy (6)