Whiteboard Notes - Josh-Joseph/data_science GitHub Wiki
The number in parentheses indicates priority and items with a star indicates where we're starting
- DS Process (0)
- what is the distinction between a "wish" and the spectrum of "nebulous problem statement" to "well defined problem statement"
- (?) wish -> potential problem -> potentially solvable problem -> try to solve -> how well did it meet the wish -> wish -> (cycle back)
- examples
- "make money" -> pairs trading -> timescale? features? universe of stocks?
- "make money" -> news event trading ->
- "make money" -> momentum trading ->
- "utilize our assets better" -> what do people want? recent successes? -> change in tech needed? custom? realtime?
- "utilize our assets better" -> what do we have?
- "utilize our assets better" -> why is the problem a wish? what is the current state? -> performance? "coolness"?
- robust detection -> outlier rejection
- first pass at ordering
- problem statement
- metric
- what metric value is "good enough"?
- is the baseline already "good enough"?
- validation
- modeling
- assumptions
- features
- computation
- what is the distinction between a "wish" and the spectrum of "nebulous problem statement" to "well defined problem statement"
- Real-world Public-facing (1)
- github (start now)
- wiki hosted here
- slack (start now)
- website (not yet)
- hosted
- self-admin
- domain: skynet.institute
- github (start now)
- Visualization (2)
- Methods (2)
- Neural Networks (convolutional) *
- Causality
- Bayesian (nonparametric)
- scikit-learn
- azure / amazon / google ML
- ensembles *
- System, distributed computation, databases (3)
- Aerospike (DB) *
- TensorFlow *
- Spark/hadoop
- Streaming
- Storm
- Kafka
- GPU/Co-processor
- OpenCL
- Postgresql
- Couchbase
- greenplum/vertica
- vectorized instances/linear algebra
- probabilistic programming
- pymc
- Language (3)
- Python
- Julia
- R
- Scala
- C++
- OpenCL
- Church
- Data sets (4)
- Kaggle
- HeroX
- academic data sets
- Validation (5)
- Business Problems Consulting Taxonomy (6)