Machine Learning Workflow - ignacio-alorre/Data-Science GitHub Wiki

Workflow to follow in Applied Machine Learning

  • 1 - Problem Definition

    • Problem Description
      • Informal description
      • Formal description
      • Assumptions
    • Provided Data
      • Constrains imposed on data
      • Attribute definition
    • Motivation
      • Motivation
      • Benefits
      • Use
    • Manual Solution
  • 2 - Analyze Data

    • Summarize Data
      • Data Structure
      • Data Distribution
    • Visualize Data
      • Attribute Histograms
      • Pairwise scatterplots of attributes
  • 3 - Prepare Data

    • Select Data
    • Preprocess Data
      • Formatting
      • Cleaning
      • Sampling
    • Transform Data
      • Scaling
      • Decompositon
      • Aggregation
  • 4 - Evaluate Algorithms

    • Test Harness and Options
    • Explire and select algorithms
    • Interpret and report results
  • 5 - Improve Results

    • Algorithm Tuning
    • Ensemble methods
      • Bagging
      • Boosting
      • Blending
    • Extreme Feature Engineering
  • 6 - Present Results

    • Present Results
      • Context
      • Problem
      • Solution
      • Findings
      • Limitations
      • Conclusions
    • Operationalize Algorithm