Machine Learning Pipelines And Stacks - BKJackson/BKJackson_Wiki GitHub Wiki

ML Pipeline Orchestration

What does Orchestration mean?

  • Managing dependencies among tasks
  • Scheduling workflows
  • Monitoring their execution

Airflow - Your new ML pipeline orchestration workflow management platform

Sklearn pipelines

Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - KDnuggets
A Deep Dive Into Sklearn Pipelines - Kaggle
A Simple Guide to Scikit-learn Pipelines 6.1. Pipelines and composite estimators - Sklearn docs

Pandas pipelines

Build pipelines with Pandas using “pdpipe” - Nov. 30, 2019

PyTorch pipelines

The 4 steps necessary before fitting a machine learning model - Mar. 6, 2020

Easy data labeling with Snorkel

No labels? No problem! Machine learning without labels using Snorkel - Mar. 4, 2020
Introducing Snorkel
Snorkel
Snorkel Intro Tutorial: Data Slicing
Data Augmentation with Snorkel
Snorkel for Image Data

Putting it all Together

Advanced Spark PANCAKE STACK
Grappa Grappa makes an entire cluster look like a single, powerful, shared-memory machine.

Create a complete, end-to-end streaming data analytics pipeline

Pipeline.io
pipeline.io From Notebook Training to Production Serving in 1-Click. Works with Jupyter, Spark, TensorFlow, Zepplin, Docker, NVIDIA Cuda GPU, and more.
Fork on Github: https://github.com/fluxcapacitor/pipeline/
Pull from Docker hub: https://hub.docker.com/r/fluxcapacitor/pipeline/
Pipeline Environment Setup Wiki
Pipeline TensorFlow Examples
Pipeline ML Prediction Layer Example

Docker
Docker Build, ship, & run any app, anywhere. Dev-test pipeline automation.
Docker Docs
What's so special about Docker and containers in general? Reddit discussion.

Interactively analyze, approximate, and visualize streaming data

See NiFi, Kafka, Spark Streaming, Flink

Generate machine learning, graph & NLP recommendation models on streaming data

  • Cluster-based Recommendation: Spark ML, Scikit-Learn
  • Graph-based Recommendation: Spark ML, Spark Graph
  • Collaborative-based Recommendation: Spark ML
  • NLP-based Recommendation: CoreNLP, NLTK
  • Geo-based Recommendation: ElasticSearch

Productionize ML models to serve real-time recommendations

Perform a hybrid on-premise and cloud deployment

  • Hybrid On-Premise+Cloud Auto-scale Deploy: Docker

Videos

Advanced Apache Spark Meetups
Advanced Apache Spark Meetup 01-12-2016 Spark Stanford CoreNLP Succinct Word2Vec Four talks (3 hours). NLP/Text Analytics: Spark ML & Pipelines, Stanford CoreNLP, Succint, KeystoneML. March 19, 2016

People

Chris Fregly