Machine Learning Pipelines And Stacks - BKJackson/BKJackson_Wiki GitHub Wiki
ML Pipeline Orchestration
What does Orchestration mean?
- Managing dependencies among tasks
- Scheduling workflows
- Monitoring their execution
Airflow - Your new ML pipeline orchestration workflow management platform
Sklearn pipelines
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - KDnuggets
A Deep Dive Into Sklearn Pipelines - Kaggle
A Simple Guide to Scikit-learn Pipelines
6.1. Pipelines and composite estimators - Sklearn docs
Pandas pipelines
Build pipelines with Pandas using “pdpipe” - Nov. 30, 2019
PyTorch pipelines
The 4 steps necessary before fitting a machine learning model - Mar. 6, 2020
Easy data labeling with Snorkel
No labels? No problem! Machine learning without labels using Snorkel - Mar. 4, 2020
Introducing Snorkel
Snorkel
Snorkel Intro Tutorial: Data Slicing
Data Augmentation with Snorkel
Snorkel for Image Data
Putting it all Together
Advanced Spark PANCAKE STACK
Grappa Grappa makes an entire cluster look like a single, powerful, shared-memory machine.
Create a complete, end-to-end streaming data analytics pipeline
Pipeline.io
pipeline.io From Notebook Training to Production Serving in 1-Click. Works with Jupyter, Spark, TensorFlow, Zepplin, Docker, NVIDIA Cuda GPU, and more.
Fork on Github: https://github.com/fluxcapacitor/pipeline/
Pull from Docker hub: https://hub.docker.com/r/fluxcapacitor/pipeline/
Pipeline Environment Setup Wiki
Pipeline TensorFlow Examples
Pipeline ML Prediction Layer Example
Docker
Docker Build, ship, & run any app, anywhere. Dev-test pipeline automation.
Docker Docs
What's so special about Docker and containers in general? Reddit discussion.
Interactively analyze, approximate, and visualize streaming data
See NiFi, Kafka, Spark Streaming, Flink
Generate machine learning, graph & NLP recommendation models on streaming data
- Cluster-based Recommendation: Spark ML, Scikit-Learn
- Graph-based Recommendation: Spark ML, Spark Graph
- Collaborative-based Recommendation: Spark ML
- NLP-based Recommendation: CoreNLP, NLTK
- Geo-based Recommendation: ElasticSearch
Productionize ML models to serve real-time recommendations
Perform a hybrid on-premise and cloud deployment
- Hybrid On-Premise+Cloud Auto-scale Deploy: Docker
Videos
Advanced Apache Spark Meetups
Advanced Apache Spark Meetup 01-12-2016 Spark Stanford CoreNLP Succinct Word2Vec Four talks (3 hours). NLP/Text Analytics: Spark ML & Pipelines, Stanford CoreNLP, Succint, KeystoneML. March 19, 2016
People
Chris Fregly