MLOps - bobbae/gcp GitHub Wiki

Machine learning (ML) workflows include steps to prepare and analyze data, train and evaluate models, deploy trained models to production, track ML artifacts and understand their dependencies, etc. Managing these steps in an ad-hoc manner can be difficult and time-consuming.

MLOps is the practice of applying DevOps practices to help automate, manage, and audit ML workflows. AI Platform Pipelines helps you implement MLOps by providing a platform where you can orchestrate the steps in your workflow as a pipeline. ML pipelines are portable and reproducible definitions of ML workflows. Kubeflow is a kind of MLOps tool.

AI Platform Pipelines

AI Platform Pipelines makes it easier to get started with MLOps by saving you the difficulty of setting up Kubeflow Pipelines with TensorFlow Extended (TFX). Kubeflow Pipelines is an open-source platform for running, monitoring, auditing, and managing ML pipelines on Kubernetes. TFX is an open-source project for building ML pipelines that orchestrate end-to-end ML workflows.

Reference architecture for MLOps

https://cloud.google.com/architecture/mlops-intelligent-products-essentials

Kubeflow

Kubeflow is the ML toolkit for Kubernetes.

Using the Kubeflow configuration interfaces you can specify the ML tools required for your workflow. Then you can deploy the workflow to various clouds.

MLflow

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

MLflow Tracking

MLflow Tracking is an API and UI for logging parameters, code versions, metrics and output files when running your machine learning code to later visualize them.

MLflow Projects

MLflow Projects provide a standard format for packaging reusable data science code. Each project is a directory with code or a Git repository.

MLflow Models

MLflow Models is a convention for packaging machine learning models in multiple formats called “flavors”. MLflow offers a variety of tools to help you deploy different flavors of models.

MLflow Model Registry

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.

Vertex Pipelines

https://cloud.google.com/vertex-ai/docs/pipelines/introduction

Spark ML pipeline using Vertex AI Pipelines

https://ivannardini.medium.com/sparkling-vertex-ai-pipeline-cfe6e19334f7

TFX

TFX is an open source project that you can use to define your ML workflow as a pipeline. Currently, TFX components can only train TensorFlow based models. TFX provides components that you can use to ingest and transform data, train and evaluate a model, deploy a trained model for inference, etc. By using the TFX SDK, you can compose a pipeline for your ML process from TFX components.

https://medium.com/@robertf99/mlops-with-tensorflow-extended-tfx-and-tensorflow-decision-forest-tf-df-part-1-bfa2f61580dc

Neptune

Neptune is a metadata store for MLOps, built for research and production teams that run a lot of experiments.

It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.

https://neptune.ai/blog/mlflow-vs-kubeflow-vs-neptune-differences

Pachyderm

Pachyderm is a data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. Other similar tools also exist to control an end-to-end machine learning life cycle.

DVC

DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.

MLRun

MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-learning pipelines from early development through model development to full pipeline deployment in production. MLRun offers a convenient abstraction layer to a wide variety of technology stacks while empowering data engineers and data scientists to define the feature and models.

ML CICD

Automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems.

Tutorials

MLOps pipeline in Vertex AI

https://blog.searce.com/implementing-mlops-pipeline-in-vertex-ai-to-adapt-to-the-changes-in-data-4cc3c07d5661

Model training CICD system

https://cloud.google.com/blog/topics/developers-practitioners/model-training-cicd-system-part-i

Vertex AI training in Gitlab

https://medium.com/google-cloud/how-to-run-vertexai-custom-jobs-in-gitlab-ci-b986e6ebed89

CI/CD for your Vertex AI Machine Learning Pipeline

https://medium.com/google-cloud/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd

Vertex AI using python

https://towardsdatascience.com/developing-and-deploying-a-machine-learning-model-on-vertex-ai-using-python-865b535814f8

MLOps pipeline for hyperparameter tuning in Vertex AI

https://towardsdatascience.com/how-to-build-an-mlops-pipeline-for-hyperparameter-tuning-in-vertex-ai-45cc2faf4ff5