MLOps Deploying and Life Cycling - liniribeiro/machine_learning GitHub Wiki
We need to think about deployment from the earliest stages of development.
Imagine we must deploy and maintain another colleague's machine-learning app and model. Immediately, a number of concerns should come to our mind:
Infrastructure compatibility
- Can this even run on the targeting infrastructure?
- You can build a model using a high memory usage, if you will run this model in a smartphone, for example.
Transparency
Transparency means theres no dilemma how we go from code + raw data to the final model
- Who trained this model?
- When?
- Using which script?
- What hyper-parameters?
- Versioned datasets + fully transparent pipelines (How to version datasets?)
Without transparency & reproducibility is a big NO GO. Return it to the sender.
Reproducibility
Reproducibility means recreating the exact model simple and straightforward
Bonus points: log experiments in metadata store, so the future maintainers don't re-invent the wheel Reproducibility increase trust because it proves that we can control our model production process to the finest level of detail. The keys to ensuring reproducibility are:
- code versioning
- data versioning
- Record this into the model metadata
- monitoring: constantly checking if the model is behaving as expected. An important part in creating these expectations is called data profiling.
Reproducibility checklist
- A pointer to the exact version of the model build pipeline code.
- A pointer to the exact versions of the datasets used during the training,
- including the train/splits during performance evaluation.
- The record of the performance achieved on the test set.
Input Data validation
Data profiles expectations -> validating the user input data great_expectations
Monitoring
One Concern is Performance deterioration: minimum requirement are log inputs and outputs of the model we should monitor in production the input data profile and the output data profile
Debugging
Lots of logs. There is no debugging without a detailed logging system in place.
Tests
Do I feel comfortable making changes to this code? We should build tests
- unit tests
- integration tests
- load tests
- stress tests
- deployment tests
Ml Pipeline Checklist
- Is the code versioned
- Is the data versioned (DVC is a popular tool for data version)
- Train Model
- Save Model
- Create Data Profile
- Record the exact version of the training data [Data versioning for reproducibility] ...
Feature Stores
Its a type of database, which stores data prepared specifically for ML models. Advanced feature stores are implemented as dual-databases, one for grabbing the training data and the other for making predictions. They help us to improve efficiency and consistency by allowing us to build a feature once, then reuse for different models.
Model build Pipelines
A model pipeline is a term we use for a machine learning model which executes a sequence of data processing steps, like cleaning and feature extraction.
The model BUILD pipeline is an automated workflow that, at the very least, loads a model or a model pipeline and the training dataset then trains the model and saves it for further use.
So, we can easily set up SOME model build pipeline, but to create one deserving of an MLOps seal of approval, it also needs to enable and facilitate deployment, reproducibility, monitoring, and CI/CD integration.
The model build pipeline should not produce only the model but a complete model package, containing a variety of MLOps-critical artifacts.
Model packaging
- smooth deployment
- reproducibility
- monitoring
formats: PMML and pickle pickle is the most used/recommended
Serving Models
batch/offline/static prediction is when the training is scheduled and it runs in a large amount of data Its the simplest form of prediction
online/on-demand/ dynamic prediction, its used when a event happen or a user make a request.
As every on-demand service, response time becomes important
The technical term for the time passed between the user request and the service response is called latency.
Real time prediction
- Ex. credit card fraud
Building API
Deployment progression and testing
- Unit tests: functions working as expecting - test environment
- Integration tests: How our code communicate with other apis and database - staging environment
- Smoke tests: Tests to check if the application can be deployed without crashing and burning
- Load test: where we check if the application can run normally with an unexpected user volume
- Stress test: much more user than load test.
- User acceptance testing (UAT): Final seal of approval before deploying to production.
Model deployment strategies
Monitoring ML Services
Maintain quality
- monitoring
Performance indicator
Fundamental health indicators:
- Service up and running?
- Number of requests in time?
- Latency distribution?
Ultimate quality metric:
- Predictive performance
How do ML model deteriorate?
How detect concept drift?
- monitor the input features: detect covariate shift
metadata store: mlflow tracking, its immensely helpful
- document the model journey
- avoid repeating same experiments
MLOps helps us maintain our model in the fastest, most efficient way
Model governance
Who will perform them? How? How we will document it?
Governance means extra steps
These new models should generate business value
Reckless ML: more damage than benefit
In the scope of model governance:
- roles and responsibilities
- model result traceability
- access control
- change logs
- test and validation procedures