MLOps Deploying and Life Cycling - liniribeiro/machine_learning GitHub Wiki

Screenshot 2025-05-16 at 12 21 40

We need to think about deployment from the earliest stages of development.

Imagine we must deploy and maintain another colleague's machine-learning app and model. Immediately, a number of concerns should come to our mind:

Infrastructure compatibility

  • Can this even run on the targeting infrastructure?
  • You can build a model using a high memory usage, if you will run this model in a smartphone, for example.

Transparency

Transparency means theres no dilemma how we go from code + raw data to the final model

  • Who trained this model?
  • When?
  • Using which script?
  • What hyper-parameters?
  • Versioned datasets + fully transparent pipelines (How to version datasets?)

Without transparency & reproducibility is a big NO GO. Return it to the sender.

Reproducibility

Reproducibility means recreating the exact model simple and straightforward

Bonus points: log experiments in metadata store, so the future maintainers don't re-invent the wheel Reproducibility increase trust because it proves that we can control our model production process to the finest level of detail. The keys to ensuring reproducibility are:

  • code versioning
  • data versioning
  • Record this into the model metadata
  • monitoring: constantly checking if the model is behaving as expected. An important part in creating these expectations is called data profiling. Screenshot 2025-05-16 at 15 58 26
Reproducibility checklist
  • A pointer to the exact version of the model build pipeline code.
  • A pointer to the exact versions of the datasets used during the training,
    • including the train/splits during performance evaluation.
  • The record of the performance achieved on the test set.

Input Data validation

Data profiles expectations -> validating the user input data great_expectations

Monitoring

One Concern is Performance deterioration: minimum requirement are log inputs and outputs of the model we should monitor in production the input data profile and the output data profile

Debugging

Lots of logs. There is no debugging without a detailed logging system in place.

Tests

Do I feel comfortable making changes to this code? We should build tests

  • unit tests
  • integration tests
  • load tests
  • stress tests
  • deployment tests

Ml Pipeline Checklist

  • Is the code versioned
  • Is the data versioned (DVC is a popular tool for data version)
  • Train Model
  • Save Model
  • Create Data Profile
  • Record the exact version of the training data [Data versioning for reproducibility] ...

Feature Stores

Its a type of database, which stores data prepared specifically for ML models. Advanced feature stores are implemented as dual-databases, one for grabbing the training data and the other for making predictions. They help us to improve efficiency and consistency by allowing us to build a feature once, then reuse for different models.

Model build Pipelines

A model pipeline is a term we use for a machine learning model which executes a sequence of data processing steps, like cleaning and feature extraction.

The model BUILD pipeline is an automated workflow that, at the very least, loads a model or a model pipeline and the training dataset then trains the model and saves it for further use.

So, we can easily set up SOME model build pipeline, but to create one deserving of an MLOps seal of approval, it also needs to enable and facilitate deployment, reproducibility, monitoring, and CI/CD integration.

Screenshot 2025-05-16 at 15 49 28

The model build pipeline should not produce only the model but a complete model package, containing a variety of MLOps-critical artifacts.

Model packaging

  • smooth deployment
  • reproducibility
  • monitoring

formats: PMML and pickle pickle is the most used/recommended

Serving Models

batch/offline/static prediction is when the training is scheduled and it runs in a large amount of data Its the simplest form of prediction

Screenshot 2025-05-17 at 10 57 57

online/on-demand/ dynamic prediction, its used when a event happen or a user make a request. As every on-demand service, response time becomes important Screenshot 2025-05-17 at 11 00 34

The technical term for the time passed between the user request and the service response is called latency.

Real time prediction

  • Ex. credit card fraud

Building API

Screenshot 2025-05-17 at 11 09 43

Deployment progression and testing

  • Unit tests: functions working as expecting - test environment
  • Integration tests: How our code communicate with other apis and database - staging environment
  • Smoke tests: Tests to check if the application can be deployed without crashing and burning
  • Load test: where we check if the application can run normally with an unexpected user volume
  • Stress test: much more user than load test.
  • User acceptance testing (UAT): Final seal of approval before deploying to production.

Model deployment strategies

Screenshot 2025-05-17 at 11 26 14

Screenshot 2025-05-17 at 11 27 43

Screenshot 2025-05-17 at 11 28 44

Monitoring ML Services

Maintain quality

  • monitoring

Performance indicator

Fundamental health indicators:

  • Service up and running?
  • Number of requests in time?
  • Latency distribution?

Ultimate quality metric:

  • Predictive performance

How do ML model deteriorate?

Screenshot 2025-05-17 at 11 34 04 How detect concept drift?

  • monitor the input features: detect covariate shift

metadata store: mlflow tracking, its immensely helpful

  • document the model journey
  • avoid repeating same experiments

MLOps helps us maintain our model in the fastest, most efficient way

Model governance

Screenshot 2025-05-17 at 11 59 58 Who will perform them? How? How we will document it? Governance means extra steps These new models should generate business value Reckless ML: more damage than benefit

In the scope of model governance:

  • roles and responsibilities
  • model result traceability
  • access control
  • change logs
  • test and validation procedures