4C. Model Deployment Alternate notes - ovokpus/MLOps-Learn GitHub Wiki
Now that the model is registered in MLflow Model Registry and production ready, we need to deploy that so that we can get the prediction result for the given data to realize its value.
Model Deployment
The kind of deployment depends upon how we want the prediction result. Say, if we can wait for an hour or a day for the prediction result then we do the offline batch deployment of the model that runs periodically. On the other hand, if we need to the prediction in real time then it has to be online deployment where the model is always up and running on a compute to serve. Again, when it comes to online deployment, based off the use case, we can deploy the model as a web service or a streaming service.
Webservice: We wrap the model in a web service where the model can be loaded and served to predict in a REST API call. For entire set of data received in an API call, the output from the model is sent in the response in one:one fashion.
Streaming: Runs in producer and consumer model where the producer pushes information to event stream and the consumers listen to the stream to get updates. For example, the predicted taxi duration result is published in the event stream and consumers such as subsequent models listen to the event stream to fetch the predicted data to further do further processing.
Deploying model as a web-service
Steps
- Save the trained model
- Create a virtual environment
- Create a script for prediction
- Put the script into a Flask app
- Package the app to Docker
Save the trained model
Here we are taking the same model, saved as a binary file, that was trained in previous model and put that in newly created web-service directory for this week.
One can run a fresh model so as to train and save the model as well.
Create a virtual environment
We need to have exact same version of Scikit-learn library that was used to create the model as well as same Python version in order to avoid any compatibility issue.
Go to the virtual environment in EC2 server where the model was trained and try the following command.
pip freeze | grep scikit-learn
python --version
With pipenv
create a new virtual environment.
pipenv install scikit-learn==1.0.2 flask --python=3.9
Activate the environment.
pipenv shell
If the prompt is too long on the screen we can set it to something short like '> '.
PS1="> "
Create script for prediction
Step 1
First we create a script that loads the saved model, preprocesses the input data and generates prediction.
Test script to test predict script
Note: Rename the files from predict_without_flask.py to predict.py and test_without_flask.py to test.py if you are interested to test the script without flask.
The idea is to create a working script that can take input in original format and generate the prediction result.
Step 2
Now that we have the working predict.py script ready, we can build a web-service around it so that we can expose it to an HTTP endpoint.
Note: Current flask set up is for the development environment. Install gunicorn and configure in order to solve the following production environment type warning.
pipenv install gunicorn
gunicorn --bind=0.0.0.0:9696 predict:app
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Note: We tried to run the test.py script from base virtual environment. We have requests library installed there. And, ideally in development environment we should have requests library installed as we need to do the testing, however in the production environment we do not need to install it.
If it is needed then we can still install it in production environment, however, with dev dependencies as follows. This we can use it but during the deployment it will not be there.
pipenv install --dev requests
Package the app to Docker
- Create Dockerfile with necessary content.
- Run the following command to build docker image
docker build -t ride-duration-prediction-service:v1 .
- Run the following command to run the image
docker run -it --rm -p 9696:9696 ride-duration-prediction-service:v1
This will deploy the webserive on localhost and we can run the test.py script again to test.
So far we have packaged the model in a docker file that can run in every docker compatible compute to serve. However the model we used was directly loaded from the local path where it was stored and we had learnt in previous sessions that the candidate models were stored in model registry that we were supposed to use. Hence, in the next section we will learn how to fetch the model from model registry to serve.
Get the models from MLflow Model Registry
This time we are going to run a fresh experiment to train a new model (Random Forest) on the same dataset and register the model in MLflow Model Registry.
Then we will explore various ways to fetch the registered model for the webservice.
Train the model
-
We are using locally hosted sqlite database for tracking server and s3 bucket for artifact storage. For that as a prerequisite create an S3 bucket and EC2 instance being used has access to the S3 bucket. To confirm try the following command and see you are able to get the list of buckets.
aws s3 ls
-
Run the following to start the mlflow tracking server
mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root=s3://bhagabat-mlflow-rf-greentaxi/
where bhagabat-mlflow-rf-greentaxi is the name of the S3 bucket.
Open mlflow UI on http://127.0.0.1:5000/
-
Create a jupyter notebook to train a model
Here the notebook where we trained a random forest regressor model, and tracked and saved the model in mlflow.
Jupyter Notbook for training and saving Random Forest Regressor.
You can check the experiment details and logged model artifact in mlflow UI
Note: There are multiple ways to use the logged model. If we are using
runs/RUN_ID/model
then we run with risk of availability lest the tracking server should go down. However, if fetch the artifact directly from S3 then we are not dependent on the artifact server. Please check the predict.py script to see the changes made.
Inference script to fetch model
-
We can take the note of the RUN ID from the experiment tracker and use that in the predict script so as to deploy the webservice.
Here is the link for predict.py script.
Remember to install the missing packages, if any, in the virtual environment.
-
In another terminal run test.py to see if we are getting the predicted result.
References
4.3 Web-services: Getting the models from the model registry (MLflow)
MLflow backend and artifact store
Deploy model as Streaming service
Reference:
Video by DataTalksClub
[Pending] Will share the notes upon completion.
Batch deployment of model
Typical approach for deploying a model in batch model:
- Create a notebook/training script to train a model and save it
- Create a notebook to load the trained model and make prediction on the new data
- Convert the notebook to an inference script
- Clean and parameterize the script
- Schedule the inference script if required
We will use the same taxi duration prediction example here.
Step 1: Train the model and save the artifacts
-
If the model is not trained yet we got train first. Since as per the homework we need to train the model on FHV datasets, I am training the model from scratch here.
Connect to EC2 server and create a new virtual environment
mkdir batch-train cd batch-train pipenv shell --python=3.9 pipenv install scikit-learn==1.0.2 flask gunicorn mlflow boto3
-
Train random forest regressor model for the fhv taxi dataset
Jupyter notebook for training modelTake a note of the full path from artifact section in mlflow ui
Step 2: Notebook for fetching the model and predicting
-
Connect to EC2 server and create a new virtual environment
mkdir batch-inference cd batch-inference pipenv shell --python=3.9 pipenv install scikit-learn==1.0.2 prefect mlflow pandas boto3 pyarrow s3fs
-
Copy the training notebook and change it for prediction, say, score.ipynb.
Once the notebook code successfully runs convert that to python script.
jupyter nbconvert --to script score.ipynb
-
Clean and parameterize the prediction script score.py
Step 3: Dockerise the script [Homework]
[Pending] This is yet to be completed.