认识MLflow.md - liuxiang/liuxiang.github.io GitHub Wiki
- 一. MLflow
- 二. Client
- 三. Docker
- Tracking:
模型训练- 模型参数、指标记录繁琐,Tracking可以记录模型的配置信息,并可视化展示. - Projects:
模型工程管理- 模型结果难以再现,Projects通过conda重现模型所需环境、依赖,使得模型结果可以复现 - Models:
模型包管理- 开发的模型部署难,Models打包、封装模型,并提供部署
改善的问题: 部署ML很难 由于需要运行的部署工具和环境过多(例如REST服务,批处理推理或移动应用程序),因此将模型迁移到生产可能具有挑战性。没有标准的方法将模型从任何库移动到这些工具中,从而为每个新部署创建新的风险。 MLflow:一个开源的机器学习平台 - 简书
quickstart/mlflow_tracking.py 是介绍MLflow概念的基本示例。
pytorch在MNIST数据集上使用CNN进行字符识别。该示例记录TensorBoard事件并将其存储(记录)为MLflow工件。
remote_store 有一个用于跟踪的基于REST的后备存储的用法示例。
sklearn_elasticnet_diabetes 使用sklearn糖尿病数据集通过ElasticNet预测糖尿病的进展。
sklearn_elasticnet_wine_quality是MLflow项目的示例。这使用Wine Quality数据集和Elastic Net来预测质量。该示例用于MLproject设置Conda环境,定义参数类型和默认值,培训的入口点等。
docker 演示如何使用docker(而非conda)创建和运行MLflow项目来管理项目依赖项
Anaconda3 : download
mlflow : pip install mlflow
examples: https://github.com/mlflow/mlflow/tree/master/examples
python examples/quickstart/mlflow_tracking.py
import os
from random import random, randint
from mlflow import log_metric, log_param, log_artifacts
if __name__ == "__main__":
print("Running mlflow_tracking.py")
# 入参
log_param("param1", randint(0, 100))
# 指标
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs") # 训练- 输出
.
├── mlruns
│ └── 0
│ ├── c866b38f23b441fe9d23c42db04ff42d
│ │ ├── artifacts
│ │ │ └── test.txt
│ │ ├── meta.yaml
│ │ ├── metrics
│ │ │ └── foo
│ │ ├── params
│ │ │ └── param1
│ │ └── tags
│ │ ├── mlflow.source.name
│ │ ├── mlflow.source.type
│ │ └── mlflow.user
│ └── meta.yaml
├── outputs
│ └── test.txt
└── quickstart
└── mlflow_tracking.py
- UI 服务(即 http://localhost:5000)
使用MLflow UI比较您生成的模型。在与包含
mlruns运行的目录相同的当前工作目录中:$ mlflow ui [2020-06-22 14:18:05 +0800] [26220] [INFO] Starting gunicorn 20.0.4 [2020-06-22 14:18:05 +0800] [26220] [INFO] Listening at: http://127.0.0.1:5000 (26220) [2020-06-22 14:18:05 +0800] [26220] [INFO] Using worker: sync [2020-06-22 14:18:05 +0800] [26224] [INFO] Booting worker with pid: 26224
└── mlflow-example
├── LICENSE.txt
├── MLproject # Project元数据描述 ★
├── README.md
├── conda.yaml # 依赖描述
├── train.py # 训练代码 ★
└── wine-quality.csv # 依赖材料
https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine
- MLproject (Project元数据描述)
name: tutorial
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: {type: float, default: 0.5}
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"- conda.yaml
name: tutorial
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.6
- scikit-learn=0.19.1
- pip
- pip:
- mlflow>=1.0- train.py (log_model将生成模型包*.pkl)
...
if tracking_url_type_store != "file":
# Register the model
# There are other ways to use the Model Registry, which depends on the use case,
# please refer to the doc for more information:
# https://mlflow.org/docs/latest/model-registry.html#api-workflow
mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
else:
mlflow.sklearn.log_model(lr, "model") # 生成模型包- mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4
$ mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4
# 创建conda环境
2020/06/22 12:51:00 INFO mlflow.projects: === Creating conda environment mlflow-6284a367a61b51ccdf445333a216776597fb4efc ===
WARNING: The conda.compat module is deprecated and will be removed in a future release.
WARNING: The conda.compat module is deprecated and will be removed in a future release.
Collecting package metadata: done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.6.11
latest version: 4.8.3
Please update conda by running
$ conda update -n base -c defaults conda
# 下载依赖
Downloading and Extracting Packages
tbb4py-2020.0 | 64 KB | ###################################### | 100%
mkl_fft-1.0.6 | 139 KB | ###################################### | 100%
scikit-learn-0.19.1 | 4.8 MB | ###################################### | 100%
numpy-1.15.4 | 35 KB | ###################################### | 100%
mkl_random-1.0.1 | 349 KB | ###################################### | 100%
tbb-2020.0 | 167 KB | ###################################### | 100%
numpy-base-1.15.4 | 4.1 MB | ###################################### | 100%
python-3.6.10 | 20.5 MB | ###################################### | 100%
pip-20.1.1 | 2.0 MB | ###################################### | 100%
mkl-2018.0.3 | 149.2 MB | ###################################### | 100%
certifi-2020.4.5.2 | 160 KB | ###################################### | 100%
scipy-1.1.0 | 15.4 MB | ###################################### | 100%
setuptools-47.3.0 | 643 KB | ###################################### | 100%
wheel-0.34.2 | 49 KB | ###################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
# 为conda虚拟环境mlflow-*,安装依赖
Ran pip subprocess with arguments:
['/Users/liuxiang/anaconda3/envs/mlflow-6284a367a61b51ccdf445333a216776597fb4efc/bin/python', '-m', 'pip', 'install', '-r', '/Users/liuxiang/PycharmProjects/mlflow_demo/examples/sklearn_elasticnet_wine/condaenv.jio31sys.requirements.txt']
Pip subprocess output:
Collecting mlflow>=1.0
Using cached mlflow-1.9.0-py3-none-any.whl (11.9 MB)
...
Successfully built querystring-parser pyyaml prometheus-flask-exporter databricks-cli alembic sqlalchemy
# 安装完成如下依赖集合
Installing collected packages: six, protobuf, smmap, gitdb, gitpython, pycparser, cffi, cryptography, urllib3, idna, chardet, requests, oauthlib, requests-oauthlib, isodate, msrest, azure-core, azure-storage-blob, gunicorn, websocket-client, docker, querystring-parser, pyyaml, entrypoints, python-dateutil, prometheus-client, itsdangerous, click, Werkzeug, MarkupSafe, Jinja2, Flask, prometheus-flask-exporter, tabulate, databricks-cli, gorilla, cloudpickle, Mako, sqlalchemy, python-editor, alembic, sqlparse, pytz, pandas, mlflow
Successfully installed Flask-1.1.2 Jinja2-2.11.2 Mako-1.1.3 MarkupSafe-1.1.1 Werkzeug-1.0.1 alembic-1.4.2 azure-core-1.6.0 azure-storage-blob-12.3.2 cffi-1.14.0 chardet-3.0.4 click-7.1.2 cloudpickle-1.4.1 cryptography-2.9.2 databricks-cli-0.11.0 docker-4.2.1 entrypoints-0.3 gitdb-4.0.5 gitpython-3.1.3 gorilla-0.3.0 gunicorn-20.0.4 idna-2.9 isodate-0.6.0 itsdangerous-1.1.0 mlflow-1.9.0 msrest-0.6.16 oauthlib-3.1.0 pandas-1.0.5 prometheus-client-0.8.0 prometheus-flask-exporter-0.14.1 protobuf-3.12.2 pycparser-2.20 python-dateutil-2.8.1 python-editor-1.0.4 pytz-2020.1 pyyaml-5.3.1 querystring-parser-1.2.4 requests-2.24.0 requests-oauthlib-1.3.0 six-1.15.0 smmap-3.0.4 sqlalchemy-1.3.13 sqlparse-0.3.1 tabulate-0.8.7 urllib3-1.25.9 websocket-client-0.57.0
#
# To activate this environment, use
#
# $ conda activate mlflow-6284a367a61b51ccdf445333a216776597fb4efc
#
# To deactivate an active environment, use
#
# $ conda deactivate
2020/06/22 13:08:26 INFO mlflow.projects: === Created directory /var/folders/0q/89pc1xbn38bcg4zff6b3qnpc0000gn/T/tmp226r4wv1 for downloading remote URIs passed to arguments of type 'path' ===
2020/06/22 13:08:26 INFO mlflow.projects: === Running command 'source /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-6284a367a61b51ccdf445333a216776597fb4efc 1>&2 && python train.py 0.4 0.1' in run with ID 'eaacfe12eee3436399f6125baeda013f' ===
Elasticnet model (alpha=0.400000, l1_ratio=0.100000):
RMSE: 0.7410782793160982
MAE: 0.5712718681984226
R2: 0.22185255063708886
2020/06/22 13:09:22 INFO mlflow.projects: === Run (ID 'eaacfe12eee3436399f6125baeda013f') succeeded ===
1.创建虚拟环境 & 安装依赖 Creating conda environment mlflow-6284a367a61b51ccdf445333a216776597fb4efc
2.Running command : (激活虚拟环境 > 运行训练py) /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-6284a367a61b51ccdf445333a216776597fb4efc 1>&2 && python train.py 0.4 0.1
-
二次执行
mlflow run examples/sklearn_elasticnet_wine可重用虚拟环境mlflow run examples/sklearn_elasticnet_wine -P alpha=0.4 2020/06/22 13:59:49 INFO mlflow.projects: === Created directory /var/folders/0q/89pc1xbn38bcg4zff6b3qnpc0000gn/T/tmp_f0csjn6 for downloading remote URIs passed to arguments of type 'path' === 2020/06/22 13:59:49 INFO mlflow.projects: === Running command 'source /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-6284a367a61b51ccdf445333a216776597fb4efc 1>&2 && python train.py 0.4 0.1' in run with ID '261a7ce97e7b4dffbff3a74391ca3865' === Elasticnet model (alpha=0.400000, l1_ratio=0.100000): RMSE: 0.7410782793160982 MAE: 0.5712718681984226 R2: 0.22185255063708886 2020/06/22 13:59:56 INFO mlflow.projects: === Run (ID '261a7ce97e7b4dffbff3a74391ca3865') succeeded === -
log_model后会生成模型包,所在位置/Users/.../mlruns/0/261a7ce97e7b4dffbff3a74391ca3865/artifacts/model├── artifacts │ └── model │ ├── MLmodel │ ├── conda.yaml │ └── model.pkl ├── meta.yaml ├── metrics │ ├── mae │ ├── r2 │ └── rmse ├── params │ ├── alpha │ └── l1_ratio └── tags ├── mlflow.log-model.history ├── mlflow.project.backend ├── mlflow.project.entryPoint ├── mlflow.project.env ├── mlflow.source.name ├── mlflow.source.type └── mlflow.user# Mlmodel (模型元数据描述) artifact_path: model flavors: python_function: data: model.pkl env: conda.yaml loader_module: mlflow.sklearn python_version: 3.6.10 sklearn: pickled_model: model.pkl serialization_format: cloudpickle sklearn_version: 0.19.1 run_id: 23ff99150d52426ba7d083647ebf76fc utc_time_created: '2020-06-22 09:47:49.837802' # conda.yaml (环境依赖描述) channels: - defaults - conda-forge dependencies: - python=3.6.10 - scikit-learn=0.19.1 - pip - pip: - mlflow - cloudpickle==1.4.1 name: mlflow-env # model.pkl (模型构件,加载方式由Mlmodel文件中flavors.python_function.loader_module确定)
2.3 mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.4
-
conda.yaml
name: tutorial channels: - defaults dependencies: - numpy>=1.14.3 - pandas>=1.0.0 - scikit-learn=0.19.1 - pip: - mlflow -
效果同上类似
1.创建虚拟环境 & 安装依赖 Creating conda environment mlflow-**** 2.Running command : (激活虚拟环境 > 运行训练py) /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow--**** 1>&2 && python train.py 0.4 0.1
# 运行训练代码,输出mlflow run-id
$ python examples/sklearn_logistic_regression/train.py
Score: 0.666
Model saved in run <run-id>
# 指定model-uri,启动mlflow rest服务
$ mlflow models serve --model-uri runs:/<run-id>/model
// 1.创建虚拟环境 & 安装依赖 Creating conda environment mlflow-****
// 2.Running command : (激活虚拟环境 > gunicorn启动rest服务)
// INFO mlflow.pyfunc.backend: === Running command 'source /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-255627e9113d091dfd14deabc838d1a982b4883f 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
# 模型预测RestAPI
$ curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{
"columns": ["a", "b", "c"],
"data": [[1, 2, 3], [4, 5, 6]]
}'
[1, 0] // 结果1.保存模型(artifact)
# 参考:https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_wine/train.py
mlflow.sklearn.log_model(lr, "model")
# 通过`mlflow run examples/sklearn_elasticnet_wine`调用`mlflow.sklearn.log_model`会生成模型包.
**/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model (其中包含模型元数据描述文件MLmodel 和 模型文件model.pkl)
- `MLmodel`元数据文件是告诉MLflow如何加载模型
- `model.pkl`文件是训练好的序列化的线性回归模型# MLmodel
artifact_path: model
flavors:
python_function:
data: model.pkl # 模型构建
env: conda.yaml # 依赖描述
loader_module: mlflow.sklearn
python_version: 3.6.10
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.19.1
run_id: 23ff99150d52426ba7d083647ebf76fc
utc_time_created: '2020-06-22 09:47:49.837802'# conda.yaml
channels:
- defaults
- conda-forge
dependencies:
- python=3.6.10
- scikit-learn=0.19.1
- pip
- pip:
- mlflow
- cloudpickle==1.4.1
name: mlflow-env2.部署成REST服务(每个模型一个serve服务)
$ mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/9806fbb1d1944a8f9761356b76de42ad/artifacts/model -p 1234
2020/06/22 15:47:00 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
# 创建conda虚拟环境
2020/06/22 15:47:05 INFO mlflow.projects: === Creating conda environment mlflow-e60aad19f9c07786c0e7c02eae8e55269edb64dd ===
WARNING: The conda.compat module is deprecated and will be removed in a future release.
WARNING: The conda.compat module is deprecated and will be removed in a future release.
Collecting package metadata: done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.6.11
latest version: 4.8.3
Please update conda by running
$ conda update -n base -c defaults conda
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
# 安装pip依赖(Using cached使用cache)
Ran pip subprocess with arguments:
['/Users/liuxiang/anaconda3/envs/mlflow-e60aad19f9c07786c0e7c02eae8e55269edb64dd/bin/python', '-m', 'pip', 'install', '-r', '/Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/9806fbb1d1944a8f9761356b76de42ad/artifacts/model/condaenv.orjbg04k.requirements.txt']
Pip subprocess output:
Collecting mlflow
Using cached mlflow-1.9.0-py3-none-any.whl (11.9 MB)
Processing /Users/liuxiang/Library/Caches/pip/wheels/69/38/7a/072b5863ca334d012821a287fd1d066cea33abdcda3ef2f878/querystring_parser-1.2.4-py3-none-any.whl
...
Installing collected packages: pytz, six, python-dateutil, pandas, sqlparse, gorilla, querystring-parser, protobuf, chardet, urllib3, idna, requests, azure-core, oauthlib, requests-oauthlib, isodate, msrest, pycparser, cffi, cryptography, azure-storage-blob, click, smmap, gitdb, gitpython, websocket-client, docker, tabulate, databricks-cli, Werkzeug, MarkupSafe, Jinja2, itsdangerous, Flask, prometheus-client, prometheus-flask-exporter, pyyaml, gunicorn, sqlalchemy, cloudpickle, python-editor, Mako, alembic, entrypoints, mlflow
Successfully installed Flask-1.1.2 Jinja2-2.11.2 Mako-1.1.3 MarkupSafe-1.1.1 Werkzeug-1.0.1 alembic-1.4.2 azure-core-1.6.0 azure-storage-blob-12.3.2 cffi-1.14.0 chardet-3.0.4 click-7.1.2 cloudpickle-1.4.1 cryptography-2.9.2 databricks-cli-0.11.0 docker-4.2.1 entrypoints-0.3 gitdb-4.0.5 gitpython-3.1.3 gorilla-0.3.0 gunicorn-20.0.4 idna-2.9 isodate-0.6.0 itsdangerous-1.1.0 mlflow-1.9.0 msrest-0.6.16 oauthlib-3.1.0 pandas-1.0.5 prometheus-client-0.8.0 prometheus-flask-exporter-0.14.1 protobuf-3.12.2 pycparser-2.20 python-dateutil-2.8.1 python-editor-1.0.4 pytz-2020.1 pyyaml-5.3.1 querystring-parser-1.2.4 requests-2.24.0 requests-oauthlib-1.3.0 six-1.15.0 smmap-3.0.4 sqlalchemy-1.3.13 sqlparse-0.3.1 tabulate-0.8.7 urllib3-1.25.9 websocket-client-0.57.0
#
# To activate this environment, use
#
# $ conda activate mlflow-e60aad19f9c07786c0e7c02eae8e55269edb64dd
#
# To deactivate an active environment, use
#
# $ conda deactivate
# 使用conda虚拟环境 & gunicorn启动Rest服务
2020/06/22 15:49:20 INFO mlflow.pyfunc.backend: === Running command 'source /Users/liuxiang/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-e60aad19f9c07786c0e7c02eae8e55269edb64dd 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:1234 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2020-06-22 15:49:22 +0800] [34419] [INFO] Starting gunicorn 20.0.4
[2020-06-22 15:49:22 +0800] [34419] [INFO] Listening at: http://127.0.0.1:1234 (34419)
[2020-06-22 15:49:22 +0800] [34419] [INFO] Using worker: sync
[2020-06-22 15:49:22 +0800] [34433] [INFO] Booting worker with pid: 34433- Rest http预测
curl -X POST -H "Content-Type:application/json; format=pandas-split" \
--data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' \
http://127.0.0.1:1234/invocations
响应输出: [3.943440394399964]%MLflow系列1:MLflow入门教程(Python) - ZH奶酪 - 博客园
如果不需要conda,则需要保障运行的环境已经安装了必要的依赖,在命令上加上--no-conda即可
$ mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/9806fbb1d1944a8f9761356b76de42ad/artifacts/model -p 1234 --no-conda
2020/06/22 16:06:09 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
# 标记--no-conda后将不再使用虚拟环境,而是直接依赖当前宿主环境
# 直接启动Rest服务
2020/06/22 16:06:09 INFO mlflow.pyfunc.backend: === Running command 'gunicorn --timeout=60 -b 127.0.0.1:1234 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2020-06-22 16:06:09 +0800] [36346] [INFO] Starting gunicorn 20.0.4
[2020-06-22 16:06:09 +0800] [36346] [INFO] Listening at: http://127.0.0.1:1234 (36346)
[2020-06-22 16:06:09 +0800] [36346] [INFO] Using worker: sync
[2020-06-22 16:06:09 +0800] [36349] [INFO] Booting worker with pid: 36349
/Users/liuxiang/anaconda3/envs/mlflow_demo/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.linear_model.coordinate_descent module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
warnings.warn(message, FutureWarning)
/Users/liuxiang/anaconda3/envs/mlflow_demo/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator ElasticNet from version 0.19.1 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)-
缺少依赖情况
$ pip freeze |grep scikit-learn scikit-learn==0.23.1 $ pip uninstall scikit-learn # 卸载宿主依赖,观察--no-conda后是否会安装依赖到宿主环境(结果:不会,而是直接执行出错)
- 卸载宿主依赖,观察--no-conda后是否会安装依赖到宿主环境(结果:不会,而是直接执行出错)
mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/9806fbb1d1944a8f9761356b76de42ad/artifacts/model -p 1234 --no-conda 2020/06/22 16:12:03 INFO mlflow.models.cli: Selected backend for flavor 'python_function' 2020/06/22 16:12:03 INFO mlflow.pyfunc.backend: === Running command 'gunicorn --timeout=60 -b 127.0.0.1:1234 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app' [2020-06-22 16:12:03 +0800] [36790] [INFO] Starting gunicorn 20.0.4 [2020-06-22 16:12:03 +0800] [36790] [INFO] Listening at: http://127.0.0.1:1234 (36790) [2020-06-22 16:12:03 +0800] [36790] [INFO] Using worker: sync [2020-06-22 16:12:03 +0800] [36794] [INFO] Booting worker with pid: 36794 [2020-06-22 16:12:05 +0800] [36794] [ERROR] Exception in worker process Traceback (most recent call last): ... File "/Users/liuxiang/anaconda3/envs/mlflow_demo/lib/python3.7/site-packages/mlflow/pyfunc/__init__.py", line 466, in load_model model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path) File "/Users/liuxiang/anaconda3/envs/mlflow_demo/lib/python3.7/site-packages/mlflow/sklearn.py", line 280, in _load_pyfunc return _load_model_from_local_file(path) File "/Users/liuxiang/anaconda3/envs/mlflow_demo/lib/python3.7/site-packages/mlflow/sklearn.py", line 271, in _load_model_from_local_file return pickle.load(f) ModuleNotFoundError: No module named 'sklearn' [2020-06-22 16:12:05 +0800] [36794] [INFO] Worker exiting (pid: 36794) [2020-06-22 16:12:05 +0800] [36790] [INFO] Shutting down: Master [2020-06-22 16:12:05 +0800] [36790] [INFO] Reason: Worker failed to boot.
详见: https://www.mlflow.org/docs/latest/models.html#fields-in-the-mlmodel-format
(建模人员需依据此文档,编写训练文件train.py,其中mlflow.pyfunc.save_model来输出mlflow格式的模型包)
Python Function (python_function)
R Function (crate)
H2O (h2o)
Keras (keras)
MLeap (mleap)
PyTorch (pytorch)
Scikit-learn (sklearn)
Spark MLlib (spark)
TensorFlow (tensorflow)
ONNX (onnx)
MXNet Gluon (gluon)
XGBoost (xgboost)
LightGBM (lightgbm)
https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-filesystem-format
# 格式
./dst-path/
./MLmodel: configuration
<code>: code packaged with the model (specified in the MLmodel file)
<data>: data packaged with the model (specified in the MLmodel file)
<env>: Conda environment definition (specified in the MLmodel file)
# 示例
├── MLmodel
├── code
│ ├── sklearn_iris.py
│
├── data
│ └── model.pkl
└── mlflow_env.yml
### cat MLmodel
python_function:
code: code
data: data/model.pkl
loader_module: mlflow.sklearn
env: mlflow_env.yml
main: sklearn_iris详见: https://www.mlflow.org/docs/latest/cli.html#mlflow-models-predict
mlflow models predict [OPTIONS]
https://mlflow.org/docs/latest/models.html#custom-python-models
- pyfunc.py 模型保存
> pyfunc.py << EOF
# Define the model class
class AddN(mlflow.pyfunc.PythonModel):
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
# Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model) # 会输出模型文件`add_n_model`
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)
# Evaluate the model
import pandas as pd
model_input = pd.DataFrame([range(10)])
model_output = loaded_model.predict(model_input)
assert model_output.equals(pd.DataFrame([range(5, 15)]))
EOF
# 部署
mlflow models serve -m add_n_model -p 5000 --no-conda- 输出模型(add_n_model)
├── add_n_model
│ ├── MLmodel
│ ├── conda.yaml
│ └── python_model.pkl
# MLmodel
flavors:
python_function:
cloudpickle_version: 1.4.1
env: conda.yaml
loader_module: mlflow.pyfunc.model
python_model: python_model.pkl
python_version: 3.7.7
utc_time_created: '2020-06-24 09:19:20.559331'
# conda.yaml
channels:
- defaults
- conda-forge
dependencies:
- python=3.7.7
- pip
- pip:
- mlflow
- cloudpickle==1.4.1
name: mlflow-env- 测试
# split-oriented
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{
"columns": ["a", "b", "c"],
"data": [[1, 2, 3], [4, 5, 6]]
}'
# record-oriented (fine for vector rows, loses ordering for JSON records)
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
{"a": 1,"b": 2,"c": 3},
{"a": 4,"b": 5,"c": 6}
]'问题: 自定义的python模型一定要将函数通过
mlflow.pyfunc.save_model输出为*.pkl吗? > 目前理解是这样的. 客户存量模型需要运行在mlflow服务中,需要将模型的py代码mlflow.pyfunc.save_model为*.pkl
https://mlflow.org/docs/latest/models.html#deployment-to-custom-targets
除了内置的部署工具外,MLflow还提供了可插入的 mlflow.deployments Python API和 mlflow部署CLI,用于将模型部署到自定义目标和环境。要部署到自定义目标,您必须首先安装适当的第三方Python插件。请在此处查看社区维护的已知插件列表 。
能否通过API部署一个模型Rest服务?
> 模型的部署需依赖conda虚拟环境(或完善的宿主环境),过程中耗时较长(1~3+分钟).另外接收请求本身就需要服务环境,创建的环境一般不是当前服务,而是新的模型服务.在多次创建模型服务时,如何控制资源都是问题.目前mlflow暂不支持RestApi部署模型服务.
> mlflow服务本质是模型训练&比对的服务(端口:5000).额外支持服务的部署运行(端口:<自定义>).
模型服务如何初始化?
> 容器环境: 在Dockerfile中描述`ENTRYPOINT ["./bin/start.sh"]`.启动的具体模型通过环境变量传入容器,模型文件通过挂盘的方式功共享.
> 通用集群: mlflow不支持一个端口部署多个模型.多端口方式看起来就正常. 暂时认为不适合集群化部署.(建模人员提供推理代码:可以满足集群部署)
> 套壳Seldon: seldon本身类似于mlflow,补充了模型部署相关特性.部署策略依然要考虑`容器环境`还是`通用集群`部署.
> 模型服务命令: mlflow models serve -m /Users/.../mlruns/0/9806fbb1d1944a8f9761356b76de42ad/artifacts/model -p 1234
MLflow提供了哪些API能力?
> experiments(实验); runs(模型训练); 资源; 注册模型&模型版本; 标签; 日志&历史;
### 启动mlflow服务
mlflow ui (同级mlruns文件夹执行,一般在工程根目录)
### experiments(实验)
- 建立实验 2.0/mlflow/experiments/create
- 列出实验 2.0/mlflow/experiments/list
- 进行实验 2.0/mlflow/experiments/get
- 按名称获取实验 2.0/mlflow/experiments/get-by-name
- 删除实验 2.0/mlflow/experiments/delete
- 恢复实验 2.0/mlflow/experiments/update
- 更新实验 2.0/mlflow/experiments/update
### 运行(模型训练)
- 创建运行 2.0/mlflow/runs/create
- 删除运行 2.0/mlflow/runs/delete
- 恢复运行 2.0/mlflow/runs/restore
- 获取运行 2.0/mlflow/runs/get
- 对数指标 2.0/mlflow/runs/log-metric
- 日志批次 2.0/mlflow/runs/log-batch
- 日志模型 2.0/mlflow/runs/log-model
- 搜索运行 2.0/mlflow/runs/search
- 更新运行 2.0/mlflow/runs/update
### 资源
- 列出Artifact 2.0/mlflow/artifacts/list
### 注册模型
- 创建RegisteredModel 2.0/preview/mlflow/registered-models/create
- 获取注册模型 2.0/preview/mlflow/registered-models/get
- 重命名RegisteredModel 2.0/preview/mlflow/registered-models/rename
- 更新注册模型 2.0/preview/mlflow/registered-models/update
- 删除RegisteredModel 2.0/preview/mlflow/registered-models/delete
- 列出注册模型 2.0/preview/mlflow/registered-models/list
- 搜索注册模型 2.0/preview/mlflow/registered-models/search
- 设置注册模型标签 2.0/preview/mlflow/registered-models/set-tag
- 获取最新的模型版本 2.0/preview/mlflow/registered-models/get-latest-versions
### 模型版本
- 创建模型版本 2.0/preview/mlflow/model-versions/create
- 获取ModelVersion 2.0/preview/mlflow/model-versions/get
- 更新ModelVersion 2.0/preview/mlflow/model-versions/update
- 删除模型版本 2.0/preview/mlflow/model-versions/delete
- 搜索模型版本 2.0/preview/mlflow/model-versions/search
- 获取ModelVersion工件的下载URI 2.0/preview/mlflow/model-versions/get-download-uri
- 过渡模型版本阶段 2.0/preview/mlflow/model-versions/transition-stage
### 标签
- 设置实验标签 2.0/mlflow/experiments/set-experiment-tag
- 设置标签 2.0/mlflow/runs/set-tag
- 删除标签 2.0/mlflow/runs/delete-tag
- 删除注册的型号标签 2.0/preview/mlflow/registered-models/delete-tag
- 设置型号版本标签 2.0/preview/mlflow/model-versions/set-tag
- 删除型号版本标签 2.0/preview/mlflow/model-versions/delete-tag
### 日志&历史
- 日志参数 2.0/mlflow/metrics/get-history
- 获取指标历史记录 2.0/mlflow/metrics/get-history
详见: https://github.com/mlflow/mlflow/tree/master/examples/docker https://mlflow.org/docs/latest/projects.html#run-an-mlflow-project-on-kubernetes-experimental
-
build.sh 使用容器
准备模型的运行环境(等价于conda.yaml)docker build -t mlflow-docker-example -f Dockerfile . -
Dockerfile
FROM continuumio/miniconda:4.5.4 RUN pip install mlflow>=1.0 \ && pip install azure-storage-blob==12.3.0 \ && pip install numpy==1.14.3 \ && pip install scipy \ && pip install pandas==0.22.0 \ && pip install scikit-learn==0.19.1 \ && pip install cloudpickle
-
MLproject
name: docker-example docker_env: image: mlflow-docker-example # 环境标记不是conda.yaml,而是image:** entry_points: main: parameters: alpha: float l1_ratio: {type: float, default: 0.1} command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"
-
kubernetes_config.json
{ "kube-context": "docker-for-desktop", "kube-job-template-path": "examples/docker/kubernetes_job_template.yaml", "repository-uri": "username/mlflow-kubernetes-example" } -
kubernetes_job_template.yaml
apiVersion: batch/v1 kind: Job metadata: name: "{replaced with MLflow Project name}" namespace: mlflow spec: ttlSecondsAfterFinished: 100 backoffLimit: 0 template: spec: containers: - name: "{replaced with MLflow Project name}" image: "{replaced with URI of Docker image created during Project execution}" command: ["{replaced with MLflow Project entry point command}"] resources: limits: memory: 512Mi requests: memory: 256Mi restartPolicy: Never
-
-
训练模型(会动态运行docker) & 输出模型包
$ mlflow run examples/docker -P alpha=0.5 # 编译docker image 2020/06/22 16:45:00 INFO mlflow.projects: === Building docker image docker-example === 2020/06/22 16:45:24 INFO mlflow.projects: === Created directory /var/folders/0q/89pc1xbn38bcg4zff6b3qnpc0000gn/T/tmptuxdfz2j for downloading remote URIs passed to arguments of type 'path' === # 运行docker 2020/06/22 16:45:24 INFO mlflow.projects: === Running command 'docker run --rm -v /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns:/mlflow/tmp/mlruns -v /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22/artifacts:/Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22/artifacts -e MLFLOW_RUN_ID=7b2a60fb34fc4244b72200f53fd1ed22 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 docker-example:latest python train.py --alpha 0.5 --l1-ratio 0.1' in run with ID '7b2a60fb34fc4244b72200f53fd1ed22' === /opt/conda/lib/python2.7/site-packages/mlflow/__init__.py:55: DeprecationWarning: MLflow support for Python 2 is deprecated and will be dropped in a future release. At that point, existing Python 2 workflows that use MLflow will continue to work without modification, but Python 2 users will no longer get access to the latest MLflow features and bugfixes. We recommend that you upgrade to Python 3 - see https://docs.python.org/3/howto/pyporting.html for a migration guide. "for a migration guide.", DeprecationWarning) Elasticnet model (alpha=0.500000, l1_ratio=0.100000): RMSE: 0.794793101903653 MAE: 0.6189130834228139 R2: 0.18411668718221796 2020/06/22 16:45:56 INFO mlflow.projects: === Run (ID '7b2a60fb34fc4244b72200f53fd1ed22') succeeded ===
运行
mlflow run examples/docker将基于mlflow-docker-example该镜像构建一个新的Docker映像,其中还包含我们的项目代码。生成的image被标记为 git commit ID
mlflow-docker-example-<git-version>。生成映像后,MLflow使用在容器内执行默认(主)项目入口点docker run。-
生成模型包
/Users/***/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22 ├── artifacts │ └── model # 模型包 │ ├── MLmodel │ ├── conda.yaml │ └── model.pkl ├── meta.yaml ├── metrics │ ├── mae │ ├── r2 │ └── rmse ├── params │ ├── alpha │ └── l1_ratio └── tags ├── mlflow.docker.image.id ├── mlflow.docker.image.uri ├── mlflow.log-model.history ├── mlflow.project.backend ├── mlflow.project.entryPoint ├── mlflow.project.env ├── mlflow.source.name ├── mlflow.source.type └── mlflow.user
-
-
模型服务部署 (
并未动态启用容器) (应该有依据kubernetes_config.json运行到k8s环境的参数)mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22/artifacts/model -p 1234 > serve并不像mlflow run那要动态创建容器环境.而是去构建conda.yaml了. 标记--no-conda同样是使用宿主环境.都未构建容器环境. 日志: mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22/artifacts/model -p 1234 2020/06/22 16:56:17 INFO mlflow.models.cli: Selected backend for flavor 'python_function' 2020/06/22 16:56:21 INFO mlflow.projects: === Creating conda environment mlflow-e7388c79eb8fda88c4d7418cb68daab65b2e0839 === WARNING: The conda.compat module is deprecated and will be removed in a future release. WARNING: The conda.compat module is deprecated and will be removed in a future release. Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you. Collecting package metadata: done Solving environment: done
-
预测
curl -X POST -H "Content-Type:application/json; format=pandas-split" \ --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' \ http://127.0.0.1:1234/invocations 响应输出: [4.203048595214442]
-
mlflow models serve -m /Users/liuxiang/PycharmProjects/mlflow_demo/mlruns/0/7b2a60fb34fc4244b72200f53fd1ed22/artifacts/model -p 1234
> serve并不像mlflow run那要动态创建容器环境.而是去构建conda.yaml了. 标记--no-conda同样是使用宿主环境.都未构建容器环境.- 建议直接对接 k8s API.
挂载模型文件卷的方式启动容器.
# 部署
docker run \
-dp 8088:8088\
-v /Users/liuxiang/models:/home/admin/src \
-w="/home/admin/src" \
-e modelPath="sklearn_elasticnet_wine" \
registry.tongdun.me/ml/centos7.2-common-anaconda3-mlflow:latest \
/bin/bash -c "mlflow models serve -m \$modelPath -p 8088 --no-conda --host 0.0.0.0"
# 测试
curl -X POST -H "Content-Type:application/json; format=pandas-split" \
--data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' \
http://0.0.0.0:8088/invocationshttps://mlflow.org/docs/latest/cli.html#mlflow-models-build-docker
# 编译模型(some-run-uuid)到image
mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name"
# 启动容器(提供服务)
docker run -p 5001:8080 "my-image-name"
docker run -p 5001:8080 -e DISABLE_NGINX=true "my-image-name"参考: https://hub.docker.com/r/dzinsouhpe/mlflow/dockerfile 作用: mlflow的容器环境,可用于模型训练(MLProject)
-
Dockerfile
FROM bluedata/centos7:latest LABEL maintainer="[email protected]" RUN yum install -y epel-release RUN yum install -y htop vim wget net-tools git unzip zip python3 libgfortran libgomp RUN yum group install -y "Development Tools" RUN pip3 install mlflow RUN pip3 install boto3 RUN useradd mlflow RUN mkdir /opt/mlflow RUN mkdir /opt/mlflow/backend-store RUN mkdir /opt/mlflow/log RUN mkdir /opt/mlflow/bin COPY start.sh /opt/mlflow/bin/start.sh RUN chmod +x /opt/mlflow/bin/start.sh RUN chown -R mlflow:mlflow /opt/mlflow WORKDIR /opt/mlflow/ USER mlflow EXPOSE 5000 ENTRYPOINT ["./bin/start.sh"]
- start.sh (mlflow server 类似与mlflow ui)
#!/bin/bash export LC_ALL="en_US.UTF-8" mlflow server --backend-store-uri $MLFLOW_BACKEND_STORE --default-artifact-root $MLFLOW_ARTIFACT_ROOT --host 0.0.0.0
参考
MLflow使用方法 - 简书
MLFlow机器学习管理平台入门教程一览
MLflow-教程 - 知乎
你是如何管理机器学习实验的?-机器学习实验管理平台大盘点 - 知乎
MLflow Projects 模块解析_mlflow每次都需要创建虚拟环境