以下内容主要是翻译mlflow官方文档的一个教程。
4.教程和示例
4.1训练、服务和评估线性回归模型
地址:Tutorial — MLflow 2.4.1 documentation
本教程展示了如何使用MLflow端到端执行以下操作:
(1)训练线性回归模型
(2)将训练模型的代码打包为可重复使用和可复制的模型格式
(3)将模型部署到一个简单的HTTP服务器中,使您能够对预测进行评分
本教程使用的数据集将根据葡萄酒的“固定酸度”、“pH”、“残留糖”等定量特征来预测葡萄酒的质量。数据集来自UCI的机器学习库(数据集地址:UCI Machine Learning Repository)。
4.1.1 准备工作
(1)安装MLflow一级scikit-learn
可以通过两种方式安装:
- 安装mlflow带上scikit-learn依赖(extras)。
pip install mlflow[extras]
- 分别安装mlflow以及scikit-learn。
pip install mlflow
pip install scikit-learn
(2)安装conda
(3)克隆或下载mlflow库
git clone https://github.com/mlflow/mlflow
(4)cd到MLflow克隆中的examples目录中-我们将使用此工作目录运行教程。我们避免直接从我们的MLflow克隆中运行,因为这样做会导致教程从源代码使用MLflow,而不是您通过pypi安装的MLflow。
4.1.2 训练模型
首先,训练一个采用两个超参数的线性回归模型:alpha和l1_ratio。
该代码位于examples/sklearn_lasticnet_wine/train.py中,代码如下:
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
import os
import warnings
import sys
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn
import logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)
def eval_metrics(actual, pred):
rmse = np.sqrt(mean_squared_error(actual, pred))
mae = mean_absolute_error(actual, pred)
r2 = r2_score(actual, pred)
return rmse, mae, r2
if __name__ == "__main__":
warnings.filterwarnings("ignore")
np.random.seed(40)
# Read the wine-quality csv file from the URL
# csv_url = (
# "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-red.csv"
# # "./wine-quality.csv"
# )
try:
# data = pd.read_csv(csv_url, sep=";")
data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv' , sep=';')
except Exception as e:
logger.exception(
"Unable to download training & test CSV, check your internet connection. Error: %s", e
)
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)
print(train)
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]
alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
#
predictions = lr.predict(train_x)
signature = infer_signature(train_x, predictions)
tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme
# Model registry does not work with file store
if tracking_url_type_store != "file":
# Register the model
# There are other ways to use the Model Registry, which depends on the use case,
# please refer to the doc for more information:
# https://mlflow.org/docs/latest/model-registry.html#api-workflow
mlflow.sklearn.log_model(
lr, "model", registered_model_name="ElasticnetWineModel", signature=signature
)
else:
mlflow.sklearn.log_model(lr, "model", signature=signature)
这个示例使用熟悉的pandas、numpy和sklearn API来创建一个简单的机器学习模型。MLflow tracking APIs记录关于每个训练运行的信息,如用于训练模型的超参数alpha和l1_ratio,以及用于评估模型的度量,如均方根误差。该示例还以MLflow能够部署的格式序列化模型。
您可以使用默认超参数运行示例,如下所示:
# Make sure the current working directory is 'examples'
python sklearn_elasticnet_wine/train.py
通过将alpha和l1_ratio作为参数传递给train.py来尝试其他一些值:
# Make sure the current working directory is 'examples'
python sklearn_elasticnet_wine/train.py <alpha> <l1_ratio>
每次运行该示例时,MLflow都会将有关实验运行的信息记录在mlrun目录中。
注意:如果你想使用Jupyter笔记本版的train.py,请在examples/sklearn_lasticnet_wine/train.ipynb上试用教程。
4.1.3 比较模型
接下来,使用MLflow UI来比较您生成的模型。在与包含mlrun的目录相同的当前工作目录中运行:
mlflow ui
通过http://localhost:5000查看。
在这个页面上,你可以看到一个实验运行的列表,其中有可以用来比较模型的指标。
您可以使用搜索功能快速筛选出许多模型。例如,查询metrics.rmse<0.8返回均方根误差小于0.8的所有模型。对于更复杂的操作,您可以将此表下载为CSV,并使用您喜欢的数据挖掘软件对其进行分析。
4.1.4 conda环境下打包训练代码
现在您已经有了训练代码,您可以将其打包,以便其他数据科学家可以轻松地重用该模型,或者您可以远程训练,例如在Databricks上。
您可以通过使用MLflow项目约定来指定代码的依赖项和入口点来实现这一点。sklearn_elasticnet_wine/ML项目文件指定该项目的依赖项位于名为Conda.yaml的Conda环境文件中,并且有一个入口点,该入口点接受两个参数:alpha和l1_ratio。
name: tutorial
python_env: python_env.yaml
entry_points:
main:
parameters:
alpha: {type: float, default: 0.5}
l1_ratio: {type: float, default: 0.1}
command: "python train.py {alpha} {l1_ratio}"
sklearn_elasticnet_wine/conda.yaml
文件列举依赖项:
name: tutorial
channels:
- conda-forge
dependencies:
- python=3.8
- pip
- pip:
- scikit-learn==1.2.0
- mlflow>=1.0
- pandas
要运行此项目,请调用mlflow run sklearn_elasticnet_wine-P alpha=0.42(本地调用需要在后面加上--env-manager=local。此外,运行此命令时候,注意里面csv文件能读到,我在运行时候利用网址读不到,就直接下载后写了个本地路径放进去读取)。运行此命令后,MLflow将在新的Conda环境中使用Conda.yaml中指定的依赖项运行训练代码。
如果存储库的根目录中有MLproject文件,您也可以直接从GitHub运行项目。本教程在https://github.com/mlflow/mlflow-example有重复的代码,你可以使用mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0运行。
4.1.5 使用pip_requirements和extra_pip_requiremements指定pip要求
"""
This example demonstrates how to specify pip requirements using `pip_requirements` and
`extra_pip_requirements` when logging a model via `mlflow.*.log_model`.
"""
import tempfile
from packaging.version import Version
import sklearn
from sklearn.datasets import load_iris
import xgboost as xgb
import mlflow
from mlflow.artifacts import download_artifacts
from mlflow.models.signature import infer_signature
def read_lines(path):
with open(path) as f:
return f.read().splitlines()
def get_pip_requirements(run_id, artifact_path, return_constraints=False):
req_path = download_artifacts(run_id=run_id, artifact_path=f"{artifact_path}/requirements.txt")
reqs = read_lines(req_path)
if return_constraints:
con_path = download_artifacts(
run_id=run_id, artifact_path=f"{artifact_path}/constraints.txt"
)
cons = read_lines(con_path)
return set(reqs), set(cons)
return set(reqs)
def main():
iris = load_iris()
dtrain = xgb.DMatrix(iris.data, iris.target)
model = xgb.train({}, dtrain)
predictions = model.predict(dtrain)
signature = infer_signature(dtrain.get_data(), predictions)
xgb_req = f"xgboost=={xgb.__version__}"
sklearn_req = f"scikit-learn=={sklearn.__version__}"
with mlflow.start_run() as run:
run_id = run.info.run_id
# Get the expected mlflow version
mlflow_version_raw = Version(mlflow.__version__)
mlflow_version = f"mlflow=={mlflow_version_raw.major}.{mlflow_version_raw.minor}"
# Default (both `pip_requirements` and `extra_pip_requirements` are unspecified)
artifact_path = "default"
mlflow.xgboost.log_model(model, artifact_path, signature=signature)
pip_reqs = get_pip_requirements(run_id, artifact_path)
assert pip_reqs.issuperset([mlflow_version, xgb_req]), pip_reqs
# Overwrite the default set of pip requirements using `pip_requirements`
artifact_path = "pip_requirements"
mlflow.xgboost.log_model(
model, artifact_path, pip_requirements=[sklearn_req], signature=signature
)
pip_reqs = get_pip_requirements(run_id, artifact_path)
assert pip_reqs == {mlflow_version, sklearn_req}, pip_reqs
# Add extra pip requirements on top of the default set of pip requirements
# using `extra_pip_requirements`
artifact_path = "extra_pip_requirements"
mlflow.xgboost.log_model(
model, artifact_path, extra_pip_requirements=[sklearn_req], signature=signature
)
pip_reqs = get_pip_requirements(run_id, artifact_path)
assert pip_reqs.issuperset([mlflow_version, xgb_req, sklearn_req]), pip_reqs
# Specify pip requirements using a requirements file
with tempfile.NamedTemporaryFile("w", suffix=".requirements.txt") as f:
f.write(sklearn_req)
f.flush()
# Path to a pip requirements file
artifact_path = "requirements_file_path"
mlflow.xgboost.log_model(
model, artifact_path, pip_requirements=f.name, signature=signature
)
pip_reqs = get_pip_requirements(run_id, artifact_path)
assert pip_reqs == {mlflow_version, sklearn_req}, pip_reqs
# List of pip requirement strings
artifact_path = "requirements_file_list"
mlflow.xgboost.log_model(
model,
artifact_path,
pip_requirements=[xgb_req, f"-r {f.name}"],
signature=signature,
)
pip_reqs = get_pip_requirements(run_id, artifact_path)
assert pip_reqs == {mlflow_version, xgb_req, sklearn_req}, pip_reqs
# Using a constraints file
with tempfile.NamedTemporaryFile("w", suffix=".constraints.txt") as f:
f.write(sklearn_req)
f.flush()
artifact_path = "constraints_file"
mlflow.xgboost.log_model(
model,
artifact_path,
pip_requirements=[xgb_req, f"-c {f.name}"],
signature=signature,
)
pip_reqs, pip_cons = get_pip_requirements(
run_id, artifact_path, return_constraints=True
)
assert pip_reqs == {mlflow_version, xgb_req, "-c constraints.txt"}, pip_reqs
assert pip_cons == {sklearn_req}, pip_cons
if __name__ == "__main__":
main()
4.1.6 模型服务
现在,您已经使用MLproject约定打包了模型,并确定了最佳模型,现在开始使用MLflow Model部署模型。MLflow Model是打包机器学习模型的标准格式,可以在各种下游工具中使用,例如,通过REST API提供实时服务或Apache Spark上的批推理。
在示例训练代码中,在训练线性回归模型之后,MLflow中的函数将模型保存为运行中的工件。
mlflow.sklearn.log_model(lr, "model")
要查看此工件,您可以再次使用UI。当你点击实验运行列表中的一个日期时,你会看到这个页面。
在底部,您可以看到通过调用mlflow.sklearn.log_model,在/Users/mlflow/mlflow prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model中生成了两个文件。第一个文件是MLmodel,它是一个元数据文件,告诉MLflow如何加载模型。第二个文件model.pkl是您训练的线性回归模型的序列化版本。
在本例中,您可以将此MLmodel格式与MLflow一起使用,以部署可以提供预测服务的本地REST服务器。
要部署服务器,请运行(将路径替换为模型的实际路径,本地服务的话记得加--env-manager local):
mlflow models serve -m /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234
注意:用于创建模型的Python版本必须与运行mlflow模型的版本相同,否则会报编码错误。
一旦部署了服务器,就可以向它传递一些示例数据并查看预测。以下示例使用curl向模型服务器发送带有拆分方向的JSON序列化pandas DataFrame。有关模型服务器接受的输入数据格式的更多信息,请参阅MLflow部署工具文档(MLflow Models — MLflow 2.4.1 documentation)。
# On Linux and macOS
curl -X POST -H "Content-Type:application/json" --data '{"dataframe_split": {"columns":["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"],"data":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}' http://127.0.0.1:1234/invocations
# On Windows
curl -X POST -H "Content-Type:application/json" --data "{\"dataframe_split\": {\"columns\":[\"fixed acidity\", \"volatile acidity\", \"citric acid\", \"residual sugar\", \"chlorides\", \"free sulfur dioxide\", \"total sulfur dioxide\", \"density\", \"pH\", \"sulphates\", \"alcohol\"],\"data\":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}" http://127.0.0.1:1234/invocations
相应结果应该与下面相似:
[6.379428821398614]
4.1.7 将模型部署到Seldon Core或KServe
在对我们的模型进行了训练和测试之后,我们现在准备将其部署到生产环境中。MLflow允许您使用MLServer为您的模型提供服务,MLServer已经在Kubernetes原生框架中用作核心Python推理服务器,包括Seldon core和KServe(以前称为KFServing)。因此,我们可以利用这种支持来构建与这些框架兼容的Docker镜像。
注意:这是一个可选步骤,目前仅适用于Python模型。这一步还需要一些基本的Kubernetes知识,包括熟悉kubectl。
要构建包含我们的模型的Docker映像,我们可以使用mlflow models build-docker子命令和--enable -mlserver标志。例如,要构建一个名为my-docker-image的镜像,我们可以执行以下操作:
mlflow models build-docker \
-m /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model \
-n my-docker-image \
--enable-mlserver
一旦我们构建了我们的镜像,下一步就是将其部署到我们的集群中。一种方法是通过kubectl CLI应用相应的Kubernetes命令:
kubectl apply -f my-manifest.yaml