mlflow案例

一个老丁头

已于 2023-06-26 17:21:02 修改

阅读量660

点赞数

文章标签：机器学习人工智能 python

于 2023-06-25 16:10:32 首次发布

本文链接：https://blog.csdn.net/weixin_43881931/article/details/131377744

版权

以下内容主要是翻译mlflow官方文档的一个教程。

4.教程和示例

4.1训练、服务和评估线性回归模型

地址：Tutorial — MLflow 2.4.1 documentation

本教程展示了如何使用MLflow端到端执行以下操作：

（1）训练线性回归模型

（2）将训练模型的代码打包为可重复使用和可复制的模型格式

（3）将模型部署到一个简单的HTTP服务器中，使您能够对预测进行评分

本教程使用的数据集将根据葡萄酒的“固定酸度”、“pH”、“残留糖”等定量特征来预测葡萄酒的质量。数据集来自UCI的机器学习库（数据集地址：UCI Machine Learning Repository）。

4.1.1 准备工作

（1）安装MLflow一级scikit-learn

可以通过两种方式安装：

安装mlflow带上scikit-learn依赖(extras)。

pip install mlflow[extras]

分别安装mlflow以及scikit-learn。

pip install mlflow
pip install scikit-learn

（2）安装conda

（3）克隆或下载mlflow库

git clone https://github.com/mlflow/mlflow

（4）cd到MLflow克隆中的examples目录中-我们将使用此工作目录运行教程。我们避免直接从我们的MLflow克隆中运行，因为这样做会导致教程从源代码使用MLflow，而不是您通过pypi安装的MLflow。

4.1.2 训练模型

首先，训练一个采用两个超参数的线性回归模型：alpha和l1_ratio。

该代码位于examples/sklearn_lasticnet_wine/train.py中，代码如下：

# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow
from mlflow.models import infer_signature
import mlflow.sklearn

import logging

logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)


def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    # csv_url = (
    #     "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-red.csv"
    #     # "./wine-quality.csv"
    # )
    try:
        # data = pd.read_csv(csv_url, sep=";")
        data = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv' , sep=';')
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e
        )

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)
    print(train)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        #
        predictions = lr.predict(train_x)
        signature = infer_signature(train_x, predictions)

        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

        # Model registry does not work with file store
        if tracking_url_type_store != "file":
            # Register the model
            # There are other ways to use the Model Registry, which depends on the use case,
            # please refer to the doc for more information:
            # https://mlflow.org/docs/latest/model-registry.html#api-workflow
            mlflow.sklearn.log_model(
                lr, "model", registered_model_name="ElasticnetWineModel", signature=signature
            )
        else:
            mlflow.sklearn.log_model(lr, "model", signature=signature)

这个示例使用熟悉的pandas、numpy和sklearn API来创建一个简单的机器学习模型。MLflow tracking APIs记录关于每个训练运行的信息，如用于训练模型的超参数alpha和l1_ratio，以及用于评估模型的度量，如均方根误差。该示例还以MLflow能够部署的格式序列化模型。

您可以使用默认超参数运行示例，如下所示：

# Make sure the current working directory is 'examples'
python sklearn_elasticnet_wine/train.py

通过将alpha和l1_ratio作为参数传递给train.py来尝试其他一些值：

# Make sure the current working directory is 'examples'
python sklearn_elasticnet_wine/train.py <alpha> <l1_ratio>

每次运行该示例时，MLflow都会将有关实验运行的信息记录在mlrun目录中。

注意：如果你想使用Jupyter笔记本版的train.py，请在examples/sklearn_lasticnet_wine/train.ipynb上试用教程。

4.1.3 比较模型

接下来，使用MLflow UI来比较您生成的模型。在与包含mlrun的目录相同的当前工作目录中运行：

mlflow ui

通过http://localhost:5000查看。

在这个页面上，你可以看到一个实验运行的列表，其中有可以用来比较模型的指标。

您可以使用搜索功能快速筛选出许多模型。例如，查询metrics.rmse<0.8返回均方根误差小于0.8的所有模型。对于更复杂的操作，您可以将此表下载为CSV，并使用您喜欢的数据挖掘软件对其进行分析。

4.1.4 conda环境下打包训练代码

现在您已经有了训练代码，您可以将其打包，以便其他数据科学家可以轻松地重用该模型，或者您可以远程训练，例如在Databricks上。

您可以通过使用MLflow项目约定来指定代码的依赖项和入口点来实现这一点。sklearn_elasticnet_wine/ML项目文件指定该项目的依赖项位于名为Conda.yaml的Conda环境文件中，并且有一个入口点，该入口点接受两个参数：alpha和l1_ratio。

name: tutorial

python_env: python_env.yaml

entry_points:
  main:
    parameters:
      alpha: {type: float, default: 0.5}
      l1_ratio: {type: float, default: 0.1}
    command: "python train.py {alpha} {l1_ratio}"

sklearn_elasticnet_wine/conda.yaml 文件列举依赖项：

name: tutorial
channels:
  - conda-forge
dependencies:
  - python=3.8
  - pip
  - pip:
      - scikit-learn==1.2.0
      - mlflow>=1.0
      - pandas

要运行此项目，请调用mlflow run sklearn_elasticnet_wine-P alpha=0.42(本地调用需要在后面加上--env-manager=local。此外，运行此命令时候，注意里面csv文件能读到，我在运行时候利用网址读不到，就直接下载后写了个本地路径放进去读取)。运行此命令后，MLflow将在新的Conda环境中使用Conda.yaml中指定的依赖项运行训练代码。

如果存储库的根目录中有MLproject文件，您也可以直接从GitHub运行项目。本教程在https://github.com/mlflow/mlflow-example有重复的代码，你可以使用mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0运行。

4.1.5 使用pip_requirements和extra_pip_requiremements指定pip要求

"""
This example demonstrates how to specify pip requirements using `pip_requirements` and
`extra_pip_requirements` when logging a model via `mlflow.*.log_model`.
"""

import tempfile
from packaging.version import Version

import sklearn
from sklearn.datasets import load_iris
import xgboost as xgb
import mlflow
from mlflow.artifacts import download_artifacts
from mlflow.models.signature import infer_signature


def read_lines(path):
    with open(path) as f:
        return f.read().splitlines()


def get_pip_requirements(run_id, artifact_path, return_constraints=False):
    req_path = download_artifacts(run_id=run_id, artifact_path=f"{artifact_path}/requirements.txt")
    reqs = read_lines(req_path)

    if return_constraints:
        con_path = download_artifacts(
            run_id=run_id, artifact_path=f"{artifact_path}/constraints.txt"
        )
        cons = read_lines(con_path)
        return set(reqs), set(cons)

    return set(reqs)


def main():
    iris = load_iris()
    dtrain = xgb.DMatrix(iris.data, iris.target)
    model = xgb.train({}, dtrain)
    predictions = model.predict(dtrain)
    signature = infer_signature(dtrain.get_data(), predictions)

    xgb_req = f"xgboost=={xgb.__version__}"
    sklearn_req = f"scikit-learn=={sklearn.__version__}"

    with mlflow.start_run() as run:
        run_id = run.info.run_id

        # Get the expected mlflow version
        mlflow_version_raw = Version(mlflow.__version__)
        mlflow_version = f"mlflow=={mlflow_version_raw.major}.{mlflow_version_raw.minor}"

        # Default (both `pip_requirements` and `extra_pip_requirements` are unspecified)
        artifact_path = "default"
        mlflow.xgboost.log_model(model, artifact_path, signature=signature)
        pip_reqs = get_pip_requirements(run_id, artifact_path)
        assert pip_reqs.issuperset([mlflow_version, xgb_req]), pip_reqs

        # Overwrite the default set of pip requirements using `pip_requirements`
        artifact_path = "pip_requirements"
        mlflow.xgboost.log_model(
            model, artifact_path, pip_requirements=[sklearn_req], signature=signature
        )
        pip_reqs = get_pip_requirements(run_id, artifact_path)
        assert pip_reqs == {mlflow_version, sklearn_req}, pip_reqs

        # Add extra pip requirements on top of the default set of pip requirements
        # using `extra_pip_requirements`
        artifact_path = "extra_pip_requirements"
        mlflow.xgboost.log_model(
            model, artifact_path, extra_pip_requirements=[sklearn_req], signature=signature
        )
        pip_reqs = get_pip_requirements(run_id, artifact_path)
        assert pip_reqs.issuperset([mlflow_version, xgb_req, sklearn_req]), pip_reqs

        # Specify pip requirements using a requirements file
        with tempfile.NamedTemporaryFile("w", suffix=".requirements.txt") as f:
            f.write(sklearn_req)
            f.flush()

            # Path to a pip requirements file
            artifact_path = "requirements_file_path"
            mlflow.xgboost.log_model(
                model, artifact_path, pip_requirements=f.name, signature=signature
            )
            pip_reqs = get_pip_requirements(run_id, artifact_path)
            assert pip_reqs == {mlflow_version, sklearn_req}, pip_reqs

            # List of pip requirement strings
            artifact_path = "requirements_file_list"
            mlflow.xgboost.log_model(
                model,
                artifact_path,
                pip_requirements=[xgb_req, f"-r {f.name}"],
                signature=signature,
            )
            pip_reqs = get_pip_requirements(run_id, artifact_path)
            assert pip_reqs == {mlflow_version, xgb_req, sklearn_req}, pip_reqs

        # Using a constraints file
        with tempfile.NamedTemporaryFile("w", suffix=".constraints.txt") as f:
            f.write(sklearn_req)
            f.flush()

            artifact_path = "constraints_file"
            mlflow.xgboost.log_model(
                model,
                artifact_path,
                pip_requirements=[xgb_req, f"-c {f.name}"],
                signature=signature,
            )
            pip_reqs, pip_cons = get_pip_requirements(
                run_id, artifact_path, return_constraints=True
            )
            assert pip_reqs == {mlflow_version, xgb_req, "-c constraints.txt"}, pip_reqs
            assert pip_cons == {sklearn_req}, pip_cons


if __name__ == "__main__":
    main()

4.1.6 模型服务

现在，您已经使用MLproject约定打包了模型，并确定了最佳模型，现在开始使用MLflow Model部署模型。MLflow Model是打包机器学习模型的标准格式，可以在各种下游工具中使用，例如，通过REST API提供实时服务或Apache Spark上的批推理。

在示例训练代码中，在训练线性回归模型之后，MLflow中的函数将模型保存为运行中的工件。

mlflow.sklearn.log_model(lr, "model")

要查看此工件，您可以再次使用UI。当你点击实验运行列表中的一个日期时，你会看到这个页面。

在底部，您可以看到通过调用mlflow.sklearn.log_model，在/Users/mlflow/mlflow prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model中生成了两个文件。第一个文件是MLmodel，它是一个元数据文件，告诉MLflow如何加载模型。第二个文件model.pkl是您训练的线性回归模型的序列化版本。

在本例中，您可以将此MLmodel格式与MLflow一起使用，以部署可以提供预测服务的本地REST服务器。

要部署服务器，请运行（将路径替换为模型的实际路径,本地服务的话记得加--env-manager local）：

mlflow models serve -m /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model -p 1234

注意：用于创建模型的Python版本必须与运行mlflow模型的版本相同，否则会报编码错误。

一旦部署了服务器，就可以向它传递一些示例数据并查看预测。以下示例使用curl向模型服务器发送带有拆分方向的JSON序列化pandas DataFrame。有关模型服务器接受的输入数据格式的更多信息，请参阅MLflow部署工具文档（MLflow Models — MLflow 2.4.1 documentation）。

# On Linux and macOS
curl -X POST -H "Content-Type:application/json" --data '{"dataframe_split": {"columns":["fixed acidity", "volatile acidity", "citric acid", "residual sugar", "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH", "sulphates", "alcohol"],"data":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}' http://127.0.0.1:1234/invocations

# On Windows
curl -X POST -H "Content-Type:application/json" --data "{\"dataframe_split\": {\"columns\":[\"fixed acidity\", \"volatile acidity\", \"citric acid\", \"residual sugar\", \"chlorides\", \"free sulfur dioxide\", \"total sulfur dioxide\", \"density\", \"pH\", \"sulphates\", \"alcohol\"],\"data\":[[6.2, 0.66, 0.48, 1.2, 0.029, 29, 75, 0.98, 3.33, 0.39, 12.8]]}}" http://127.0.0.1:1234/invocations

相应结果应该与下面相似：

[6.379428821398614]

4.1.7 将模型部署到Seldon Core或KServe

在对我们的模型进行了训练和测试之后，我们现在准备将其部署到生产环境中。MLflow允许您使用MLServer为您的模型提供服务，MLServer已经在Kubernetes原生框架中用作核心Python推理服务器，包括Seldon core和KServe（以前称为KFServing）。因此，我们可以利用这种支持来构建与这些框架兼容的Docker镜像。

注意：这是一个可选步骤，目前仅适用于Python模型。这一步还需要一些基本的Kubernetes知识，包括熟悉kubectl。

要构建包含我们的模型的Docker映像，我们可以使用mlflow models build-docker子命令和--enable -mlserver标志。例如，要构建一个名为my-docker-image的镜像，我们可以执行以下操作：

mlflow models build-docker \
  -m /Users/mlflow/mlflow-prototype/mlruns/0/7c1a0d5c42844dcdb8f5191146925174/artifacts/model \
  -n my-docker-image \
  --enable-mlserver

一旦我们构建了我们的镜像，下一步就是将其部署到我们的集群中。一种方法是通过kubectl CLI应用相应的Kubernetes命令：