MLflow

最新推荐文章于 2024-08-10 23:48:08 发布

zhixingheyi_tian

最新推荐文章于 2024-08-10 23:48:08 发布

阅读量522

点赞数

分类专栏：人工智能

本文链接：https://blog.csdn.net/zhixingheyi_tian/article/details/113928910

版权

人工智能专栏收录该内容

20 篇文章 0 订阅

订阅专栏

Introduction

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Components

MLflow Tracking: Record and query experiments: code, data, config,
and results
MLflow Projects: Package data science code in a format to reproduce
runs on any platform
MLflow Models: Deploy machine learning models in diverse serving
environments
Model Registry: Store, annotate, discover, and manage models in a
central repository

Features

WORKS WITH ANY ML LIBRARY, LANGUAGE & EXISTING CODE
RUNS THE SAME WAY IN ANY CLOUD
DESIGNED TO SCALE FROM 1 USER TO LARGE ORGS
SCALES TO BIG DATA WITH APACHE SPARK™

MLflow Projects

An MLflow Project is a format for packaging data science code in a reusable and reproducible way, based primarily on conventions. In addition, the Projects component includes an API and command-line tools for running projects, making it possible to chain together projects into workflows.

At the core, MLflow Projects are just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. Each project is simply a directory of files, or a Git repository, containing your code.

You can run any project from a Git URI or from a local directory using the mlflow run command-line tool, or the mlflow.projects.run() Python API. These APIs also allow submitting the project for remote execution on Databricks and Kubernetes.

MLflow Models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.

MLflow Tracking Servers

mlflow server \
    --backend-store-uri /mnt/persistent-disk \
    --default-artifact-root s3://my-mlflow-bucket/ \
    --host 0.0.0.0

The backend store is where MLflow Tracking Server stores experiment and run metadata as well as params, metrics, and tags for runs. MLflow supports two types of backend stores: file store and database-backed store.

By default --backend-store-uri is set to the local ./mlruns directory (the same as when running mlflow run locally), but when running a server, make sure that this points to a persistent (that is, non-ephemeral) file system location.

The artifact store is a location suitable for large data (such as an S3 bucket or shared NFS file system) and is where clients log their artifact output (for example, models). artifact_location is a property recorded on

Use --default-artifact-root (defaults to local ./mlruns directory) to configure default location to server’s artifact store. This will be used as artifact location for newly-created experiments that do not specify one. Once you create an experiment, --default-artifact-root is no longer relevant to that experiment.

Note:

In order to use model registry functionality, 
you must run your server using a database-backed store.

Logging to a Tracking Server

To log to a tracking server, set the MLFLOW_TRACKING_URI environment variable to the server’s URI, along with its scheme and port (for example, http://10.0.0.1:5000) or call mlflow.set_tracking_uri().
The mlflow.start_run(), mlflow.log_param(), and mlflow.log_metric() calls then make API requests to your remote tracking server.

MLflow Tracking

Automatic Logging

Automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements.

There are two ways to use autologging:

Call mlflow.autolog() before your training code. This will enable autologging for each supported library you have installed as soon as you import it.
Use library-specific autolog calls for each library you use in your code. See below for examples.

The following libraries support autologging:

Scikit-learn (experimental)
TensorFlow and Keras (experimental)
Gluon (experimental)
XGBoost (experimental)
LightGBM (experimental)
Statsmodels (experimental)
Spark (experimental)
Fastai (experimental)
Pytorch (experimental)

MLflow Model Registry

The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.

Serving an MLflow Model from Model Registry

After you have registered an MLflow model, you can serve the model as a service on your host.

#!/usr/bin/env sh

# Set environment variable for the tracking URL where the Model Registry resides
export MLFLOW_TRACKING_URI=http://localhost:5000

# Serve the production model from the model registry
mlflow models serve -m "models:/sk-learn-random-forest-reg-model/Production"

Concepts

Model

An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.

Registered Model

An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, associated transitional stages, model lineage, and other metadata.

Model Version

Each registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Model Stage

Each distinct model version can be assigned one stage at any given time. MLflow provides predefined stages for common use-cases such as Staging, Production or Archived. You can transition a model version from one stage to another stage.

Model Registry Workflows

Before you can add a model to the Model Registry, you must log it using the log_model methods of the corresponding model flavors. Once a model has been logged, you can add, modify, update, transition, or delete model in the Model Registry through the UI or the API.

Adding an MLflow Model to the Model Registry

There are three programmatic ways to add a model to the registry. First, you can use the mlflow.<model_flavor>.log_model() method.

  # Log the sklearn model and register as version 1
    mlflow.sklearn.log_model(
        sk_model=sk_learn_rfr,
        artifact_path="sklearn-model",
        registered_model_name="sk-learn-random-forest-reg-model"
    )

The second way is to use the mlflow.register_model() method, after all your experiment runs complete and when you have decided which model is most suitable to add to the registry. For this method, you will need the run_id as part of the runs:URI argument.

result = mlflow.register_model(
    "runs:/d16076a3ec534311817565e6527539c0/sklearn-model",
    "sk-learn-random-forest-reg"
)

If a registered model with the name doesn’t exist, the method registers a new model, creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists, the method creates a new model version and returns the version object.

API Workflow

Scalability and Big Data

Data is the key to obtaining good results in machine learning, so MLflow is designed to scale to large data sets, large output files (for example, models), and large numbers of experiments. Specifically, MLflow supports scaling in four dimensions:

An individual MLflow run can execute on a distributed cluster, for example, using Apache Spark. You can launch runs on the distributed infrastructure of your choice and report results to a Tracking Server to compare them. MLflow includes a built-in API to launch runs on Databricks.
MLflow supports launching multiple runs in parallel with different parameters, for example, for hyperparameter tuning. You can simply use the Projects API to start multiple runs and the Tracking API to track them.
MLflow Projects can take input from, and write output to, distributed storage systems such as AWS S3 and DBFS. MLflow can automatically download such files locally for projects that can only run on local files, or give the project a distributed storage URI if it supports that. This means that you can write projects that build large datasets, such as featurizing a 100 TB file.
MLflow Model Registry offers large organizations a central hub to collaboratively manage a complete model lifecycle. Many data science teams within an organization develop hundreds of models, each model with its experiments, runs, versions, artifacts, and stage transitions. A central registry facilitates model discovery and model’s purpose across multiple teams in a large organization.

And finally, you can use the create_registered_model() to create a new registered model. If the model name exists, this method will throw an MlflowException because creating a new registered model requires a unique name.

Fetch the latest model version in a specific stage

To fetch a model version by stage, simply provide the model stage as part of the model URI, and it will fetch the most recent version of the model in that stage.

import mlflow.pyfunc

model_name = "sk-learn-random-forest-reg-model"
stage = 'Staging'

model = mlflow.pyfunc.load_model(
    model_uri=f"models:/{model_name}/{stage}"
)

model.predict(data)

Command-Line Interface

# mlflow experiments list
  Experiment Id  Name     Artifact Location
---------------  -------  -------------------
              0  Default  hdfs:/mlflow/0

# mlflow runs list --experiment-id 0
Date                     Name           ID
-----------------------  -------------  --------------------------------
2021-03-09 13:41:51 CST  YOUR_RUN_NAME  a9324d66b324437dac9ba38ec64d0de6
2021-03-08 11:36:30 CST                 1a4cc508c73d4aefa50a3a40bb4a34c1
2021-03-08 11:19:26 CST                 6e3e7acd9882422891fa10c888d71ff5
2021-03-08 11:18:07 CST                 603b79069c4c45a4985a3d776274d3a1

UI

Launch the MLflow tracking UI for local viewing of run results. To launch a production server, use the “mlflow server” command instead.

The UI will be visible at http://localhost:5000 by default, and only accept connections from the local machine. To let the UI server accept connections from other machines, you will need to pass --host 0.0.0.0 to listen on all network interfaces (or a specific interface address).