MLFLOW

1、概念(组件)

Tracking, Projects, Models, and Model Registry.

2、问题

  • 难于追踪实验结果 。It’s difficult to keep track of experiments. When you are just working with files on your laptop, or with an interactive notebook, how do you tell which data, code and parameters went into getting a particular result?)
  • 难于复现实验。 It’s difficult to reproduce code. Even if you have meticulously tracked the code versions and parameters, you need to capture the whole environment (for example, library dependencies) to get the same result again. This is especially challenging if you want another data scientist to use your code, or if you want to run the same code at scale on another platform (for example, in the cloud).
  • 打包和部署模型没有标准。 There’s no standard way to package and deploy models. Every data science team comes up with its own approach for each ML library that it uses, and the link between a model and the code and parameters that produced it is often lost.
  • 没有中心化的存储管理模型(包括版本问题)。 There’s no central store to manage models (their versions and stage transitions). A data science team creates many models. In absence of a central place to collaborate and manage model lifecycle, data science teams face challenges in how they manage models stages: from development to staging, and finally, to archiving or production, with respective versions, annotations, and history.

3、mlflow的解决方案

MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and artifacts when running your machine learning code and for later visualizing the results. You can use MLflow Tracking in any environment (for example, a standalone script or a notebook) to log results to local files or to a server, then compare multiple runs. Teams can also use it to compare results from different users.

MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows.

MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.

MLflow Registry offers a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production or archiving), and annotations.

4、MLflow Tracking

4.1、概念

MLflow Tracking是围绕runs来组织的。每条run记录包括如下信息:

  1. Code Version
  2. Start & End Time
  3. Source
  4. Parameters
  5. Metrics
  6. Artifacts

4.2、Runs如何被记录的

4.2.1. 场景1、MLflow on localhost

在这里插入图片描述

4.2.2、场景2、MLflow on localhost with SQLite

在这里插入图片描述

4.2.3、场景3、 MLflow on localhost with Tracking Server

在这里插入图片描述

4.2.4、场景4、MLflow with remote Tracking Server, backend and artifact stores

在这里插入图片描述

4.2.5、场景5、 MLflow Tracking Server enabled with proxied artifact storage access

                                 重点!

在这里插入图片描述

4.2.6、场景6、 MLflow Tracking Server used exclusively as proxied access host for artifact storage access

在这里插入图片描述

4.3、MLflow Tracking Servers

  • Storage:An MLflow tracking server has two components for storage: a backend store and an artifact store.
    • Backend Stores: stores experiment and run metadata as well as params, metrics, and tags for runs. MLflow supports two types of backend stores: file store and database-backed store.
    • Artifact Stores:存放大数据:例如模型,图片
    • File store performance
    • Deletion Behavior
    • SQLAlchemy Options
  • Networking
  • Using the Tracking Server for proxied artifact access
    • Optionally using a Tracking Server instance exclusively for artifact handling
	mlflow server \
		--host 0.0.0.0 \
		--port 8885 \
		--artifacts-destination hdfs://myhost:8887/mlprojects/models \
		--serve-artifacts
  • Logging to a Tracking Server
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值