机器学习mlflow_使用mlflow管理您的机器学习实验

最新推荐文章于 2024-08-07 10:07:06 发布

weixin_26704853

最新推荐文章于 2024-08-07 10:07:06 发布

阅读量1.1k

点赞数

文章标签：机器学习 python 人工智能 java 深度学习

原文链接：https://towardsdatascience.com/managing-your-machine-learning-experiments-with-mlflow-1cd6ee21996e

版权

本文介绍了如何利用mlflow有效地管理机器学习实验，包括模型训练、版本控制和跟踪实验参数。通过mlflow，可以方便地在python和java环境中进行深度学习与人工智能项目的迭代优化。

摘要由CSDN通过智能技术生成

机器学习mlflow

There was this painful period of time that I still remember when my teammate and I were working on a machine learning (ML) project.

Ť这里是一个时间，我仍然记得当我的队友和我的机器学习(ML)项目工作这个痛苦的时期。

Tediously and studiously, we were manually transferring the results of our countless experiments to a Google Sheet and organizing our saved models in folders. Of course, we did try to automate the process as much as possible, but managing our ML experiments was still a messy affair.

繁琐而艰辛的工作是，我们将无数实验的结果手动传输到Google表格中，并在文件夹中整理保存的模型。当然，我们确实尝试了使过程尽可能自动化，但是管理ML实验仍然是一件麻烦事。

If the above situation sounds like something you are in, hopefully, this article will be able to help you out and reduce your pain.

如果上述情况听起来像您正在从事的工作，那么希望本文能够为您提供帮助，并减轻您的痛苦。

Being one of the best open source solutions (see other tools here) for managing ML experiments, MLflow will greatly improve your well being (as a data scientist, machine learning specialist, etc.) and let you and your team remain sane while keeping track of your models 💪.

作为管理ML实验的最佳开源解决方案之一(请参阅此处的其他工具)，MLflow将大大改善您的健康状况(作为数据科学家，机器学习专家等)，并使您和您的团队在保持跟踪的同时保持理智您的模型💪。

MLflow如何帮助我？ (How can MLflow help me?)

With just a few lines of code integrated into your script, you can auto-log your model parameters and metrics into an organized dashboard as shown below.

只需将几行代码集成到脚本中，即可将模型参数和指标自动记录到组织好的仪表板中，如下所示。

Clicking into each of the table rows will show you more details, including the path of the model saved for that run (one run is basically one model training).

单击每个表行将为您显示更多详细信息，包括为该运行保存的模型的路径(一次运行基本上是一次模型训练)。

MLflow dashboard showing path to saved model

And as mentioned earlier, the important thing is that all these can be automated with just a few additional lines of code in your script.

如前所述，重要的是，只需在脚本中添加几行代码即可自动完成所有这些操作。

In our example code snippet below, we have placed comments above all the lines of code relating to MLflow.

在下面的示例代码片段中，我们已在与MLflow相关的所有代码行的上方放置了注释。

X_train, X_test, y_train, y_test = data_processing()


#################### 1. Setup Experiment ###########################
# set experiment name to organize runs
mlflow.set_experiment('New Experiment Name') 
experiment = mlflow.get_experiment_by_name('New Experiment Name')


# set path to log data, e.g., mlruns local folder
mlflow.set_tracking_uri('./mlruns')


# launch new run under the experiment name
with mlflow.start_run(experiment_id = experiment.experiment_id):


#################### 2. Normal Model Training ######################
    hyperparams = {'max_depth': 10, 
                   'max_samples': 0.8, 
                   'max_features': 'sqrt'}
    clf = RandomForestClassifier(**hyperparams,
                                 random_state=0)
    clf.fit(X_train, y_train)
    accuracy = clf.score(X_test, y_test)


################ 3. Log params, metrics and model #################
    # log model params
    mlflow.log_params(hyperparams)
    
    # log model metric
    mlflow.log_metric('accuracy', accuracy)
    
    # log model
    mlflow.sklearn.log_model(clf, "model")

In general, there are three main sections in our example:

通常，我们的示例中包含三个主要部分：

1. Setup experiment: Here we set an experiment name (mlflow.set_experiment()) and path (mlflow.set_tracking_uri()) to log our run, before starting our run with mlflow.start_run().

1. 设置实验 ：在使用mlflow.start_run()开始运行之前，在此处设置实验名称( mlflow.set_experiment() )和路径( mlflow.set_tracking_uri() )以记录运行。

2. Train model: Nothing special here, just normal model training.

2. 训练模型 ：这里没有什么特别的，只是普通的模型训练。

3. Logging: Log parameters (mlflow.log_params()), metrics (mlflow.log_metric()) and model (mlflow.sklearn.log_model()).

3. 记录：记录参数( mlflow.log_params() )，指标( mlflow.log_metric() )和模型( mlflow.sklearn.log_model() )。

After running the code, you can execute mlflow ui in your terminal and there will be a link to your MLflow dashboard.

运行代码后，您可以在终端中执行mlflow ui ，并且将有指向MLflow仪表板的链接。

Simple and neat right? 😎

简单利落吧？ 😎

However, what we have shown you so far are in the local environment. What if we would like to collaborate with other teammates? This is where a remote server can come into play and our next section of the article shows you the steps to do that.

但是，到目前为止，我们向您展示的内容都是在本地环境中进行的。如果我们想与其他队友合作怎么办？这是远程服务器可以发挥作用的地方，本文的下一部分将向您展示执行此操作的步骤。

在Google Cloud上部署MLflow的步骤 (Steps to Deploy MLflow on Google Cloud)

We list down first the general steps to take before detailing each of the steps with screenshots (feel free to click on each step to navigate). Having a Google Cloud account is the only prerequisite for following the steps. Do note that Google Cloud has a free trial for new signups, so you can experiment at no cost.

我们先列出要执行的一般步骤，然后再用屏幕截图详细说明每个步骤(可随时单击每个步骤进行导航)。拥有Google Cloud帐户是执行这些步骤的唯一先决条件。请注意，Google Cloud为新注册提供免费试用，因此您可以免费试用。

1.设置虚拟机(VM) (1. Setup Virtual Machine (VM))

Our first step is to set up a Compute Engine VM instance through Google Cloud console.

我们的第一步是通过Google Cloud控制台设置Compute Engine VM实例。

a) Enable the Compute Engine API after logging in to your Google Cloud console

a)登录到Google Cloud控制台后启用Compute Engine API

b) Start Google Cloud Shell

b)启动Google Cloud Shell

You should see a button similar to the one in red box below in the top right corner of your console page. Click on it and a terminal will pop out. We shall be using this terminal to launch our VM.

您应该在控制台页面右上角看到一个类似于下面红色框中的按钮。单击它，将弹出一个终端。我们将使用此终端来启动我们的VM。

Click on button in red box to start Google Cloud Shell

c) Create a Compute Engine VM instance

c)创建一个Compute Engine VM实例

Key in the following into Google Cloud Shell to create a VM instance named mlflow-server.

在Google Cloud Shell中键入以下内容以创建名为mlflow-server的VM实例。

gcloud compute instances create mlflow-server \
--machine-type n1-standard-1 \
--zone us-central1-a \
--tags mlflow-server \
--metadata startup-script='#! /bin/bash
sudo apt update
sudo apt-get -y install tmux
echo Installing python3-pip
sudo apt install -y python3-pip
export PATH="$HOME/.local/bin:$PATH"
echo Installing mlflow and google_cloud_storage
pip3 install mlflow google-cloud-storage'

A brief description of the parameters in the code above:

上面的代码中的参数的简要说明：

machine-type specifies the amount of CPU and RAM for our VM. You can choose other types from this list.
机器类型为我们的VM指定CPU和RAM的数量。您可以从此列表中选择其他类型。
zone refers to the data center zone that your cluster resides in. You can choose somewhere that is not too far away from your users.
区指的是数据中心地带，你的集群所在。您可以选择的地方，是不是太远离你的用户。
tags allow us to identify the instances when adding network firewall rules later.
标签可让我们在以后添加网络防火墙规则时识别实例。
metadata startup-script provides a bash script that will be executed when our instance boots up, installing various packages required.
元数据启动脚本提供了一个bash脚本，该脚本将在我们的实例启动时执行，并安装所需的各种软件包。

d) Create firewall rule

d)创建防火墙规则

This is to allow access on port 5000 to our MLflow server.

这是为了允许在端口5000上访问我们的MLflow服务器。

gcloud compute firewall-rules create mlflow-server \
--direction=INGRESS --priority=999 --network=default \
--action=ALLOW --rules=tcp:5000 --source-ranges=0.0.0.0/0 \
--target-tags=mlflow-server

2.创建云存储桶 (2. Create Cloud Storage Bucket)

Run the code below in the Google Cloud Shell, replacing <BUCKET_NAME> with a unique name of your choice. This bucket will be where we will store our models later.

在Google Cloud Shell中运行以下代码，将<BUCKET_NAME>替换为您选择的唯一名称。这个存储桶将是我们以后存储模型的地方。

gsutil mb gs://<BUCKET_NAME>

3.启动MLflow服务器 (3. Launch MLflow Server)

We shall now SSH into our mlflow-server instance.

现在，我们将SSH到我们的mlflow-server实例中。

Go to the Compute Engine page and click on the SSH button for your instance. A terminal for your VM instance should pop out.

转到“ 计算引擎”页面，然后单击您实例的SSH按钮。 VM实例的终端应弹出。

While the terminal gets ready, take note of the internal and external IPs for your mlflow-server instance that is shown on the Compute Engine page. We will need them later.

终端准备就绪后，请注意Compute Engine页面上显示的mlflow-server实例的内部和外部IP。我们稍后将需要它们。

Before launching our MLflow server, let’s do a quick check to ensure that everything has been installed. As our startup script will take a few minutes to finish execution, the packages may not have all been installed if you SSH in too quickly. To check that MLflow has been installed, key in the terminal:

在启动MLflow服务器之前，让我们快速检查一下是否已安装所有内容。由于我们的启动脚本将需要几分钟才能完成执行，因此如果您以太快的速度进行SSH，则可能尚未安装所有软件包。要检查是否已安装MLflow，请输入终端：

mlflow --version

You should see the version of MLflow if it has been installed. If not, no worries, either wait a while more or execute the commands in our bash script under step 1c to manually install the packages.

如果已经安装了MLflow，则应该看到它的版本。如果没有，请稍候，或者在步骤1c中执行bash脚本中的命令以手动安装软件包。

Check that MLflow has been installed with version showing

If MLflow has been installed, we can now bring up a new window using tmux by executing:

如果已经安装了MLflow，我们现在可以通过执行以下命令使用tmux弹出一个新窗口：

tmux

And launch our MLflow server by running code below, replacing <BUCKET_NAME> and <INTERNAL_IP> respectively with the bucket name in step 2 and your internal IP address noted earlier.

并通过运行以下代码来启动MLflow服务器，分别将<BUCKET_NAME>和<INTERNAL_IP>分别替换为步骤2中的存储桶名称和您前面提到的内部IP地址。

mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root gs://<BUCKET_NAME> \
--host <INTERNAL_IP>

If you see something similar to screenshot below, congratulations your MLflow server is up and running 😄. You can now visit <External_IP>:5000 in your browser to view your MLflow dashboard.

如果您看到与以下屏幕截图类似的内容，则表示您的MLflow服务器已启动并正在运行。现在，您可以在浏览器中访问<External_IP>:5000来查看MLflow仪表板。

4.添加用户身份验证 (4. Add User Authentication)

If you don’t mind letting anyone who has your external IP address to view your MLflow dashboard, then you can skip this step. But I am guessing you are not such an exhibitionist right? Or are you? 😱

如果您不介意让拥有外部IP地址的任何人查看MLflow仪表板，则可以跳过此步骤。但是我猜你不是这样的暴露狂吧？还是你 😱

To add user authentication, first let’s stop our MLflow server for now by pressing Ctrl+c. And then say out Terminator’s famous line “I’ll be back” before detaching our window by Ctrl+b d.

要添加用户身份验证，首先让我们现在按Ctrl+c停止MLflow服务器。然后在按Ctrl+b d分离窗口之前，说出Terminator著名的一行“我会回来”。

a) Install Nginx and Apache Utilities

a)安装Nginx和Apache实用程序

In our terminal’s main window, execute:

在终端的主窗口中，执行：

sudo apt-get install nginx apache2-utils

Nginx shall set up our web server while Apache Utilities will give us access to the htpasswd command which we will use next to create password file.

Nginx将设置我们的Web服务器，而Apache Utilities将使我们能够访问htpasswd命令，接下来将使用它创建密码文件。

b) Add password file

b)添加密码文件

Run the folllowing, replacing <USERNAME> with a cool name.

运行以下操作，将<USERNAME>替换为一个很酷的名称。

sudo htpasswd -c /etc/nginx/.htpasswd <USERNAME>

Then set your nobody-can-decipher password.

然后设置您的没人可以解密的密码。

If you need a party, just leave out the -c argument to add additional users:

如果您需要参加聚会，只需省略-c参数即可添加其他用户：

sudo htpasswd /etc/nginx/.htpasswd <ANOTHER_USER>

c) Enable password and reverse-proxy

c)启用密码和反向代理

We need to configure Nginx to let our password file take effect and set up reverse-proxy to our MLflow server. We do this by modifying the default server block file:

我们需要配置Nginx，以使我们的密码文件生效，并为我们的MLflow服务器设置反向代理。我们通过修改default服务器阻止文件来做到这一点：

sudo nano /etc/nginx/sites-enabled/default

Modify the file by replacing the content under location to the three bold lines:

通过将位置下的内容替换为三行粗体来修改文件：

server {
  location / {proxy_pass http://localhost:5000;
    auth_basic "Restricted Content";
    auth_basic_user_file /etc/nginx/.htpasswd;
  }
}

Press Ctrl+x y Enter to save changes and exit the editor.

按Ctrl+x y Enter保存更改并退出编辑器。

Restart Nginx for the changes to take effect:

重新启动Nginx以使更改生效：

sudo service nginx restart

Create a new session with tmux or re-attach to our earlier tmux session:

使用tmux创建一个新会话，或重新连接到我们先前的tmux会话：

tmux attach-session -t 0

Launch our MLflow server again but this time around, our host is set to localhost:

再次启动MLflow服务器，但是这次，我们的主机设置为localhost ：

mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root gs://<BUCKET_NAME> \
--host localhost

d) Enable HTTP traffic

d)启用HTTP流量

Lastly, we enable HTTP traffic for our instance to allow access to our Nginx web server by following the steps in this link. Essentially, when you click on our mlflow-server instance on the Compute Engine page, you can edit and select Allow HTTP traffic and Allow HTTPS traffic under the Firewall section.

最后，我们按照此链接中的步骤为实例启用HTTP流量，以允许访问我们的Nginx Web服务器。本质上，当您在Compute Engine页面上单击我们的mlflow-server实例时，您可以在“防火墙”部分下编辑并选择“ 允许HTTP通信”和“ 允许HTTPS通信 ”。

Now if you visit your external IP (leave out :5000, just external IP), you should be prompted for credentials. Key in the username and password that you set earlier and “Open Sesame”, your MLflow dashboard is back before your eyes again.

现在，如果您访问外部IP(请省略：5000 ，仅访问外部IP)，则将提示您输入凭据。键入您先前设置的用户名和密码，然后单击“打开芝麻”，MLflow仪表板又回到了您的视线。

5.修改代码以访问服务器 (5. Modify Code to Access Server)

In order for our scripts to log to the server, we need to modify our code by providing some credentials as environment variables.

为了使脚本登录到服务器，我们需要通过提供一些凭据作为环境变量来修改代码。

a) Create and download the service account json

a)创建并下载服务帐户json

Follow the steps here to create new service account key.

请按照此处的步骤创建新的服务帐户密钥。

b) Pip install google-cloud-storage locally

b)在本地Pip安装google-cloud-storage

google-cloud-storage package is required to be installed on both the client and server in order to access Google Cloud Storage. We had installed the package on the server through our startup script so you just need to install it locally.

必须在客户端和服务器上都安装google-cloud-storage软件包，才能访问Google Cloud Storage。我们已经通过启动脚本将软件包安装在服务器上，因此您只需要在本地安装它即可。

c) Set up credentials as environment variables

c)将凭证设置为环境变量

In your code, add the following in order for your script to access the server, replacing each of them accordingly:

在您的代码中，添加以下内容以使您的脚本能够访问服务器，并相应地替换它们：

<GOOGLE_APPLICATION_CREDENTIALS> : Path of downloaded service account key
<GOOGLE_APPLICATION_CREDENTIALS>：下载的服务帐户密钥的路径
<MLFLOW_TRACKING_USERNAME> : Username
<MLFLOW_TRACKING_USERNAME>：用户名
<MLFLOW_TRACKING_PASSWORD> : Password
<MLFLOW_TRACKING_PASSWORD>：密码

import os# Set path to service account json file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = <GOOGLE_APPLICATION_CREDENTIALS># Set username and password if authentication was added
os.environ['MLFLOW_TRACKING_USERNAME'] = <MLFLOW_TRACKING_USERNAME>
os.environ['MLFLOW_TRACKING_PASSWORD'] = <MLFLOW_TRACKING_PASSWORD>

d) Set external IP as MLflow tracking URI

d)将外部IP设置为MLflow跟踪URI

Earlier in our example code, the mlflow.set_tracking_uri() was set to a local folder path. Now set it to <EXTERNAL_IP>:80, e.g. “http://35.225.50.9:80”.

在我们的示例代码的mlflow.set_tracking_uri() ， mlflow.set_tracking_uri()设置为本地文件夹路径。现在将其设置为<EXTERNAL_IP>：80，例如“ http://35.225.50.9:80 ”。

mlflow.set_tracking_uri(<EXTERNAL_IP>:80)

You can now easily collaborate with your teammate and log your models to the server. 👏 👏 👏

现在，您可以轻松地与队友协作并将模型记录到服务器。 👏

Our full example code can be found here for your testing convenience.

我们的完整示例代码可在此处找到，以方便您进行测试。

Through the guide above, we hope that you are now able to deploy MLflow both locally as well as on Google Cloud to manage your ML experiments. In addition, after your experimentation, MLflow will remain useful for monitoring your model after you have deployed it into production.

通过以上指南，我们希望您现在能够在本地以及在Google Cloud上部署MLflow来管理您的ML实验。此外，在进行实验后，将MLflow部署到生产环境后，对监控模型仍然有用。

Thanks for reading and I hope the article was useful :) Please also feel free to comment with any questions or suggestions that you may have.

感谢您的阅读，希望本文对您有用：)也请随时提出任何问题或建议，以发表评论。

翻译自: https://towardsdatascience.com/managing-your-machine-learning-experiments-with-mlflow-1cd6ee21996e

机器学习mlflow

weixin_26704853

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习mlflow_使用mlflow管理您的机器学习实验

机器学习mlflowThere was this painful period of time that I still remember when my teammate and I were working on a machine learning (ML) project. Ť这里是一个时间，我仍然记得当我的队友和我的机器学习(ML)项目工作这个痛苦的时期。 Tediously and ...
复制链接

扫一扫