机器学习模型的衡量指标_如何衡量机器学习执行速度

最新推荐文章于 2022-05-27 16:17:14 发布

cumian9828

最新推荐文章于 2022-05-27 16:17:14 发布

阅读量880

点赞数

文章标签： python 机器学习深度学习人工智能 java

原文链接：https://www.freecodecamp.org/news/benchmarking-machine-learning-execution-speeds/

版权

本文探讨了使用个人计算机CPU、GPU和云服务（如Google Colab、Kaggle）进行机器学习项目时的速度权衡。在Python中，使用XGBoost并借助GPU（如NVIDIA TITAN RTX）可显著加快训练速度。使用GPU云服务（如Google Colab的TESLA T4 GPU）和RAPIDS库（NVIDIA的Python数据科学库）也能进一步提高效率，尤其是在处理大规模数据时。

摘要由CSDN通过智能技术生成

机器学习模型的衡量指标

介绍 (Introduction)

Thanks to recent advances in storage capacity and memory management, it has become much easier to create machine learning and deep learning projects from the comfort of your own home.

得益于存储容量和内存管理的最新发展，从家里舒适地创建机器学习和深度学习项目变得更加容易。

In this article, I will introduce you to different possible approaches to machine learning projects in Python and give you some indications on their trade-offs in execution speed. Some of the different approaches are:

在本文中，我将向您介绍使用Python进行机器学习项目的各种可能方法，并提供一些有关它们在执行速度上的权衡的指示。一些不同的方法是：

Using a personal computer/laptop CPU (Central processing unit)/GPU (Graphics processing unit).
使用个人计算机/笔记本电脑CPU(中央处理单元)/ GPU(图形处理单元)。
Using cloud services (Kaggle, Google Colab).
使用云服务(Kaggle，Google Colab)。

First of all, we need to import all the necessary dependencies:

首先，我们需要导入所有必要的依赖项：

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.metrics import accuracy_score

For this example, I decided to fabricate a simple dataset using Gaussian Distributions consisting of four features and two labels (0/1):

对于此示例，我决定使用由四个要素和两个标签(0/1)组成的高斯分布来构造一个简单的数据集：

# Creating a linearly separable dataset using Gaussian Distributions.
# The first half of the number in Y is 0 and the other half 1.
# Therefore I made the first half of the 4 features quite different from
# the second half of the features (setting the value of the means quite 
# similar) so that make quite simple the classification between the 
# classes (the data is linearly separable).
dataset_len = 40000000
dlen = int(dataset_len/2)
X_11 = pd.Series(np.random.normal(2,2,dlen))
X_12 = pd.Series(np.random.normal(9,2,dlen))
X_1 = pd.concat([X_11, X_12]).reset_index(drop=True)
X_21 = pd.Series(np.random.normal(1,3,dlen))
X_22 = pd.Series(np.random.normal(7,3,dlen))
X_2 = pd.concat([X_21, X_22]).reset_index(drop=True)
X_31 = pd.Series(np.random.normal(3,1,dlen))
X_32 = pd.Series(np.random.normal(3,4,dlen))
X_3 = pd.concat([X_31, X_32]).reset_index(drop=True)
X_41 = pd.Series(np.random.normal(1,1,dlen))
X_42 = pd.Series(np.random.normal(5,2,dlen))
X_4 = pd.concat([X_41, X_42]).reset_index(drop=True)
Y = pd.Series(np.repeat([0,1],dlen))
df = pd.concat([X_1, X_2, X_3, X_4, Y], axis=1)
df.columns = ['X1', 'X2', 'X3', 'X_4', 'Y']
df.head()

Finally, now we just have to prepare our dataset to be fed into a machine learning model (dividing it into features and labels, and training and test sets):

最后，现在我们只需要准备将数据集输入到机器学习模型中(将其分为功能和标签，以及训练和测试集)即可：

train_size = 0.80
X = df.drop(['Y'], axis = 1).values
y = df['Y']

# label_encoder object knows how to understand word labels. 
label_encoder = preprocessing.LabelEncoder() 

# Encode labels
y = label_encoder.fit_transform(y) 

# identify shape and indices
num_rows, num_columns = df.shape
delim_index = int(num_rows * train_size)

# Splitting the dataset in training and test sets
X_train, y_train = X[:delim_index, :], y[:delim_index]
X_test, y_test = X[delim_index:, :], y[delim_index:]

# Checking sets dimensions
print('X_train dimensions: ', X_train.shape, 'y_train: ', y_train.shape)
print('X_test dimensions:', X_test.shape, 'y_validation: ', y_test.shape)

# Checking dimensions in percentages
total = X_train.shape[0] + X_test.shape[0]
print('X_train Percentage:', (X_train.shape[0]/total)*100, '%')
print('X_test Percentage:', (X_test.shape[0]/total)*100, '%')

The output train test split result is shown below:

输出列车测试拆分结果如下所示：

X_train dimensions:  (32000000, 4) y_train:  (32000000,)
X_test dimensions: (8000000, 4) y_validation:  (8000000,)
X_train Percentage: 80.0 %
X_test Percentage: 20.0 %

We are now ready to get started benchmarking the different approaches. In all the following examples, we will be using XGBoost (Gradient Boosted Decision Trees) as our classifier.

现在，我们准备开始对不同方法进行基准测试。在以下所有示例中，我们将使用XGBoost(梯度增强决策树)作为分类器。

1)CPU (1) CPU)

Training an XGBClassifier on my personal machine (without using a GPU), led to the following results:

在我的个人计算机上(不使用GPU)训练XGBClassifier产生了以下结果：

%%time

model = XGBClassifier(tree_method='hist')
model.fit(X_train, y_train)

CPU times: user 8min 1s, sys: 5.94 s, total: 8min 7s
Wall time: 8min 6s
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, tree_method='hist', verbosity=1)

Once we've trained our model, we can now check it's prediction accuracy:

训练模型后，我们现在可以检查其预测准确性：

sk_pred = model.predict(X_test)
sk_pred = np.round(sk_pred)
sk_acc = round(accuracy_score(y_test, sk_pred), 2)
print("XGB accuracy using Sklearn:", sk_acc*100, '%')

XGB accuracy using Sklearn: 99.0 %

In summary, using a standard CPU machine, it took about 8 minutes to train our classifier to achieve 99% accuracy.

总而言之，使用标准的CPU机器，大约需要8分钟来训练我们的分类器以达到99％的准确性。

2)GPU (2) GPU)

I will now instead make use of an NVIDIA TITAN RTX GPU on my personal machine to speed up the training. In this case, in order to activate the GPU mode of XGB, we need to specify the tree_method as gpu_hist instead of hist.

现在，我将在个人计算机上使用NVIDIA TITAN RTX GPU来加快培训速度。在这种情况下，为了激活XGB的GPU模式，我们需要将tree_method指定为gpu_hist而不是hist 。

%%time

model = XGBClassifier(tree_method='gpu_hist')
model.fit(X_train, y_train)

Using the TITAN RTX led in this example to just 8.85 seconds of execution time (about 50 times faster than using just the CPU!).

在本例中，使用TITAN RTX的执行时间仅为8.85秒(比仅使用CPU快50倍！)。

sk_pred = model.predict(X_test)
sk_pred = np.round(sk_pred)
sk_acc = round(accuracy_score(y_test, sk_pred), 2)
print("XGB accuracy using Sklearn:", sk_acc*100, '%')

XGB accuracy using Sklearn: 99.0 %

This considerable improvement in speed was possible thanks to the ability of the GPU to take the load off from the CPU, freeing up RAM memory and parallelizing the execution of multiple tasks.

由于GPU能够从CPU减轻负载，释放RAM内存并并行执行多个任务，因此速度的显着提高是可能的。

3)GPU云服务 (3) GPU Cloud Services)

I will now go over two examples of free GPU cloud services (Google Colab and Kaggle) and show you what benchmark score they are able to achieve. In both cases, we need to explicitly turn on the GPUs on the respective notebooks and specify the XGBoost tree_method as gpu_hist.

现在，我将通过两个免费GPU云服务示例(Google Colab和Kaggle)，向您展示它们能够实现的基准测试得分。在这两种情况下，我们都需要显式打开相应笔记本上的GPU，并将XGBoost tree_method指定为gpu_hist 。

Google Colab (Google Colab)

Using Google Colab NVIDIA TESLA T4 GPUs, the following scores have been registered:

使用Google Colab NVIDIA TESLA T4 GPU，已记录以下分数：

CPU times: user 5.43 s, sys: 1.88 s, total: 7.31 s
Wall time: 7.59 s

卡格勒 (Kaggle)

Using Kaggle instead led to a slightly higher execution time:

相反，使用Kaggle会导致执行时间略长：

CPU times: user 5.37 s, sys: 5.42 s, total: 10.8 s
Wall time: 11.2 s
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, tree_method='gpu_hist', verbosity=1)

Using either Google Colab or Kaggle both led to a remarkable decrease in execution time.

使用Google Colab或Kaggle都可以大大减少执行时间。

One downside of using these services is the limited amount of CPU and RAM available. In fact, slightly increasing the dimensions of the example dataset caused Google Colab to run out of RAM memory (which wasn't an issue when using the TITAN RTX).

使用这些服务的缺点之一是可用的CPU和RAM数量有限。实际上，略微增加示例数据集的大小会导致Google Colab用完RAM内存(使用TITAN RTX时这不是问题)。

One possible way to fix this type of problem when working with constrained memory devices is to optimize the code to consume the least amount of memory possible (using fixed point precision and more efficient data structures).

解决受约束的存储设备时此类问题的一种可能方法是优化代码以消耗尽可能少的内存(使用定点精度和更有效的数据结构)。

4)奖励积分：RAPIDS (4) Bonus Point: RAPIDS)

As an additional point, I will now introduce you to RAPIDS, an open-source collection of Python libraries by NVIDIA. In this example, we will make use of its integration with the XGBoost library to speed up our workflow in Google Colab. The full notebook for this example (with instructions on how to set up RAPIDS in Google Colab) is available here or on my GitHub Account.

另外，我现在将向您介绍RAPIDS，这是NVIDIA的Python库的开源集合。在此示例中，我们将利用其与XGBoost库的集成来加快Google Colab中的工作流程。此示例的完整笔记本(包含有关如何在Google Colab中设置RAPIDS的说明)可在此处或在我的GitHub帐户上找到。

RAPIDS is designed to be the next evolutionary step in data processing. Thanks to its Apache Arrow in-memory format, RAPIDS can lead to up to around 50x speed improvement compared to Spark in-memory processing. Additionally, it is also able to scale from one to multi-GPUs.

RAPIDS被设计为数据处理的下一步发展。凭借其Apache Arrow内存格式，与Spark内存处理相比，RAPIDS可以将速度提高近50倍。此外，它还可以从一个GPU扩展到多个GPU。

All RAPIDS libraries are based on Python and are designed to have Pandas and Sklearn-like interfaces to facilitate adoption.

所有RAPIDS库均基于Python，并设计为具有Pandas和类似Sklearn的接口，以方便采用。

The structure of RAPIDS is based on different libraries in order to accelerate data science from end to end. Its main components are:

RAPIDS的结构基于不同的库，以便从头到尾加速数据科学。它的主要组成部分是：

cuDF = used to perform data processing tasks (Pandas-like).
cuDF =用于执行数据处理任务(类似于熊猫)。
cuML = used to create machine learning models (Sklearn-like).
cuML =用于创建机器学习模型(类似于Sklearn)。
cuGraph = used to perform graph analytics (NetworkX).
cuGraph =用于执行图分析(NetworkX)。

In this example, we will make use of it's XGBoost integration:

在此示例中，我们将利用它的XGBoost集成：

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

%%time

params = {}
booster_params = {}
booster_params['tree_method'] = 'gpu_hist' 
params.update(booster_params)

clf = xgb.train(params, dtrain)

CPU times: user 1.42 s, sys: 719 ms, total: 2.14 s
Wall time: 2.51 s

As we can see above, using RAPIDS it took just about 2.5 seconds to train our model (decreasing time execution by almost 200 times!).

正如我们在上面看到的，使用RAPIDS只需花2.5秒钟即可训练我们的模型(将执行时间减少了近200倍！)。

Finally, we can now check that we obtained exactly the same prediction accuracy using RAPIDS that we registered in the other cases:

最后，我们现在可以检查是否使用在其他情况下注册的RAPIDS获得了完全相同的预测精度：

rapids_pred = clf.predict(dtest)

rapids_pred = np.round(rapids_pred)
rapids_acc = round(accuracy_score(y_test, rapids_pred), 2)
print("XGB accuracy using RAPIDS:", rapids_acc*100, '%')

XGB accuracy using RAPIDS: 99.0 %

If you are interested in finding out more about RAPIDS, more information is available here.

如果您想了解有关RAPIDS的更多信息，请在此处获得更多信息。

结论 (Conclusion)

Finally, we can now compare the execution time of the different methods used. As shown in Figure 2, using GPU optimization can substantially decrease execution time, especially if integrated with the use of RAPIDS libraries.

最后，我们现在可以比较所使用的不同方法的执行时间。如图2所示，使用GPU优化可以大大减少执行时间，尤其是与RAPIDS库结合使用时。

Figure 3 shows how many times faster the GPUs models are compared to our baseline CPU results.

图3显示了GPU模型与基准CPU结果相比要快多少倍。

联络人 (Contacts)

If you want to keep updated with my latest articles and projects, follow me on Medium and subscribe to my mailing list. These are some of my contacts details:

如果您想随时了解我的最新文章和项目，请在Medium上关注我并订阅我的邮件列表。这些是我的一些联系方式：