ml.net模型_改善ML模型的性能(续...)

ml.net模型

ml.net模型

改善ML模型的性能(续...) (Improving Performance of ML Model (Contd…))

通过算法调整提高性能 (Performance Improvement with Algorithm Tuning)

As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters.

众所周知,ML模型的参数化方式可以针对特定问题调整其行为。 算法调整意味着找到这些参数的最佳组合,从而可以提高ML模型的性能。 这个过程有时称为超参数优化,算法本身的参数称为超参数,而ML算法找到的系数称为参数。

Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn.

在这里,我们将讨论Python Scikit-learn提供的一些算法参数调整方法。

网格搜索参数调整 (Grid Search Parameter Tuning)

It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature.

这是一种参数调整方法。 该方法工作的关键点是针对网格中指定的算法参数的每种可能组合,系统地构建和评估模型。 因此,可以说该算法具有搜索性质。

Example

In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset.

在以下Python配方中,我们将使用sklearn的GridSearchCV类执行网格搜索,以评估Pima Indians糖尿病数据集上的Ridge回归算法的各种alpha值。

First, import the required packages as follows −

首先,导入所需的软件包,如下所示:


import numpy
from pandas import read_csv
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

Now, we need to load the Pima diabetes dataset as did in previous examples −

现在,我们需要像之前的示例一样加载Pima糖尿病数据集-


path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names=headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, evaluate the various alpha values as follows −

接下来,如下评估各种alpha值-


alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0])
param_grid = dict(alpha=alphas)

Now, we need to apply grid search on our model −

现在,我们需要在模型上应用网格搜索-


model = Ridge()
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid.fit(X, Y)

Print the result with following script line −

使用以下脚本行打印结果-


print(grid.best_score_)
print(grid.best_estimator_.alpha)

Output

输出量


0.2796175593129722
1.0

The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0.

上面的输出为我们提供了最佳分数以及网格中达到该分数的参数集。 在这种情况下,alpha值为1.0。

随机搜索参数调整 (Random Search Parameter Tuning)

It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations.

这是一种参数调整方法。 该方法工作的关键是从固定分布的迭代次数的随机分布中采样算法参数。

Example

In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset.

在以下Python配方中,我们将使用sklearn的RandomizedSearchCV类对Pima Indians糖尿病数据集的Ridge回归算法评估介于0和1之间的不同alpha值,以执行随机搜索。

First, import the required packages as follows −

首先,导入所需的软件包,如下所示:


import numpy
from pandas import read_csv
from scipy.stats import uniform
from sklearn.linear_model import Ridge
from sklearn.model_selection import RandomizedSearchCV

Now, we need to load the Pima diabetes dataset as did in previous examples −

现在,我们需要像之前的示例一样加载Pima糖尿病数据集-


path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names=headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, evaluate the various alpha values on Ridge regression algorithm as follows −

接下来,按以下方法在Ridge回归算法上评估各种alpha值-


param_grid = {'alpha': uniform()}
model = Ridge()
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=50,
random_state=7)
random_search.fit(X, Y)

Print the result with following script line −

使用以下脚本行打印结果-


print(random_search.best_score_)
print(random_search.best_estimator_.alpha)

Output

输出量


0.27961712703051084
0.9779895119966027

The above output gives us the optimal score just similar to the grid search.

上面的输出为我们提供了最佳分数,与网格搜索类似。

翻译自: https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_improving_performance_of_ml_model.htm

ml.net模型

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值