参数调优为什么要采样_机器学习 | 特征工程- 超参数调优方法整理

最新推荐文章于 2022-11-20 21:44:21 发布

weixin_39972777

最新推荐文章于 2022-11-20 21:44:21 发布

阅读量130

点赞数

文章标签：参数调优为什么要采样

本文链接：https://blog.csdn.net/weixin_39972777/article/details/111736592

版权

rf中的超参数主要为森林中的决策树的数量(n_estimators) 以及每个树在分割节点时考虑的特征的数量(max_features)。并且超参数优化的标准程序是通过交叉验证来解决过度拟合问题。

1)Random Search with Cross Validation

通常，对于最佳超参数的范围比较模糊，因此缩小搜索范围的最佳方法是为每个超参数评估各种值。使用Scikit-Learn的RandomizedSearchCV方法，我们可以定义超参数范围的网格，并从网格中随机抽样，使用每个值组合执行K-Fold CV。

Step 1: 执行前，先获取目前模型的参数

rf= RandomForestRegressor(random_state = 42)from pprint importpprint#Look at parameters used by our current forest

print('Parameters currently in use:\n')

pprint(rf.get_params())

Step 2：为了使用RandomizedSearchCV，我们首先需要创建一个参数网格在拟合过程中进行采样:

from sklearn.model_selection import RandomizedSearchCV

# Number of trees in random forest

n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]

# Number of features to consider at every split

max_features = ['auto', 'sqrt']

# Maximum number of levels in tree

max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]

max_depth.append(None)

# Minimum number of samples required to split a node

min_samples_split = [2, 5, 10]

# Minimum number of samples required at each leaf node

min_samples_leaf = [1, 2, 4]

# Method of selecting samples for training each tree

bootstrap = [True, False]

# Create the random grid

random_grid = {'n_estimators': n_estimators,

'max_features': max_features,

'max_depth': max_depth,

'min_samples_split': min_samples_split,

'min_samples_leaf': min_samples_leaf,

'bootstrap': bootstrap}

pprint(random_grid)

Step 3：训练

#使用随机网格搜索最佳超参数#首先创建要调优的基本模型

rf =RandomForestRegressor()#随机搜索参数，使用3倍交叉验证#采用100种不同的组合进行搜索，并使用所有可用的核心

rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)#Fit模型

rf_random.fit(train_features, train_labels)

Step 4：得到最佳参数

rf_random.best_params_

Step 5：将优化后的参数进行训练和比较验证。

2) GridSearch

随机搜索允许我们缩小每个超参数的范围。现在我们知道在哪里集中搜索，我们可以明确指定要尝试的每个设置组合。GridSearchCV可以评估我们定义的所有组合。

Step 1：要使用网格搜索，我们根据随机搜索提供的最佳值创建另一个网格

from sklearn.model_selection importGridSearchCV#Create the parameter grid based on the results of random search

param_grid ={'bootstrap': [True],'max_depth': [80, 90, 100, 110],'max_features': [2, 3],'min_samples_leaf': [3, 4, 5],'min_samples_split': [8, 10, 12],'n_estimators': [100, 200, 300, 1000]

}#Create a based model

rf =RandomForestRegressor()#Instantiate the grid search model

grid_search = GridSearchCV(estimator = rf, param_grid = param_grid, cv = 3, n_jobs = -1, verbose = 2)

Step 2: Fit模型并重新训练和比较验证

grid_search.fit(train_features, train_labels)

grid_search.best_params_

best_grid=grid_search.best_estimator_

grid_accuracy= evaluate(best_grid, test_features, test_labels)

当性能的小幅下降表明我们已经达到了超参数调整的收益递减。

本部分代码请见：

weixin_39972777

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
参数调优为什么要采样_机器学习 | 特征工程- 超参数调优方法整理

rf中的超参数主要为森林中的决策树的数量(n_estimators) 以及每个树在分割节点时考虑的特征的数量(max_features)。并且超参数优化的标准程序是通过交叉验证来解决过度拟合问题。1)Random Search with Cross Validation通常，对于最佳超参数的范围比较模糊，因此缩小搜索范围的最佳方法是为每个超参数评估各种值。使用Scikit-Learn的Rando...
复制链接

扫一扫