机器学习之网格搜索调参sklearn

 网格搜索

网格搜索 GridSearchCV我们在选择超参数有两个途径:1凭经验;2选择不同大小的参数,带入到模型中,挑选表现最好的参数。通过途径2选择超参数时,人力手动调节注意力成本太高,非常不值得。For循环或类似于for循环的方法受限于太过分明的层次,不够简洁与灵活,注意力成本高,易出错。GridSearchCV 称为网格搜索交叉验证调参,它通过遍历传入的参数的所有排列组合,通过交叉验证的方式,返回所有参数组合下的评价指标得分。

GridSearchCV听起来很高大上,其实就是暴力搜索。注意的是,该方法在小数据集上很有用,数据集大了就不太适用了。

from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import load_iris  # 自带的样本数据集


iris = load_iris()

X = iris.data  # 150个样本,4个属性
y = iris.target # 150个类标号
# 以随机森林为例介绍基本调用方法

# 穷举网格搜索
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split  # 切分数据
# 切分数据 训练数据80% 验证数据20%
train_data, test_data, train_target, test_target = train_test_split(
    X, y, test_size=0.2, random_state=0)

model = RandomForestClassifier()
parameters = {'n_estimators': [20, 50, 100], 'max_depth': [1, 2, 3]}

clf = GridSearchCV(model, parameters, cv=3, verbose=2)
clf.fit(train_data, train_target)

print("最优参数:")
print(clf.best_params_)
print("最优分数:")
print(clf.best_score_)
sorted(clf.cv_results_.keys())

score_test = roc_auc_score(test_target, clf.predict_proba(test_data), multi_class='ovr')

print("RandomForestClassifier GridSearchCV test AUC:   ", score_test)
D:\anaconda\python.exe C:/Users/Administrator/Desktop/数据挖掘项目/代码包测试集/网格搜索调参.py
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV] max_depth=1, n_estimators=20 ....................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=20 ....................................
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=20 ....................................
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed:    1.7s finished
最优参数:
{'max_depth': 2, 'n_estimators': 50}
最优分数:
0.9583333333333334
RandomForestClassifier GridSearchCV test AUC:    1.0

进程已结束,退出代码0

随机搜索

我们在搜索超参数的时候,如果超参数个数较少(三四个或者更少),那么我们可以采用网格搜索,一种穷尽式的搜索方法。

但是当超参数个数比较多的时候,我们仍然采用网格搜索,那么搜索所需时间将会指数级上升。所以有人就提出了随机搜索的方法,随机在超参数空间中搜索几十几百个点,其中就有可能有比较小的值。这种做法比上面稀疏化网格的做法快,而且实验证明,随机搜索法结果比稀疏网格法稍好。

RandomizedSearchCV使用方法和类GridSearchCV 很相似,但他不是尝试所有可能的组合,而是通过选择每一个超参数的一个随机值的特定数量的随机组合,这个方法有两个优点:相比于整体参数空间,可以选择相对较少的参数组合数量。如果让随机搜索运行,它会探索每个超参数的不同的值 可以方便的通过设定搜索次数,控制超参数搜索的计算量。添加参数节点不会影响性能,不会降低效率。RandomizedSearchCV的使用方法其实是和GridSearchCV一致的,但它以随机在参数空间中采样的方式代替了GridSearchCV对于参数的网格搜索,在对于有连续变量的参数时,RandomizedSearchCV会将其当做一个分布进行采样进行这是网格搜索做不到的,它的搜索能力取决于设定的n_iter参数。

from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import load_iris  # 自带的样本数据集


iris = load_iris()

X = iris.data  # 150个样本,4个属性
y = iris.target # 150个类标号
# 以随机森林为例介绍基本调用方法




# 随机参数优化

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split  # 切分数据
# 切分数据 训练数据80% 验证数据20%
train_data, test_data, train_target, test_target = train_test_split(
    X, y, test_size=0.2, random_state=0)

model = RandomForestClassifier()
parameters = {'n_estimators': [10, 20, 30, 50], 'max_depth': [1, 2, 3]}

clf = RandomizedSearchCV(model, parameters, cv=3, verbose=2)
clf.fit(train_data, train_target)

score_test = roc_auc_score(test_target, clf.predict_proba(test_data), multi_class='ovr')

print("RandomForestClassifier RandomizedSearchCV test AUC:   ", score_test)
print("最优参数:")
print(clf.best_params_)
sorted(clf.cv_results_.keys())
D:\anaconda\python.exe C:/Users/Administrator/Desktop/数据挖掘项目/代码包测试集/随机参数优化调参.py
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] n_estimators=10, max_depth=3 ....................................
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=3 ....................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=3 ....................................
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    0.9s finished
RandomForestClassifier RandomizedSearchCV test AUC:    1.0
最优参数:
{'n_estimators': 30, 'max_depth': 3}

进程已结束,退出代码0


 

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值