利用sklearn中的GridSearchCV对模型最优超参数进行选择,经常与交叉验证共用。参数选择表现如下:
0.01 0.1 1.0 10.0 100.0
1 (0.01, 1) (0.1, 1) (1, 1) (10, 1) (100, 1)
2 (0.01, 2) (0.1, 2) (1, 2) (10, 2) (100, 2)
3 (0.01, 3) (0.1, 3) (1, 3) (10, 3) (100, 3)
4 (0.01, 4) (0.1, 4) (1, 4) (10, 4) (100, 4)
5 (0.01, 5) (0.1, 5) (1, 5) (10, 5) (100, 5)
通过上述表格可以看出,网格搜索对电脑的性能要求是比较高的,因为会涉及到多次运算验证,所以代码一次性写好的要求比较高。接下来介绍网格搜索的主要参数及其函数。
首先,导入库,实例化模型,这里以随机森林为例。
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
model = RandomForestClassifier()
param_grid = {'n_estimators': np.linspace(10, 100, 10).astype(int), 'max_depth': np.arange(4, 11)}
gs = GridSearchCV(model, param_grid=param_grid, scoring='accuracy', cv=5)
gs_res =