意义
通过网格搜索法,可以确定超参数的值。
过程
首先先导入数据:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
digits = datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 666)
然后确定参数的值:
param_grid = [
{
'weights': ['uniform'],
'n_neighbors': [i for i in range(1, 11)]
},
{
'weights': ['distance'],
'n_neighbors': [i for i in range(1, 11)],
'p': [i for i in range(1, 6)]
}
]
最后即可训练数据:
grid_search = GridSearchCV(knn_clf, param_grid)
grid_search.fit(X_train, y_train)
显示最佳分类器的参数:
grid_search.best_estimator_
显示运用此分类器的准确度:
grid_search.best_score_
显示我们最开始创建的param_grid中的最佳参数:
grid_search.best_params_
提升刚刚训练数据的速度:
grid_search = GridSearchCV(knn_clf, param_grid, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train)
引入了n_jobs,后面的数字即为计算机的核,当取-1时即所有核都进行此次运算。而后面的verbose是为了显示这个过程中的一些细节。