Pipleline:
导管,保存需要进行的步骤,程序会根据导管里的顺序依次进行。
GridSearchCV:
1.网格搜索,搜索出模型最好的参数。
2.其中param_grid是一个列表形式,为模型的参数设置,这些设置的查看方式为:
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
NB = GaussianNB()
KNN = KNeighborsClassifier()
print(KNN.get_params())
3.另外,param_grid中每个元素的name命名方式都有讲究。模型名__参数设置名(注意是两个'_')
例子:
X_train, X_test, y_train, y_test = train_test_split(mat_descriptor, label, test_size=0.25, random_state=7)
pipeline7 = Pipeline([
('scaler',StandardScaler()),
("KNN",KNeighborsClassifier()),
])
param_grid= [
{
'KNN__weights': ['uniform'],
'KNN__n_neighbors': [i for i in range(1, 11)]
},
{
'KNN__weights': ['distance'],
'KNN__n_neighbors': [i for i in range(1, 11)],
'KNN__p': [i for i in range(1, 6)]
}
]
clf = GridSearchCV(pipeline7,param_grid, cv=4, n_jobs=2, verbose=1)
clf.fit(X_train,y_train)
clf_pred = clf.predict(X_test)
clf_pred_score = accuracy_score(y_test,clf_pred)
print(clf_pred_score)
print(clf.best_score_)
print(clf.best_estimator_)