超参数和模型参数
超参数:在算法运行前需要决定的参数
模型参数:算法过程中学习的参数
kNN算法没有模型参数
kNN算法中的k是典型的超参数
随机种子:随机数种子控制每次划分训练集和测试集的模式,其取值不变时划分得到的结果一模一样,其值改变时,划分得到的结果不同。若不设置此参数,则函数会自动选择一种随机模式,得到的结果也就不同。
### 超参数
import numpy as np
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=666)//随机种子设置为666,保证后续实验中得到的结果是一致的
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier(n_neighbors=3)
knn_clf.fit(X_train,y_train)
knn_clf.score(X_test,y_test)
//寻找最好的k
best_score = 0.0//初始化准确率
best_k = -1//初始化最好的k值
for k in range(1,11)://如果找到的最好的k值是10的话,那么此时有必要对10以上的数进行搜索,因为通常来讲不同的参数决定了不同的分类的准确率,它们之间是呈现连续的变化的,如果找到的最好的k值在我们寻找的边界上的时候,就意味着有可能有更好的值在边界的外面
knn_clf = KNeighborsClassifier(n_neighbors=k)
knn_clf.fit(X_train,y_train)
score = knn_clf.score(X_test,y_test)
if score > best_score:
best_k = k
best_score = score
print("best_k = ",best_k)
print("best_score = ",best_score)
//打印出来的结果
best_k = 4
best_score = 0.9916666666666667
考虑距离?不考虑距离?
best_method = ""
best_score = 0.0
best_k = -1
for method in ["uniform","distance"]:
for k in range(1,11):
knn_clf = KNeighborsClassifier(n_neighbors=k,weights=method)
knn_clf.fit(X_train,y_train)
score = knn_clf.score(X_test,y_test)
if score > best_score:
best_k = k
best_score = score
best_method = method
print("best_method = ",best_method)
print("best_k = ",best_k)
print("best_score = ",best_score)
//打印出来的结果
best_method = uniform
best_k = 4
best_score = 0.9916666666666667
搜索明可夫斯基距离相应的p
曼哈顿距离:每个维度的距离差的和,p=1
欧拉距离:直线距离,p=2
获得一个新的超参数p
%%time
best_p = -1
best_score = 0.0
best_k = -1
for k in range(1,11):
for p in range(1,6):
knn_clf = KNeighborsClassifier(n_neighbors=k,weights="distance",p = p)
knn_clf.fit(X_train,y_train)
score = knn_clf.score(X_test,y_test)
if score > best_score:
best_k = k
best_score = score
best_p = p
print("best_p = ",best_p)
print("best_k = ",best_k)
print("best_score = ",best_score)
运行结果:
best_p = 2
best_k = 3
best_score = 0.9888888888888889
Wall time: 32.8 s