超参数
超参数:算法运行前需要决定的参数,如前文中的k值
模型参数:算法过程中学习的参数,KNN算法没有模型参数
1、怎么找出最好的k值
In [345]: best_score = 0.0
In [346]: best_k = -1
In [347]: for k in range(1,11):
...: knn_clf = KNeighborsClassifier(n_neighbors=k)
...: knn_clf.fit(X_train,y_train)
...: score = knn_clf.score(X_test,y_test)
...: if score > best_score:
...: best_k =k
...: best_score = score
In [348]: print("best_k = ",best_k)
...: print("best_score = ",best_score)
best_k = 4
best_score = 0.9916666666666667
如果找到的最好的k值在边界,要往上稍微拓展
2、是否要考虑距离
KNeighborsClassifier()函数中有一个参数weight,控制是否考虑距离
In [349]: best_method = ""
In [350]: best_score = 0.0
In [351]: best_k = -1
In [352]: for method in ["uniform","distance"]:
...: for k in range(1,11):
...: knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
...: knn_clf.fit(X_train,y_train)
...: score = knn_clf.score(X_test,y_test)
...: if score > best_score:
...: best_k = k
...: best_score = score
...: best_method = method
In [354]: print("best_k = ",best_k)
...: print("best_score = ",best_score)
...: print("best_method = ",best_method)
best_k = 4
best_score = 0.9916666666666667
best_method = uniform
3、明可夫斯基距离
—欧拉距离:直线距离,p=2
—曼哈顿距离:每个维度的距离差的和,p=1
获得一个新的超参数p
KNeighborsClassifier()函数中有一个参数p
In [357]: best_method = ""
...: best_score = 0.0
...: best_k = -1
...: best_p = -1
In [358]: for k in range(1,11):
...: for p in range(1,6):
...: knn_clf = KNeighborsClassifier(n_neighbors=k, weights="distance",p=p)
...: knn_clf.fit(X_train,y_train)
...: score = knn_clf.score(X_test,y_test)
...: if score > best_score:
...: best_k = k
...: best_score = score
...: best_p = p
In [359]: print("best_k = ",best_k)
...: print("best_score = ",best_score)
...: print("best_p = ",best_p)
best_k = 3
best_score = 0.9888888888888889
best_p = 2