第4章最基础的分类算法-k近邻算法 kNN 学习笔记中

最新推荐文章于 2022-08-20 15:52:37 发布

天人合一peng

最新推荐文章于 2022-08-20 15:52:37 发布

阅读量617

点赞数

分类专栏：机器学习/深度学习/人工智能/情感计算文章标签：算法 sklearn 机器学习

本文链接：https://blog.csdn.net/moonlightpeng/article/details/106504525

版权

机器学习/深度学习/人工智能/情感计算专栏收录该内容

222 篇文章 17 订阅

订阅专栏

4-5 超参数 05-Hyper-Parameters

4-6 网格搜索与k近邻算法中更多超参数

4-5 超参数 05-Hyper-Parameters

random_state=666 随机种子，保证每次运行的结果一样

best_score = 0.0
best_k = -1
for k in range(1, 11):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(X_train, y_train)
    score = knn_clf.score(X_test, y_test)
    if score > best_score:
        best_k = k
        best_score = score
        
print("best_k =", best_k)
print("best_score =", best_score)

如果最好的值在边界上，则有可能好的值在边界外面，如果是10，则要对10以上的一些数计算

只计了投票数，没有权重，近的则权重大一点，比较合理

权重是距离的倒数

各有一票，则是平票，解决平票的情况

sklearn.neighbors.KNeighborsClassifier — scikit-learn 1.0 documentation

官方文档的说明

best_score = 0.0
best_k = -1
best_method = ""
for method in ["uniform", "distance"]:
    for k in range(1, 11):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
        
print("best_method =", best_method)
print("best_k =", best_k)
print("best_score =", best_score)

（）----》| |

有一定的一致性两者在数学上，对其进行推广

p = 1为莫达顿距离， 2为欧拉距离又是一个超参数

best_score = 0.0
best_k = -1
best_p = -1

for k in range(1, 11):
    for p in range(1, 6):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights="distance", p=p)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)
        if score > best_score:
            best_k = k
            best_p = p
            best_score = score
        
print("best_k =", best_k)
print("best_p =", best_p)
print("best_score =", best_score)

distance和p有关，而uniform则和p无关

4-6 网格搜索与k近邻算法中更多超参数

param_grid = [
    {
        'weights': ['uniform'], 
        'n_neighbors': [i for i in range(1, 11)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 11)], 
        'p': [i for i in range(1, 6)]
    }
]

uniform 10

weights 10*5=50

数组，里面是字典，定义探索参数的集合

knn_clf = KNeighborsClassifier()

10+50= 60种不同的结果

两次运行weights可以不同，因为使用的CV交叉验证，这个和算法有关

n_jobs指定使用的计算机核数，并行运算，-1使用所有的核

运行没有什么输出， verbose越大则输出的信息越详细，输出的信息就是使用verbose的意义

鸢尾花的分类案例

import seaborn as sns
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
X = iris.data[:,:2]
# X = iris.data
y = iris.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 6)

# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
h = .02  # step size in the mesh

def drawBoundary(knn_clf,n_neighbors,weights):
    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = knn_clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=cmap_light)
    # plt.contour(xx, yy, Z, cmap=cmap_light)


#     Plot also the training points
    sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=iris.target_names[y],
                    palette=cmap_bold, alpha=1.0, edgecolor="black")

    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])

    plt.show() # 当有多个图片要显示时只能一张显示后关了才能显示第二张


# 自己实现的网格搜索
best_score = 0.0
best_k = -1
best_method = ""
for method in ["uniform", "distance"]:
    for k in range(1, 18):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)

        if score > best_score:
            best_k = k
            best_score = score
            best_method = method

            # drawBoundary(knn_clf, best_k, best_method)
    # 如果这个绘制函数放在drawBoundary函数里当有多个图片要显示时只能一张显示后关了才能显示第二张
    # 但把这句放在下面这儿就不会
    # plt.show() # 在drawBoundary后面一定要有这句不然图像绘不出来，单步调试时也只会显示一部分，但程序运行完后就不显示

print("best_method =", best_method)
print("best_k =", best_k)
print("best_score =", best_score)

# 采用系统自带的网格搜索
param_grid = [
    {
        'weights': ['uniform'],
        'n_neighbors': [i for i in range(1, 18)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 18)],
        'p': [i for i in range(1, 6)]
    }
]
from sklearn.model_selection import GridSearchCV
clf = KNeighborsClassifier()
clf.fit(X_train, y_train)
grid_srearch = GridSearchCV(clf, param_grid, n_jobs = -1, verbose = -1)
grid_srearch.fit(X_train, y_train)
print(10*"-------------")
print("best:%f using %s" % (grid_srearch.best_score_,grid_srearch.best_params_))
# print(grid_srearch.best_params_['n_neighbors'])
# print(grid_srearch.best_params_['weights'])
# print(grid_srearch.best_estimator_)
# means = grid_srearch.cv_results_['mean_test_score']
# params =  grid_srearch.cv_results_['params']
#
# for mean, param in zip(means,params):
#     print("%f with: %r" % (mean,param))

drawBoundary(grid_srearch.best_estimator_, grid_srearch.best_params_['n_neighbors'], grid_srearch.best_params_['weights'])