第4章 最基础的分类算法-k近邻算法 kNN 学习笔记 中

目录

4-5 超参数 05-Hyper-Parameters

4-6 网格搜索与k近邻算法中更多超参数


4-5 超参数 05-Hyper-Parameters

random_state=666  随机种子,保证每次运行的结果一样

best_score = 0.0
best_k = -1
for k in range(1, 11):
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(X_train, y_train)
    score = knn_clf.score(X_test, y_test)
    if score > best_score:
        best_k = k
        best_score = score
        
print("best_k =", best_k)
print("best_score =", best_score)

如果最好的值在边界上,则有可能好的值在边界外面,如果是10,则要对10以上的一些数计算

只计了投票数,没有权重,近的则权重大一点,比较合理

权重是距离的倒数

各有一票,则是平票, 解决平票的情况 

sklearn.neighbors.KNeighborsClassifier — scikit-learn 1.0 documentation

官方文档的说明 

best_score = 0.0
best_k = -1
best_method = ""
for method in ["uniform", "distance"]:
    for k in range(1, 11):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)
        if score > best_score:
            best_k = k
            best_score = score
            best_method = method
        
print("best_method =", best_method)
print("best_k =", best_k)
print("best_score =", best_score)

()----》|  |

有一定的一致性两者在数学上,对其进行推广

p = 1为莫达顿距离, 2为欧拉距离  又是一个超参数

best_score = 0.0
best_k = -1
best_p = -1

for k in range(1, 11):
    for p in range(1, 6):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights="distance", p=p)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)
        if score > best_score:
            best_k = k
            best_p = p
            best_score = score
        
print("best_k =", best_k)
print("best_p =", best_p)
print("best_score =", best_score)

distance和p有关,而uniform则和p无关

4-6 网格搜索与k近邻算法中更多超参数

param_grid = [
    {
        'weights': ['uniform'], 
        'n_neighbors': [i for i in range(1, 11)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 11)], 
        'p': [i for i in range(1, 6)]
    }
]

uniform 10

weights 10*5=50 

数组,里面是字典,定义探索参数的集合

knn_clf =  KNeighborsClassifier()

10+50= 60种不同的结果

两次运行weights可以不同,因为使用的CV交叉验证,这个和算法有关

n_jobs指定使用的计算机核数,并行运算,-1使用所有的核

运行没有什么输出, verbose越大则输出的信息越详细,输出的信息就是使用verbose的意义

鸢尾花的分类案例

import seaborn as sns
from matplotlib.colors import ListedColormap
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

iris = datasets.load_iris()
X = iris.data[:,:2]
# X = iris.data
y = iris.target

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 6)

# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ['darkorange', 'c', 'darkblue']
h = .02  # step size in the mesh

def drawBoundary(knn_clf,n_neighbors,weights):
    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = knn_clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure(figsize=(8, 6))
    plt.contourf(xx, yy, Z, cmap=cmap_light)
    # plt.contour(xx, yy, Z, cmap=cmap_light)


#     Plot also the training points
    sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=iris.target_names[y],
                    palette=cmap_bold, alpha=1.0, edgecolor="black")

    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))
    plt.xlabel(iris.feature_names[0])
    plt.ylabel(iris.feature_names[1])

    plt.show() # 当有多个图片要显示时只能一张显示后关了才能显示第二张


# 自己实现的网格搜索
best_score = 0.0
best_k = -1
best_method = ""
for method in ["uniform", "distance"]:
    for k in range(1, 18):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights=method)
        knn_clf.fit(X_train, y_train)
        score = knn_clf.score(X_test, y_test)

        if score > best_score:
            best_k = k
            best_score = score
            best_method = method

            # drawBoundary(knn_clf, best_k, best_method)
    # 如果这个绘制函数放在drawBoundary函数里当有多个图片要显示时只能一张显示后关了才能显示第二张
    # 但把这句放在下面这儿就不会
    # plt.show() # 在drawBoundary后面一定要有这句不然图像绘不出来,单步调试时也只会显示一部分,但程序运行完后就不显示

print("best_method =", best_method)
print("best_k =", best_k)
print("best_score =", best_score)

# 采用系统自带的网格搜索
param_grid = [
    {
        'weights': ['uniform'],
        'n_neighbors': [i for i in range(1, 18)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 18)],
        'p': [i for i in range(1, 6)]
    }
]
from sklearn.model_selection import GridSearchCV
clf = KNeighborsClassifier()
clf.fit(X_train, y_train)
grid_srearch = GridSearchCV(clf, param_grid, n_jobs = -1, verbose = -1)
grid_srearch.fit(X_train, y_train)
print(10*"-------------")
print("best:%f using %s" % (grid_srearch.best_score_,grid_srearch.best_params_))
# print(grid_srearch.best_params_['n_neighbors'])
# print(grid_srearch.best_params_['weights'])
# print(grid_srearch.best_estimator_)
# means = grid_srearch.cv_results_['mean_test_score']
# params =  grid_srearch.cv_results_['params']
#
# for mean, param in zip(means,params):
#     print("%f with: %r" % (mean,param))

drawBoundary(grid_srearch.best_estimator_, grid_srearch.best_params_['n_neighbors'], grid_srearch.best_params_['weights'])



pandas读取数据

pandas在excel中读取的数据类型与numpy的数据类型是不一样

pandas是DataFrame,numpy是array

 

 

 excel表格数据

其他超参数

sklearn.neighbors.DistanceMetric — scikit-learn 1.0 documentation

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值