KNN+数据可视化+代码详解

最新推荐文章于 2024-05-07 07:41:13 发布

蟹堡王不卖汉堡

最新推荐文章于 2024-05-07 07:41:13 发布

阅读量1.1k

点赞数 2

文章标签： python sklearn

原文链接：https://blog.csdn.net/qq_44846324/article/details/114270003?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522166546726116800192226423%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=166546726116800192226423&biz_id=0&utm_me

版权

KNN 算法详解：

KNN算法详解参考

生成随机数据集

详细讲解：
make_classification

plt.scatter

np.meshgrid

def create_data():
    X, y = make_classification(n_samples=200, n_features=2, n_redundant=0, n_clusters_per_class=1, n_classes=3)
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=y)
    # plt.scatter()函数用于生成一个scatter散点图。
    # 展示生成数据集
    
    plt.show()
    h = .02
    # x[:,n]表示在全部数组（维）中取第n个数据，直观来说，x[:,n]就是取所有集合的第n个数据, 
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    # np.arange(start,end,step) 等同于range
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))  # 生成网格型二维数据对
    return X, y, xx, yy

训练

def training(X, y):
    fs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
   	# 交叉验证KFold
   	#1、n_splits=n 表示划分为几块（至少是2）
    	# n_splits 	将数据集划分成 n 份
        #      		随机抽取一份做验证集
        #		  	剩下的 n-1  数据集 做训练集 
    
	#2、shuffle 表示洗牌操作，也就是是否打乱，默认False，即不打乱
	#3、random_state 表示是否固定随机起点，一般在 shuffle == True时使用.
    fk = KFold(n_splits=4, random_state=2001, shuffle=True)
 	#1、get_n_splits([X, y, groups]) 返回分的块数
	#2、split(X[,Y,groups]) 返回分类后数据集的index
    best_k = fs[0]
    # 先默认最好的准确率值
    best_score = 0
    # 遍历所有的候选值
    dict_k_key = {}
    for k in fs:
        # 记录n_splits=n段的准确率之和
        curr_score = 0
        # 遍历段的数据集
        for train_index, valid_index in fk.split(X):
            # train_index 训练集 这里获取的是个检索
            # valid_index 测试集
            # 实例化KNN模型
            clf = KNeighborsClassifier(n_neighbors=k)
            # 训练模型
            clf.fit(X[train_index], y[train_index])
            # 计算当前的准确率
            curr_score = curr_score + clf.score(X[valid_index], y[valid_index])
        # 计算KNN模型的K值为k时的平均准确率值
        avg_score = curr_score / 4
        print('平均准确率为：%.2f' % avg_score)
        dict_k_key[k] = "%.2f" % avg_score
        if avg_score > best_score:
            # 将平均准确率值替代原先最好的准确率值
            best_score = avg_score
            # 将目前的K值替换原先最好的K值
            best_k = k
        print('目前最好的K值为：%d' % best_k, "目前最好的准确率值为：%.2f" % best_score)
        print("*" * 50)
    print('评估最合适的K值为：%d' % best_k, "其准确率为：%.2f" % best_score)
    return dict_k_key, best_k

不同k值训练结果,绘制折线图

def plot_k(key_values):
    x_l = []
    y_l = []
    plt.rcParams['font.sans-serif'] = ['SimHei']
    plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

    for x, y in key_values.items():
        x_l.append(int(x))
        y_l.append(float(y))
    plt.plot(x_l, y_l, label="准确度", color="red")
    plt.title("不同k值准确率", loc="center")

    for a, b in zip(x_l, y_l):
        plt.text(a, b, b, ha='center', va="bottom", fontsize=12)
    plt.xlabel('k值')
    plt.ylabel('准确率')
    plt.legend()
    plt.show()

渲染图

def plot_rek(best_k, X, y, xx, yy):
    clf = neighbors.KNeighborsClassifier(n_neighbors=best_k, weights='distance')
    clf.fit(X, y)
    answer = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    # 将预测的结果在平面坐标中画出其类别区域
    answer = answer.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, answer)
    # 也画出所有的训练集数据
    plt.scatter(X[:, 0], X[:, 1], c=y)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.show()

run

def run():
    X, y, xx, yy = create_data()
    key_values, best_k = training(X, y)
    plot_k(key_values)
    plot_rek(best_k, X, y, xx, yy)

蟹堡王不卖汉堡

关注

2
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
KNN+数据可视化+代码详解

KNN算法代码详解调用sklearn库完成，并可视化
复制链接

扫一扫

KNN+数据可视化+代码详解

KNN 算法详解：

相关库的导入

生成随机数据集

训练

不同k值训练结果,绘制折线图

渲染图

run

“相关推荐”对你有帮助么？