我的机器学习笔记1——KNN分类

最新推荐文章于 2024-10-15 11:02:22 发布

wwang314159

最新推荐文章于 2024-10-15 11:02:22 发布

阅读量48

点赞数

文章标签：机器学习笔记分类

本文链接：https://blog.csdn.net/wwang314159/article/details/132647189

版权

学习内容来自于知乎的这篇文章：

机器学习实战之 kNN 分类https://zhuanlan.zhihu.com/p/23191325

这个文章里面有不少问题。结合文章底下的评论和一些自我尝试，做了一些小小的改进。也许还有其他优化的方法我不知道。

问题1：首先没有random.normal，应该是np.random.normal。

我改后的代码如下：

from sklearn import neighbors
from sklearn import preprocessing
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

x1 = np.random.normal(50,6,200)
y1 = np.random.normal(5,0.5,200)

x2 = np.random.normal(30,6,200)
y2 = np.random.normal(4,0.5,200)

x3 = np.random.normal(45,6,200)
y3 = np.random.normal(2.5,0.5,200)

x_val = np.concatenate((x1,x2,x3))
y_val = np.concatenate((y1,y2,y3))

问题2：数据的归一化问题。

用户yellow的评论：

用最大最小归一化的时候有点问题 x_normalized = [x/(x_diff) for x in x_val] 其中 x 应该要减去「最小值」的，可以通过 sklearn.preprocessing.MinMaxScaler 快速实现。

但是因为MinMaxScaler输入数据必须是二维的，所以这时候需要用到np.expand_dims函数。后面我又用了flatten把维数降低。

问题3：zip函数的使用。

python3中前面要加上list。

我改后的代码如下：

x_val = np.expand_dims(x_val, axis=1)
y_val = np.expand_dims(y_val, axis=1)

scaler = preprocessing.MinMaxScaler()

x_normalized = scaler.fit_transform(x_val)
y_normalized = scaler.fit_transform(y_val)
xy_normalized = list(zip(x_normalized.flatten(),y_normalized.flatten()))

labels = [1]*200+[2]*200+[3]*200

clf=neighbors.KNeighborsClassifier(n_neighbors=10)
clf.fit(xy_normalized,labels)

问题4：测试集的数据归一化

这里用MinMaxScaler的时候要小心，需要用训练集的数据对测试集的数据做归一化。所以fit，transform，fit_transform的区别要小心。

x1_test = np.random.normal(50, 6, 50)
y1_test = np.random.normal(5, 0.5, 50)

x2_test = np.random.normal(30,6,50)
y2_test = np.random.normal(4,0.5,50)

x3_test = np.random.normal(45,6,50)
y3_test = np.random.normal(2.5, 0.5, 50)

x_test_val = np.concatenate((x1_test,x2_test,x3_test))
y_test_val = np.concatenate((y1_test,y2_test,y3_test))

x_test_val = np.expand_dims(x_test_val, axis=1)
y_test_val = np.expand_dims(y_test_val, axis=1)

scaler.fit(x_val)
x_test_normalized = scaler.transform(x_test_val)
scaler.fit(y_val)
y_test_normalized = scaler.transform(y_test_val)

xy_test_normalized = list(zip(x_test_normalized.flatten(),y_test_normalized.flatten()))

labels_test = [1]*50+[2]*50+[3]*50
print(clf.score(xy_test_normalized, labels_test))

问题5：生成图时候的数据归一化

用MinMaxScaler的话，原文章里面有不少地方要改。

首先一开始不能直接meshgrid，需要先归一化。

xx = np.arange(1,70.1,0.1)
xx = np.expand_dims(xx, axis=1)
yy = np.arange(1,7.01,0.01)
yy = np.expand_dims(yy, axis=1)

scaler.fit(x_val)
xx_normalized = scaler.transform(xx)
scaler.fit(y_val)
yy_normalized = scaler.transform(yy)

xxx, yyy = np.meshgrid(xx_normalized, yy_normalized)
coords = np.c_[xxx.ravel(), yyy.ravel()]
Z = clf.predict(coords)
Z = Z.reshape(xxx.shape)

然后在用pcolormesh的时候，有两个方法，要么用np.repeat，要么再用遍meshgrid。后者比前者应该更适用。

light_rgb = ListedColormap([ '#AAAAFF', '#FFAAAA','#AAFFAA'])

#xx1 = np.repeat(np.transpose(xx), 601, axis=0)
#yy1 = np.repeat(yy, 691, axis=1)

xx1, yy1 = np.meshgrid(xx, yy)

plt.pcolormesh(xx1, yy1, Z, cmap=light_rgb)
plt.scatter(x1,y1,c='b',marker='s',s=10,alpha=0.8)
plt.scatter(x2,y2,c='r', marker='^', s=10, alpha=0.8)
plt.scatter(x3,y3, c='g', s=10, alpha=0.8)
plt.axis((10, 70, 1, 7))

最后的输出图像如下：