KN代码实现
思路
确定K的值后,计算数据集和目标之间的距离,用的是欧式距离运算,之后将每个距离从大到小进行排序,之后统计出前K个中出现最多的标签即为目标的标签
def kn_classify(intx, dataset, labels, k): #kn算法
datasetsize = dataset.shape[0]
diffmat = np.tile(intx,(datasetsize,1)) - dataset
sqdiffmat = diffmat**2
aqdistance = sqdiffmat.sum(axis = 1)
distance = aqdistance**0.5 #目前为止为:计算目标和数据集的距离 用的是欧氏距离公式
sorteddistance = distance.argsort()
classcount = {}
for i in range(k):
numlabel = labels[sorteddistance[i]]
classcount[numlabel] = classcount.get(numlabel,0)+1
predecidelabels = classcount.items()
decidelabels = sorted(predecidelabels,key=lambda x: x[1],reverse=True) #得到最多数量的标签
return decidelabels[0][0]
散点图统计
先将标签设置为对应的颜色,然后将数据集的横纵坐标进行分离,进行绘制二维坐标图,并标上标签即可。
def makematlab(intx,dataset,labels): #散点图
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
choosecolors = ['r', 'g', 'y', 'b', 'r', 'c', 'g', 'b', 'k', 'm']
labelcolors = {}
setlabels = set(labels)
count = 0
for i in setlabels:
labelcolors.update({i:choosecolors[count]})
count+=1
n = dataset.shape[0]
fig = plt.figure(1)
x = [i[0] for i in dataset]
y = [i[1] for i in dataset]
for i in range(len(x)):
plt.scatter(x[i],y[i],color = labelcolors[labels[i]],label = labels[i])
plt.text(x[i], y[i], labels[i], fontsize=12)
plt.scatter(intx[0],intx[1],color = 'gray')
plt.xticks(np.arange(np.min(x)-1,np.max(x)+1,step=0.5)) #刻度
plt.yticks(np.arange(np.min(y)-1,np.max(y)+1,step=0.5))
plt.title("散点分布")
print(x)
plt.show()
return
归一化
有时候数据集的偏大,对计算的结果影响大,所有可以将特征值降至到0-1之间
def Guiyihua(dataset,intx): #归一化
x = [i[0] for i in dataset]
y = [i[1] for i in dataset]
x.append(intx[0])
y.append(intx[1])
minx,miny = np.min(x),np.min(y)
maxx,maxy = np.max(x),np.max(y)
rangeminmax_x = maxx-minx
rangeminmax_y = maxy-miny
for i in range(len(dataset)):
dataset[i][0] = (dataset[i][0] - minx)/rangeminmax_x
dataset[i][1] = (dataset[i][1] - miny)/rangeminmax_y
intx[0] = (intx[0]-minx)/rangeminmax_x
intx[1] = (intx[1]-miny)/rangeminmax_y
return dataset,intx
运行结果
归一化前:
运行结果为B
归一化后:
运行结果是B