knn算法（1）

最新推荐文章于 2022-11-07 17:10:01 发布

东方欲晓888

最新推荐文章于 2022-11-07 17:10:01 发布

阅读量307

点赞数

本文链接：https://blog.csdn.net/qq_31842721/article/details/80804765

版权

1.算法的基本思路：

首先需要有个样本数据，然后提取样本数据的标签（特征），输入测试数据，计算测试数据跟样本空间数据之间的距离（简单的可以计算欧式几何距离），取k个距离最近的样本，并统计其分类个数，对分类个数逆序排序，获取第一个。

2.算法实现（python实现):

预备知识：

numpy operator模块

numpy：

1）shape函数

该函数属于<class 'numpy.ndarray'>，用于获取举证的行列数，shape(0)获取行数，shape(1)获取列数

2,tile函数

该函数属于numpy空间，用于生成重复数据块

3.sum函数

该函数属于<class 'numpy.ndarray'>，也属于numpy表空间，用于对于行向量或者列向量进行求和，返回<class 'numpy.ndarray'>

0属于列方向，1属于行方向

4.argsort函数

该函数属于<class 'numpy.ndarray'>，也属于numpy表空间，用户返回排序后的索引值

argsort(a, axis=-1, kind='quicksort', order=None) -a//降序 a//默认升序 axis=0按列排序 1 按照行排序

获取排序后的值 [argsort]

5.sorted函数

属于内置函数

>>> L=[('a', 1), ('b', 3), ('c', 2)]

>>> sorted(L,key=lambda x:x[1])

[('a', 1), ('c', 2), ('b', 3)]

>>> sorted(L,key=lambda x:x[1],reverse=True)

[('b', 3), ('c', 2), ('a', 1)]

from numpy import *
import operator
import pandas as pd
import matplotlib.pyplot as plt
from pandas import DataFrame,Series
def getDataSet():

    groups=array([[1.0,1.1],[0.8,0.8],[0.2,0.2],[0.4,0.4]])

    labels=['A','A','B','B']

    return groups,labels

'''
实现knn算法分类

'''
def classify0(Intx,DataSet,Labels,k):
     #计算每行的数据跟测试数据之间的距离
     #为了使用矩阵运算，先扩展测试数据，由原来的a[1,n] 扩展到a[size,n]
     rowSize=DataSet.shape[0]
     print('rowsize=',rowSize)
     extendIntx=tile(Intx,(rowSize,1))
     print(DataSet-extendIntx)
     # ((a1-b1)^2+(a2-b2)^2)^(1/2)

     distances=(DataSet-extendIntx)**2

     print(distances)

     print(distances.sum(1))
     distances=(distances.sum(1))**0.5
     print(distances)
     #排序
     distancesIndex=distances.argsort()
     print(distancesIndex)
     #取前k个距离最近的，对应的标签,统计出现的次数
     counts={}
     for i in range(k):

         label=Labels[distancesIndex[i]]


         '''
         if  not label in counts:
             counts[label]=1
         else:
             counts[label] = counts[label]+1
         '''
         counts[label]=counts.get(label,0)+1

     print(counts)
     afterSorted=sorted(counts.items(),key=operator.itemgetter(1),reverse=True)

     #print(afterSorted[0][0])

     return afterSorted[0][0]


groups,labels=getDataSet()

Indx=[0.3,0.5]
Indx=[0.9,0.9]

result=classify0(Indx,groups,labels,3)

print('knn=',result)

#fd=DataFrame(groups,columns=labels)

x=[record[0]  for record in groups]
y=[record[1]  for record in groups]



plt.figure('KNN分类算法')
ax=plt.gca()
ax.set_xlabel('x')
ax.set_ylabel('y')


colors=[ ord(record)+(ord(record)-65)*10 for record  in labels ]


area=[20 for record  in labels]


x.append(Indx[0])
y.append(Indx[0])


colors.append(ord(result)+(ord(result)-65)*10)
area.append(50)
print(x,y)
print(colors)

plt.scatter(x,y,s=area,c=colors)
plt.show()