kNN优势:对于异常数据不敏感因为分类情况由大多数点决定,
劣势:如果选定更多的邻居,会让空间复杂度和计算复杂度很高
from numpy import *
import operator
def createDataSet():
group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
labels = ['A','A','B','B']
return group,labels
def classify0(inX,dataSet,labels,k):
#三个参数分别是需要进行归类的点[a,b] k 是最近邻居数目
dataSetSize = dataSet.shape[0]
diffMat = tile(inX,(dataSetSize,1)) - dataSet
sqDiffMat = diffMat**2
sqDistance = sqDiffMat.sum(axis=1)
distance = sqDistance**0.5
sortedDistIndicies = distance.argsort()
classCount = {}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]]
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
sortedClassCount = sorted(classCount.items(),#3.5中:iteritems turn to items
key = operator.itemgetter(1), reverse = True)
return sortedClassCount[0][0]
简单的kNN自己生成数据并且输出
在命令行中进入python或者是idle中调试里面
首先收入import kNN将我们写的这个程序导入
接着用
group,labels = kNN.createDataSet()
来生成数据
接着调用classify0()函数来进行分类
kNN.classify0([1,1],group,labels,3)
>>> import kNN
>>> group,labels = kNN.createDataSet()
>>> group
array([[1. , 1.1],
[1. , 1. ],
[0. , 0. ],
[0. , 0.1]])
>>> labels
['A', 'A', 'B', 'B']
>>> kNN.classify0([1.1],group,labels,3)
'A'