kNN算法代码实现简单分析1

最新推荐文章于 2024-04-01 21:22:39 发布

小君不忧

最新推荐文章于 2024-04-01 21:22:39 发布

阅读量1.6k

点赞数

分类专栏： numpy kNN 文章标签： numpy kNN算法

本文链接：https://blog.csdn.net/wuchuankang/article/details/80301236

版权

numpy 同时被 2 个专栏收录

2 篇文章 2 订阅

订阅专栏

kNN

2 篇文章 0 订阅

订阅专栏

numpy（numerical python）提供了python对多维数组对象的支持：ndarray，具有矢量运算能力。numpy支持多维数组和矩阵运算，此外提供了大量的数学函数库。

python中有list，但在numpy库中，只有数组（向量、矩阵），数组的类被称为ndarray。

注：一下当说方法时，是指ndarrary对象中函数，说函数时，是指numpy中定义的函数

现在剖析一个简单的kNN算法程序：

import numpy as np
import operator

def createDataset():
    group = np.array([[1.0,1.1], [1.0,1.0], [0,0], [0,0.1]])
    labels = ["A", 'A', 'B', 'B']
    #我们这里认为group和labels是一一对应的，[1.0,1.1]对应A,依次如是

    return group, labels

def classify(inX, dataset, labels, k):
    datasetSize = dataset.shape[0]
    diffMat = np.tile(inX, (datasetSize,1)) - dataset
    sqDiffMat = diffMat **2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances ** 0.5
    sortedDistIndicies = distances.argsort()
    #等同于np.argsort(distances)
    classCount = {}

    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        print(voteIlabel)
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
        print(classCount[voteIlabel])
    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1), reverse=True)              
    print(sortedClassCount)
    return sortedClassCount[0][0]

if __name__ == '__main__'
group, labels = createDataset()
print(classify([0,0], group, labels, 3))

运行的结果是：

B
1
B
2
A
1
[('B', 2), ('A', 1)]
B

现在将上面代码中用到知识点进行剖析：

1. 生成数组方法

>>> group = np.array([[1.0,1.1], [1.0,1.0], [0,0], [0,0.1]])
>>> group
array([[1. , 1.1],
       [1. , 1. ],
       [0. , 0. ],
       [0. , 0.1]])

2. shape方法：

>>> group.shape
(4, 2)

返回的是一个元组，4行2列，如果想得到行数：

>>> group.shape[0]
4

3. tile(A, reps)函数

tile函数是numpy中定义的函数，用于扩充数组

>>> aa = np.array([[1,2],[3,4]])
>>> aa
array([[1, 2],
       [3, 4]])
>>>import numpy as np
>>> bb = np.tile(aa, (2,1))
>>> bb
array([[1, 2],
       [3, 4],
       [1, 2],
       [3, 4]])

reps是一个数组，reps=（2,1）意思将aa数组行扩充2倍，列扩充1倍，很有用，在kNN算法求某一目标数据与其他training data的距离时，先要将目标数据扩充到training data构成的同维度的数组，用矩阵运算解决

4. sum，mean方法

>>> aa
array([[1, 2],
       [3, 4]])
>>> aa.sum(axis=1)
array([3, 7])
>>> aa.mean(axis=0)
array([2., 3.])

在https://blog.csdn.net/A1518643337/article/details/78266310中对axis=0，1有较为详细的论述：

使用0值表示沿着每一列或行标签\索引值向下执行方法
使用1值表示沿着每一行或者列标签模向执行对应的方法

5. argsort方法和函数（类和库都有）

返回数组值从小到大的索引值（ndarray类型），是索引值，不是简单排序，axis默认是按行排序

>>> aa
array([[1, 2],
       [3, 4]])
>>> aa.argsort()
array([[0, 1],
       [0, 1]], dtype=int32)
>>> aa.argsort(axis=0)
array([[0, 0],
       [1, 1]], dtype=int32)
>>> aa.argsort(axis=1)
array([[0, 1],
       [0, 1]], dtype=int32)
>>> np.argsort(aa)
array([[0, 1],
       [0, 1]], dtype=int32)
>>> np.argsort(aa,axis=1)
array([[0, 1],
       [0, 1]], dtype=int32)

一维数组的情况：

>>> x = np.array([3, 1, 2, 7, 6])
>>> x
array([3, 1, 2, 7, 6])
>>> x.argsort()
array([1, 2, 0, 4, 3], dtype=int32)

先给出x数组每个元素对应的索引值，这些索引值也构成一个数组，暂且叫y，元素3 在第一个位置，对应的索引自然是0，以此类推，可得y=array[0,1,2,3,4]，将x数组元素从小到大排列：[1,2,3,6,7]，那么对应的索引y变成[1,2,0,4,3]。

现在有个问题是这样的：

x = np.array([3, 1, 2, 7, 6])

要取出x中从小到大的前3项，怎么做？

先进行x.argsort(),取出前三项就可以：再用for循环就可以取出：

>>> x = np.array([3, 1, 2, 7, 6])
>>> x
array([3, 1, 2, 7, 6])
>>> y = x.argsort()
>>> y
array([1, 2, 0, 4, 3], dtype=int32)
>>> z = y[:3]
>>> z
array([1, 2, 0], dtype=int32)
>>> for i in z:
	print(x[i])

	
1
2
3
>>>

6.字典的get函数

dict.get(key, default=None)

字典dict的get()函数返回指定键（key）的值，如果不存在就返回默认的值，比如：

>>> dict = {'Name': 'Runoob', 'Age': 27}
>>> print('age=',dict.get('Age'))
age= 27
>>> print('sex=',dict.get('Sex','Na'))
sex= Na

这里没有“Sex”键，给的默认值是“Na”,所有返回值是“Na”

上面的kNN代码中，你可以看到它的巧妙应用：

 classCount = {}

    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        print(voteIlabel)
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1

先是建立了一个空的字典classCount,进入for循环，第1次循环，根据程序的上文

voteIlabel = sortedDistIndicies[0]

结果是"B"

classCount.get(voteIlabel, 0）

因为classCount没有"B"，所以结果返回默认值为0，那么

classCount[voteIlabel = classCount.get(voteIlabel, 0) + 1

结果就是1

第2次循环：

voteIlabel = sortedDistIndicies[1]

结果是"B"

classCount.get(voteIlabel, 0）

因为classCount有"B"，所以结果返回"B"键的值，是1，那么

classCount[voteIlabel = classCount.get(voteIlabel, 0) + 1

结果是2

第3次循环：

voteIlabel = sortedDistIndicies[2]

结果是"A"

classCount.get(voteIlabel, 0）

因为classCount没有"A"，所以结果返回默认值为0，那么

classCount[voteIlabel = classCount.get(voteIlabel, 0) + 1

结果是1

循环结束后，字典classCount={“B”:2, “A”:1}

这样就计算出所需循环的数据中，数据的标签（labels）是“B”、“A”的个数。

是否相当有用？！嘿嘿

7. 字典的items

字典的items（）函数以列表返回可迭代的元祖数组，

>>> dict
{'Name': 'Runoob', 'Age': 27}
>>> dict.items()
dict_items([('Name', 'Runoob'), ('Age', 27)])

8. sorted和operator.itemgetter()

sort只能用于list排序，且是原地修改，sorted可以用于一切可迭代对象，且保留副本，生成新的对象

sorted 语法：

sorted(iterable[, cmp[, key[, reverse]]])

参数说明：

iterable -- 可迭代对象。
cmp -- 比较的函数，这个具有两个参数，参数的值都是从可迭代对象中取出，此函数必须遵守的规则为，大于则返回1，小于则返回-1，等于则返回0。
key -- 主要是用来进行比较的元素，只有一个参数，具体的函数的参数就是取自于可迭代对象中，指定可迭代对象中的一个元素来进行排序。
reverse -- 排序规则，reverse = True 降序， reverse = False 升序（默认）。

>>> L=[('b',2),('a',1),('c',3),('d',4)]
>>> sorted(L, key=lambda x:x[1])
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

kNN代码中

sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1), reverse=True)

可以替换成

sortedClassCount = sorted(classCount.items(),key=lambda x:x[1], reverse=True)

但是当L是矩阵时，方法依旧：

>>> a = ny.array([[1,2],[3,1],[4,7],[8,5]])
>>> a
array([[1, 2],
       [3, 1],
       [4, 7],
       [8, 5]])
>>> sorted(a, key=operator.itemgetter(1))
[array([3, 1]), array([1, 2]), array([8, 5]), array([4, 7])]
>>> sorted(a, key=lambda x:x[1])
[array([3, 1]), array([1, 2]), array([8, 5]), array([4, 7])]

注：key=的是个函数，对象中元素一次按照函数的作用后在升序或者降序排列

那operator.itemgetter()是啥？

operator模块是python的内置模块，itemgetter()函数用于获取对象的哪些维的数据，比如：

a = [1,2,3]
>>> b=operator.itemgetter(1) //定义函数b，获取对象的第1个域的值
>>> b(a)
2
>>> b=operator.itemgetter(1,0) //定义函数b，获取对象的第1个域和第0个的值
>>> b(a)
(2, 1)

要注意，operator.itemgetter函数获取的不是值，而是定义了一个函数，通过该函数作用到对象上才能获取值。

operator.itemgetter（）函数通常和sorted函数在一起使用：

>>> L=[('b',2),('a',1),('c',3),('d',4)]
>>> sorted(L, key=operator.itemgetter(1), reverse=True)
[('d', 4), ('c', 3), ('b', 2), ('a', 1)]

小君不忧

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
kNN算法代码实现简单分析1

numpy（numerical python）提供了python对多维数组对象的支持：ndarray，具有矢量运算能力。numpy支持多维数组和矩阵运算，此外提供了大量的数学函数库。python中有list，但在numpy库中，只有数组（向量、矩阵），数组的类被称为ndarray。注：一下当说方法时，是指ndarrary对象中函数，说函数时，是指numpy中定义的函数现在剖析一个简单的kNN算法...
复制链接

扫一扫

专栏目录