[机器学习实战] k-近邻算法代码解析

最新推荐文章于 2023-07-21 18:18:53 发布

Sherbet_Lemon

最新推荐文章于 2023-07-21 18:18:53 发布

阅读量328

点赞数 1

分类专栏： machine-learning 文章标签：机器学习算法源代码

本文链接：https://blog.csdn.net/sq1652827791/article/details/78905898

版权

machine-learning 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Machine Learning in Action 程序清单2.1 k-近邻算法代码解析

源代码如下：

    from numpy import *
    import operator
    from os import listdir

    def classify0(inX, dataSet, labels, k):
        dataSetSize = dataSet.shape[0]
        diffMat = tile(inX, (dataSetSize,1)) - dataSet
        sqDiffMat = diffMat**2
        sqDistances = sqDiffMat.sum(axis=1)
        distances = sqDistances**0.5
        sortedDistIndicies = distances.argsort()     
        classCount={}          
        for i in range(k):
            voteIlabel = labels[sortedDistIndicies[i]]
            classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
        sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
        return sortedClassCount[0][0]

代码解析

numpy.ndarray

An ndarray is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape, which is a tuple of N positive integers that specify the sizes of each dimension. The type of items in the array is specified by a separate dtype, one of which is associated with each ndarray.

e.g. :

x = numpy.array([[1,2,3],[4,5,6]], numpy.int32)
type(x) #  <class 'numpy.ndarray'>
x.shape #  (2,3)
x.dtype #  dtype('int32')

详见官方文档。

numpy.shape

the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

在本例中，通过dataSet.shape[0]获得训练数据的行数，即样本数量。

numpy.tile

numpy.tile(A,reps)
Construct an array by repeating A the number of times given by reps.

If reps has length d, the result will have dimension of max(d, A.ndim).

If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.

If A.ndim > d, reps is promoted to A.ndim by pre-pending 1’s to it. Thus for an A of shape (2, 3, 4, 5), a reps of (2, 2) is treated as (1, 1, 2, 2).

不妨令输入向量inX为(Ax, Ay)，某个训练样本为(Bx,By)。在本例中，将输入样本inX重复dataSetSize次，与dataSet相减，即为分别计算(Ax-Bx)和(Ay-By)。下面的diffMat**2分别计算两项平方。

sum(axis=1)分别计算每一行的和；sum(axis=0)分别计算每一列的和。因此，此处sum(axis=1)即为 $(Ax-Bx)^2+(Ay-By)^2$ 。

numpy.argsort

Return the indices that would sort an array. 对于一个给定的array，返回一个能够使数组排序的索引序列。

numpy.argsort(a, axis=-1, kind='quicksort', order=none)。
a：需要进行排序的数组。axis：沿着哪个轴进行排序。kind：所采用的排序算法，{‘quicksort’, ‘mergesort’, ‘heapsort’}。 order：如果数组定义了多个域，order参数决定了各域比较的先后顺序。
官方文档
e.g. :

x = numpy.array([3,1,2])
numpy.argsort(x)  # array([1,2,0])

x = numpy.array([(1,0),(0,1)], dtype=[('x','<i4'),('y','<i4')])
numpy.argsort(x,order=('x','y'))  # array([1,0])
numpy.argsort(x,order=('y','x'))  # array([0,1])

operator.itemgetter

返回一个可调用对象，将元素从使用该对象的操作数中取出。如果指定了多个元素，返回元素组成的元组。
e.g. :

itemgetter(1)('ABCDEFG')  # 'B'
itemgetter(1,3,5)('ABCDEFG')  # ('B','D','F')

sorted()

sorted()函数对所有可迭代的对象进行排序操作。
sorted(iterable[,cmp[,key[,reverse]]])。iterable为可迭代对象；cmp为比较的函数，具有两个参数，参数的值都是从可迭代对象中取出，大于则返回1，小于则返回-1，等于则返回0；key表示用来比较的元素，指定可迭代对象中的某一个元素来进行排序；reverse表示排序规则，reverse=True降序，reverse=False升序（默认）。

e.g. :

a = [5,7,6,3,4,1,2]
b = sorted(a)  # b = [1,2,3,4,5,6,7]

L=[('b',2),('a',1),('c',3),('d',4)]
sorted(L,cmp=lambda x,y:cmp(x[1],y[1]))  
# [('a',1),('b',2),('c',3),('d',4)]

students=[('john','A',15),('dave','B',10),('jane','B',12)]
sorted(students, key=lambda s:s[2],reverse=True) #按年龄降序排列
#[('john','A',15),('jane','B',12),('dave','B',10)]

Sherbet_Lemon

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
[机器学习实战] k-近邻算法代码解析

Machine Learning in Action 程序清单2.1 k-近邻算法代码解析源代码如下： from numpy import * import operator from os import listdir def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape
复制链接

扫一扫