Python学习-机器学习实战-ch02KNN_part1

最新推荐文章于 2024-03-16 16:42:07 发布

从兮

最新推荐文章于 2024-03-16 16:42:07 发布

阅读量679

点赞数

分类专栏： python学习

本文链接：https://blog.csdn.net/dai_fun/article/details/50960978

版权

python学习专栏收录该内容

17 篇文章 2 订阅

订阅专栏

开始学习《机器学习实战》这本书，感觉书很好，很适合我。

第二章：KNN（k近邻）part1

======================================================================================

KNN的简单实现：

KNN.py

from numpy import *
import operator

def creatDataSet():
    group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels

def classify(inX,dataSet,labels,k):
    #待分类为inX
    #训练集为dataSet
    dataSetSize=dataSet.shape[0]
    #shape函数是numpy.core.fromnumeric中的函数
    #读取矩阵的长度，比如shape[0]就是读取矩阵第一维度的长度。
    diffMat=tile(inX,(dataSetSize,1))-dataSet
    #求输入向量与训练集的差
    #tile位于numpy.lib.shape_base中
    # 功能是重复某个数组。比如tile(A,n)，功能是将数组A重复n次，构成一个新的数组，
    sqDiffMat=diffMat**2
    sqDistances=sqDiffMat.sum(axis=1)
    #一般默认sum(axis=0)，即普通求和
    #sum(axis=1)表示矩阵的每一行向量相加
    distances=sqDistances**0.5
    #欧式距离
    sortedDistIndicies=distances.argsort()
    #argsort函数返回的是数组值从小到大的索引值
    classCount={}
    for i in range(k):
        voteILabel=labels[sortedDistIndicies[i]]
        classCount[voteILabel]=classCount.get(voteILabel,0)+1
        #get函数表示：若参数voteILabel在classCount中，返回classCount[voteILabel]，否则返回0
    sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    #sorted(iterable, cmp=None, key=None, reverse=False)
    #iterable：是可迭代类型;cmp：用于比较的函数，比较什么由key决定;key：用列表元素的某个属性和函数进行作为关键字;
    #iteritems函数是字典内的键对值
    #operator.itemgetter函数用于获取对象的哪些维度的数据
    #reverse:false为升序,true为降序
    return sortedClassCount[0][0]

执行时，IDE的语句：

>>> import os
>>> os.chdir("D:\learnPY\MachineLearningPY")
>>> import KNN
>>> group,labels=KNN.creatDataSet()
>>> group
array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])
>>> labels
['A', 'A', 'B', 'B']
>>> KNN.classify([0,0],group,labels,3)
'B'

这是一个KNN的简单小例子。与书中相比我加了一点备注。最近又想开始学习python，许多之前的基本结构啊，函数啊都忘了（本来就不会好吗）

加油吧！