【读书笔记】机器学习实战p19-2.1.2（k-近邻算法）

最新推荐文章于 2020-11-26 07:33:52 发布

AntioniaMao

最新推荐文章于 2020-11-26 07:33:52 发布

阅读量268

点赞数

分类专栏：机器学习文章标签： Python 机器学习算法人工智能

本文链接：https://blog.csdn.net/AntioniaMao/article/details/79479121

版权

机器学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

from numpy import *
import operator


def createDataSet():
    group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels


def classify0(inX,dataSet,labels,k):
    dataSetSize=dataSet.shape[0]
    # 函数shape   作为：查看矩阵或者数组的维数


    diffMat=tile(inX,(dataSetSize,1))-dataSet
    #函数原型：numpy.tile(A,reps)   作为：将A重复进行输出
    # tile(A,[x,y])为二维输出
    # tile（A，x）为一维输出等价于tile（A，【1,x】）


    sqDiffMat=diffMat**2
    #**为次方运算


    sqDistances=sqDiffMat.sum(axis=1)
    #默认sum（axis=0）简单相加运算
    #sum（axis=1）将矩阵的每一行向量相加


    distances=sqDistances**0.5
    sortedDistIndicies=distances.argsort()
    #argsort()函数是将x中的元素从小到大排列，提取其对应的index(索引)，然后输出到y
    classCount={}


    for i in range(k):
        voteIlabel=labels[sortedDistIndicies[i]]
        classCount[voteIlabel]=classCount.get(voteIlabel,0)+1
        sortedClassCount=sorted(classCount.iteritems(),key=operator.itemgetter(1),reverse=True)
        return sortedClassCount[0][0]


 def file2matrix(filename):
    fr=open(filename)


    numberOfLines=len(fr.readlines())
    #readlines([sizehit[,keepends]])
    #Read all lines available on the input stream and return them as a list of lines
    returnMat=zeros((numberOfLines,3))
    #create a matrix with zero       


    classLabelVector=[]
    fr=open(filename)
    index=0
    for line in fr.readlines():
        line=line.strip()
        #去除换行符
        listFromLine=line.split('\t')
        #利用tab字符进行切分
        returnMat[index,:]=listFromLine[0:3]
        #将0、1、2加入returnMat矩阵中
        classLabelVector.append(int(listFromLine[-1]))
        #将每一行最后一个int值加入到classLabelVector中
        index += 1
    return  returnMat,classLabelVector




#归一化数值计算
#newValue=(oldValue-min)/(max-min)
def autoNorm(dataSet):
    minVals=dataSet.min(0)
    maxVals=dataSet.max(0)
    ranges=maxVals-minVals
    normDataSet=zeros(shape(dataSet))
    m=dataSet.shape[0]
    normDataSet=dataSet-tile(minVals,(m,1))
    normDataSet=normDataSet/tile(ranges,(m,1))
    return normDataSet,ranges,minVals




def datingClassTest():
    hoRatio=0.50
    #样本集和测试集比例
    datingDataMat,datingLabels=file2matrix('datingTestSet2.txt')
    normMat,ranges,minVals=autoNorm(datingDataMat)
    m=normMat.shape[0]
    numTestVecs=int(m*hoRatio)
    errorCount=0.0
    for i in range(numTestVecs):
        classifierResult=classify0(normMat[i,:],normMat[numTestVecs:m,:],
                                   datingLabels[numTestVecs:m],3)
        print("the classifier came back with: %d, the real answer is: %d" % (classifierResult, datingLabels[i]))
        if(classifierResult != datingLabels[i]):
            errorCount += 1.0
    print("the total error rate is :%f" %(errorCount/float(numTestVecs)))
    print(errorCount)

浅述python中argsort()函数的用法

python中的sum函数.sum(axis=1)

python 科学计算库NumPy—tile函数

python *和**

python: numpy--函数 shape用法

Python中的sorted函数以及operator.itemgetter函数

Numpy数组操作

AntioniaMao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【读书笔记】机器学习实战p19-2.1.2（k-近邻算法）

from numpy import *import operatordef createDataSet(): group=array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]) labels=['A','A','B','B'] return group,labelsdef classify0(inX,dataSet,labels...
复制链接

扫一扫