knn(2)进阶

最新推荐文章于 2022-07-16 18:23:25 发布

d12155214552

最新推荐文章于 2022-07-16 18:23:25 发布

阅读量215

点赞数

分类专栏：机器学习机器学习实战

本文链接：https://blog.csdn.net/d12155214552/article/details/94720240

版权

机器学习同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

机器学习实战

2 篇文章 0 订阅

订阅专栏

本文是作者根据学习<<机器学习实战>>后编写的注释版代码

从文件读取到数组中

def file2matrix(filename):
    fr = open(filename)#打开文件，返回对象
    arrayOLines = fr.readlines()
    numberOfLines = len(arrayOLines)#得到文件行数
    returnMat = zeros((numberOfLines,3))#全0的矩阵，行是文件行数，列是3
    classLabelVector = [] #标签存在元组中
    index = 0
    for line in arrayOLines:
        line = line.strip()#Python strip() 方法用于移除字符串头尾指定的字符（默认为空格）或字符序列。不能删除中间部分的字符。
        listFromLine = line.split('\t') #split() 通过指定分隔符对字符串进行切片,此处是回车
        returnMat[index,:] = listFromLine[0:3] #将从文件中读取到的数据存放在矩阵中
        classLabelVector.append(int(listFromLine[-1]))
        index+=1
    fr.close()
    return returnMat,classLabelVector

数据可视化

import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
#add_subplot(349)参数349的意思是：将画布分割成3行4列，图像画在从左到右从上到下的第9块，当我们只想画一副图的时候，使用参数‘111’即可。
ax = fig.add_subplot(111) #subplot() 函数允许你在同一图中绘制不同的东西。
#scatter把点呈现出来　scatter函数的参数如下
#def scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, verts=None, 
#edgecolors=None, hold=None, data=None, **kwargs) x,y是数据　ｓ是大小　ｃ是颜色

#黄色极具魅力，蓝色一般，紫色不喜欢
#ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*array(datingLabels),15.0*array(datingLabels))
#plt.show()
ax.scatter(datingDataMat[:,0],datingDataMat[:,1],15.0*array(datingLabels),15.0*array(datingLabels))
plt.show()

将数据标准化，每个特征的范围都调整到０到１

def autoNorm(dataSet):
    minVals=dataSet.min(0)#0是列，1是行
    maxVals=dataSet.max(0)
    ranges=maxVals-minVals
    m=dataSet.shape[0]
    normDataSet=zeros(shape(dataSet)) 
    normDataSet=dataSet-tile(minVals,(m,1))
    normDataSet=normDataSet/tile(ranges,(m,1))
    return normDataSet,ranges,minVals

案例：约会测试

def datingClassTest():
    hoRatio = 0.1
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt') #读取文件内容到矩阵
    normMat,ranges,minVals = autoNorm(datingDataMat)#将矩阵标准化，让每个特征都在0-1之间
    m = normMat.shape[0]#m是矩阵行数，即样例个数
    testNum = int(m*hoRatio)#测试的个数
    errorNum = 0.0#累计判断错误的个数
    for i in range(testNum):
        testResult = classify0(normMat[i,:],normMat[testNum:m,:],datingLabels[testNum:m],3)
        print("the classifyier came back with:%d,the real answer is:%d" %(testResult,datingLabels[i]))
        if(testResult!=datingLabels[i]):
            errorNum+=1
    print("the total error rate is:%d" % (errorNum/float(testNum)))

案例：通过一个人的特征判断对该女士的吸引力

def classifyPerson():
    result=['not at all','a little like','very like']
    #读入待测试数据
    ffMile = float(input("flier miles per year?"))
    percentTats = float(input("pencentage of time spent playing video game?"))
    iceCream = float(input("liters of ice cream consumed per year?"))
    testMat = [ffMile,percentTats,iceCream]#测试矩阵
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
    normMat,ranges,minVals = autoNorm(datingDataMat)
    testResult = classify0(((testMat-minVals)/ranges),normMat,datingLabels,3)#得到测试结果
    print("test person result:", (result[testResult-1]))

读取图片到矩阵

def img2vector(filename):#将文件读入到一维vector中，txt文件是32*32,读入为1*1024
    returnVect = zeros((1,1024)) 
    fr = open(filename)
    for i in range (32):
        linestr = fr.readline()
        for j in range (32):
            returnVect[0,j+i*32]=int(linestr[j])
    return returnVect

案例：手写数字识别

def handwritingClassTest():
    hwLabels = []
    #os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。这个列表以字母顺序。 它不包括 '.' 和'..' 即使它在文件夹中。
    trainingFileList = listdir('trainingDigits')#读入顺序和原来文件排列顺序不一定一致
    m = len(trainingFileList)
    trainingMat = zeros((m,1024))
    for i in range(m):
        fileName = trainingFileList[i]
        name = fileName.split('.')[0]
        label = name.split('_')[0]
        hwLabels.append(label)
        trainingMat[i,:] = img2vector('trainingDigits/%s' %fileName)

    error = 0.0
    testFileList = listdir('testDigits')
    n = len(testFileList)
    testMat = zeros((1,1024))
    for j in range(n):
        testFileName = testFileList[j]
        testName = testFileName.split('.')[0]　#用．分割开并取分开后第一个位置的数据
        testLabel = int(testName.split('_')[0])
        testMat = img2vector('testDigits/%s' % testFileName)
        result = int(classify0(testMat,trainingMat,hwLabels,2))
        
        print("the predict is:%d,the real result is:%d" % (result,testLabel))
        if(testLabel!=result):
            error+=1
    print("the error rate is:%d" % (error/float(n)))

d12155214552

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
knn(2)进阶

本文是作者根据学习<<机器学习实战>>后编写的注释版代码。从文件读取到数组中def file2matrix(filename): fr = open(filename)#打开文件，返回对象 arrayOLines = fr.readlines() numberOfLines = len(arrayOLines)#得到文件行数 return...
复制链接

扫一扫