用python3来实现对机器学习实战上面的实例

最新推荐文章于 2024-04-08 03:47:06 发布

翰飞~

最新推荐文章于 2024-04-08 03:47:06 发布

阅读量739

点赞数 2

分类专栏： ml

本文链接：https://blog.csdn.net/qq_34774088/article/details/79977417

版权

ml 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

这是我对机器学习实战中的kNN算法实现，以及对代码的理解。

首先我是用的数据集是机器学习实战中的实例,我是用python3实现的。代码以及文件全部在我的GitHub

首先我们这个算法用的是欧氏距离公式，
我们平常所用的就是欧式距离公式，欧氏距离也可以用来求多个维度的点的距离。

首先让我们引入所需要的包

    from numpy import *
    import operator

做一个测试集

    def createDataSet():
        group = array([[1.0, 1.1],[1.0, 1.1],[0, 0], [0, 0.1]])
        labels = ['A', 'A', 'B', 'B']
        return group, labels

测试一下这个代码

    group, labels = createDataSet()
    print(group)
    print(labels)

输出结果

    [[1.  1.1]
    [1.  1.1]
    [0.  0. ]
    [0.  0.1]]
    ['A', 'A', 'B', 'B']

knn分类算法

    def classify0(inX, dataSet, labels, k):
        '''
        inX表示输入向量
        dataSet是我们输入的续联样本集
        labels表示向量标签
        k 表示我们需要选出最小的k个值
        '''
        dataSetSize = dataSet.shape[0]  #dataSetsize表示dataSet数据集中的第一位的长度
        diffMat = tile(inX, (dataSetSize, 1)) - dataSet #tile方法，这句话的含义表示生成一个n行的数组,其中每行都是inX，并且与训练集相减
        #求出距离
        sqDiffMat = diffMat ** 2    
        sqDistance = sqDiffMat.sum(axis = 1)
        distances = sqDistance ** 0.5
        print(distances)
        sortedDistIndicies = distances.argsort() #argsort()方法表示，对距离从大到小排序，并且返回对应的序号
        print(sortedDistIndicies)
        classCount ={ }

        # 选出距离最小的k个值
        for i in range(k):
            voteIlabel = labels[sortedDistIndicies[i]]
            classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
        sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)
        return sortedClassCount[0][0]

运用前面的数据集做一下测试

    print(classify0([0, 0], group, labels, 3))

输出结果是：

    [1.48660687 1.48660687 0.         0.1       ]
    [2 3 0 1]
    B

输入文件得到训练的数据集

    def file2matrix(filename):
    fr = open(filename) #打开文件
    arrayOLines = fr.readlines() #得到文件的所有行
    numberOfLines = len(arrayOLines) #得到文件的行数
    returnMat = zeros((numberOfLines, 3)) #生成一个numberOfLine行，3列的全为0的矩阵
    classLabelVector  = []
    index = 0
    for line in arrayOLines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] =  listFromLine[0:3]  #取出特征数据
        classLabelVector.append(listFromLine[-1])#取出数据对应的类别，也就是标签
        index += 1
    return returnMat, classLabelVector

对数据归一化处理

    def autoNorm(dataSet):
        minVals = dataSet.min(0)  #取出每列的最小值
        maxVals = dataSet.max(0)    #取出每列的最大值
        ranges = maxVals - minVals  #得到一个最大值减去最小值的中间数
        normDataSet = zeros(shape(dataSet)) #生成一个和数据集有相同行列的矩阵
        m = dataSet.shape[0]
        normDataSet = dataSet - tile(minVals, (m, 1))
        normDataSet = normDataSet / tile(ranges, (m,1))
        return normDataSet, ranges, minVals

其中 normalDataSet全是小于或等于一的数
1. 对数据算法进行最终的测试

    def datingClassTest():
        hoRatio = 0.10  #在数据中取出0.1的比例作为测试数据
        datingDataMat, datingLabels = file2matrix('datingTestSet.txt') #导入数据
        normMat, ranges, minVals = autoNorm(datingDataMat) #归一化处理
        m = normMat.shape[0]
        numTestVecs = int(m*hoRatio) #得到测试数据的总和 
        errorCount = 0.0
        for i in range(numTestVecs):
            #利用knn算法分类
            classifierResult = classify0(normMat[i, :], normMat[numTestVecs:m,:],\
                                        datingLabels[numTestVecs:m], 3)
            print("the classfier came back with: %s, the real answer is: %s"\
                %(classifierResult, datingLabels[i]))
            #统计错误的个数
            if classifierResult != (datingLabels[i]):errorCount += 1.0
        print('the total error rate is: %f' %(errorCount/float(numTestVecs)))
    datingClassTest()

结果

    the classfier came back with:largeDoses, the real answer is: largeDoses
    the classfier came back with:smallDoses, the real answer is: smallDoses
    the classfier came back with:didntLike, the real answer is: didntLike
                                    .
                                    .
                                    .
    the classfier came back with:didntLike, the real answer is: didntLike
    the classfier came back with:largeDoses, the real answer is: didntLike
    the total error rate is: 0.050000

我们得到了错误率为5%，这个算法还行

翰飞~

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python3来实现对机器学习实战上面的实例

这是我对机器学习实战中的kNN算法实现，以及对代码的理解。首先我是用的数据集是机器学习实战中的实例,我是用python3实现的。代码以及文件全部在我的GitHub首先我们这个算法用的是欧氏距离公式，我们平常所用的就是欧式距离公式，欧氏距离也可以用来求多个维度的点的距离。首先让我们引入所需要的包 from numpy import * import op...
复制链接

扫一扫

专栏目录