knn代码实现

最新推荐文章于 2024-04-16 16:52:33 发布

richard1230

最新推荐文章于 2024-04-16 16:52:33 发布

阅读量1.1k

点赞数 1

分类专栏： ai 机器学习 AI进击之路文章标签： ai 人工智能 knn knn算法

本文链接：https://blog.csdn.net/richard1230/article/details/90484282

版权

ai 同时被 3 个专栏收录

23 篇文章 1 订阅

订阅专栏

机器学习

11 篇文章 0 订阅

订阅专栏

AI进击之路

9 篇文章 1 订阅

订阅专栏

文章目录

原理

knn原理:存在一个样本数据集合(训练集),并且样本集里面每个数据都存在标签；输入没有标签的新数据之后,将新数据的每个特征与样本集里面的数据对应进行比较(计算欧式距离)，而后算法提取样本集里面的特征最相似的前k个数据,通过投票的方式来选择标签:

代码


import numpy as np
import operator

def createDataSet():
    matrix = np.array([[1.0,1.1], [1.0, 1.0], [0.0, 0.0], [0.0, 0.1]])##创建矩阵
    classVector = ['A','A','B','B']                                   ##标明类别
    return matrix,classVector

matrix,classVector = createDataSet();

##算法封装
def classify(inX ,matrix,classvector,k):##inX为待检测目标数据,k指的是样本集中最相似的前K个数据
    dataSetSize = matrix.shape[0]  ##matrix.shape[0]代表的是matrix这个矩阵的行数,matrix.shape[1]代表的是matrix这个矩阵的列数
    diffmat = np.tile(inX,(dataSetSize,1)) - matrix  ##做差
    sqDiffMat= diffmat ** 2                          ## 平方
    sqDistances = sqDiffMat.sum(axis=1)              ##求平方和
    distance = sqDistances ** 0.5                    ##求出每个样本与未知样本的距离
    sortDistIndicies = distance.argsort()            ##根据下标来进行排序;argsort函数是将distance中的元素从小到大排列，提取其对应的index(索引)，然后输出,注意:这里将所有距离都给求出来了(并且是按照从小到大来排的)！！！
    print(sortDistIndicies)
    classCount = {}                                  ##自定义了一个字典,用于保存{A:2,B:1},里面的A，B为种类,2,1为下标(索引)
    for i in range(k):
        voteLabel = classVector[sortDistIndicies[i]] ##
        classCount[voteLabel]=classCount.get(voteLabel,0)+1   ##这里的get方法为字典里面的方法，0表示如果不存在key则返回0；A，B变成了{'A': 2, 'B': 1},这里是一个非常重要的技巧,开发会用到
    sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)##itemgetter(1)表示是按照{'A': 2, 'B': 1}里面的2,1来排序的，如果为itemgetter(0)表示是按照{'A': 2, 'B': 1}里面的A，B来排序的
    ##sortedClassCount是个数组
    print(sortedClassCount[0][0])



classify([1, 1], matrix, classVector, 3)

##总结
# 这边其实已经把所有距离都算出来了，默认是从小到大排序的，k只不过是取得前k个最短的距离值
# 然后用字典来计数的,按照从大到小的顺序排，显然第一个就是票数最多的

相关测试代码

import numpy as np
import operator

matrix = np.array([[1.0, 1.1], [1.0, 1.0], [0.0, 0.0], [0.0, 0.1]])
dataSetSize = matrix.shape[0]    ##matrix.shape[0]代表的是matrix这个矩阵的行数,matrix.shape[1]代表的是matrix这个矩阵的列数
classVector = ['A', 'A', 'B', 'B']

print(matrix)
print(dataSetSize)

##这里解释了np.tile的用法与效果
diffMat = np.tile([1,1], (dataSetSize, 1))
diffMat1 = np.tile([1,1], (dataSetSize, 2))

print(diffMat)
# [[1 1]
#  [1 1]
#  [1 1]
#  [1 1]]
print(diffMat1)
# [[1 1 1 1]
#  [1 1 1 1]
#  [1 1 1 1]
#  [1 1 1 1]]
##做差
diffMat = np.tile([1,1], (dataSetSize, 1)) -matrix
print(diffMat)
# [[ 0.  -0.1]
#  [ 0.   0. ]
#  [ 1.   1. ]
#  [ 1.   0.9]]

##平方
sqDiffMat = diffMat ** 2
print(sqDiffMat)
# [[0.   0.01]
#  [0.   0.  ]
#  [1.   1.  ]
#  [1.   0.81]]
##求和
sqDistances = sqDiffMat.sum(axis=1)
print(sqDistances)                    ##[0.01 0.   2.   1.81]
sqDistances1 = sqDiffMat.sum(axis=0)
print(sqDistances1)                   ##[2.   1.82]

distance = sqDistances ** 0.5
print(distance)                       ##[0.1        0.         1.41421356 1.3453624 ]

##根据下标来进行排序

sortDistIndicies = distance.argsort()
print(sortDistIndicies)
##[1 0 3 2]
##
# [0.1        0.         1.41421356 1.3453624 ]
#   A          A          B           B
#   0          1           2          3   -------》对应的[1 0 3 2]就是下标
##再看一个例子:
# 1.先定义一个array数据
#
# 1 import numpy as np
# 2 x=np.array([1,4,3,-1,6,9])
# 2.现在我们可以看看argsort()函数的具体功能是什么：
#
# x.argsort()
# 输出定义为y=array([3,0,2,1,4,5])。
#
# 我们发现argsort()函数是将x中的元素从小到大排列，提取其对应的index(索引)，然后输出到y。例如：x[3]=-1最小，所以y[0]=3,x[5]=9最大，所以y[5]=5。
##############

voteLabel = classVector[sortDistIndicies[0]]
voteLabel1= classVector[sortDistIndicies[1]]
voteLabel2= classVector[sortDistIndicies[2]]

print(voteLabel)
print(voteLabel1)
print(voteLabel2)
classCount = {}
classCount[voteLabel] = classCount.get(voteLabel, 0)+1
##字典里面的get方法(返回指定键的值)
# dict = {'Name': 'Zara', 'Age': 27}
#
# print "Value : %s" %  dict.get('Age')
# print "Value : %s" %  dict.get('Sex', "Never")
# 以上实例输出结果为：
#
# Value : 27
# Value : Never
print(classCount[voteLabel])


print(classCount.items())
###items 的例子: https://www.runoob.com/python/att-dictionary-items.html
# # !/usr/bin/python
# # coding=utf-8
#
# dict = {'Google': 'www.google.com', 'Runoob': 'www.runoob.com', 'taobao': 'www.taobao.com'}
#
# print
# "字典值 : %s" % dict.items()
#
# # 遍历字典列表
# for key, values in dict.items():
#     print
#     key, values
######以上实例输出结果为:
#
# 字典值: [('Google', 'www.google.com'), ('taobao', 'www.taobao.com'), ('Runoob', 'www.runoob.com')]
# Google
# www.google.com
# taobao
# www.taobao.com
# Runoob
# www.runoob.com

#classCount[voteLabel] = classCount.get(voteLabel, 0)+1  这里的测试

dic = {}
dic1={}
dic['A']=dic.get("A",0)+1
print(dic) #{'A': 1}
dic['A']=dic.get("A",0)+1
print(dic)##{'A': 2}

dic['A']=dic.get("A",0)+1
print(dic)#{'A': 3}

dic1['A']=dic1.get("A",0)
print(dic1)
print(dic1['A'])

richard1230

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
knn代码实现

文章目录原理代码相关测试代码原理knn原理:存在一个样本数据集合(训练集),并且样本集里面每个数据都存在标签；输入没有标签的新数据之后,将新数据的每个特征与样本集里面的数据对应进行比较(计算欧式距离)，而后算法提取样本集里面的特征最相似的前k个数据,通过投票的方式来选择标签:代码import numpy as npimport operatordef createDataSet()...
复制链接

扫一扫