k邻近算法

概念:

思维导图:

相关资料链接:

Python实现聚类算法(三)之总结:http://www.colabug.com/72166.html

scikit-learn学习之K-means聚类算法与 Mini Batch K-Means算法:https://blog.csdn.net/gamer_gyt/article/details/51244850

KNN(三)--KD树详解及KD树最近邻算法:https://blog.csdn.net/app_12062011/article/details/51986805

机器学习:Python实现聚类算法(一)之K-Means(简单):https://www.cnblogs.com/lc1217/p/6893924.html

 

KNN算法和kd树:

  转载博客链接:http://www.cnblogs.com/21207-iHome/p/6084670.html

下面图片中只有三种豆,有三个豆是未知的种类,如何判定他们的种类?

提供一种思路,即:未知的豆离哪种豆最近就认为未知豆和该豆是同一种类。由此,我们引出最近邻算法的定义:为了判定未知样本的类别,以全部训练样本作为代表点,计算未知样本与所有训练样本的距离,并以最近邻者的类别作为决策未知样本类别的唯一依据。但是,最近邻算法明显是存在缺陷的,比如下面的例子:有一个未知形状(图中绿色的圆点),如何判断它是什么形状?

  显然,最近邻算法的缺陷——对噪声数据过于敏感,为了解决这个问题,我们可以可以把未知样本周边的多个最近样本计算在内,扩大参与决策的样本量,以避免个别数据直接决定决策结果。由此,我们引进K-最近邻算法。K-最近邻算法是最近邻算法的一个延伸。基本思路是:选择未知样本一定范围内确定个数的K个样本,该K个样本大多数属于某一类型,则未知样本判定为该类型。如何选择一个最佳的K值取决于数据。一般情况下,在分类时较大的K值能够减小噪声的影响,但会使类别之间的界限变得模糊。待测样本(绿色圆圈)既可能分到红色三角形类,也可能分到蓝色正方形类。如果k取3,从图可见,待测样本的3个邻居在实线的内圆里,按多数投票结果,它属于红色三角形类。但是如果k取5,那么待测样本的最邻近的5个样本在虚线的圆里,按表决法,它又属于蓝色正方形类。在实际应用中,K先取一个比较小的数值,再采用交叉验证法来逐步调整K值,最终选择适合该样本的最优的K值。

  • KNN算法实现 

算法基本步骤:

1)计算待分类点与已知类别的点之间的距离

2)按照距离递增次序排序

3)选取与待分类点距离最小的k个点

4)确定前k个点所在类别的出现次数

5)返回前k个点出现次数最高的类别作为待分类点的预测分类

  下面是一个按照算法基本步骤用python实现的简单例子,根据已分类的4个样本点来预测未知点(图中的灰点)的分类:

from numpy import *  

# create a dataset which contains 4 samples with 2 classes  
def createDataSet(): # create a matrix: each row as a sample group = array([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]]) labels = ['A', 'A', 'B', 'B'] # four samples and two classes return group, labels # classify using kNN (k Nearest Neighbors ) # Input: newInput: 1 x N # dataSet: M x N (M samples N, features) # labels: 1 x M # k: number of neighbors to use for comparison # Output: the most popular class label def kNNClassify(newInput, dataSet, labels, k): numSamples = dataSet.shape[0] # shape[0] stands for the num of row ## step 1: calculate Euclidean distance # tile(A, reps): Construct an array by repeating A reps times # the following copy numSamples rows for dataSet diff = tile(newInput, (numSamples, 1)) - dataSet # Subtract element-wise squaredDiff = diff ** 2 # squared for the subtract squaredDist = sum(squaredDiff, axis = 1) # sum is performed by row distance = squaredDist ** 0.5 ## step 2: sort the distance # argsort() returns the indices that would sort an array in a ascending order sortedDistIndices = argsort(distance) classCount = {} # define a dictionary (can be append element) for i in xrange(k): ## step 3: choose the min k distance voteLabel = labels[sortedDistIndices[i]] ## step 4: count the times labels occur # when the key voteLabel is not in dictionary classCount, get() # will return 0 classCount[voteLabel] = classCount.get(voteLabel, 0) + 1 ## step 5: the max voted class will return maxCount = 0 for key, value in classCount.items(): if value > maxCount: maxCount = value maxIndex = key return maxIndex if __name__== "__main__": dataSet, labels = createDataSet() testX = array([1.2, 1.0]) k = 3 outputLabel = kNNClassify(testX, dataSet, labels, 3) print "Your input is:", testX, "and classified to class: ", outputLabel testX = array([0.1, 0.3]) outputLabel = kNNClassify(testX, dataSet, labels, 3) print "Your input is:", testX, "and classified to class: ", outputLabel

结果如下:
Your input is: [ 1.2 1. ] and classified to class: A
Your input is: [ 0.1 0.3] and classified to class: B

  OpenCV中也提供了机器学习的相关算法,其中KNN算法的最基本例子如下

import numpy as np
import matplotlib.pyplot as plt
import cv2 # Feature set containing (x,y) values of 25 known/training data trainData = np.random.randint(0,100,(25,2)).astype(np.float32) # Labels each one either Red or Blue with numbers 0 and 1 responses = np.random.randint(0,2,(25,1)).astype(np.float32) # Take Red families and plot them red = trainData[responses.ravel()==0] plt.scatter(red[:,0],red[:,1],80,'r','^') # Take Blue families and plot them blue = trainData[responses.ravel()==1] plt.scatter(blue[:,0],blue[:,1],80,'b',
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值