视觉机器学习20讲-MATLAB源码示例（2）-KNN学习算法

mozun2020

已于 2022-05-07 23:37:27 修改

阅读量763

点赞数

分类专栏： ML1:视觉机器learning20讲-MATLAB源码示例文章标签：计算机视觉 MATLAB Matlab宝典 KNN 图像处理

于 2022-04-05 14:40:19 首次发布

本文链接：https://blog.csdn.net/sinat_34897952/article/details/123969016

版权

ML1:视觉机器learning20讲-MATLAB源码示例专栏收录该内容

20 篇文章 41 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

本文介绍了KNN（K-Nearest Neighbor）学习算法，包括其原理、Matlab仿真过程及优缺点。KNN是一种基础的机器学习算法，适用于非线性分类，尤其在处理样本容量较大的类别时。然而，其计算量大、空间复杂度高，对样本不平衡问题敏感。文中还提到了通过加权和调整K值来改善效果。

摘要由CSDN通过智能技术生成

1. KNN学习算法

KNN（K-Nearest Neighbor）算法是机器学习算法中最基础、最简单的算法之一。它既能用于分类，也能用于回归。KNN通过测量不同特征值之间的距离来进行分类。

具体思路为：如果一个样本在特征空间中的K个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。也就是说，该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。

一般来说，KNN分类算法的计算过程：

1）计算待分类点与已知类别的点之间的距离

2）按照距离递增次序排序

3）选取与待分类点距离最小的K个点

4）确定前K个点所在类别的出现次数

5）返回前K个点出现次数最高的类别作为待分类点的预测分类

2. Matlab仿真

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%功能：演示KNN算法在计算机视觉中的应用
%实现如何利用KNN算法进行聚类；
%Modi: C.S
%环境：Win7，Matlab2018a
%时间：2022-4-5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function main
trainData = [
    0.6213    0.5226    0.9797    0.9568    0.8801    0.8757    0.1730    0.2714    0.2523
    0.7373    0.8939    0.6614    0.0118    0.1991    0.0648    0.2987    0.2844    0.4692
    ];
trainClass = [
    1     1     1     2     2     2     3     3     3
    ];
testData = [
    0.9883    0.5828    0.4235    0.5155    0.3340
    0.4329    0.2259    0.5798    0.7604    0.5298
    ];

% main
testClass = cvKnn(testData, trainData, trainClass);

% plot prototype vectors
classLabel = unique(trainClass);
nClass     = length(classLabel);
plotLabel = {'r*', 'g*', 'b*'};
figure;
for i=1:nClass
    A = trainData(:, trainClass == classLabel(i));
    plot(A(1,:), A(2,:), plotLabel{i});
    hold on;
end

% plot classifiee vectors
plotLabel = {'ro', 'go', 'bo'};
for i=1:nClass
    A = testData(:, testClass == classLabel(i));
    plot(A(1,:), A(2,:), plotLabel{i});
    hold on;
end
legend('1: prototype','2: prototype', '3: prototype', '1: classifiee', '2: classifiee', '3: classifiee', 'Location', 'NorthWest');
title('K nearest neighbor');
hold off;

% cvEucdist - Euclidean distance
%
% Synopsis
%   [d] = cvEucdist(X, Y)
%
% Description
%   cvEucdist calculates a squared euclidean distance between X and Y.
%
% Inputs ([]s are optional)
%   (matrix) X        D x N matrix where D is the dimension of vectors
%                     and N is the number of vectors.
%   (matrix) [Y]      D x P matrix where D is the dimension of vectors
%                     and P is the number of vectors.
%                     If Y is not given, the L2 norm of X is computed and
%                     1 x N matrix (not N x 1) is returned.
%
% Outputs ([]s are optional)
%   (matrix) d        N x P matrix where d(n,p) represents the squared
%                     euclidean distance between X(:,n) and Y(:,p).
%
% Examples
%   X = [1 2
%        1 2];
%   Y = [1 2 3
%        1 2 3];
%   d = cvEucdist(X, Y)
% %      0     2     8
% %      2     0     2
%
% See also
%   cvMahaldist

% Authors
%   Naotoshi Seo <sonots(at)sonots.com>
%
% License
%   The program is free to use for non-commercial academic purposes,
%   but for course works, you must understand what is going inside to use.
%   The program can be used, modified, or re-distributed for any purposes
%   if you or one of your group understand codes (the one must come to
%   court if court cases occur.) Please contact the authors if you are
%   interested in using the program without meeting the above conditions.
%
% Changes
%   06/2006  First Edition
function d = cvEucdist(X, Y)
 if ~exist('Y', 'var') || isempty(Y)
     %% Y = zeros(size(X, 1), 1);
     U = ones(size(X, 1), 1);
     d = abs(X'.^2*U).'; return;
 end
 V = ~isnan(X); X(~V) = 0; % V = ones(D, N); 
 %clear V;
 U = ~isnan(Y); Y(~U) = 0; % U = ones(D, P); 
 %clear U;
 %d = abs(X'.^2*U - 2*X'*Y + V'*Y.^2);
 d1 = X'.^2*U;
 d3 = V'*Y.^2;
 d2 = X'*Y;
 d = abs(d1-2*d2+d3);
 
% X = X';
% Y = Y';
% for i=1:size(X,1)
%     for j=1:size(Y,1)
%         d(i,j)=(norm(X(i,:)-Y(j,:)))^2;  %计算每个测试样本与所有训练样本的欧氏距离
%     end
% end

% cvKnn - K-Nearest Neighbor classification
%
% Synopsis
%   [Class] = cvKnn(X, Proto, ProtoClass, [K], [distFunc])
%
% Description
%   K-Nearest Neighbor classification
%
% Inputs ([]s are optional)
%   (matrix) X        D x N matrix representing column classifiee vectors
%                     where D is the number of dimensions and N is the
%                     number of vectors.
%   (matrix) Proto    D x P matrix representing column prototype vectors
%                     where D is the number of dimensions and P is the
%                     number of vectors.
%   (vector) ProtoClass
%                     1 x P vector containing class lables for prototype
%                     vectors. 
%   (scalar) [K = 1]  K-NN's K. Search K nearest neighbors.
%   (func)   [distFunc = @cvEucdist]
%                     A function handle for distance measure. The function
%                     must have two arguments for matrix X and Y. See
%                     cvEucdist.m (Euclidean distance) as a reference.
%
% Outputs ([]s are optional)
%   (vector) Class    1 x N vector containing classified class labels 
%                     for X. Class(n) is the class id for X(:,n). 
%   (matrix) [Rank]   Available only for NN (K = 1) now.
%                     nClass x N vector containing ranking class labels
%                     for X. Rank(1,n) is the 1st candidate which is 
%                     the same with Class(n), Rank(2,n) is the 2nd 
%                     candidate, Rank(3,n) is the 3rd, and so on.
%
% See also
%   cvEucdist, cvMahaldist

% Authors
%   Naotoshi Seo <sonots(at)sonots.com>
%
% License
%   The program is free to use for non-commercial academic purposes,
%   but for course works, you must understand what is going inside to use.
%   The program can be used, modified, or re-distributed for any purposes
%   if you or one of your group understand codes (the one must come to
%   court if court cases occur.) Please contact the authors if you are
%   interested in using the program without meeting the above conditions.
%
% Changes
%   04/01/2005  First Edition
function [Class, Rank] = cvKnn(X, Proto, ProtoClass, K, distFunc)
if ~exist('K', 'var') || isempty(K)
    K = 1;%默认为K = 1
end
if ~exist('distFunc', 'var') || isempty(distFunc)
    distFunc = @cvEucdist;
end
if size(X, 1) ~= size(Proto, 1)
    error('Dimensions of classifiee vectors and prototype vectors do not match.');
end
[D, N] = size(X);

% Calculate euclidean distances between classifiees and prototypes
d = distFunc(X, Proto);

if K == 1, % sort distances only if K>1
    [mini, IndexProto] = min(d, [], 2); % 2 == row%每列的最小元素
    Class = ProtoClass(IndexProto);
    if nargout == 2, % instance indices in similarity descending order
        [sorted, ind] = sort(d'); % PxN
        RankIndex = ProtoClass(ind); %,e.g., [2 1 2 3 1 5 4 1 2]'
        % conv into, e.g., [2 1 3 5 4]'
        for n = 1:N
            [ClassLabel, ind] = unique(RankIndex(:,n),'first');
            [sorted, ind] = sort(ind);
            Rank(:,n) = ClassLabel(ind);
        end
    end
else
    [sorted, IndexProto] = sort(d'); % PxN
    clear d；
    % K closest
    IndexProto = IndexProto(1:K,:);
    KnnClass = ProtoClass(IndexProto);
    % Find all class labels
    ClassLabel = unique(ProtoClass);
    nClass = length(ClassLabel);
    for i = 1:nClass
        ClassCounter(i,:) = sum(KnnClass == ClassLabel(i));
    end
    [maxi, winnerLabelIndex] = max(ClassCounter, [], 1); % 1 == col
    % Future Work: Handle ties somehow
    Class = ClassLabel(winnerLabelIndex);
end

3. 仿真结果

在这里插入图片描述

4. 小结

KNN的优缺点是什么？

优点：

（1）算法简单，理论成熟，既可以用来做分类也可以用来做回归。

（2）可用于非线性分类。

（3）没有明显的训练过程，而是在程序开始运行时，把数据集加载到内存后，不需要进行训练，直接进行预测，所以训练时间复杂度为0。

（4）由于KNN方法主要靠周围有限的邻近的样本，而不是靠判别类域的方法来确定所属的类别，因此对于类域的交叉或重叠较多的待分类样本集来说，KNN方法较其他方法更为适合。

（5）该算法比较适用于样本容量比较大的类域的自动分类，而那些样本容量比较小的类域采用这种算法比较容易产生误分类情况。

缺点：

（1）需要算每个测试点与训练集的距离，当训练集较大时，计算量相当大，时间复杂度高，特别是特征数量比较大的时候。

（2）需要大量的内存，空间复杂度高。

（3）样本不平衡问题（即有些类别的样本数量很多，而其它样本的数量很少），对稀有类别的预测准确度低。

（4）是lazy learning方法，基本上不学习，导致预测时速度比起逻辑回归之类的算法慢。

注意，为了克服降低样本不平衡对预测准确度的影响，可以对不同类别进行加权，例如对样本数量多的类别用较小的权重，而对样本数量少的类别，使用较大的权重。另外，作为KNN算法唯一的一个超参数K,它的设定也会算法产生重要影响。因此，为了降低K值设定的影响，可以对距离加权。为每个点的距离增加一个权重，使得距离近的点可以得到更大的权重。