视觉机器学习20讲-MATLAB源码示例(2)-KNN学习算法
1. KNN学习算法
KNN(K-Nearest Neighbor)算法是机器学习算法中最基础、最简单的算法之一。它既能用于分类,也能用于回归。KNN通过测量不同特征值之间的距离来进行分类。
具体思路为:如果一个样本在特征空间中的K个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。也就是说,该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。
一般来说,KNN分类算法的计算过程:
1)计算待分类点与已知类别的点之间的距离
2)按照距离递增次序排序
3)选取与待分类点距离最小的K个点
4)确定前K个点所在类别的出现次数
5)返回前K个点出现次数最高的类别作为待分类点的预测分类
2. Matlab仿真
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%功能:演示KNN算法在计算机视觉中的应用
%实现如何利用KNN算法进行聚类;
%Modi: C.S
%环境:Win7,Matlab2018a
%时间:2022-4-5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function main
trainData = [
0.6213 0.5226 0.9797 0.9568 0.8801 0.8757 0.1730 0.2714 0.2523
0.7373 0.8939 0.6614 0.0118 0.1991 0.0648 0.2987 0.2844 0.4692
];
trainClass = [
1 1 1 2 2 2 3 3 3
];
testData = [
0.9883 0.5828 0.4235 0.5155 0.3340
0.4329 0.2259 0.5798 0.7604 0.5298
];
% main
testClass = cvKnn(testData, trainData, trainClass);
% plot prototype vectors
classLabel = unique(trainClass);
nClass = length(classLabel);
plotLabel = {'r*', 'g*', 'b*'};
figure;
for i=1:nClass
A = trainData(:, trainClass == classLabel(i));
plot(A(1,:), A(2,:), plotLabel{i});
hold on;
end
% plot classifiee vectors
plotLabel = {'ro', 'go', 'bo'};
for i=1:nClass
A = testData(:, testClass == classLabel(i));
plot(A(1,:), A(2,:), plotLabel{i});
hold on;
end
legend('1: prototype','2: prototype', '3: prototype', '1: classifiee', '2: classifiee', '3: classifiee', 'Location', 'NorthWest');
title('K nearest neighbor');
hold off;
% cvEucdist - Euclidean distance
%
% Synopsis
% [d] = cvEucdist(X, Y)
%
% Description
% cvEucdist calculates a squared euclidean distance between X and Y.
%
% Inputs ([]s are optional)
% (matrix) X D x N matrix where D is the dimension of vectors
% and N is the number of vectors.
% (matrix) [Y] D x P matrix where D is the dimension of vectors
% and P is the number of vectors.
% If Y is not given, the L2 norm of X is computed and
% 1 x N matrix (not N x 1) is returned.
%
% Outputs ([]s are optional)
% (matrix) d N x P matrix where d(n,p) represents the squared
% euclidean distance between X(:,n) and Y(:,p).
%
% Examples
% X = [1 2
% 1 2];
% Y = [1 2 3
% 1 2 3];
% d = cvEucdist(X, Y)
% % 0 2 8
% % 2 0 2
%
% See also
% cvMahaldist
% Authors
% Naotoshi Seo <sonots(at)sonots.com>
%
% License
% The program is free to use for non-commercial academic purposes,
% but for course works, you must understand what is going inside to use.
% The program can be used, modified, or re-distributed for any purposes
% if you or one of your group understand codes (the one must come to
% court if court cases occur.) Please contact the authors if you are
% interested in using the program without meeting the above conditions.
%
% Changes
% 06/2006 First Edition
function d = cvEucdist(X, Y)
if ~exist('Y', 'var') || isempty(Y)
%% Y = zeros(size(X, 1), 1);
U = ones(size(X, 1), 1);
d = abs(X'.^2*U).'; return;
end
V = ~isnan(X); X(~V) = 0; % V = ones(D, N);
%clear V;
U = ~isnan(Y); Y(~U) = 0; % U = ones(D, P);
%clear U;
%d = abs(X'.^2*U - 2*X'*Y + V'*Y.^2);
d1 = X'.^2*U;
d3 = V'*Y.^2;
d2 = X'*Y;
d = abs(d1-2*d2+d3);
% X = X';
% Y = Y';
% for i=1:size(X,1)
% for j=1:size(Y,1)
% d(i,j)=(norm(X(i,:)-Y(j,:)))^2; %计算每个测试样本与所有训练样本的欧氏距离
% end
% end
% cvKnn - K-Nearest Neighbor classification
%
% Synopsis
% [Class] = cvKnn(X, Proto, ProtoClass, [K], [distFunc])
%
% Description
% K-Nearest Neighbor classification
%
% Inputs ([]s are optional)
% (matrix) X D x N matrix representing column classifiee vectors
% where D is the number of dimensions and N is the
% number of vectors.
% (matrix) Proto D x P matrix representing column prototype vectors
% where D is the number of dimensions and P is the
% number of vectors.
% (vector) ProtoClass
% 1 x P vector containing class lables for prototype
% vectors.
% (scalar) [K = 1] K-NN's K. Search K nearest neighbors.
% (func) [distFunc = @cvEucdist]
% A function handle for distance measure. The function
% must have two arguments for matrix X and Y. See
% cvEucdist.m (Euclidean distance) as a reference.
%
% Outputs ([]s are optional)
% (vector) Class 1 x N vector containing classified class labels
% for X. Class(n) is the class id for X(:,n).
% (matrix) [Rank] Available only for NN (K = 1) now.
% nClass x N vector containing ranking class labels
% for X. Rank(1,n) is the 1st candidate which is
% the same with Class(n), Rank(2,n) is the 2nd
% candidate, Rank(3,n) is the 3rd, and so on.
%
% See also
% cvEucdist, cvMahaldist
% Authors
% Naotoshi Seo <sonots(at)sonots.com>
%
% License
% The program is free to use for non-commercial academic purposes,
% but for course works, you must understand what is going inside to use.
% The program can be used, modified, or re-distributed for any purposes
% if you or one of your group understand codes (the one must come to
% court if court cases occur.) Please contact the authors if you are
% interested in using the program without meeting the above conditions.
%
% Changes
% 04/01/2005 First Edition
function [Class, Rank] = cvKnn(X, Proto, ProtoClass, K, distFunc)
if ~exist('K', 'var') || isempty(K)
K = 1;%默认为K = 1
end
if ~exist('distFunc', 'var') || isempty(distFunc)
distFunc = @cvEucdist;
end
if size(X, 1) ~= size(Proto, 1)
error('Dimensions of classifiee vectors and prototype vectors do not match.');
end
[D, N] = size(X);
% Calculate euclidean distances between classifiees and prototypes
d = distFunc(X, Proto);
if K == 1, % sort distances only if K>1
[mini, IndexProto] = min(d, [], 2); % 2 == row%每列的最小元素
Class = ProtoClass(IndexProto);
if nargout == 2, % instance indices in similarity descending order
[sorted, ind] = sort(d'); % PxN
RankIndex = ProtoClass(ind); %,e.g., [2 1 2 3 1 5 4 1 2]'
% conv into, e.g., [2 1 3 5 4]'
for n = 1:N
[ClassLabel, ind] = unique(RankIndex(:,n),'first');
[sorted, ind] = sort(ind);
Rank(:,n) = ClassLabel(ind);
end
end
else
[sorted, IndexProto] = sort(d'); % PxN
clear d;
% K closest
IndexProto = IndexProto(1:K,:);
KnnClass = ProtoClass(IndexProto);
% Find all class labels
ClassLabel = unique(ProtoClass);
nClass = length(ClassLabel);
for i = 1:nClass
ClassCounter(i,:) = sum(KnnClass == ClassLabel(i));
end
[maxi, winnerLabelIndex] = max(ClassCounter, [], 1); % 1 == col
% Future Work: Handle ties somehow
Class = ClassLabel(winnerLabelIndex);
end
3. 仿真结果
4. 小结
KNN的优缺点是什么?
优点:
(1)算法简单,理论成熟,既可以用来做分类也可以用来做回归。
(2)可用于非线性分类。
(3)没有明显的训练过程,而是在程序开始运行时,把数据集加载到内存后,不需要进行训练,直接进行预测,所以训练时间复杂度为0。
(4)由于KNN方法主要靠周围有限的邻近的样本,而不是靠判别类域的方法来确定所属的类别,因此对于类域的交叉或重叠较多的待分类样本集来说,KNN方法较其他方法更为适合。
(5)该算法比较适用于样本容量比较大的类域的自动分类,而那些样本容量比较小的类域采用这种算法比较容易产生误分类情况。
缺点:
(1)需要算每个测试点与训练集的距离,当训练集较大时,计算量相当大,时间复杂度高,特别是特征数量比较大的时候。
(2)需要大量的内存,空间复杂度高。
(3)样本不平衡问题(即有些类别的样本数量很多,而其它样本的数量很少),对稀有类别的预测准确度低。
(4)是lazy learning方法,基本上不学习,导致预测时速度比起逻辑回归之类的算法慢。
注意,为了克服降低样本不平衡对预测准确度的影响,可以对不同类别进行加权,例如对样本数量多的类别用较小的权重,而对样本数量少的类别,使用较大的权重。 另外,作为KNN算法唯一的一个超参数K,它的设定也会算法产生重要影响。因此,为了降低K值设定的影响,可以对距离加权。为每个点的距离增加一个权重,使得距离近的点可以得到更大的权重。
本系列文章列表如下:
视觉机器学习20讲-MATLAB源码示例(1)-Kmeans聚类算法
视觉机器学习20讲-MATLAB源码示例(2)-KNN学习算法
视觉机器学习20讲-MATLAB源码示例(3)-回归学习算法
视觉机器学习20讲-MATLAB源码示例(4)-决策树学习算法
视觉机器学习20讲-MATLAB源码示例(5)-随机森林(Random Forest)学习算法
视觉机器学习20讲-MATLAB源码示例(6)-贝叶斯学习算法
视觉机器学习20讲-MATLAB源码示例(7)-EM算法
视觉机器学习20讲-MATLAB源码示例(8)-Adaboost算法
视觉机器学习20讲-MATLAB源码示例(9)-SVM算法
视觉机器学习20讲-MATLAB源码示例(10)-增强学习算法
视觉机器学习20讲-MATLAB源码示例(11)-流形学习算法
视觉机器学习20讲-MATLAB源码示例(12)-RBF学习算法
视觉机器学习20讲-MATLAB源码示例(13)-稀疏表示算法
视觉机器学习20讲-MATLAB源码示例(14)-字典学习算法
视觉机器学习20讲-MATLAB源码示例(15)-BP学习算法
视觉机器学习20讲-MATLAB源码示例(16)-CNN学习算法
视觉机器学习20讲-MATLAB源码示例(17)-RBM学习算法
视觉机器学习20讲-MATLAB源码示例(18)-深度学习算法
视觉机器学习20讲-MATLAB源码示例(19)-遗传算法
视觉机器学习20讲-MATLAB源码示例(20)-蚁群算法