matlab分类算法代码,K近邻分类器的matlab代码(Matlab code of k-nearest neighbors)

function rate = KNN(Train_data,Train_label,Test_data,Test_label,k,Distance_mark);

% K-Nearest-Neighbor classifier(K-NN classifier)

%Input:

%     Train_data,Test_data are training data set and test data

%     set,respectively.(Each row is a data point)

%     Train_label,Test_label are column vectors.They are labels of training

%     data set and test data set,respectively.

%     k is the number of nearest neighbors

%     Distance_mark           :   ['Euclidean', 'L2'| 'L1' | 'Cos']

%     'Cos' represents Cosine distance.

%Output:

%     rate:Accuracy of K-NN classifier

%

%    Examples:

%

% %Classification problem with three classes

% A = rand(50,300);

% B = rand(50,300)+2;

% C = rand(50,300)+3;

% % label vector for the three classes

% gnd = [ones(300,1);2*ones(300,1);3*ones(300,1)];

% fea = [A B C]';

% trainIdx = [1:150,301:450,601:750]';

% testIdx = [151:300,451:600,751:900]';

% fea_Train = fea(trainIdx,:);

% gnd_Train = gnd(trainIdx);

% fea_Test = fea(testIdx,:);

% gnd_Test = gnd(testIdx);

% rate = KNN(fea_Train,gnd_Train,fea_Test,gnd_Test,1)

%

%

%

%Reference:

%

% If you used my matlab code, we appreciate it very much if you can cite our following papers:

% Jie Gui, Tongliang Liu, Dacheng Tao, Zhenan Sun, Tieniu Tan, "Representative Vector Machines: A unified framework for classical classifiers", IEEE

% Transactions on Cybernetics.

%This code is written by Gui Jie in the evening 2009/03/11.

%If you have find some bugs in the codes, feel free to contract me

if nargin < 5

error('Not enought arguments!');

elseif nargin < 6

Distance_mark='L2';

end

[n dim]    = size(Test_data);% number of test data set

train_num  = size(Train_data, 1); % number of training data set

% Normalize each feature to have zero mean and unit variance.

% If you need the following four rows,you can uncomment them.

% M        = mean(Train_data); % mean & std of the training data set

% S        = std(Train_data);

% Train_data = (Train_data - ones(train_num, 1) * M)./(ones(train_num, 1) * S); % normalize training data set

% Test_data            = (Test_data-ones(n,1)*M)./(ones(n,1)*S); % normalize data

U        = unique(Train_label); % class labels

nclasses = length(U);%number of classes

Result  = zeros(n, 1);

Count   = zeros(nclasses, 1);

dist=zeros(train_num,1);

for i = 1:n

% compute distances between test data and all training data and

% sort them

test=Test_data(i,:);

for j=1:train_num

train=Train_data(j,:);V=test-train;

switch Distance_mark

case {'Euclidean', 'L2'}

dist(j,1)=norm(V,2); % Euclead (L2) distance

case 'L1'

dist(j,1)=norm(V,1); % L1 distance

case 'Cos'

dist(j,1)=acos(test*train'/(norm(test,2)*norm(train,2)));     % cos distance

otherwise

dist(j,1)=norm(V,2); % Default distance

end

end

[Dummy Inds] = sort(dist);

% compute the class labels of the k nearest samples

Count(:) = 0;

for j = 1:k

ind        = find(Train_label(Inds(j)) == U); %find the label of the j'th nearest neighbors

Count(ind) = Count(ind) + 1;

end% Count:the number of each class of k nearest neighbors

% determine the class of the data sample

[dummy ind] = max(Count);

Result(i)   = U(ind);

end

correctnumbers=length(find(Result==Test_label));

rate=correctnumbers/n;

--------------------------------------------------以上是代码---------------------------------------------------------------------

余弦距离和余弦相似度的区别

餘弦相似度(cosine similarity)乃是傳統文件分類中,最常被拿來度量文件間距離的基本度量方法,其以兩個 d 維向量間的角度差異來度量該向量間的距離,所得數據介於 0 ~ 1 之間,當兩向量角度越相近時,所求出的餘弦距離越接近1;反之,則越接近 0。假設在 d 維空間中有兩點a = [a1, a2, …, ad],b =  [b1, b2, …,bd],則其餘弦相似度可表示為:

cosineSimilarity(a,b) = dot(a,b) / (norm(a)*norm(b))   [我觉得这里说成cosineSimilarity,不应该说成cosineDistance。相似度越大,距离应该越小。比如,a和b夹角为0,此时最相似,相似度最大,距离最小]

dot(a,b)  代表a和b的内积,因为向量内积定义为a·b = |a| × |b| × cosθ, (一般情况下,θ∈[0,π], http://baike.baidu.com/view/1485493.htm )。故这样定义不能满足在 0 ~ 1 之間,而是-1到1之间,有两种方式:

(1) 我下面的代码是正确的,用acos,将这个余弦转化为[0, π]之间的角度. 未必一定要限制在0 ~ 1 ,我的代码转化成[0, π],值越大代表其距离越大;

(2) cosineDistance(a,b) = 1- cosineSimilarity(a,b) = 1- dot(a,b) / (norm(a)*norm(b))。cosineDistance的范围就在[0 2]。

範例:

a=[1 1 1]; b=[1 0 0];

cosineDistance = dot(a,b) / (norm(a)*norm(b))

cosineDistance =

0.5774   [http://neural.cs.nthu.edu.tw/jang/books/dcpr/doc/02%E8%B7%9D%E9%9B%A2%E8%88%87%E7%9B%B8%E4%BC%BC%E5%BA%A6.pdf ,已经保存到电脑:距离与相似度.pdf]

"KDD17_Linearized GMM Kernels"的公式1是cosine的等价表示形式;“Min-Max Kernels 1503.01737”的公式1是Min-Max Kernel的标准定义

Minimal Local Reconstruction Error Measure Based Discriminant Feature Extraction and Classification的P3

For robustness, before classification, we generally need to normalize feature vectors, making the length of each feature vector to be 1, that is, x -> x /|| x || .

The normalized Euclidean distance is equivalent to the cosine distance.(1) Lin Zhu 师弟讲将循环改为计算距离矩阵会节省时间,因为matlab循环很耗时,但大样本还必须用循环否则out of memory.想起以前上课jinsong老师也提供了一个KNN代码,不过他的也是用循环实现的.matlab有自带的函数knnclassify,论文Sparsity preserving projections的代码SPP_1NN.m中就用的该函数。在ASLAN上我的KNN和knnclassify识别率完全一样(2)极其重要注意点:倒数第四行程序不要用Result(i)   = ind;这对Yale等标号依次为1,2,3等没问题。对二分类1和-1就有问题。SRC_QC和SRC_QC2也是类似的,倒数第三行不能用Result(i) = index, 要用Result(i) = classLabel(index); 原来只修改了这一处,其实SRC_QC2的50行和SRC_QC的42行也要将ii修改为classLabel(ii)。正因为这个错误,才得出SRC在ASLAN上是50%错误率方差是0的错误结果。正确的SRC_QC2和SRC_QC程序在ASLAN目录

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值