bag of words matlab,Bag of words（matlab实现）

最新推荐文章于 2021-03-24 17:49:45 发布

weixin_39676021

最新推荐文章于 2021-03-24 17:49:45 发布

阅读量124

点赞数

文章标签： bag of words matlab

Bag of Word主要思想：将训练样本特征Kmeans聚类，对测试样本的每个特征，计算与其最近的类心，相应类别计数count加1，这样每个测试样本可以生成ncenter维的直方图。

比如：训练样本特征a、b、c、a、d、f、e、b、e、d、c、f，如果类别数ncenter为6，则可以聚成6类[a,b,c,d,e,f]注意实际聚类时类心不一定为训练样本中特征，因为kmeans聚类更新类心时都重新计算。

假如一个测试样本特征为：a、b、c、d.那么经过BoW生成6维的直方图[1,1,1,1,0,0].

其实前面就是kmeans，然后Hard voting。关于kmeans不细说了，就是更新类心的过程，一直到类心变化在误差范围内。

kmeans聚类时用的训练数据中center个随机数据初始化，后面用的欧氏距离度量，其中计算欧氏距离时用了矢量化编程，加速运算。

这是参考了别人的代码实现的，每个人针对自己的研究可能还需要小小修改。适合入门的看看。

function dic=CalDic(data,dicsize)

fprintf('Building Dictionary using Training Data\n\n');

dictionarySize = dicsize;

niters=100;%迭代次数

centres=zeros(dictionarySize,size(data,2));

[ndata,data_dim]=size(data);

[ncentres,dim]=size(centres);

%% initialization

perm = randperm(ndata);

perm = perm(1:ncentres);

centres = data(perm, :);

num_points=zeros(1,dictionarySize);

old_centres = centres;

display('Run k-means');

for n=1:niters

% Save old centres to check for termination

e2=max(max(abs(centres - old_centres)));

inError(n)=e2;

old_centres = centres;

tempc = zeros(ncentres, dim);

num_points=zeros(1,ncentres);

[ndata, data_dim] = size(data);

id = eye(ncentres);

d2 = EuclideanDistance(data,centres);

% Assign each point to nearest centre

[minvals, index] = min(d2', [], 1);

post = id(index,:); % matrix, if word i is in cluster j, post(i,j)=1, else 0;

num_points = num_points + sum(post, 1);

for j = 1:ncentres

tempc(j,:) = tempc(j,:)+sum(data(find(post(:,j)),:), 1);

end

for j = 1:ncentres

if num_points(j)>0

centres(j,:) = tempc(j,:)/num_points(j);

end

end

if n > 1

% Test for termination

%Threshold

ThrError=0.009;

if max(max(abs(centres - old_centres))) <0.009

dictionary= centres;

fprintf('Saving texton dictionary\n');

mkdir('data');%建立data文件夹

save ('data\dictionary','dictionary');%保存dictionary到data文件夹下。

break;

end

fprintf('The %d th interation finished \n',n);

end

end

下面是欧氏距离函数：

function d = EuclideanDistance(a,b)

% DISTANCE - computes Euclidean distance matrix

%

% E = EuclideanDistance(A,B)

%

% A - (MxD) matrix

% B - (NxD) matrix

%

% Returns:

% E - (MxN) Euclidean distances between vectors in A and B

%

%

% Description :

% This fully vectorized (VERY FAST!) m-file computes the

% Euclidean distance between two vectors by:

%

% ||A-B|| = sqrt ( ||A||^2 + ||B||^2 - 2*A.B )

%

% Example :

% A = rand(100,400); B = rand(200,400);

% d = EuclideanDistance(A,B);

% Author : Roland Bunschoten

% University of Amsterdam

% Intelligent Autonomous Systems (IAS) group

% Kruislaan 403 1098 SJ Amsterdam

% tel.(+31)20-5257524

% bunschot@wins.uva.nl

% Last Rev : Oct 29 16:35:48 MET DST 1999

% Tested : PC Matlab v5.2 and Solaris Matlab v5.3

% Thanx : Nikos Vlassis

% Copyright notice: You are free to modify, extend and distribute

% this code granted that the author of the original code is

% mentioned as the original author of the code.

if (nargin ~= 2)

b=a;

end

if (size(a,2) ~= size(b,2))

error('A and B should be of same dimensionality');

end

aa=sum(a.*a,2); bb=sum(b.*b,2); ab=a*b';

d = sqrt(abs(repmat(aa,[1 size(bb,1)]) + repmat(bb',[size(aa,1) 1]) - 2*ab));

下面是Hard Voting函数：

function His=HardVoting(data,dic)

ncentres=size(dic,1);

id = eye(ncentres);

d2 = EuclideanDistance(data,dic);% Assign each point to nearest centre

[minvals, index] = min(d2', [], 1);

post = id(index,:); % matrix, if word i is in cluster j, post(i,j)=1, else 0

His=sum(post, 1);

end

如果用于分类问题，可以尝试用LLC(CVPR2010) 一般比Hard Voting效果好。

weixin_39676021

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
bag of words matlab,Bag of words（matlab实现）

Bag of Word主要思想：将训练样本特征Kmeans聚类，对测试样本的每个特征，计算与其最近的类心，相应类别计数count加1，这样每个测试样本可以生成ncenter维的直方图。比如：训练样本特征a、b、c、a、d、f、e、b、e、d、c、f，如果类别数ncenter为6，则可以聚成6类[a,b,c,d,e,f]注意实际聚类时类心不一定为训练样本中特征，因为kmeans聚类更新类心时都重新计...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。