matlab实现了一下《统计机器学习》P140页的例子。adaboost是一个nonUniform的集成算法,提升错误样本的权重,降低正确样本的权重,再用分类器分类训练。
findbest函数,找到最优的decision stump的位置
%% findBest(X,Y,W) return best decision stump
% parameter contains all the information of the basic classification
% parameter:
% ----index--------context-----
% 1 correctRate
% 2 decision stump
% 3 direction
% ----------------
function [parameter,bestResult]=findBest(X,Y,W)
decisionStump=(X+X+1)/2;
bestCorrectRate=0;
bestDecisionStump=0;
bestSignal=0;
for i=1:size(decisionStump,2)
A=zeros(1,10);
A(find(X<decisionStump(1,i)))=1;
A(find(X>=decisionStump(1,i)))=-1;
correctRate=sum((A==Y).*W);%correct rate
signal=1;
result=double(A==Y);
if correctRate<0.5
signal=-1;
correctRate=1-correctRate;
result=1-double(A==Y);
end
if correctRate>bestCorrectRate
bestCorrectRate=correctRate;
bestDecisionStump=decisionStump(1,i);
bestSignal=signal;
result(find(result==0))=-1;
bestResult=result;
end
end
parameter=[bestCorrectRate;bestDecisionStump;bestSignal];
主函数
%% adaboost function
%
% parameter contains all the information of the basic classification
% parameter:
% ----index--------context-----
% 1 correctRate
% 2 decision stump
% 3 direction
% -----------------------------
% original data
X=[0,1,2,3,4,5,6,7,8,9];
Y=[1,1,1,-1,-1,-1,1,1,1,-1];
%init weight
W=0.1*ones(1,10);
% train model with less error by disition stump
for i=1:4 % max iterator times
[parameter,bestResult]=findBest(X,Y,W);
errorRate=1-parameter(1,1);
amount=1/2*log((1-errorRate)/(errorRate));
% updata W
W=W.*exp(-(bestResult.*amount));
W=W/sum(W,2);
end