Matlab 对数据按类别进行分层抽样
function [train_x,train_y,test_x,test_y] = getdata()
[data,~] = xlsread("XXX.xlsx");
data(find(isnan(data)))=0; % 去掉数据中的NaN
labels = data(:,end); % 最后一列是标签列
train_x=[];
train_y=[];
test_x=[];
test_y=[];
%% 层次抽样 在数据的每一类中,按一定比例抽取数据,构成训练集,剩下的作为测试集
scala = 0.7; % 每一类中,训练集抽取的比例
for label=1:length(unique(labels))
cate = find(labels==label);
half = int32(length(cate)*scala);
train = cate(randperm(length(cate),half)); %当前类下,抽取的训练集的所在行
test = setdiff(cate,train); % 当前类下,剩余的也就是测试集的所在行
train_x = [train_x;data(train,1:end-1)];
train_y = [train_y;labels(train)];
test_x = [test_x;data(test,1:end-1)];
test_y = [test_y;labels(test)];
end
end
每篇小附录:
机器学习与人工智能顶级期刊JMLR