我发现,我是个懒人。不对,我一直是个懒人。
但是!电光火石间!不知怎么地!我决定!我写个博客吧
=====================我是废话的分割线===================
最开始接触Faster R-CNN,先尝试跑的其实是PYTHON版,但是编译过程中出错了,我又从来没接触过python,自己稍稍处理了一下没成功,于是就放弃了,跑去跑MATLAB版了......看!再次证明我是有多懒!不过后来因为需要还是没躲过PYTHON,后续在别的博客中再详细说。
恩,首先是各种乱七八糟的Caffe之类的环境配置,因为之前都配好了这里就不说了,恩还有装MATLAB。
然后,就按https://github.com/ShaoqingRen/faster_rcnn这里的一步步来就好了。运行demo可以看到定位识别效果。
接下来重点说明一下怎么用自己的数据训练模型。
首先,要先把自己的数据做成VOC2007的格式。
1.标注图像。就是框图啦,把结果保存到TXT中,matlab代码如下:
- close all;
- clc
- label = 'XXX';
- folder_path ='pics/';
- img_path_list = dir(folder_path);
- img_name_tmp = {img_path_list.name};
- img_name_all = img_name_tmp(3:end);
- img_num = length(img_name_all); % get file num
- if img_num > 0
- for j=1:img_num
- im_names{j} = [folder_path, img_name_all{j}];
- end
- end
- %
- f = fopen('XXX_label.txt','a+t');
- for j = 1:length(im_names)
- im = imread(im_names{j});
- figure,imshow(im);
- fprintf(f,'%s',img_name_tmp{j+2});
- n=input('input the number of XXX:');
- for i=1:n1
- [I,RECT] = imcrop(); %RECT:[XMIN YMIN WIDTH HEIGHT]
- fprintf(f, ' %s %f %f %f %f',label,RECT(1),RECT(2),RECT(1)+RECT(3),RECT(2)+RECT(4)); %[XMIN YMIN XMAX YMAX]
- end
- fprintf(f,'\n');
- close;
- end
- fclose(f);
生成的txt文件内容大概是酱紫的:
- 000001.jpg xxx 2.510000 4.510000 256.490000 252.490000
- 000002.jpg xxx 302.510000 23.510000 346.490000 215.490000
2.生成XML,放到自己建的VOCxxx/Annotations中,matlab代码如下:
- path_image='/home/vision/jinglihua/data_lable/xxx/';
- path_label = 'xxx_label.txt';
- label = 'xxx ';
- files_all=dir(path_image);
- fo = fopen(path_label);
- msg = textread(path_label,'%s');
- h=1;
- for i = 3:length(files_all)
- msg1 = fgetl(fo);
- people = strfind(msg1,label);
- label_num = length(people);
- clear rec;
- clear rec1;
- path = ['./VOCxxx/Annotations/' msg{h}(1:end-4) '.xml'];
- file=fopen(path,'w');
- fprintf(file,'<annotation>\n');
- rec.folder = 'VOCxxx';
- rec.filename = msg{h}
- h = h +1;
- rec.source.database = 'The XXX Database';
- rec.source.annotation = 'The XXX Database';
- rec.source.image = 'jinglihua';
- rec.source.flickrid = '20160504';
- rec.owner.flickrid = 'I do not know';
- rec.owner.name = 'I do not know';
- img = imread(['/home/vision/jinglihua/data_lable/xxx/' files_all(i).name]);
- rec.size.width = int2str(size(img,2));
- rec.size.height = int2str(size(img,1));
- rec.size.depth = int2str(size(img,3));
- rec.segmented = '0';
- writexml(file,rec,1);
- for j = 1:label_num
- rec1.object.name = msg{h};
- h = h +1;
- rec1.object.pose = 'Unspecified';
- rec1.object.truncated = '0';
- rec1.object.difficult = '0';
- rec1.object.bndbox.xmin = msg{h};
- h = h +1;
- rec1.object.bndbox.ymin = msg{h};
- h = h +1;
- rec1.object.bndbox.xmax = msg{h};
- h = h +1;
- rec1.object.bndbox.ymax = msg{h};
- h = h +1;
- writexml(file,rec1,1);
- end
- fprintf(file,'</annotation>\n');
- fclose(file);
- end
- fclose(fo);
- %writexml.m
- function xml = writexml(fid,rec,depth)
- %WRITEXML Summary of this function goes here
- % Detailed explanation goes here
- fn=fieldnames(rec);
- for i=1:length(fn)
- f=rec.(fn{i});
- if ~isempty(f)
- if isstruct(f)
- for j=1:length(f)
- fprintf(fid,'%s',repmat(char(9),1,depth));
- a=repmat(char(9),1,depth);
- fprintf(fid,'<%s>\n',fn{i});
- writexml(fid,rec.(fn{i})(j),depth+1);
- fprintf(fid,'%s',repmat(char(9),1,depth));
- fprintf(fid,'</%s>\n',fn{i});
- end
- else
- if ~iscell(f)
- f={f};
- end
- for j=1:length(f)
- fprintf(fid,'%s',repmat(char(9),1,depth));
- fprintf(fid,'<%s>',fn{i});
- if ischar(f{j})
- fprintf(fid,'%s',f{j});
- elseif isnumeric(f{j})&&numel(f{j})==1
- fprintf(fid,'%s',num2str(f{j}));
- else
- error('unsupported type');
- end
- fprintf(fid,'</%s>\n',fn{i});
- end
- end
- end
- end
3.生成训练集/验证集/测试集相关的4个txt文件,放到VOCxxx/ImageSets/Main中,matlab代码如下:
- %writetxt.m
- clear
- close all
- clc
- file = dir('/home/vision/jinglihua/data_lable/VOCxxx/Annotations');
- len = length(file)-2;
- num_trainval=sort(randperm(len, floor(9*len/10)));
- num_train=sort(num_trainval(randperm(length(num_trainval), floor(5*length(num_trainval)/6))));
- num_val=setdiff(num_trainval,num_train);
- num_test=setdiff(1:len,num_trainval);
- path = '/home/vision/jinglihua/data_lable/VOCxxx/ImageSets/Main/';
- fid=fopen(strcat(path, 'trainval.txt'),'a+');
- for i=1:length(num_trainval)
- s = sprintf('%s',file(num_trainval(i)+2).name);
- fprintf(fid,[s(1:length(s)-4) '\n']);
- end
- fclose(fid);
- fid=fopen(strcat(path, 'train.txt'),'a+');
- for i=1:length(num_train)
- s = sprintf('%s',file(num_train(i)+2).name);
- fprintf(fid,[s(1:length(s)-4) '\n']);
- end
- fclose(fid);
- fid=fopen(strcat(path, 'val.txt'),'a+');
- for i=1:length(num_val)
- s = sprintf('%s',file(num_val(i)+2).name);
- fprintf(fid,[s(1:length(s)-4) '\n']);
- end
- fclose(fid);
- fid=fopen(strcat(path, 'test.txt'),'a+');
- for i=1:length(num_test)
- s = sprintf('%s',file(num_test(i)+2).name);
- fprintf(fid,[s(1:length(s)-4) '\n']);
- end
- fclose(fid);
4.把标注的图像源文件拷到VOCxxx/JPEGImages目录下。然后,当当当当!数据准备完成!VOCxxx目录下架构如下:
5.训练过程。训练前需要修改一些文件,具体的情况http://blog.csdn.net/sinat_30071459/article/details/50546891这篇博客中讲的很清楚啦,按着来就好~
恩,MATLAB版的暂时就先写这么多啦啦啦