原文链接:https://blog.csdn.net/wwww1244/article/details/81034045
多标签分类在工程上有很多应用,例如,输入一张图片,判断这个人的年龄、性别和是否配戴眼镜。这时,数据集的label文件应当具有这样的格式:
- 000001.jpg 22 1 0
- 000002.jpg 30 1 1
- 000003.jpg 44 0 1
- 000004.jpg 17 0 0
假定第一个数字表示年龄,第二个0/1表示女/男,第三个0/1表示不戴眼镜/戴眼镜。
同样地,回归问题在CNN中也有很多应用,例如,在物体检测领域(例如Faster R-CNN、SSD、Yolo等网络),输入一张图片,输出bounding box的平移缩放参数(当然,实际情况会比这复杂一点。。)。这时,数据集的label文件应当具有这样的格式:
- 000001.jpg -0.01 -0.29 -0.05 0.02
- 000002.jpg 0.25 -0.04 0.02 -0.07
- 000003.jpg 0.23 0.05 -0.22 0.15
- 000004.jpg 0.20 -0.12 0.02 -0.20
四个参数分别表示x、y、w、h的变化量。
(实际上,上面是从MTCNN人脸检测网络的训练数据中截取的bounding box回归部分,关于MTCNN训练数据格式下一篇会提到)
--------------------------------------------------------------------------------
caffe对单标签分类问题提供了很好的支持,但是对多标签分类/回归问题的支持却不是很好,用于生成lmdb数据集的可执行文件build/tools/convert_imageset只能接受单标签和整形数,显然,多标签分类/回归问题需要程序接受多标签和浮点型数,这就是我们要做的工作。
需要注意的是,即使用浮点型数读入分类标签,也不会影响分类结果。
我希望能够继续使用lmdb数据集,这就需要修改一些caffe源码,虽然修改过程复杂了点,但因为基本是一劳永逸的,所以还是值得一试的。
参考博客:用 caffe 做回归 (上)
这位dalao的博客基本上可以解决问题,我对他做过的修改进行了汇总,总的来说,共需要修改以下几个文件:
- tools/convert_imageset.cpp:用于生成lmdb数据集。将该文件复制一份,重命名为tools/convert_imageset_multi.cpp并修改该文件。
- include/caffe/util/io.hpp:为上面的程序提供新的函数
- src/caffe/util/io.cpp
- src/caffe/layers/data_layer.cpp:用于加载lmdb数据集
- src/caffe/proto/caffe.proto:用于添加label_num参数
首先声明,我用的是BVLC版本的caffe(2018.7):https://github.com/BVLC/caffe
如果caffe版本不一样,这些文件很可能不能直接使用。接下来就是这些文件的修改版,其中,//###表示修改的部分,主要思路就是将int型数据转变为vector<float>型数据:
1. tools/convert_imageset_multi.cpp:
-
// This program converts a set of images to a lmdb/leveldb by storing them
-
// as Datum proto buffers.
-
// Usage:
-
// convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME
-
//
-
// where ROOTFOLDER is the root folder that holds all the images, and LISTFILE
-
// should be a list of files as well as their labels, in the format as
-
// subfolder1/file1.JPEG 7
-
// ....
-
-
#include <algorithm>
-
#include <fstream> // NOLINT(readability/streams)
-
#include <string>
-
#include <utility>
-
#include <vector>
-
-
#include "boost/scoped_ptr.hpp"
-
#include "gflags/gflags.h"
-
#include "glog/logging.h"
-
-
#include "caffe/proto/caffe.pb.h"
-
#include "caffe/util/db.hpp"
-
#include "caffe/util/format.hpp"
-
#include "caffe/util/io.hpp"
-
#include "caffe/util/rng.hpp"
-
-
#include <iostream> //###
-
#include <boost/tokenizer.hpp> //###
-
-
using
namespace caffe;
// NOLINT(build/namespaces)
-
using
std::pair;
-
using boost::scoped_ptr;
-
-
DEFINE_bool(gray,
false,
-
"When this option is on, treat images as grayscale ones");
-
DEFINE_bool(shuffle,
false,
-
"Randomly shuffle the order of images and their labels");
-
DEFINE_string(backend,
"lmdb",
-
"The backend {lmdb, leveldb} for storing the result");
-
DEFINE_int32(resize_width,
0,
"Width images are resized to");
-
DEFINE_int32(resize_height,
0,
"Height images are resized to");
-
DEFINE_bool(check_size,
false,
-
"When this option is on, check that all the datum have the same size");
-
DEFINE_bool(encoded,
false,
-
"When this option is on, the encoded image will be save in datum");
-
DEFINE_string(encode_type,
"",
-
"Optional: What type should we encode the image as ('png','jpg',...).");
-
-
int main(int argc, char** argv) {
-
#ifdef USE_OPENCV
-
::google::InitGoogleLogging(argv[
0]);
-
// Print output to stderr (while still logging)
-
FLAGS_alsologtostderr =
1;
-
-
#ifndef GFLAGS_GFLAGS_H_
-
namespace gflags = google;
-
#endif
-
-
gflags::SetUsageMessage(
"Convert a set of images to the leveldb/lmdb\n"
-
"format used as input for Caffe.\n"
-
"Usage:\n"
-
" convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME\n"
-
"The ImageNet dataset for the training demo is at\n"
-
" http://www.image-net.org/download-images\n");
-
gflags::ParseCommandLineFlags(&argc, &argv,
true);
-
-
if (argc <
4) {
-
gflags::ShowUsageWithFlagsRestrict(argv[
0],
"tools/convert_imageset");
-
return
1;
-
}
-
-
const
bool is_color = !FLAGS_gray;
-
const
bool check_size = FLAGS_check_size;
-
const
bool encoded = FLAGS_encoded;
-
const
string encode_type = FLAGS_encode_type;
-
-
std::
ifstream infile(argv[2]);
-
// std::vector<std::pair<std::string, int> > lines;
-
std::
vector<
std::pair<
std::
string,
std::
vector<
float> > > lines;
//### int -> vector<float>
-
std::
string line;
-
size_t pos;
-
// int label; //###
-
std::
vector<
float> labels;
//###
-
while (
std::getline(infile, line)) {
-
//###
-
// pos = line.find_last_of(' ');
-
// label = atoi(line.substr(pos + 1).c_str());
-
// lines.push_back(std::make_pair(line.substr(0, pos), label));
-
-
//###
-
std::
vector<
std::
string> tokens;
-
boost::char_separator<
char> sep(
" ");
-
boost::tokenizer<boost::char_separator<
char> > tok(line, sep);
-
tokens.clear();
-
std::copy(tok.begin(), tok.end(),
std::back_inserter(tokens));
-
-
for (
int i =
1; i < tokens.size(); ++i)
-
labels.push_back(atof(tokens.at(i).c_str()));
-
lines.push_back(
std::make_pair(tokens.at(
0), labels));
-
labels.clear();
-
}
-
if (FLAGS_shuffle) {
-
// randomly shuffle data
-
LOG(INFO) <<
"Shuffling data";
-
shuffle(lines.begin(), lines.end());
-
}
-
LOG(INFO) <<
"A total of " << lines.size() <<
" images.";
-
-
if (encode_type.size() && !encoded)
-
LOG(INFO) <<
"encode_type specified, assuming encoded=true.";
-
-
int resize_height =
std::max<
int>(
0, FLAGS_resize_height);
-
int resize_width =
std::max<
int>(
0, FLAGS_resize_width);
-
-
// Create new DB
-
scoped_ptr<db::DB> db(db::GetDB(FLAGS_backend));
-
db->Open(argv[
3], db::NEW);
-
scoped_ptr<db::Transaction> txn(db->NewTransaction());
-
-
// Storing to db
-
std::
string root_folder(argv[1]);
-
Datum datum;
-
int count =
0;
-
int data_size =
0;
-
bool data_size_initialized =
false;
-
-
for (
int line_id =
0; line_id < lines.size(); ++line_id) {
-
bool status;
-
-
std::
string enc = encode_type;
-
if (encoded && !enc.size()) {
-
// Guess the encoding type from the file name
-
string fn = lines[line_id].first;
-
size_t p = fn.rfind(
'.');
-
if ( p == fn.npos )
-
LOG(WARNING) <<
"Failed to guess the encoding of '" << fn <<
"'";
-
enc = fn.substr(p+
1);
-
std::transform(enc.begin(), enc.end(), enc.begin(), ::
tolower);
-
}
-
status = ReadImageToDatum(root_folder + lines[line_id].first,
//### 没有修改,但是调用了新的函数
-
lines[line_id].second, resize_height, resize_width, is_color,
-
enc, &datum);
-
if (status ==
false)
continue;
-
if (check_size) {
-
if (!data_size_initialized) {
-
data_size = datum.channels() * datum.height() * datum.width();
-
data_size_initialized =
true;
-
}
else {
-
const
std::
string& data = datum.data();
-
CHECK_EQ(data.size(), data_size) <<
"Incorrect data field size "
-
<< data.size();
-
}
-
}
-
// sequential
-
string key_str = caffe::format_int(line_id,
8) +
"_" + lines[line_id].first;
-
-
// Put in db
-
string out;
-
CHECK(datum.SerializeToString(&out));
-
txn->Put(key_str, out);
-
-
if (++count %
1000 ==
0) {
-
// Commit db
-
txn->Commit();
-
txn.reset(db->NewTransaction());
-
LOG(INFO) <<
"Processed " << count <<
" files.";
-
}
-
}
-
// write the last batch
-
if (count %
1000 !=
0) {
-
txn->Commit();
-
LOG(INFO) <<
"Processed " << count <<
" files.";
-
}
-
#else
-
LOG(FATAL) <<
"This tool requires OpenCV; compile with USE_OPENCV.";
-
#endif // USE_OPENCV
-
return
0;
-
}
2. include/caffe/util/io.hpp(部分):
找到ReadImageToDatum函数,并在下面添加(不要删除原函数)一个vector<float>类型的函数重载。
-
bool ReadImageToDatum(const string& filename, const int label,
-
const
int height,
const
int width,
const
bool is_color,
-
const
std::
string & encoding, Datum* datum);
-
-
bool ReadImageToDatum(const string& filename, const vector<float> labels, //###
-
const
int height,
const
int width,
const
bool is_color,
-
const
std::
string & encoding, Datum* datum);
3. src/caffe/util/io.cpp(部分):
添加上面函数的实现(这里注释掉的encoding部分与图像编码格式有关,默认encoding='',所以这部分可以直接注释掉)。
-
bool ReadImageToDatum(const string& filename, const int label,
-
const
int height,
const
int width,
const
bool is_color,
-
const
std::
string & encoding, Datum* datum) {
-
cv::Mat cv_img = ReadImageToCVMat(filename, height, width, is_color);
-
if (cv_img.data) {
-
if (encoding.size()) {
-
if ( (cv_img.channels() ==
3) == is_color && !height && !width &&
-
matchExt(filename, encoding) )
-
return ReadFileToDatum(filename, label, datum);
-
std::
vector<uchar> buf;
-
cv::imencode(
"."+encoding, cv_img, buf);
-
datum->set_data(
std::
string(
reinterpret_cast<
char*>(&buf[
0]),
-
buf.size()));
-
datum->set_label(label);
-
datum->set_encoded(
true);
-
return
true;
-
}
-
CVMatToDatum(cv_img, datum);
-
datum->set_label(label);
-
return
true;
-
}
else {
-
return
false;
-
}
-
}
-
-
bool ReadImageToDatum(const string& filename, const vector<float> labels,
-
const
int height,
const
int width,
const
bool is_color,
-
const
std::
string & encoding, Datum* datum) {
-
cv::Mat cv_img = ReadImageToCVMat(filename, height, width, is_color);
-
if (cv_img.data) {
-
// if (encoding.size()) {
-
// if ( (cv_img.channels() == 3) == is_color && !height && !width &&
-
// matchExt(filename, encoding) )
-
// return ReadFileToDatum(filename, label, datum);
-
// std::vector<uchar> buf;
-
// cv::imencode("."+encoding, cv_img, buf);
-
// datum->set_data(std::string(reinterpret_cast<char*>(&buf[0]),
-
// buf.size()));
-
// datum->set_label(label);
-
// datum->set_encoded(true);
-
// return true;
-
// }
-
CVMatToDatum(cv_img, datum);
-
// datum->set_label(label);
-
-
//###
-
for (
int i =
0; i < labels.size(); ++i)
-
datum->add_float_data(labels.at(i));
-
-
return
true;
-
}
else {
-
return
false;
-
}
-
}
4. src/caffe/layers/data_layer.cpp:
-
#ifdef USE_OPENCV
-
#include <opencv2/core/core.hpp>
-
#endif // USE_OPENCV
-
#include <stdint.h>
-
-
#include <vector>
-
-
#include "caffe/data_transformer.hpp"
-
#include "caffe/layers/data_layer.hpp"
-
#include "caffe/util/benchmark.hpp"
-
-
namespace caffe {
-
-
template <
typename Dtype>
-
DataLayer<Dtype>::DataLayer(
const LayerParameter& param)
-
: BasePrefetchingDataLayer<Dtype>(param),
-
offset_() {
-
db_.reset(db::GetDB(param.data_param().backend()));
-
db_->Open(param.data_param().source(), db::READ);
-
cursor_.reset(db_->NewCursor());
-
}
-
-
template <
typename Dtype>
-
DataLayer<Dtype>::~DataLayer() {
-
this->StopInternalThread();
-
}
-
-
template <
typename Dtype>
-
void DataLayer<Dtype>::DataLayerSetUp(
const
vector<Blob<Dtype>*>& bottom,
-
const
vector<Blob<Dtype>*>& top) {
-
const
int batch_size =
this->layer_param_.data_param().batch_size();
-
// Read a data point, and use it to initialize the top blob.
-
Datum datum;
-
datum.ParseFromString(cursor_->value());
-
-
// Use data_transformer to infer the expected blob shape from datum.
-
vector<
int> top_shape =
this->data_transformer_->InferBlobShape(datum);
-
this->transformed_data_.Reshape(top_shape);
-
// Reshape top[0] and prefetch_data according to the batch_size.
-
top_shape[
0] = batch_size;
-
top[
0]->Reshape(top_shape);
-
for (
int i =
0; i <
this->prefetch_.size(); ++i) {
-
this->prefetch_[i]->data_.Reshape(top_shape);
-
}
-
LOG_IF(INFO, Caffe::root_solver())
-
<<
"output data size: " << top[
0]->num() <<
","
-
<< top[
0]->channels() <<
"," << top[
0]->height() <<
","
-
<< top[
0]->width();
-
-
//###
-
// label
-
// if (this->output_labels_) {
-
// vector<int> label_shape(1, batch_size);
-
// top[1]->Reshape(label_shape);
-
// for (int i = 0; i < this->prefetch_.size(); ++i) {
-
// this->prefetch_[i]->label_.Reshape(label_shape);
-
// }
-
// }
-
-
//###
-
const
int label_num =
this->layer_param_.data_param().label_num();
-
if (
this->output_labels_) {
-
vector<
int> label_shape;
-
label_shape.push_back(batch_size);
-
label_shape.push_back(label_num);
-
label_shape.push_back(
1);
-
label_shape.push_back(
1);
-
top[
1]->Reshape(label_shape);
-
for (
int i =
0; i <
this->prefetch_.size(); ++i) {
-
this->prefetch_[i]->label_.Reshape(label_shape);
-
}
-
}
-
}
-
-
template <
typename Dtype>
-
bool DataLayer<Dtype>::Skip() {
-
int size = Caffe::solver_count();
-
int rank = Caffe::solver_rank();
-
bool keep = (offset_ % size) == rank ||
-
// In test mode, only rank 0 runs, so avoid skipping
-
this->layer_param_.phase() == TEST;
-
return !keep;
-
}
-
-
template<
typename Dtype>
-
void DataLayer<Dtype>::Next() {
-
cursor_->Next();
-
if (!cursor_->valid()) {
-
LOG_IF(INFO, Caffe::root_solver())
-
<<
"Restarting data prefetching from start.";
-
cursor_->SeekToFirst();
-
}
-
offset_++;
-
}
-
-
// This function is called on prefetch thread
-
template<
typename Dtype>
-
void DataLayer<Dtype>::load_batch(Batch<Dtype>* batch) {
-
CPUTimer batch_timer;
-
batch_timer.Start();
-
double read_time =
0;
-
double trans_time =
0;
-
CPUTimer timer;
-
CHECK(batch->data_.count());
-
CHECK(
this->transformed_data_.count());
-
const
int batch_size =
this->layer_param_.data_param().batch_size();
-
-
Datum datum;
-
for (
int item_id =
0; item_id < batch_size; ++item_id) {
-
timer.Start();
-
while (Skip()) {
-
Next();
-
}
-
datum.ParseFromString(cursor_->value());
-
read_time += timer.MicroSeconds();
-
-
if (item_id ==
0) {
-
// Reshape according to the first datum of each batch
-
// on single input batches allows for inputs of varying dimension.
-
// Use data_transformer to infer the expected blob shape from datum.
-
vector<
int> top_shape =
this->data_transformer_->InferBlobShape(datum);
-
this->transformed_data_.Reshape(top_shape);
-
// Reshape batch according to the batch_size.
-
top_shape[
0] = batch_size;
-
batch->data_.Reshape(top_shape);
-
}
-
-
// Apply data transformations (mirror, scale, crop...)
-
timer.Start();
-
int offset = batch->data_.offset(item_id);
-
Dtype* top_data = batch->data_.mutable_cpu_data();
-
this->transformed_data_.set_cpu_data(top_data + offset);
-
this->data_transformer_->Transform(datum, &(
this->transformed_data_));
-
-
//###
-
// Copy label.
-
// if (this->output_labels_) {
-
// Dtype* top_label = batch->label_.mutable_cpu_data();
-
// top_label[item_id] = datum.label();
-
// }
-
-
//###
-
const
int label_num =
this->layer_param_.data_param().label_num();
-
if (
this->output_labels_) {
-
Dtype* top_label = batch->label_.mutable_cpu_data();
-
for (
int i =
0; i < label_num; i++){
-
top_label[item_id * label_num + i] = datum.float_data(i);
//read float labels
-
}
-
}
-
-
trans_time += timer.MicroSeconds();
-
Next();
-
}
-
timer.Stop();
-
batch_timer.Stop();
-
DLOG(INFO) <<
"Prefetch batch: " << batch_timer.MilliSeconds() <<
" ms.";
-
DLOG(INFO) <<
" Read time: " << read_time /
1000 <<
" ms.";
-
DLOG(INFO) <<
"Transform time: " << trans_time /
1000 <<
" ms.";
-
}
-
-
INSTANTIATE_CLASS(DataLayer);
-
REGISTER_LAYER_CLASS(Data);
-
-
}
// namespace caffe
5. src/caffe/proto/caffe.proto(部分):
在DataParameter中添加label_num项。
-
message DataParameter {
-
enum DB {
-
LEVELDB =
0;
-
LMDB =
1;
-
}
-
// Specify the data source.
-
optional
string source =
1;
-
// Specify the batch size.
-
optional uint32 batch_size =
4;
-
// The rand_skip variable is for the data layer to skip a few data points
-
// to avoid all asynchronous sgd clients to start at the same point. The skip
-
// point would be set as rand_skip * rand(0,1). Note that rand_skip should not
-
// be larger than the number of keys in the database.
-
// DEPRECATED. Each solver accesses a different subset of the database.
-
optional uint32 rand_skip =
7 [
default =
0];
-
optional DB backend =
8 [
default = LEVELDB];
-
// DEPRECATED. See TransformationParameter. For data pre-processing, we can do
-
// simple scaling and subtracting the data mean, if provided. Note that the
-
// mean subtraction is always carried out before scaling.
-
optional
float scale =
2 [
default =
1];
-
optional
string mean_file =
3;
-
// DEPRECATED. See TransformationParameter. Specify if we would like to randomly
-
// crop an image.
-
optional uint32 crop_size =
5 [
default =
0];
-
// DEPRECATED. See TransformationParameter. Specify if we want to randomly mirror
-
// data.
-
optional
bool mirror =
6 [
default =
false];
-
// Force the encoded image to have 3 color channels
-
optional
bool force_encoded_color =
9 [
default =
false];
-
// Prefetch queue (Increase if data feeding bandwidth varies, within the
-
// limit of device memory for GPU training)
-
optional uint32 prefetch =
10 [
default =
4];
-
//### For multi-task training
-
optional uint32 label_num =
11 [
default =
1];
-
}
到这里为止,程序就修改完成了,重新编译caffe,会生成新的build/tools/convert_imageset_multi可执行文件,利用它就可以读取开头所说的那种多标签label文件。
关于调用方法,下一篇再说。