CAFFE源码学习笔记 - data_layer

最新推荐文章于 2020-08-23 20:09:10 发布

hqtgyj

最新推荐文章于 2020-08-23 20:09:10 发布

阅读量147

点赞数

分类专栏： caffe 文章标签： caffe

caffe 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一、前言

CAFFE在搭建CNN网络的时候，第一层就是数据层，所以本节梳理一下同样很庞大的DataLayer层。
先给一个网络结构：
这里写图片描述

Layer类：层的基类;

BaseDataLayer类：数据层的基类;

BasePrefetchingDataLayer类：预取层，主要是预先读取若干批次的数据，平衡CPU与GPU带宽和GPU计算速度。从继承关系可以看出，该层是多线程系统主要发挥作用的地方。
其实多线程系统在caffe中主要就是为GPU服务，准备数据的。

BasePrefetchingDataLayer类要做的就是数据的加工了。这一部分主要完成两件事：

1、确定数据层最终的输出（可以不输出label的）
2、完成数据层预处理（通常要做一些白化数据的简单工作，比如减均值，乘系数）

 
 1
2

DataLayer类：数据层，网络结构的第一层。Caffe的DataLayer的主要目标是读入两种DB的训练数据作为输入，而两种DB内存储的格式默认是一种叫Datum的数据结构。该层就是将Datum读取到blob中。

message Datum {
  optional int32 channels = 1;
  optional int32 height = 2;
  optional int32 width = 3;
  // the actual image data, in bytes
  optional bytes data = 4;
  optional int32 label = 5;
  // Optionally, the datum could also hold float data.
  repeated float float_data = 6;
  // If true data contains an encoded image that need to be decoded
  optional bool encoded = 7 [default = false];
}
 
 1
2
3
4
5
6
7
8
9
10
11
12

其余层都是数据的存储层，主要存储的格式有：
1、HDF5格式;包括将数据从硬盘读出，将数据写入硬盘;
2、ImageDataLayer：图像文件直接读取
3、MemoryDatalayer：从内存中读取，直观感觉是速度快;
4、WindowDataLayer：从图像数据的窗口，一般是opencv相关的吧。
5、DummyDataLayer：通过Filler产生的数据。

二、base_data_layer文件

在base_data_layer.hpp和base_data_layer.cpp文件中，分别定义了三个类：BaseDataLayer，Batch，BasePrefetchingDataLayer。
1、Batch类
Batch实际就是数据和标签，其数据类型就是Blob。

template <typename Dtype>
class Batch {
 public:
  Blob<Dtype> data_, label_;
};

 
 1
2
3
4
5
6

2、BaseDataLayer类
该类是datalayer的基类，其中由该类自己实现的成员函数只有两个：
a、构造函数
由于其继承了Layer类，所以首先构造基类Layer;
然后用transform_param()初始化其成员变量，为转换数据的维度或者预处理做准备。

template <typename Dtype>
BaseDataLayer<Dtype>::BaseDataLayer(const LayerParameter& param)
    : Layer<Dtype>(param),
      transform_param_(param.transform_param()) {
}
 
 1
2
3
4
5

b、LayerSetUp函数
数据层的初始化，初始化时根据top的大小来确定，如果大小为1，表明只需要输出数据即可，不输出类标志。

template <typename Dtype>
void BaseDataLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {
  if (top.size() == 1) {
    output_labels_ = false;
  } else {
    output_labels_ = true;
  }
  data_transformer_.reset(
      new DataTransformer<Dtype>(transform_param_, this->phase_));//初始化DataTransformer实例，进行数据的预处理
  data_transformer_->InitRand();
  // The subclasses should setup the size of bottom and top
  DataLayerSetUp(bottom, top);//实际的Layer初始化是调用DataLayerSetUp函数，对特殊的层进行初始化的。该函数是纯虚函数，继承类必须自己实现。
}
 
 1
2
3
4
5
6
7
8
9
10
11
12
13

3、BasePrefetchingDataLayer 类
该类继承自InternalThread和BaseDataLayer类，所以预取操作是采用多线程的系统。

a、功能
因为GPU计算速度和带宽跟CPU都有较大的差距，所以需要在GPU在计算的时候预先取出若干批次的数据。而该类就是实现这个功能的。
b、成员变量
可以看出，在batch级别的预取操作中，使用了双阻塞队列。


vector<shared_ptr<Batch<Dtype> > > prefetch_：预先读取的若干批次数据的容器;

BlockingQueue<Batch<Dtype>*> prefetch_free_;生产者队列

BlockingQueue<Batch<Dtype>*> prefetch_full_;消费者队列

Batch<Dtype>* prefetch_current_;指向当前批次的数据的指针

Blob<Dtype> transformed_data_;需要注意的是之前的成员变量都是batch级别的，而该变量则是Blob型数据。
 
 1
2
3
4
5
6
7
8
9
10

template <typename Dtype>
class BasePrefetchingDataLayer :
    public BaseDataLayer<Dtype>, public InternalThread {
 public:
  explicit BasePrefetchingDataLayer(const LayerParameter& param);
  // LayerSetUp: implements common data layer setup functionality, and calls
  // DataLayerSetUp to do special data layer setup for individual layer types.
  // This method may not be overridden.
  void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);

 protected:
  virtual void InternalThreadEntry();//开启预取线程
  virtual void load_batch(Batch<Dtype>* batch) = 0;//纯虚函数，主要是需要data_layer自己实现

  vector<shared_ptr<Batch<Dtype> > > prefetch_;//预取的若干batch
  BlockingQueue<Batch<Dtype>*> prefetch_free_;
  BlockingQueue<Batch<Dtype>*> prefetch_full_;
  Batch<Dtype>* prefetch_current_;//指向当前批次的指针

  Blob<Dtype> transformed_data_;//被修正过的数据
};

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

c、构造函数

template <typename Dtype>
BasePrefetchingDataLayer<Dtype>::BasePrefetchingDataLayer(
    const LayerParameter& param)
    : BaseDataLayer<Dtype>(param),
      prefetch_(param.data_param().prefetch()),
      prefetch_free_(), prefetch_full_(), prefetch_current_() {//默认初始化阻塞队列
  for (int i = 0; i < prefetch_.size(); ++i) {
    prefetch_[i].reset(new Batch<Dtype>());//根据预取的size初始化prefetch_
    prefetch_free_.push(prefetch_[i].get());//根据prefetch_初始化生产者的阻塞队列
  }
}

 
 1
2
3
4
5
6
7
8
9
10
11
12

d、LayerSetUp函数
初始化相关数据结构之后，开启预取线程。

template <typename Dtype>
void BasePrefetchingDataLayer<Dtype>::LayerSetUp(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  BaseDataLayer<Dtype>::LayerSetUp(bottom, top);//调用父类的setup函数


  for (int i = 0; i < prefetch_.size(); ++i) {
    prefetch_[i]->data_.mutable_cpu_data();
    if (this->output_labels_) {
      prefetch_[i]->label_.mutable_cpu_data();
    }//作者解释：在开启预取线程之前，必须主动调用mutable_cpu_data或者mutable_gpu_data，防止线程同时调用两个函数。这是因为在某些GPU上不这么做会发生错误。
  }
#ifndef CPU_ONLY
  if (Caffe::mode() == Caffe::GPU) {
    for (int i = 0; i < prefetch_.size(); ++i) {
      prefetch_[i]->data_.mutable_gpu_data();
      if (this->output_labels_) {
        prefetch_[i]->label_.mutable_gpu_data();
      }
    }
  }
#endif
  DLOG(INFO) << "Initializing prefetch";
  this->data_transformer_->InitRand();//初始化随机数种子
  StartInternalThread();//开启预取线程，线程启动的工作是搬运全局资源，初始化boost::thread等。
  DLOG(INFO) << "Prefetch initialized.";
}

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

e、InternalThreadEntry()
在之前的internel thread模块提到，InternalThreadEntry()函数没有实现，是在继承类中由继承者实现的。

template <typename Dtype>
void BasePrefetchingDataLayer<Dtype>::InternalThreadEntry() {
#ifndef CPU_ONLY
  cudaStream_t stream;//创建流
  if (Caffe::mode() == Caffe::GPU) {
    CUDA_CHECK(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));
  }
#endif

  try {
    while (!must_stop()) {
      Batch<Dtype>* batch = prefetch_free_.pop();//从生产者队列中pop出一个batch的数据
      load_batch(batch);//load_batch是纯虚函数，任何继承该类的继承类都必须自己实现。
#ifndef CPU_ONLY
      if (Caffe::mode() == Caffe::GPU) {
        batch->data_.data().get()->async_gpu_push(stream);//如果是GPU模式，则是使用异步流同步向GPU推送数据，该函数在syncmem中就已经总结了。
        if (this->output_labels_) {
          batch->label_.data().get()->async_gpu_push(stream);
        }
        CUDA_CHECK(cudaStreamSynchronize(stream));
      }
#endif
      prefetch_full_.push(batch);//将batch装载进消费者队列中
    }
  } catch (boost::thread_interrupted&) {
    // Interrupted exception is expected on shutdown
  }
#ifndef CPU_ONLY
  if (Caffe::mode() == Caffe::GPU) {
    CUDA_CHECK(cudaStreamDestroy(stream));//销毁流
  }
#endif
}

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

f、datalayer中的foward_cpu()

template <typename Dtype>
void BasePrefetchingDataLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  if (prefetch_current_) {
    prefetch_free_.push(prefetch_current_);
  }
  prefetch_current_ = prefetch_full_.pop("Waiting for data");//从消费者队列中弹出一个batch，这中间会有条件变量进行多线程下的资源同步
  // 根据batch的形状修改top的形状
  top[0]->ReshapeLike(prefetch_current_->data_);
  top[0]->set_cpu_data(prefetch_current_->data_.mutable_cpu_data());//初始化top
  if (this->output_labels_) {
    // Reshape to loaded labels.
    top[1]->ReshapeLike(prefetch_current_->label_);
    top[1]->set_cpu_data(prefetch_current_->label_.mutable_cpu_data());
  }
}

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

三、data_layer
该层继承自预取层，
头文件如下：

template <typename Dtype>
class DataLayer : public BasePrefetchingDataLayer<Dtype> {
 public:
  explicit DataLayer(const LayerParameter& param);
  virtual ~DataLayer();
  virtual void DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top);
  // DataLayer uses DataReader instead for sharing for parallelism？
  //我看的这个版本中没有DataReader
  virtual inline bool ShareInParallel() const { return false; }//是否并行训练时共享数据
  virtual inline const char* type() const { return "Data"; }
  virtual inline int ExactNumBottomBlobs() const { return 0; }
  virtual inline int MinTopBlobs() const { return 1; }
  virtual inline int MaxTopBlobs() const { return 2; }

 protected:
  void Next();//游标移动
  bool Skip();//跳过某些数据
  virtual void load_batch(Batch<Dtype>* batch);//将图像数据从数据库中读取到batch中

//下面三个变量在之前的版本是用DataReader类表示的。现在看样子是没有了。
  shared_ptr<db::DB> db_;//数据库格式数据
  shared_ptr<db::Cursor> cursor_;//游标，配合数据库取数
  uint64_t offset_;//偏移量，在blob中offset可以算出当前图像在batch中的位置
};

 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

具体实现：

template <typename Dtype>
DataLayer<Dtype>::DataLayer(const LayerParameter& param)
  : BasePrefetchingDataLayer<Dtype>(param),
    offset_() {//构造函数中的基类开启线程
  db_.reset(db::GetDB(param.data_param().backend()));//protobuf参数初始化数据库类型
  db_->Open(param.data_param().source(), db::READ);//打开数据库文件
  cursor_.reset(db_->NewCursor());//初始化游标
}

template <typename Dtype>
DataLayer<Dtype>::~DataLayer() {
  this->StopInternalThread();//析构函数是结束线程
}

template <typename Dtype>
void DataLayer<Dtype>::DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const int batch_size = this->layer_param_.data_param().batch_size();//每批大小

  Datum datum;//表示一个图像数据
  datum.ParseFromString(cursor_->value());//从数据库中的map中根据游标读取图像文件


  vector<int> top_shape = this->data_transformer_->InferBlobShape(datum);//根据datum的形状推测top的形状
  this->transformed_data_.Reshape(top_shape);根据推测出的形状重塑
  // Reshape top[0] and prefetch_data according to the batch_size.  
  // 既然获取了数据的形状(channel,height,width)，那么这里再设置一下batch_size  
  // top_shape[0]=batch_size  
  // top_shape[1]=channel  
  // top_shape[2]=height  
  // top_shape[3]=width 
  top_shape[0] = batch_size;
  top[0]->Reshape(top_shape);
  for (int i = 0; i < this->prefetch_.size(); ++i) {
    this->prefetch_[i]->data_.Reshape(top_shape);
  }//设置预取数据的形状
  LOG_IF(INFO, Caffe::root_solver())
      << "output data size: " << top[0]->num() << ","
      << top[0]->channels() << "," << top[0]->height() << ","
      << top[0]->width();
  // label
  if (this->output_labels_) {
    vector<int> label_shape(1, batch_size);
    top[1]->Reshape(label_shape);
    for (int i = 0; i < this->prefetch_.size(); ++i) {
      this->prefetch_[i]->label_.Reshape(label_shape);
    }
  }
}

template <typename Dtype>
bool DataLayer<Dtype>::Skip() {
  int size = Caffe::solver_count();//并行训练的个数
  int rank = Caffe::solver_rank();//并行训练的序号
  bool keep = (offset_ % size) == rank ||
              // In test mode, only rank 0 runs, so avoid skipping
              this->layer_param_.phase() == TEST;
  return !keep;//跳过了哪些数据？
}

template<typename Dtype>
void DataLayer<Dtype>::Next() {
  cursor_->Next();
  if (!cursor_->valid()) {
    LOG_IF(INFO, Caffe::root_solver())
        << "Restarting data prefetching from start.";
    cursor_->SeekToFirst();//说明游标到了末尾
  }
  offset_++;//游标偏移量的移动
}

// This function is called on prefetch thread
template<typename Dtype>
void DataLayer<Dtype>::load_batch(Batch<Dtype>* batch) {//将数据库中的数据载入到batch中。
  CPUTimer batch_timer;
  batch_timer.Start();
  double read_time = 0;
  double trans_time = 0;
  CPUTimer timer;
  CHECK(batch->data_.count());
  CHECK(this->transformed_data_.count());
  const int batch_size = this->layer_param_.data_param().batch_size();

  Datum datum;//单个图像数据
  for (int item_id = 0; item_id < batch_size; ++item_id) {
    timer.Start();
    while (Skip()) {
      Next();
    }
    datum.ParseFromString(cursor_->value());//从数据库中获取的图像数据
    read_time += timer.MicroSeconds();

    if (item_id == 0) {
      //根据每个batch的第一个数据来推测形状
      //一个Blob的shape，[batch_size,channels,height,width]，后三个shape都可以由Datum推断出来。
      vector<int> top_shape = this->data_transformer_->InferBlobShape(datum);
      this->transformed_data_.Reshape(top_shape);
      top_shape[0] = batch_size;
      batch->data_.Reshape(top_shape);
    }//Transformer提供了一个由Datum堆砌成Blob的途径

    timer.Start();
    int offset = batch->data_.offset(item_id);//根据该批次内的编号设置偏移量
    //每个Datum在Blob的偏移位置必须计算出来，只要偏移offset=Blob.offset(i)即可，i 为一个Batch内的样本数据下标

//Blob具体的shape必须提前计算出来，而且必须启动SyncedMemory自动机，分配实际内存
    Dtype* top_data = batch->data_.mutable_cpu_data();
    this->transformed_data_.set_cpu_data(top_data + offset);
    this->data_transformer_->Transform(datum, &(this->transformed_data_));

    if (this->output_labels_) {
      Dtype* top_label = batch->label_.mutable_cpu_data();
      top_label[item_id] = datum.label();
    }
    trans_time += timer.MicroSeconds();
    Next();
  }
  timer.Stop();
  batch_timer.Stop();
  DLOG(INFO) << "Prefetch batch: " << batch_timer.MilliSeconds() << " ms.";
  DLOG(INFO) << "     Read time: " << read_time / 1000 << " ms.";
  DLOG(INFO) << "Transform time: " << trans_time / 1000 << " ms.";
}
 
 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

别忘了实例化该类，以及注册层

INSTANTIATE_CLASS(DataLayer);  
REGISTER_LAYER_CLASS(Data);  

 
 1
2
3

四、总结
在大多数解释该模块的文章中都有datareader这个模块，整个数据层就可以描述成一个两级缓冲的系统。
第一级为从数据库中读取当个的图像文件，按照batch_size
存储在一个batch中。
第二级则是以batch为单位，使用双阻塞队列将若干batch存入prefetch_容器中。
如图可以说明问题：
这里写图片描述
但是我发现现在的版本中没有了DataReader类，而是直接从数据库中读取文件了。不过大致的流程没有改变。

第一级从数据库中将Datum文件按照Blob的格式存放到batch中，根据Blob中总结的偏移量计算得到坐标就可以对号入座了。

hqtgyj

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CAFFE源码学习笔记 - data_layer

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/sinat_22336563/article/details/69524736 一、前言CAFFE在搭建CNN网络的时候，第一层就是数据层，所以本节梳理一下同样很庞大...
复制链接

扫一扫

专栏目录