Layer
Layer是所有层的基类,在Layer的基础上衍生出来的有5种Layers:
- data_layer
- neuron_layer
- loss_layer
- common_layer
- vision_layer
它们都有对应的[.hpp .cpp]文件声明和实现了各个类的接口。下面一个一个地讲这5个Layer。
data_layer
先看data_layer.hpp中头文件调用情况:
1 2 3 4 5 6 7 8 9 10 11 12 | #include "boost/scoped_ptr.hpp" #include "hdf5.h" #include "leveldb/db.h" #include "lmdb.h" //前4个都是数据格式有关的文件 #include "caffe/blob.hpp" #include "caffe/common.hpp" #include "caffe/data_transformer.hpp" #include "caffe/filler.hpp" #include "caffe/internal_thread.hpp" #include "caffe/layer.hpp" #include "caffe/proto/caffe.pb.h" |
不难看出data_layer主要包含与数据有关的文件。在官方文档中指出data是caffe数据的入口是网络的最低层,并且支持多种格式,在这之中又有5种LayerType:
DATA
MEMORY_DATA
HDF5_DATA
HDF5_OUTPUT
IMAGE_DATA
其实还有两种WINDOW_DATA
, DUMMY_DATA
用于测试和预留的接口,这里暂时不管。
DATA
1 2 3 4 5 6 | template <typename Dtype> class BaseDataLayer : public Layer<Dtype> template <typename Dtype> class BasePrefetchingDataLayer : public BaseDataLayer<Dtype>, public InternalThread template <typename Dtype> class DataLayer : public BasePrefetchingDataLayer<Dtype> |
用于LevelDB或LMDB数据格式的输入的类型,输入参数有source
, batch_size
, (rand_skip
), (backend
)。后两个是可选。
MEMORY_DATA
1 2 | template <typename Dtype> class MemoryDataLayer : public BaseDataLayer<Dtype> |
这种类型可以直接从内存读取数据使用时需要调用MemoryDataLayer::Reset
,输入参数有batch_size
,channels
, height
, width
。
HDF5_DATA
1 2 | template <typename Dtype> class HDF5DataLayer : public Layer<Dtype> |
HDF5数据格式输入的类型,输入参数有source
, batch_size
。
HDF5_OUTPUT
1 2 | template <typename Dtype> class HDF5OutputLayer : public Layer<Dtype> |
HDF5数据格式输出的类型,输入参数有file_name
。
IMAGE_DATA
1 2 | template <typename Dtype> class ImageDataLayer : public BasePrefetchingDataLayer<Dtype> |
图像格式数据输入的类型,输入参数有source
, batch_size
, (rand_skip
), (shuffle
), (new_height
), (new_width
)。
neuron_layer
先看neuron_layer.hpp中头文件调用情况
1 2 3 4 | #include "caffe/blob.hpp" #include "caffe/common.hpp" #include "caffe/layer.hpp" #include "caffe/proto/caffe.pb.h" |
同样是数据的操作层,neuron_layer实现里大量激活函数,主要是元素级别的操作,具有相同的bottom
,top
size。
Caffe中实现了大量激活函数GPU和CPU的都有很多。它们的父类都是NeuronLayer
1 2 | template <typename Dtype> class NeuronLayer : public Layer<Dtype> |
这部分目前没什么需要深究的地方值得注意的是一般的参数设置格式如下(以ReLU为例):
1 2 3 4 5 6 | layers { name: "relu1" type: RELU  bottom: "conv1"  top: "conv1" } |
loss_layer
Loss层计算网络误差,loss_layer.hpp头文件调用情况:
1 2 3 4 5 | #include "caffe/blob.hpp" #include "caffe/common.hpp" #include "caffe/layer.hpp" #include "caffe/neuron_layers.hpp" #include "caffe/proto/caffe.pb.h" |
可以看见调用了neuron_layers.hpp
,估计是需要调用里面的函数计算Loss,一般来说Loss放在最后一层。caffe实现了大量loss function,它们的父类都是LossLayer
。
1 2 | template <typename Dtype> class LossLayer : public Layer<Dtype> |
common_layer
先看common_layer.hpp头文件调用:
1 2 3 4 5 6 7 | #include "caffe/blob.hpp" #include "caffe/common.hpp" #include "caffe/data_layers.hpp" #include "caffe/layer.hpp" #include "caffe/loss_layers.hpp" #include "caffe/neuron_layers.hpp" #include "caffe/proto/caffe.pb.h" |
用到了前面提到的data_layers.hpp
, loss_layers.hpp
, neuron_layers.hpp
说明这一层肯定开始有复杂的操作了。
这一层主要进行的是vision_layer
的连接
声明了9个类型的common_layer,部分有GPU实现:
InnerProductLayer
SplitLayer
FlattenLayer
ConcatLayer
SilenceLayer
- (Elementwise Operations) 这里面是我们常说的激活函数层Activation Layers。
EltwiseLayer
SoftmaxLayer
ArgMaxLayer
MVNLayer
InnerProductLayer
常常用来作为全连接层,设置格式为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | layers { name: "fc8" type: INNER_PRODUCT blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay mu weight_decay: 0 # weight decay multiplier for the biases inner_product_param { num_output: 1000 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } bottom: "fc7" top: "fc8 } |
SplitLayer
用于一输入对多输出的场合(对blob)
FlattenLayer
将n * c * h * w变成向量的格式n * ( c * h * w ) * 1 * 1
ConcatLayer
用于多输入一输出的场合。
1 2 3 4 5 6 7 8 9 10 | layers { name: "concat" bottom: "in1" bottom: "in2" top: "out" type: CONCAT concat_param { concat_dim: 1 } } |
SilenceLayer
用于一输入对多输出的场合(对layer)
(Elementwise Operations)
EltwiseLayer
, SoftmaxLayer
, ArgMaxLayer
,MVNLayer
vision_layer
头文件包含前面所有文件,也就是说包含了最复杂的操作。
1 2 3 4 5 6 7 8 | #include "caffe/blob.hpp" #include "caffe/common.hpp" #include "caffe/common_layers.hpp" #include "caffe/data_layers.hpp" #include "caffe/layer.hpp" #include "caffe/loss_layers.hpp" #include "caffe/neuron_layers.hpp" #include "caffe/proto/caffe.pb.h" |
它主要是实现Convolution和Pooling操作。主要有以下几个类。
1 2 3 4 5 6 7 8 | template <typename Dtype> class ConvolutionLayer : public Layer<Dtype> template <typename Dtype> class Im2colLayer : public Layer<Dtype> template <typename Dtype> class LRNLayer : public Layer<Dtype> template <typename Dtype> class PoolingLayer : public Layer<Dtype> |
ConvolutionLayer
最常用的卷积操作,设置格式如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | layers { name: "conv1" type: CONVOLUTION bottom: "data" top: "conv1" blobs_lr: 1 # learning rate multiplier for the filters blobs_lr: 2 # learning rate multiplier for the biases weight_decay: 1 # weight decay multiplier for the filters weight_decay: 0 # weight decay multiplier for the biases convolution_param { num_output: 96 # learn 96 filters kernel_size: 11 # each filter is 11x11 stride: 4 # step 4 pixels between each filter application weight_filler { type: "gaussian" # initialize the filters from a Gaussian std: 0.01 # distribution with stdev 0.01 (default mean: 0) } bias_filler { type: "constant" # initialize the biases to zero (0) value: 0 } } } |
Im2colLayer
与MATLAB里面的im2col类似,即image-to-column transformation,转换后方便卷积计算
LRNLayer
全称local response normalization layer,在Hinton论文中有详细介绍ImageNet Classification with Deep Convolutional Neural Networks。
PoolingLayer
即Pooling操作,格式:
1 2 3 4 5 6 7 8 9 10 11 | layers { name: "pool1" type: POOLING bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 # pool over a 3x3 region stride: 2 # step two pixels (in the bottom blob) between pooling regions } } |