caffe layer层详解

最新推荐文章于 2021-10-09 23:36:27 发布

精灵小弟

最新推荐文章于 2021-10-09 23:36:27 发布

阅读量1.7w

点赞数 8

分类专栏： C++ 算法

本文链接：https://blog.csdn.net/swfa1/article/details/46814553

版权

C++ 同时被 2 个专栏收录

15 篇文章

订阅专栏

算法

6 篇文章

订阅专栏

1、基本的layer定义，参数

如何利用caffe定义一个网络，首先要了解caffe中的基本接口，下面分别对五类layer进行介绍

Vision Layers

可视化层来自于头文件 Header: ./include/caffe/vision_layers.hpp 一般输入和输出都是图像，这一层关注图像的2维的几何结构，并根据此结构对输入进行处理，特别是，大多数可视化层都通过对一些区域的操作，产生相关的区域进行输出，相反的是其他层忽视结合结构，只是把输入当作一个一维的大规模的向量进行处理。
Convolution：
Convolution

Layer type: Convolution
CPU implementation: ./src/caffe/layers/convolution_layer.cpp

CUDA GPU implementation: ./src/caffe/layers/convolution_layer.cu
Parameters (ConvolutionParameter convolution_param)
Required：

num_output (c_o): the number of filters
//卷积的个数
kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
//每个卷积的size
Strongly Recommended
weight_filler [default type: 'constant' value: 0]
Optional

bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
//偏移量
pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
//pad是对输入图像的扩充，边缘增加的大小
stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
//定义引用卷积的区间
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.
//限定输入的连通性，输入通道被分成g组，输出和输入的联通性是一致的，第i个输出通道仅仅和第i个输入通道联通。

每个filter产生一个featuremap.
输入的大小： $n*c_i(channel)*h_i(height)*w_i(weight)$
输出的大小：
$n * c_o * h_o * w_o, where h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1 and w_o\ likewise.$

Pooling：
池化层的作用是压缩特征的维度，把相邻的区域变成一个值。目前的类型包括：最大化，平均，随机
参数有：
kernel_size，filter的大小
pool:类型
pad:每个输入图像的增加的边界的大小
stride:filter之间的大小
输入大小：
$n * c * h_i * w_i$
输出大小：
$n * c * h_o * w_o$ , where h_o and w_o are computed in the same way as convolution.

Local Response Normalization (LRN)：
Layer type: LRN
CPU Implementation: ./src/caffe/layers/lrn_layer.cpp
CUDA GPU Implementation: ./src/caffe/layers/lrn_layer.cu
Parameters (LRNParameter lrn_param)
Optional
local_size [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)
alpha [default 1]: the scaling parameter (see below)
beta [default 5]: the exponent (see below)
norm_region [default ACROSS_CHANNELS]: whether to sum over adjacent channels (ACROSS_CHANNELS) or nearby spatial locaitons (WITHIN_CHANNEL)
The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions. In ACROSS_CHANNELS mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape local_size x 1 x 1). In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels (i.e., they have shape 1 x local_size x local_size). Each input value is divided by $(1+(\alpha/n)\sum_ix_i^2)^{\beta}$ , where $n$ is the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).

im2col
图像转化为列向量

Loss Layers

损失层是网络在学习过程的依据，一般最小化一个损失函数，通过FP和梯度
softmax:
本层计算输入的多元的Logistic 损失 $l(\theta)=-log(o_y)$ 其中 $o_y$ 是分类是y的概率.
注意与softmax-loss的区别softmax-loss其实就是把 $o_y$ 展开

l ˜ (y, z) = - l o g (e z y \sum m j = 1 e z j) = l o g (\sum j = 1 m e z j) - z y

$\widetilde{l}(y,z)=-log(\frac{e^{z_y}}{\sum_{j=1}^me^z_j})=log(\sum_{j=1}^me^{z_j})-z_y$ .其中

zy $z_y$ 是

zi=ωTix+bi $z_i=\omega_i^Tx+b_i$ 是第i个类别的线性预测结果。

平方和
类型： EuclideanLoss
欧式损失层计算的是两个输入向量之间的损失函数，

1 2 N \sum i = 1 N | | x 1 i - x 2 i | | 22 .

$\frac{1}{2N}\sum_{i=1}^N||x_i^1-x_i^2||_2^2.$
hinge:
类型：hingeloss
选项：L1，L2范数
输入：n*c*h*w的预测结果，n*1*1*1的label
输出：1*1*1*1的损失计算结果
样例：

# L1 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"
}

# L2 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  hinge_loss_param {
    norm: L2
  }
}

hinge loss层计算了一个一对多的,或者是平方的损失函数
Sigmoid Cross-Entropy：
类型：

 31 template <typename Dtype>
 32 void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
 33     const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 34   // The forward pass computes the sigmoid outputs.
 35   sigmoid_bottom_vec_[0] = bottom[0];
 36   sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
 37   // Compute the loss (negative log likelihood)
 38   const int count = bottom[0]->count();
 39   const int num = bottom[0]->num();
 40   // Stable version of loss computation from input data
 41   const Dtype* input_data = bottom[0]->cpu_data();
 42   const Dtype* target = bottom[1]->cpu_data();
 43   Dtype loss = 0;
 44   for (int i = 0; i < count; ++i) {
 45     loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
 46         log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
 47   }
 48   top[0]->mutable_cpu_data()[0] = loss / num;
 49 }
 50

Infogain：

 49 template <typename Dtype>
 50 void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
 51     const vector<Blob<Dtype>*>& top) {
 52   const Dtype* bottom_data = bottom[0]->cpu_data();
 53   const Dtype* bottom_label = bottom[1]->cpu_data();
 54   const Dtype* infogain_mat = NULL;
 55   if (bottom.size() < 3) {
 56     infogain_mat = infogain_.cpu_data();
 57   } else {
 58     infogain_mat = bottom[2]->cpu_data();
 59   }
 60   int num = bottom[0]->num();
 61   int dim = bottom[0]->count() / bottom[0]->num();
 62   Dtype loss = 0;
 63   for (int i = 0; i < num; ++i) {
 64     int label = static_cast<int>(bottom_label[i]);
 65     for (int j = 0; j < dim; ++j) {
 66       Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
 67       loss -= infogain_mat[label * dim + j] * log(prob);
 68     }
 69   }
 70   top[0]->mutable_cpu_data()[0] = loss / num;
 71 }
 72 
 73 template <typename Dtype>
 74 void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
 75     const vector<bool>& propagate_down,
 76     const vector<Blob<Dtype>*>& bottom) {
 77   if (propagate_down[1]) {
 78     LOG(FATAL) << this->type()
 79                << " Layer cannot backpropagate to label inputs.";
 80   }
 81   if (propagate_down.size() > 2 && propagate_down[2]) {
 82     LOG(FATAL) << this->type()
 83                << " Layer cannot backpropagate to infogain inputs.";
 84   }
 85   if (propagate_down[0]) {
 86     const Dtype* bottom_data = bottom[0]->cpu_data();
 87     const Dtype* bottom_label = bottom[1]->cpu_data();
 88     const Dtype* infogain_mat = NULL;
 89     if (bottom.size() < 3) {
 90       infogain_mat = infogain_.cpu_data();
 91     } else {
 92       infogain_mat = bottom[2]->cpu_data();
 93     }
 94     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
 95     int num = bottom[0]->num();
 96     int dim = bottom[0]->count() / bottom[0]->num();
 97     const Dtype scale = - top[0]->cpu_diff()[0] / num;
 98     for (int i = 0; i < num; ++i) {
 99       const int label = static_cast<int>(bottom_label[i]);
100       for (int j = 0; j < dim; ++j) {
101         Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
102         bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
103       }
104     }
105   }
106 }
107 
108 INSTANTIATE_CLASS(InfogainLossLayer);
109 REGISTER_LAYER_CLASS(InfogainLoss);
110 }  // namespace caffe

Accuracy and Top-k：

这个是对输出的结果与实际目标之间的准确率，实际上不是一个bp过程

Activation / Neuron Layers

一般激活/神经层是元操作，输入一个底层的数据blob，输出一个同样大小的顶层的blob,下面的层中，我们将忽略输入输出的大小，由于他们是同样的大小的。
Input： $n*c*h*w$
Output: $n*c*h*w$

ReLU/Rectified inner and leaky-ReLU:
Parameters (ReLUParameter relu_param)
Optional
negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

ReLU函数如下定义，设输入值为X

f(x)={xnegative_slope∗xif x>0,otherwise.

$\begin{equation}f(x)=\left\{ \begin{aligned} x & &if\ x>0,\\ negative\_slope*x & &otherwise.\\ \end{aligned} \right.\end{equation}$
其中

negative_slope $negative\_slope$ 不是设定的，与

max(0,x) $max(0,x)$ 相等，详情见我的另外一个小博客
http://blog.csdn.net/swfa1/article/details/45601789

sigmoid层
层的类型:sigmoid
样例：

layer {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: "Sigmoid"
}

公式：

f (x) = s i g m o i d (x)

$f(x)=sigmoid(x)$
TanH / Hyperbolic Tangent:
类型：TanH
样例：

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "TanH"
}

f (x) = t a n h (x)

$f(x)=tanh(x)$
绝对值：
类型：AbsVal

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "AbsVal"
}

公式：

f (x) = a b s (x)

$f(x)=abs(x)$
幂函数：
类型：Power
参数：
power [default 1]
scale [default 1]
shift [default 0]
样例：

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "Power"
  power_param {
    power: 1
    scale: 1
    shift: 0
  }
}

公式：

f (x) = (s h i f t + s c a l e * x) p o w e r

$f(x)=(shift + scale * x) ^ {power}$

BNLL:
type：BNLL

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: BNLL
}

公式：
The BNLL (binomial normal log likelihood) layer computes the output as

l o g (1 + e x p (x))

$log(1 + exp(x))$

Data Layers

Common Layers

InnerProduct
类型：InnerProduct
参数：
必须的：
num_output (c_o): the number of filters
强烈建议的：weight_filler [default type: ‘constant’ value: 0]
可选的：
bias_filler [default type: ‘constant’ value: 0]
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
样例：

layer {
  name: "fc8"
  type: "InnerProduct"
  # learning rate and decay multipliers for the weights
  param { lr_mult: 1 decay_mult: 1 }
  # learning rate and decay multipliers for the biases
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "fc7"
  top: "fc8"
}

作用：
内积层又叫全连接层，输入当做一个以为想想，产生的输出也是以向量的形式输出，相当于blob的height 和width是1.

经过一段时间的学习之后，我发现上面的一些网络写的不是很详细，下面详细解释一下其中的
slice，ArgMaxLayer以及elementwise
slice layer
对输入进行分块处理，处理之后再进行剩下的计算，
ArgMaxLayer
Compute the index of the $K$ max values for each datum across all dimensions $(C \times H \times W)$ .

Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to true, output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximise.
NOTE: does not implement Backwards operation.
elementwise
Compute elementwise operations, such as product and sum, along multiple input Blobs.

2、Alex 网络定义

3、如何增加一个新层

Add a class declaration for your layer to the appropriate one of common_layers.hpp,data_layers.hpp, loss_layers.hpp, neuron_layers.hpp, or vision_layers.hpp. Include an inline implementation of type and the *Blobs() methods to specify blob number requirements. Omit the*_gpu declarations if you’ll only be implementing CPU code.

Implement your layer in layers/your_layer.cpp.

SetUp for initialization: reading parameters, allocating buffers, etc.

Forward_cpu for the function your layer computes

Backward_cpu for its gradient

(Optional) Implement the GPU versions Forward_gpu and Backward_gpu in layers/your_layer.cu.

Add your layer to proto/caffe.proto, updating the next available ID. Also declare parameters, if needed, in this file.

Make your layer createable by adding it to layer_factory.cpp.

Write tests in test/test_your_layer.cpp. Use test/test_gradient_check_util.hpp to check that your Forward and Backward implementations are in numerical agreement.

以上是github上某大神的解答，步骤很清晰，具体说一下，比如现在要添加一个vision layer，名字叫Aaa_Layer：

1、属于哪个类型的layer，就打开哪个hpp文件，这里就打开vision_layers.hpp，然后自己添加该layer的定义，或者直接复制Convolution_Layer的相关代码来修改类名和构造函数名都改为Aaa_Layer，如果不用GPU，将*_gpu的声明都去掉。

2、实现自己的layer，编写Aaa_Layer.cpp，加入到src/caffe/layers，主要实现Setup、Forward_cpu、Backward_cpu。

3、如果需要GPU实现，那么在Aaa_Layer.cu中实现Forward_gpu和Backward_gpu。

4、修改src/caffe/proto/caffe.proto，好到LayerType，添加Aaa，并更新ID，如果Layer有参数，添加AaaParameter类。

5、在src/caffe/layer_factory.cpp中添加响应代码。

6、在src/caffe/test中写一个test_Aaa_layer.cpp，用include/caffe/test/test_gradient_check_util.hpp来检查前向后向传播是否正确。