1、基本的layer定义,参数
1、基本的layer定义,参数
如何利用caffe定义一个网络,首先要了解caffe中的基本接口,下面分别对五类layer进行介绍
Vision Layers
可视化层来自于头文件 Header: ./include/caffe/vision_layers.hpp 一般输入和输出都是图像,这一层关注图像的2维的几何结构,并根据此结构对输入进行处理,特别是,大多数可视化层都通过对一些区域的操作,产生相关的区域进行输出,相反的是其他层忽视结合结构,只是把输入当作一个一维的大规模的向量进行处理。
Convolution:
Convolution
Layer type: Convolution
CPU implementation: ./src/caffe/layers/convolution_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/convolution_layer.cu
Parameters (ConvolutionParameter convolution_param)
Required:
num_output (c_o): the number of filters
//卷积的个数
kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
//每个卷积的size
Strongly Recommended
weight_filler [default type: 'constant' value: 0]
Optional
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
//偏移量
pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
//pad是对输入图像的扩充,边缘增加的大小
stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
//定义引用卷积的区间
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.
//限定输入的连通性,输入通道被分成g组,输出和输入的联通性是一致的,第i个输出通道仅仅和第i个输入通道联通。
每个filter产生一个featuremap.
输入的大小:
n∗ci(channel)∗hi(height)∗wi(weight)
输出的大小:
n∗co∗ho∗wo,whereho=(hi+2∗padh−kernelh)/strideh+1andwo likewise.
Pooling:
池化层的作用是压缩特征的维度,把相邻的区域变成一个值。目前的类型包括:最大化,平均,随机
参数有:
kernel_size,filter的大小
pool:类型
pad:每个输入图像的增加的边界的大小
stride:filter之间的大小
输入大小:
n∗c∗hi∗wi
输出大小:
n∗c∗ho∗wo
, where h_o and w_o are computed in the same way as convolution.
Local Response Normalization (LRN):
Layer type: LRN
CPU Implementation: ./src/caffe/layers/lrn_layer.cpp
CUDA GPU Implementation: ./src/caffe/layers/lrn_layer.cu
Parameters (LRNParameter lrn_param)
Optional
local_size [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)
alpha [default 1]: the scaling parameter (see below)
beta [default 5]: the exponent (see below)
norm_region [default ACROSS_CHANNELS]: whether to sum over adjacent channels (ACROSS_CHANNELS) or nearby spatial locaitons (WITHIN_CHANNEL)
The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions. In ACROSS_CHANNELS mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape local_size x 1 x 1). In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels (i.e., they have shape 1 x local_size x local_size). Each input value is divided by
(1+(α/n)∑ix2i)β
, where
n
is the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).
im2col
图像转化为列向量
Loss Layers
损失层是网络在学习过程的依据,一般最小化一个损失函数,通过FP和梯度
softmax:
本层计算输入的多元的Logistic 损失
注意与softmax-loss的区别softmax-loss其实就是把
oy
展开
平方和
类型: EuclideanLoss
欧式损失层计算的是两个输入向量之间的损失函数,
hinge:
类型:hingeloss
选项:L1,L2范数
输入:n*c*h*w的预测结果,n*1*1*1的label
输出:1*1*1*1的损失计算结果
样例:
# L1 Norm
layer {
name: "loss"
type: "HingeLoss"
bottom: "pred"
bottom: "label"
}
# L2 Norm
layer {
name: "loss"
type: "HingeLoss"
bottom: "pred"
bottom: "label"
top: "loss"
hinge_loss_param {
norm: L2
}
}
hinge loss层计算了一个一对多的,或者是平方的损失函数
Sigmoid Cross-Entropy:
类型:
31 template <typename Dtype>
32 void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
33 const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
34 // The forward pass computes the sigmoid outputs.
35 sigmoid_bottom_vec_[0] = bottom[0];
36 sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
37 // Compute the loss (negative log likelihood)
38 const int count = bottom[0]->count();
39 const int num = bottom[0]->num();
40 // Stable version of loss computation from input data
41 const Dtype* input_data = bottom[0]->cpu_data();
42 const Dtype* target = bottom[1]->cpu_data();
43 Dtype loss = 0;
44 for (int i = 0; i < count; ++i) {
45 loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
46 log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
47 }
48 top[0]->mutable_cpu_data()[0] = loss / num;
49 }
50
Infogain:
49 template <typename Dtype>
50 void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
51 const vector<Blob<Dtype>*>& top) {
52 const Dtype* bottom_data = bottom[0]->cpu_data();
53 const Dtype* bottom_label = bottom[1]->cpu_data();
54 const Dtype* infogain_mat = NULL;
55 if (bottom.size() < 3) {
56 infogain_mat = infogain_.cpu_data();
57 } else {
58 infogain_mat = bottom[2]->cpu_data();
59 }
60 int num = bottom[0]->num();
61 int dim = bottom[0]->count() / bottom[0]->num();
62 Dtype loss = 0;
63 for (int i = 0; i < num; ++i) {
64 int label = static_cast<int>(bottom_label[i]);
65 for (int j = 0; j < dim; ++j) {
66 Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
67 loss -= infogain_mat[label * dim + j] * log(prob);
68 }
69 }
70 top[0]->mutable_cpu_data()[0] = loss / num;
71 }
72
73 template <typename Dtype>
74 void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
75 const vector<bool>& propagate_down,
76 const vector<Blob<Dtype>*>& bottom) {
77 if (propagate_down[1]) {
78 LOG(FATAL) << this->type()
79 << " Layer cannot backpropagate to label inputs.";
80 }
81 if (propagate_down.size() > 2 && propagate_down[2]) {
82 LOG(FATAL) << this->type()
83 << " Layer cannot backpropagate to infogain inputs.";
84 }
85 if (propagate_down[0]) {
86 const Dtype* bottom_data = bottom[0]->cpu_data();
87 const Dtype* bottom_label = bottom[1]->cpu_data();
88 const Dtype* infogain_mat = NULL;
89 if (bottom.size() < 3) {
90 infogain_mat = infogain_.cpu_data();
91 } else {
92 infogain_mat = bottom[2]->cpu_data();
93 }
94 Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
95 int num = bottom[0]->num();
96 int dim = bottom[0]->count() / bottom[0]->num();
97 const Dtype scale = - top[0]->cpu_diff()[0] / num;
98 for (int i = 0; i < num; ++i) {
99 const int label = static_cast<int>(bottom_label[i]);
100 for (int j = 0; j < dim; ++j) {
101 Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
102 bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
103 }
104 }
105 }
106 }
107
108 INSTANTIATE_CLASS(InfogainLossLayer);
109 REGISTER_LAYER_CLASS(InfogainLoss);
110 } // namespace caffe
Accuracy and Top-k:
这个是对输出的结果与实际目标之间的准确率,实际上不是一个bp过程
Activation / Neuron Layers
一般激活/神经层是元操作,输入一个底层的数据blob,输出一个同样大小的顶层的blob,下面的层中,我们将忽略输入输出的大小,由于他们是同样的大小的。
Input:
n∗c∗h∗w
Output:
n∗c∗h∗w
ReLU/Rectified inner and leaky-ReLU:
Parameters (ReLUParameter relu_param)
Optional
negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
ReLU函数如下定义,设输入值为X
其中 negative_slope 不是设定的,与 max(0,x) 相等,详情见我的另外一个小博客
http://blog.csdn.net/swfa1/article/details/45601789
sigmoid层
层的类型:sigmoid
样例:
layer {
name: "encode1neuron"
bottom: "encode1"
top: "encode1neuron"
type: "Sigmoid"
}
公式:
TanH / Hyperbolic Tangent:
类型:TanH
样例:
layer {
name: "layer"
bottom: "in"
top: "out"
type: "TanH"
}
绝对值:
类型:AbsVal
layer {
name: "layer"
bottom: "in"
top: "out"
type: "AbsVal"
}
公式:
幂函数:
类型:Power
参数:
power [default 1]
scale [default 1]
shift [default 0]
样例:
layer {
name: "layer"
bottom: "in"
top: "out"
type: "Power"
power_param {
power: 1
scale: 1
shift: 0
}
}
公式:
BNLL:
type:BNLL
layer {
name: "layer"
bottom: "in"
top: "out"
type: BNLL
}
公式:
The BNLL (binomial normal log likelihood) layer computes the output as
Data Layers
Common Layers
InnerProduct
类型:InnerProduct
参数:
必须的:
num_output (c_o): the number of filters
强烈建议的:weight_filler [default type: ‘constant’ value: 0]
可选的:
bias_filler [default type: ‘constant’ value: 0]
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
样例:
layer {
name: "fc8"
type: "InnerProduct"
# learning rate and decay multipliers for the weights
param { lr_mult: 1 decay_mult: 1 }
# learning rate and decay multipliers for the biases
param { lr_mult: 2 decay_mult: 0 }
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
bottom: "fc7"
top: "fc8"
}
作用:
内积层又叫全连接层,输入当做一个以为想想,产生的输出也是以向量的形式输出,相当于blob的height 和width是1.
经过一段时间的学习之后,我发现上面的一些网络写的不是很详细,下面详细解释一下其中的
slice,ArgMaxLayer以及elementwise
slice layer
对输入进行分块处理,处理之后再进行剩下的计算,
ArgMaxLayer
Compute the index of the
K
max values for each datum across all dimensions
(C×H×W)
.
Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to true, output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximise.
NOTE: does not implement Backwards operation.
elementwise
Compute elementwise operations, such as product and sum, along multiple input Blobs.
2、Alex 网络定义
3、如何增加一个新层
Add a class declaration for your layer to the appropriate one of common_layers.hpp,data_layers.hpp, loss_layers.hpp, neuron_layers.hpp, or vision_layers.hpp. Include an inline implementation of type and the *Blobs() methods to specify blob number requirements. Omit the*_gpu declarations if you’ll only be implementing CPU code.
Implement your layer in layers/your_layer.cpp.
SetUp for initialization: reading parameters, allocating buffers, etc.
Forward_cpu for the function your layer computes
Backward_cpu for its gradient
(Optional) Implement the GPU versions Forward_gpu and Backward_gpu in layers/your_layer.cu.
Add your layer to proto/caffe.proto, updating the next available ID. Also declare parameters, if needed, in this file.
Make your layer createable by adding it to layer_factory.cpp.
Write tests in test/test_your_layer.cpp. Use test/test_gradient_check_util.hpp to check that your Forward and Backward implementations are in numerical agreement.
以上是github上某大神的解答,步骤很清晰,具体说一下,比如现在要添加一个vision layer,名字叫Aaa_Layer:
1、属于哪个类型的layer,就打开哪个hpp文件,这里就打开vision_layers.hpp,然后自己添加该layer的定义,或者直接复制Convolution_Layer的相关代码来修改类名和构造函数名都改为Aaa_Layer,如果不用GPU,将*_gpu的声明都去掉。
2、实现自己的layer,编写Aaa_Layer.cpp,加入到src/caffe/layers,主要实现Setup、Forward_cpu、Backward_cpu。
3、如果需要GPU实现,那么在Aaa_Layer.cu中实现Forward_gpu和Backward_gpu。
4、修改src/caffe/proto/caffe.proto,好到LayerType,添加Aaa,并更新ID,如果Layer有参数,添加AaaParameter类。
5、在src/caffe/layer_factory.cpp中添加响应代码。
6、在src/caffe/test中写一个test_Aaa_layer.cpp,用include/caffe/test/test_gradient_check_util.hpp来检查前向后向传播是否正确。