伸手党的福音,工具真的有好多呀。。。。。
后之视今,亦犹今之视昔
在风口的时候,可以亲眼见证各种变化,于是才有机会体验什么是瞬息万变。当年以目标识别为切入点,依据网上有的各种资料,各种填坑安装,重装,兵分几路,尝试caffe, mxnet, tensorflow,torch, cntk, darknet。
- cntk用的太少,基本被放弃。
- darknet仅仅是对yolo的专属,难以扩展,也没有专门去看。
- torch那时仅仅支持lua,虽然买了本lua的书啃,但是生态还是个大问题,迫于tensorflow的压力后面最终还是推出了pytorch,大有超越tensorflow之势。
- tensorflow感觉越来越庞杂,bazel的编译也是。。。
- 李沐、陈天奇等人的mxnet当时也是个厉害的角色,可惜没有大公司背书,生态很难起来。现在李沐本尊都已经有了《手把手教你》系列教程了。
- caffe真的是个祖传工具,推出早,成名早,用户积累广,是源码debug研究的绝好材料,虽然后面一度被python接口的tensorflow碾压,在贾扬清大神加盟Facebook,推出caffe2以后,和pytorch配合训练+部署一条龙服务,简直不要太友好。
Caffe概览
祖传工具的好处是,有各种前人已经帮你把它揉碎嚼烂消化吸收好之后,呈现在你面前,可以大大的提升小白的学习效率。总的印象它大约是长成如下这个样子
Caffe的反向传播推导
结合安排和自己的兴趣,当时重点源码debug反向传播这一块。
以Softmax 函数为例,假设数据
x
{\mathbf{x}}
x对应的标签为
y
{\mathbf{y}}
y,观察到的数据
x
{\mathbf{x}}
x,属于类别
i
{i}
i的概率为
o
i
{o_{i}}
oi, Softmax 函数:
σ
(
z
)
=
(
σ
1
(
z
)
,
.
.
.
,
σ
m
(
z
)
)
{\mathbf{\sigma}(\mathbf{z} )=(\sigma_{1}(\mathbf{z}),...,\sigma_{m}(\mathbf{z}))}
σ(z)=(σ1(z),...,σm(z))
o
i
=
σ
i
(
z
)
=
e
x
p
(
z
i
)
∑
j
=
1
m
e
x
p
(
z
j
)
,
i
=
1
,
.
.
.
,
m
o_{i}=\sigma_{i}(\mathbf{z})=\frac{exp(z_{i})}{\sum_{j=1}^{m}exp(z_{j})},i=1,...,m
oi=σi(z)=∑j=1mexp(zj)exp(zi),i=1,...,m
反向传播的动力源头-Multinomial Logistic Loss
l
(
y
,
o
)
=
−
l
o
g
(
o
y
)
l(y,o)=-log(o_{y})
l(y,o)=−log(oy)
∂
l
(
y
,
o
)
∂
o
i
=
−
δ
i
y
o
y
\frac{\partial l(y,o)}{\partial o_{i}}=-\frac{\delta_{iy}}{o_{y}}
∂oi∂l(y,o)=−oyδiy
δ
k
y
=
{
1
k = y
0
k!= y
\delta_{ky}=\left\{ \begin{array}{ll} 1 & \textrm{k = y}\\ 0 & \textrm{k!= y}\\ \end{array} \right.
δky={10k = yk!= y
Softmax的导数
∂
o
i
∂
z
k
=
δ
i
k
e
z
i
(
∑
j
=
1
m
e
z
i
)
−
e
z
i
e
z
k
(
∑
j
=
1
m
e
z
i
)
2
=
δ
i
k
o
k
−
o
i
o
k
\frac{\partial o_{i}}{\partial z_{k}}=\frac{\delta_{ik}e^{z_{i}}(\sum_{j=1}^{m}e^{z_{i}})-e^{z_{i}}e^{z_{k}}}{(\sum_{j=1}^{m}e^{z_{i}})^2}=\delta_{ik}o_{k}-o_{i}o_{k}
∂zk∂oi=(∑j=1mezi)2δikezi(∑j=1mezi)−eziezk=δikok−oiok
引入chain rule,得到SoftmaxWithLoss的导数
∑
i
=
1
m
∂
o
i
∂
z
k
⋅
∂
l
(
y
,
o
)
∂
o
i
=
o
k
−
δ
y
k
o
k
o
y
=
o
k
−
δ
y
k
\sum_{i=1}^{m}\frac{\partial o_{i}}{\partial z_{k}}\cdot \frac{\partial l(y,o)}{\partial o_{i}}=o_{k}-\delta_{yk}\frac{o_{k}}{o_{y}}=o_{k}-\delta_{yk}
i=1∑m∂zk∂oi⋅∂oi∂l(y,o)=ok−δykoyok=ok−δyk
Numerical Stability:
如果分成两层计算,除了计算量增大,数值稳定性也变差。由于浮点数有精度限制,每多一次运算就多累积一定误差,且分两步计算时,我们需要计算
δ
i
y
o
y
\frac{\delta_{iy}}{o_{y}}
oyδiy ,如果这次预测非常不准,正确的类别所得到的概率非常小, 会产生overflow。
Caffe的反向传播代码
softmaxwithloss的反向传播实现
template <typename Dtype>
void KLNSOFTMAXLossLayer<Dtype>::Backward_cpu(
const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[1]) {
LOG(FATAL) << this->type()
<< "Layer cannot backpropagate to label inputs.";
}
if (propagate_down[0]) {
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const Dtype* prob_data = prob_.cpu_data();
caffe_copy(prob_.count(), prob_data, bottom_diff);
const Dtype* label = bottom[1]->cpu_data();
int dim = prob_.count() / outer_num_;
int count = 0;
for (int i = 0; i < outer_num_; ++i) {
for (int j = 0; j < inner_num_; ++j) {
const int label_value =
static_cast<int>(label[i * inner_num_ + j]);
if (has_ignore_label_ && label_value == ignore_label_) {
for (int c = 0; c < bottom[0]->shape(kln_softmax_axis_); ++c)
{
bottom_diff[i * dim + c * inner_num_ + j] = 0;
}
} else {
bottom_diff[i * dim + label_value * inner_num_ + j] -= 1;
++count;
}
}
}// Scale gradient
Dtype loss_weight = top[0]->cpu_diff()[0] /
get_normalizer(normalization_, count);
caffe_scal(prob_.count(), loss_weight, bottom_diff);
}
}
Caffe手动添加一个自己的层
- Step1: 添加ReLU在caffe.proto中的消息定义
message KLNReLUParameter {
optional float negative_slope = 1 [default = 0];
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 2 [default = DEFAULT];
}
- Step2: 在caffe.proto的LayerParameter中添加最新的ID
message LayerParameter {
optional string name = 1;
optional string type = 2;
repeated string bottom = 3;
repeated string top = 4;
......
optional AccuracyParameter accuracy_param = 102;
optional ArgMaxParameter argmax_param = 103;
optional BatchNormParameter batch_norm_param = 139;
optional BiasParameter bias_param = 141;
optional ConcatParameter concat_param = 104;
......
optional KLNReLUParameter kln_relu_param = 147;
}
- Step3: 在include/caffe/layers/中添加一个头文件kln_relu_layer.hpp
#ifndef CAFFE_KLN_RELU_LAYER_HPP_
#define CAFFE_KLN_RELU_LAYER_HPP_
#include <vector>
#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/layers/kln_neuron_layer.hpp"
namespace caffe {
template <typename Dtype>
class KLNReLULayer : public KLNNeuronLayer<Dtype> {
public:
explicit KLNReLULayer(const LayerParameter& param)
: KLNNeuronLayer<Dtype>(param) {}
virtual inline const char* type() const { return "KLNReLU"; }
protected:
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top);
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top){
NOT_IMPLEMENTED;
};
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom);
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom){
NOT_IMPLEMENTED;
};
};
} // namespace caffe
#endif // CAFFE_KLN_RELU_LAYER_HPP_
- Step4:在src/caffe/layers/中添加对应的实现文件kln_relu_layer.cpp
#include <algorithm>
#include <vector>
#include "caffe/layers/kln_relu_layer.hpp"
namespace caffe {
template <typename Dtype>
void KLNReLULayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {
const Dtype* bottom_data = bottom[0]->cpu_data();
Dtype* top_data = top[0]->mutable_cpu_data();
const int count = bottom[0]->count();
Dtype negative_slope =
this->layer_param_.kln_relu_param().negative_slope();
for (int i = 0; i < count; ++i) {
top_data[i] = std::max(bottom_data[i], Dtype(0))
+ negative_slope * std::min(bottom_data[i], Dtype(0));
}
}
template <typename Dtype>
void KLNReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
if (propagate_down[0]) {
const Dtype* bottom_data = bottom[0]->cpu_data();
const Dtype* top_diff = top[0]->cpu_diff();
Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
const int count = bottom[0]->count();
Dtype negative_slope =
this->layer_param_.kln_relu_param().negative_slope();
for (int i = 0; i < count; ++i) {
bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
+ negative_slope * (bottom_data[i] <= 0));
}
}
}
INSTANTIATE_CLASS(KLNReLULayer);
} // namespace caffe
- Step5:在src/caffe/layer_factory.cpp中添加对应的注册实现
...
#include "caffe/layers/kln_relu_layer.hpp"
...
template <typename Dtype>
shared_ptr<Layer<Dtype>> GetKLNReLULayer(const LayerParameter& param) {
KLNReLUParameter_Engine engine = param.kln_relu_param().engine();
if (engine == KLNReLUParameter_Engine_DEFAULT) {
engine = KLNReLUParameter_Engine_CAFFE;
}
if (engine == KLNReLUParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new KLNReLULayer<Dtype>(param));
} else {
LOG(FATAL) << "Layer " << param.name() << "has unknown engine.";
}
}
REGISTER_LAYER_CREATOR(KLNReLU, GetKLNReLULayer);
后记
看到这里就知道为什么当时如火如荼的caffe后来被“新涌现”的tensorflow压倒,tensorflow也有重蹈覆辙之势。“每览昔人兴感之由,若合一契,未尝不临文嗟悼,不能喻之于怀。固知一死生为虚诞,齐彭殇为妄作。后之视今,亦犹今之视昔”。
- [1] 深度学习:21天实战Caffe. 赵永科,电子工业出版社. - [2] http://freemind.pluskid.org/machine-learning/softmax-vs-softmax-loss-numerical-stability/