caffe源码分析:softmax_layer.cpp && softmax_loss_layer.cpp

最新推荐文章于 2019-06-21 13:59:31 发布

小胖蹄儿

最新推荐文章于 2019-06-21 13:59:31 发布

阅读量2.3k

点赞数

分类专栏： CAFFE

本文链接：https://blog.csdn.net/Cheese_pop/article/details/51122398

版权

CAFFE 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

本文仅分析了softmax_layer.cpp 和 softmax_loss_layer.cpp两个文件中的forward函数，backward函数有待补充。

1、softmax_layer.cpp

softmax function

设有m个已标记样本， $\sigma\mathit(z)=(\sigma_1(\mathit{z}),\sigma_2(\mathit{z}),...,\sigma_m(\mathit{z}))$ 定义：

σ i (z) = exp ( z i ) \sum m j = 1 exp ( z j ), i = 1, . . ., m

$\sigma_\mathit{i}(\mathit{z})=\frac{\exp(z_i)}{\sum_{j=1}^m\exp(z_j)},\quad i=1,...,m$
其中，

σi(z) $\sigma_\mathit{i}(\mathit{z})$ 是loss层的输入；

zi=WTix+bi $z_i=W_i^Tx+b_i$ ,表示第i类的线性预测结果，

WTi $W_i^T$ 为权重，

bi $b_i$ 为偏置值。
带入softmax进行计算其实就是先对每一个 $z_i$ 取exponential变为非负，然后除以所有项之和进行归一化。
在softmax_layer.cpp中，可以将forward函数比较直观的表现为以下形式：

h θ (x (i)) = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ p (y (i) = 1 | x (i); θ) p (y (i) = 2 | x (i); θ) ⋮ p (y (i) = k | x (i); θ) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = 1 \sum k l = 1 e θ T j x ( i ) ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ e θ T 1 x (i) e θ T 2 x (i) ⋮ e θ T k x (i) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$h_\theta\left(x^\mathit{(i)}\right)=\left[\begin{matrix}p(y^\mathit{(i)}=1|x^\mathit{(i)};\theta)\\p(y^\mathit{(i)}=2|x^\mathit{(i)};\theta)\\\vdots\\p(y^\mathit{(i)}=k|x^\mathit{(i)};\theta)\end{matrix}\right]=\frac{1}{\sum_{l=1}^ke^\mathit{\theta_j^Tx^\mathit{(i)}}}\left[\begin{matrix}e^\mathit{\theta_\mathsf{1}^Tx^\mathit{(i)}}\\e^\mathit{\theta_\mathsf{2}^Tx^\mathit{(i)}}\\\vdots\\e^\mathit{\theta_\mathsf{k}^Tx^\mathit{(i)}}\end{matrix}\right]$
在 softmax.cpp中 forward函数代码：

template <typename Dtype>
void SoftmaxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = top[0]->mutable_cpu_data();
  Dtype* scale_data = scale_.mutable_cpu_data();
  int channels = bottom[0]->shape(softmax_axis_);
  int dim = bottom[0]->count() / outer_num_;
  caffe_copy(bottom[0]->count(), bottom_data, top_data);
  // We need to subtract the max to avoid numerical issues, compute the exp,
  // and then normalize.
  for (int i = 0; i < outer_num_; ++i) {
    // initialize scale_data to the first plane
    caffe_copy(inner_num_, bottom_data + i * dim, scale_data);
    for (int j = 0; j < channels; j++) {
      for (int k = 0; k < inner_num_; k++) {
        scale_data[k] = std::max(scale_data[k],
            bottom_data[i * dim + j * inner_num_ + k]);
      }
    }
    // subtraction
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, inner_num_,
        1, -1., sum_multiplier_.cpu_data(), scale_data, 1., top_data);
    // exponentiation
    caffe_exp<Dtype>(dim, top_data, top_data);
    // sum after exp
    caffe_cpu_gemv<Dtype>(CblasTrans, channels, inner_num_, 1.,
        top_data, sum_multiplier_.cpu_data(), 0., scale_data);
    // division
    for (int j = 0; j < channels; j++) {
      caffe_div(inner_num_, top_data, scale_data, top_data);
      top_data += inner_num_;
    }
  }
}

代码不多，针对Line 21至Line 32分析如下：
1、//division

top_data=top_data/scale_data;
top_data=top_data+inner_num_;

2、//sum after exp

scale_data=top_data*sum_multiplier_.cpu_data()

分析：求和，每一层各自求和放到scale_data中
3、//exponentiation

top_data=exp(top_data)

分析：比较直观，能看出是在exponentiation。函数caffe_exp()的第一个参数是dim，那么应该是对K维列向量做exp
4、//subtraction

通过矩阵相乘的方式来计算，有channels层的top_data，每层元素减去该层的最大值

2、softmax_loss_layer.cpp

softmax loss function

根据上面讲到的softmax函数，假设x属于第i类，我们要最大似然化 $\sigma_i(z)$ ，通常使用negtive log-likelihood ,也就是要最小化 $-log(o_y)$ 的值。
loss function:

J (θ) = - 1 m ⎡ ⎣ \sum i = 1 m \sum j = 1 k 1 {y (i) = j} l o g e θ T j x ( i ) \sum k l = 1 e θ T j x ( i ) ⎤ ⎦

$J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^m\sum_{j=1}^k1\{y^\mathit{(i)}=j\}log\frac{e^\mathit{\theta_j^Tx^\mathit{(i)}}}{\sum_{l=1}^ke^\mathit{\theta_j^Tx^\mathit{(i)}}}\right]$
其中，

1{y(i)=j} $1\{y^\mathit{(i)}=j\}$ 为示性函数。
在 softmax_loss_layer.cpp中forward函数代码：

template <typename Dtype>
void SoftmaxWithLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  // The forward pass computes the softmax prob values.
  softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_);
  const Dtype* prob_data = prob_.cpu_data();//定义了一个指针指向最初的可能值
  const Dtype* label = bottom[1]->cpu_data();//原始的label
  int dim = prob_.count() / outer_num_;//输入图像类的个数
  int count = 0;
  Dtype loss = 0;
  for (int i = 0; i < outer_num_; ++i) {//outer_num_=batch_size
    for (int j = 0; j < inner_num_; j++) {//inner_num_的存在可解决多标签问题，对于单一标签问题inner_num_=1
      const int label_value = static_cast<int>(label[i * inner_num_ + j]);//对于多标签问题还不是很理解，在单一标签问题中inner_num_=1,那么label[i * inner_num_ + j]表示第i * inner_num_ + j个输入图像的标签值，label[i * inner_num_ + j]一定属于[0,输入图像类别数-1]
      if (has_ignore_label_ && label_value == ignore_label_) {
        continue;
      }
      DCHECK_GE(label_value, 0);
      DCHECK_LT(label_value, prob_.shape(softmax_axis_));
      loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j],Dtype(FLT_MIN)));//对于单标签问题，每张图像经过计算后都会输出一个dim×1大小的矩阵（列向量），矩阵中的第k个值表示该图像属于第k类的概率；prob_data[i * dim + label_value * inner_num_ + j]表示第i个输入图像属于第label_value的概率。
      ++count;
    }
  }
  if (normalize_) {
    top[0]->mutable_cpu_data()[0] = loss / count;
  } else {
    top[0]->mutable_cpu_data()[0] = loss / outer_num_;
  }
  if (top.size() == 2) {
    top[1]->ShareData(prob_);
  }
}

小胖蹄儿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
caffe源码分析:softmax_layer.cpp && softmax_loss_layer.cpp

本文仅分析了softmax_layer.cpp 和 softmax_loss_layer.cpp两个文件中的forward函数，backward函数有待补充。1、softmax_layer.cppsoftmax function设有m个已标记样本，σ(z)=(σ1(z),σ2(z),...,σm(z))\sigma\mathit(z)=(\sigma_1(\mathit{z}),\sigma_2(\
复制链接

扫一扫