Caffe源码解读: SoftmaxLayer的前向与反向传播

最新推荐文章于 2024-08-06 14:07:22 发布

faithenXX

最新推荐文章于 2024-08-06 14:07:22 发布

阅读量1.2k

点赞数

分类专栏： caffe

本文链接：https://blog.csdn.net/zyf19930610/article/details/71194160

版权

本文详细解读了Caffe框架中SoftmaxLayer的前向传播和反向传播过程。前向传播遵循softmax公式，反向传播涉及梯度计算，输入包括softmax的输出及loss对其的梯度，输出为loss对输入的梯度。通过公式推导，展示了在不同条件下的计算细节。

摘要由CSDN通过智能技术生成

1.前向传播部分

这部分直接参照softmax公式：

template <typename Dtype>

void SoftmaxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();     //输入数据
  Dtype* top_data = top[0]->mutable_cpu_data();         //输出数据
  Dtype* scale_data = scale_.mutable_cpu_data();        //保存中间结果数据
  int channels = bottom[0]->shape(softmax_axis_);       //通道数
  int dim = bottom[0]->count() / outer_num_;            //类别数目
  caffe_copy(bottom[0]->count(), bottom_data, top_data);
  // We need to subtract the max to avoid numerical issues, compute the exp,指数不能太大，避免数值问题。
  // and then normalize.
  for (int i = 0; i < outer_num_; ++i) {                        //i表示第i个数据,每个数据有dim维
    // initialize scale_data to the first plane
    caffe_copy(inner_num_, bottom_data + i * dim, scale_data);   
    for (int j = 0; j < channels; j++) {
      for (int k = 0; k < inner_num_; k++) {
        scale_data[k] = std::max(scale_data[k],                 //获得每一参数维的最大值
            bottom_data[i * dim + j * inner_num_ + k]);
      }
    }
    // subtraction 通过矩阵相乘的方式来计算，有outer_num个top_data，每层元素减去该层的最大值。太巧妙了 
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, channels, inner_num_,
        1, -1., sum_multiplier_.cpu_data(), scale_data, 1., top_data);
	//C = alpha*op( A )*op( B ) + beta*C  

    // exponentiation 计算自然对数
    caffe_exp<Dtype>(dim, top_data, top_data);

    // sum after exp  每一层各自求和放到scale_data中  
    caffe_cpu_gemv<Dtype>(CblasTrans, channels, inner_num_, 1.,
        top_data, sum_multiplier_.cpu_data(), 0., scale_data);

    // division       每一层各自除以该层的和
    for (int j = 0; j < channels; j++) {
      caffe_div(inner_num_, top_data, scale_data, top_data);
      top_data += inner_num_;
    }
  }
}

2.后向传播部分

原理分析：

backward过程的输入：1，softmax正向传播的输出a，也就是top_data。

2，loss对a的梯度，也就是top_diff。

backward过程的输出：loss对softmax正向传播的输入