caffe源码阅读9-loss_layer.hpp+各cpp

最新推荐文章于 2020-11-21 05:26:47 发布

thy_2014

最新推荐文章于 2020-11-21 05:26:47 发布

阅读量2.8k

点赞数

分类专栏：深度学习文章标签： caffe loss_layer SoftmaxWithLoss

本文链接：https://blog.csdn.net/thy_2014/article/details/52163947

版权

深度学习专栏收录该内容

21 篇文章 0 订阅

订阅专栏

loss_layer：

AccuracyLayer类；

LossLayer类；

ContrastiveLossLayer类；

EuclideanLossLayer类；

HingeLossLayer类；

InfogainLossLayer类；

MultinomialLogisticLossLayer类；

SigmoidCrossEntropyLossLayer类；

SoftmaxWithLossLayer类；

终于看到最后一层了。

首先定义了一个常数：

const float kLOG_THRESHOLD = 1e-20;

1 AccuracyLayer：

Computes the classification accuracy for a one-of-many classification task.

1.1 原理介绍：

首先需要弄清楚，这个层是什么作用，上面的粉红色字体告诉我们，该层是计算分类精度的。所以我们常常在论文中会看到top-1, top5就是这里计算出来的。

假设批量训练的时候，一个批量的大小是N，那么对该批量的训练精度可以表示为：

其中：

分类标签的预测值为：

分类标签的实际值为：

那么top-1也就是将评分最高的那个预测分类标签与实际标签做比较，而top-5是将评分最高的前5个预测分类标签与实际标签做比较(只要有一个比较成功就算是成功)。

由于只是计算精度，所以没有反馈。

另外还可以推出，这个层通常处于最末端。

1.2 属性变量：

int top_k_;

这个变量在前面已经介绍过了。

1.3 构造函数：

template <typename Dtype>
void AccuracyLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  top_k_ = this->layer_param_.accuracy_param().top_k();
}

template <typename Dtype>
void AccuracyLayer<Dtype>::Reshape(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  CHECK_EQ(bottom[0]->num(), bottom[1]->num())
      << "The data and label should have the same number.";
  CHECK_LE(top_k_, bottom[0]->count() / bottom[0]->num())
      << "top_k must be less than or equal to the number of classes.";
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  (*top)[0]->Reshape(1, 1, 1, 1);
}

从 LayerSetUp()中可以看到，top_k是可以在网络配置文件中设置的。

另外，因为该层的目的只是计算精度，所以top层只有1个元素。

1.4 前馈函数：

template <typename Dtype>
void AccuracyLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  Dtype accuracy = 0;
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* bottom_label = bottom[1]->cpu_data();
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  vector<Dtype> maxval(top_k_+1);
  vector<int> max_id(top_k_+1);
  for (int i = 0; i < num; ++i) {
    // Top-k accuracy
    std::vector<std::pair<Dtype, int> > bottom_data_vector;
    for (int j = 0; j < dim; ++j) {
      bottom_data_vector.push_back(
          std::make_pair(bottom_data[i * dim + j], j));
    }
    std::partial_sort(
        bottom_data_vector.begin(), bottom_data_vector.begin() + top_k_,
        bottom_data_vector.end(), std::greater<std::pair<Dtype, int> >());
    // check if true label is in top k predictions
    for (int k = 0; k < top_k_; k++) {
      if (bottom_data_vector[k].second == static_cast<int>(bottom_label[i])) {
        ++accuracy;
        break;
      }
    }
  }

  // LOG(INFO) << "Accuracy: " << accuracy;
  (*top)[0]->mutable_cpu_data()[0] = accuracy / num;
  // Accuracy layer should not be used as a loss function.
}

整个过程不难理解，其中用到了比较有意思的部分排序——partial_sort，然后其中定义的max_id似乎没有用的样子欸。

2 LossLayer：

An interface for Layer%s that take two Blob%s as input -- usually (1) predictions and (2) ground-truth labels -- and output a singleton Blob representing the loss.

LossLayers are typically only capable of backpropagating to their first input -- the predictions.

这里就先不介绍了，因为从类中可以看到，这还只是一个抽象类，说明在caffe中实现了各种损失层，都继承了这个抽象类：

template <typename Dtype>
class LossLayer : public Layer<Dtype> {
 public:
  explicit LossLayer(const LayerParameter& param)
     : Layer<Dtype>(param) {}
  virtual void LayerSetUp(
      const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);
  virtual void Reshape(
      const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);

  virtual inline int ExactNumBottomBlobs() const { return 2; }

  /**
   * @brief For convenience and backwards compatibility, instruct the Net to
   *        automatically allocate a single top Blob for LossLayers, into which
   *        they output their singleton loss, (even if the user didn't specify
   *        one in the prototxt, etc.).
   */
  virtual inline bool AutoTopBlobs() const { return true; }
  virtual inline int ExactNumTopBlobs() const { return 1; }
  /**
   * We usually cannot backpropagate to the labels; ignore force_backward for
   * these inputs.
   */
  virtual inline bool AllowForceBackward(const int bottom_index) const {
    return bottom_index != 1;
  }
};

3 ContrastiveLossLayer：

Computes the contrastive loss

where

This can be used to train siamese networks.

@param bottom input Blob vector (length 3)
-# @f$ (N \times C \times 1 \times 1) @f$
the features:

-# @f$ (N \times C \times 1 \times 1) @f$
the features:

-# @f$ (N \times 1 \times 1 \times 1) @f$
the binary similarity:

3.1 原理介绍：

这是一种损失值计算方式，上面也提到了，这种损失值对于siamese network很有用，这种网络是05年Yann Lecun提出来的，它的特点是它接收两个图片作为输入，而不是一张图片作为输入。

摘抄自caffe github的issue697
Siamese nets are supervised models for metric learning [1].
[1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539–546. IEEE, 2005. http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf

Speaking of metric learning, I remember that @norouzi had proposed and open sourced a method that learned a Hamming distance metric to distinguish similar and dissimilar images [2].
[2] Mohammad Norouzi, David J. Fleet, Ruslan Salakhutdinov, Hamming Distance Metric Learning, Neural Information Processing Systems (NIPS), 2012.

根据前面的介绍，其实该层的前馈计算过程已经算是描述清楚了，那么反馈过程是怎样的呢？

根据链式法则有：

当两图相似的时候（也就是y=1）：

对b的求导也就是多了一个负号而已。

当两图不相似时（也就是y=0）：

因为margin是常量，所以求导过程与上面类似，只是还要再多一个负号。当然这里面还要分margin-d是否大于0，如果小于0，导数还是0。

3.2 属性变量：

  Blob<Dtype> diff_;  // cached for backward pass
  Blob<Dtype> dist_sq_;  // cached for backward pass
  Blob<Dtype> diff_sq_;  // tmp storage for gpu forward pass
  Blob<Dtype> summer_vec_;  // tmp storage for gpu forward pass

3.3 构造函数：

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::LayerSetUp(bottom, top);
  CHECK_EQ(bottom[0]->channels(), bottom[1]->channels());
  CHECK_EQ(bottom[0]->height(), 1);
  CHECK_EQ(bottom[0]->width(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  CHECK_EQ(bottom[2]->channels(), 1);
  CHECK_EQ(bottom[2]->height(), 1);
  CHECK_EQ(bottom[2]->width(), 1);
  diff_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  diff_sq_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  dist_sq_.Reshape(bottom[0]->num(), 1, 1, 1);
  // vector of ones used to sum along channels
  summer_vec_.Reshape(bottom[0]->channels(), 1, 1, 1);
  for (int i = 0; i < bottom[0]->channels(); ++i)
    summer_vec_.mutable_cpu_data()[i] = Dtype(1);
}

前面说到这种siamese network的输入是两个图片，那么为啥这里在检测底层Blob的时候有三个呢？从后面的前馈函数可以看到，其实bottom[2]是标签！也就是两个图片是否相似。

3.4 前馈反馈函数：

前馈：

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  int count = bottom[0]->count();
  caffe_sub(
      count,
      bottom[0]->cpu_data(),  // a
      bottom[1]->cpu_data(),  // b
      diff_.mutable_cpu_data());  // a_i-b_i
  const int channels = bottom[0]->channels();
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  Dtype loss(0.0);
  for (int i = 0; i < bottom[0]->num(); ++i) {
    dist_sq_.mutable_cpu_data()[i] = caffe_cpu_dot(channels,
        diff_.cpu_data() + (i*channels), diff_.cpu_data() + (i*channels));
    if (static_cast<int>(bottom[2]->cpu_data()[i])) {  // similar pairs
      loss += dist_sq_.cpu_data()[i];
    } else {  // dissimilar pairs
      loss += std::max(margin-dist_sq_.cpu_data()[i], Dtype(0.0));
    }
  }
  loss = loss / static_cast<Dtype>(bottom[0]->num()) / Dtype(2);
  (*top)[0]->mutable_cpu_data()[0] = loss;
}

前馈的过程很简单，按照公式进行计算就好了。另外从代码中还可以看到，在前馈的公式中y表示是否相似，是一个0，1变量。

反馈：

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  for (int i = 0; i < 2; ++i) {
    if (propagate_down[i]) {
      const Dtype sign = (i == 0) ? 1 : -1;
      const Dtype alpha = sign * top[0]->cpu_diff()[0] /
          static_cast<Dtype>((*bottom)[i]->num());
      int num = (*bottom)[i]->num();
      int channels = (*bottom)[i]->channels();
      for (int j = 0; j < num; ++j) {
        Dtype* bout = (*bottom)[i]->mutable_cpu_diff();
        if (static_cast<int>((*bottom)[2]->cpu_data()[j])) {  // similar pairs
          caffe_cpu_axpby(
              channels,
              alpha,
              diff_.cpu_data() + (j*channels),
              Dtype(0.0),
              bout + (j*channels));
        } else {  // dissimilar pairs
          if ((margin-dist_sq_.cpu_data()[j]) > Dtype(0.0)) {
            caffe_cpu_axpby(
                channels,
                -alpha,
                diff_.cpu_data() + (j*channels),
                Dtype(0.0),
                bout + (j*channels));
          } else {
            caffe_set(channels, Dtype(0), bout + (j*channels));
          }
        }
      }
    }
  }
}

这里的计算过程也很简单，按照前面原理介绍中提到的公式来计算就好了。不过 有一个疑问，没有弄懂：

应该来说，这个层是整个网络中最后一层了，那么误差也就是在这里面计算的，也就是loss嘛，但是前馈的时候，将loss存入top.cpu_data中的。而反馈的时候，误差传递用的却是top.cpu_dff，那么top.cpu_diff是哪里来的呢？

4 EuclideanLossLayer：

@brief Computes the Euclidean (L2) loss:

for real-valued regression tasks.

@param bottom input Blob vector (length 2)

-# @f$ (N \times C \times H \times W) @f$
      the predictions:

   -# @f$ (N \times C \times H \times W) @f$
      the targets

This can be used for least-squares regression tasks. An InnerProductLayer input to a EuclideanLossLayer exactly formulates a linear least squares regression problem. With non-zero weight decay the problem becomes one of ridge regression -- see src/caffe/test/test_sgd_solver.cpp for a concrete example wherein we check that the gradients computed for a Net with exactly this structure match hand-computed gradient formulas for ridge regression.

(Note: Caffe, and SGD in general, is certainly \b not the best way to solve linear least squares problems! We use it only as an instructive example.)

4.1 原理介绍：

前馈的过程就直接按照上面的公式计算即可；反馈的过程其实更简单，因为上面的导数：

4.2 前馈反馈函数：

template <typename Dtype>
void EuclideanLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  int count = bottom[0]->count();
  caffe_sub(
      count,
      bottom[0]->cpu_data(),
      bottom[1]->cpu_data(),
      diff_.mutable_cpu_data());
  Dtype dot = caffe_cpu_dot(count, diff_.cpu_data(), diff_.cpu_data());
  Dtype loss = dot / bottom[0]->num() / Dtype(2);
  (*top)[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void EuclideanLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  for (int i = 0; i < 2; ++i) {
    if (propagate_down[i]) {
      const Dtype sign = (i == 0) ? 1 : -1;
      const Dtype alpha = sign * top[0]->cpu_diff()[0] / (*bottom)[i]->num();
      caffe_cpu_axpby(
          (*bottom)[i]->count(),              // count
          alpha,                              // alpha
          diff_.cpu_data(),                   // a
          Dtype(0),                           // beta
          (*bottom)[i]->mutable_cpu_diff());  // b
    }
  }
}

因为这里比较简单，就不啰嗦了，只是比较奇怪的是， 为什么反馈的时候会对标签y做偏导计算呢？好奇怪！！当然如果不要将y理解成标签，将这种欧氏loss理解层和上面介绍的loss同样的效果，那么对y求偏导也是可以理解的。

5 HingeLossLayer：

@brief Computes the hinge loss for a one-of-many classification task.

@param bottom input Blob vector (length 2)
-# @f$ (N \times C \times H \times W) @f$
the predictions @f$ t @f$, a Blob with values in

indicating the predicted score for each of the

classes. In an SVM, @f$ t @f$ is the result of taking the inner product

of the D-dimensional features

and the learned hyperplane parameters

      so a Net with just an InnerProductLayer (with num_output = D) providing predictions to a HingeLossLayer and no other learnable parameters or losses is equivalent to an SVM.
   -# @f$ (N \times 1 \times 1 \times 1) @f$
      the labels @f$ l @f$, an integer-valued Blob with values

      indicating the correct class label among the @f$ K @f$ classes
@param top output Blob vector (length 1)
   -# @f$ (1 \times 1 \times 1 \times 1) @f$
      the computed hinge loss:

for the @f$ L^p @f$ norm
(defaults to @f$ p = 1 @f$, the L1 norm; L2 norm, as in L2-SVM, is also available), and

In an SVM,

is the result of taking the inner product

of the features

and the learned hyperplane parameters

So, a Net with just an InnerProductLayer (with num_output = @f$k@f$) providing predictions to a HingeLossLayer is equivalent to an SVM (assuming it has no other learned outside the InnerProductLayer and no other losses outside the HingeLossLayer).

5.1 原理介绍：

前馈的原理已经在上面说明了。

那么反馈的时候是怎么做的呢？

Gradients cannot be computed with respect to the label inputs (bottom[1]), so this method ignores bottom[1] and requires !propagate_down[1], crashing if propagate_down[1] is set.

@param top output Blob vector (length 1), providing the error gradient with respect to the outputs
-# @f$ (1 \times 1 \times 1 \times 1) @f$
This Blob's diff will simply contain the

as is the coefficient of this layer's output
in the overall Net loss

hence

     (*Assuming that this top Blob is not used as a bottom (input) by any other layer of the Net.)
@param propagate_down see Layer::Backward.
     propagate_down[1] must be false as we can't compute gradients with respect to the labels.
@param bottom input Blob vector (length 2)
-# @f$ (N \times C \times H \times W) @f$
     the predictions @f$t@f$; Backward computes diff

-# @f$ (N \times 1 \times 1 \times 1) @f$
the labels -- ignored as we can't compute their error gradients

5.2 前馈和反馈函数：

前馈：

template <typename Dtype>
void HingeLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
  const Dtype* label = bottom[1]->cpu_data();
  int num = bottom[0]->num();
  int count = bottom[0]->count();
  int dim = count / num;

  caffe_copy(count, bottom_data, bottom_diff);
  for (int i = 0; i < num; ++i) {
    bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
  }
  for (int i = 0; i < num; ++i) {
    for (int j = 0; j < dim; ++j) {
      bottom_diff[i * dim + j] = std::max(
        Dtype(0), 1 + bottom_diff[i * dim + j]);
    }
  }
  Dtype* loss = (*top)[0]->mutable_cpu_data();
  switch (this->layer_param_.hinge_loss_param().norm()) {
  case HingeLossParameter_Norm_L1:
    loss[0] = caffe_cpu_asum(count, bottom_diff) / num;
    break;
  case HingeLossParameter_Norm_L2:
    loss[0] = caffe_cpu_dot(count, bottom_diff, bottom_diff) / num;
    break;
  default:
    LOG(FATAL) << "Unknown Norm";
  }
}

这里，其实不是很难理解。从上面的计算过程来看，相当于是先将bottom_data拷贝到bottom_diff中，然后对bottom_diff中的数据一部分乘以了-1，注意，乘以-1的只是该图片对应的标签位置（可以参考手写数字识别的时候，那个标签是怎么做的）。后面开始做+1的操作，按照公式来看，其实是max(0, 1-bottom_diff)，但由于bottom_diff中只是很小的一部分数据乘以了-1，所以在代码中在做+1操作时，其实对应的公式应该是

那么在前馈的公式中 t 也就是bottom_data（拷贝到了bottom_diff中）。

前馈的最后就没什么好说的，做1阶范数或者2阶范数。

反馈：

template <typename Dtype>
void HingeLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down[0]) {
    Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
    const Dtype* label = (*bottom)[1]->cpu_data();
    int num = (*bottom)[0]->num();
    int count = (*bottom)[0]->count();
    int dim = count / num;

    for (int i = 0; i < num; ++i) {
      bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
    }

    const Dtype loss_weight = top[0]->cpu_diff()[0];
    switch (this->layer_param_.hinge_loss_param().norm()) {
    case HingeLossParameter_Norm_L1:
      caffe_cpu_sign(count, bottom_diff, bottom_diff);
      caffe_scal(count, loss_weight / num, bottom_diff);
      break;
    case HingeLossParameter_Norm_L2:
      caffe_scal(count, loss_weight * 2 / num, bottom_diff);
      break;
    default:
      LOG(FATAL) << "Unknown Norm";
    }
  }
}

前馈中的max(0, 1 - {l_n = k} t_nk)令为f，那么对t_nk的偏导可以写为：(2-范数为例)

在别的地方看到：

其实不太理解这是为什么！！！因为 f>0 的地方为啥只能是 l_n=k 的时候呢？

6 InfogainLossLayer：

A generalization of MultinomialLogisticLossLayer that takes an "information gain" (infogain) matrix specifying the "value" of all label pairs.

6.1 原理介绍：

输入：

形状：预测值内，表示这预测每一类的概率，共个类，每一个预测概率的和为1: .
形状：标签值： , 是一个整数值，其范围是表示着在个类中的索引。
形状： (可选) 信息增益矩阵 .作为第三个输入参数，. 如果 , 则它等价于多项式逻辑损失函数

输出：

形状：

计算公式: , 其中表示行 of .

有了上面的说明，前馈相当于就知道了，反馈的过程如下：

6.2 属性变量：

Blob<Dtype> infogain_;

这个变量也就是前面介绍的H。

6.3 构造函数：

template <typename Dtype>
void InfogainLossLayer<Dtype>::LayerSetUp(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::LayerSetUp(bottom, top);
  if (bottom.size() < 3) {
    CHECK(this->layer_param_.infogain_loss_param().has_source())
        << "Infogain matrix source must be specified.";
    BlobProto blob_proto;
    ReadProtoFromBinaryFile(
      this->layer_param_.infogain_loss_param().source(), &blob_proto);
    infogain_.FromProto(blob_proto);
  }
}

template <typename Dtype>
void InfogainLossLayer<Dtype>::Reshape(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::Reshape(bottom, top);
  Blob<Dtype>* infogain = NULL;
  if (bottom.size() < 3) {
    infogain = &infogain_;
  } else {
    infogain = bottom[2];
  }
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  const int num = bottom[0]->num();
  const int dim = bottom[0]->count() / num;
  CHECK_EQ(infogain->num(), 1);
  CHECK_EQ(infogain->channels(), 1);
  CHECK_EQ(infogain->height(), dim);
  CHECK_EQ(infogain->width(), dim);
}

从这里可以看到，当输入的bottom只有两个时，H矩阵是需要被指定的。如果输入bottom有三个时，第三个bottom就作为H矩阵。

6.4 前馈和反馈函数：

前馈：

template <typename Dtype>
void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* bottom_label = bottom[1]->cpu_data();
  const Dtype* infogain_mat = NULL;
  if (bottom.size() < 3) {
    infogain_mat = infogain_.cpu_data();
  } else {
    infogain_mat = bottom[2]->cpu_data();
  }
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  Dtype loss = 0;
  for (int i = 0; i < num; ++i) {
    int label = static_cast<int>(bottom_label[i]);
    for (int j = 0; j < dim; ++j) {
      Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
      loss -= infogain_mat[label * dim + j] * log(prob);
    }
  }
  (*top)[0]->mutable_cpu_data()[0] = loss / num;
}

反馈：

template <typename Dtype>
void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    vector<Blob<Dtype>*>* bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down.size() > 2 && propagate_down[2]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to infogain inputs.";
  }
  if (propagate_down[0]) {
    const Dtype* bottom_data = (*bottom)[0]->cpu_data();
    const Dtype* bottom_label = (*bottom)[1]->cpu_data();
    const Dtype* infogain_mat = NULL;
    if (bottom->size() < 3) {
      infogain_mat = infogain_.cpu_data();
    } else {
      infogain_mat = (*bottom)[2]->cpu_data();
    }
    Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
    int num = (*bottom)[0]->num();
    int dim = (*bottom)[0]->count() / (*bottom)[0]->num();
    const Dtype scale = - top[0]->cpu_diff()[0] / num;
    for (int i = 0; i < num; ++i) {
      const int label = static_cast<int>(bottom_label[i]);
      for (int j = 0; j < dim; ++j) {
        Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
        bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
      }
    }
  }
}

thy_2014

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
caffe源码阅读9-loss_layer.hpp+各cpp

loss_layer：AccuracyLayer类；LossLayer类；ContrastiveLossLayer类；EuclideanLossLayer类；HingeLossLayer类；InfogainLossLayer类；MultinomialLogisticLossLayer类；SigmoidCrossEntropyLossLayer类；Softma
复制链接

扫一扫