caffe源码阅读9-loss_layer.hpp+各cpp

loss_layer:

AccuracyLayer类;

LossLayer类;

ContrastiveLossLayer类;

EuclideanLossLayer类;

HingeLossLayer类;

InfogainLossLayer类;

MultinomialLogisticLossLayer类;

SigmoidCrossEntropyLossLayer类;

SoftmaxWithLossLayer类;

终于看到最后一层了。


首先定义了一个常数:

const float kLOG_THRESHOLD = 1e-20;

1 AccuracyLayer:

Computes the classification accuracy for a one-of-many classification task.

1.1 原理介绍:

首先需要弄清楚,这个层是什么作用,上面的粉红色字体告诉我们,该层是计算分类精度的。所以我们常常在论文中会看到top-1, top5就是这里计算出来的。

假设批量训练的时候,一个批量的大小是N,那么对该批量的训练精度可以表示为:

其中:

分类标签的预测值为:

分类标签的实际值为:

那么top-1也就是将评分最高的那个预测分类标签与实际标签做比较,而top-5是将评分最高的前5个预测分类标签与实际标签做比较(只要有一个比较成功就算是成功)。

由于只是计算精度,所以没有反馈。

另外还可以推出,这个层通常处于最末端。

1.2 属性变量:

int top_k_;

这个变量在前面已经介绍过了。

1.3 构造函数:

template <typename Dtype>
void AccuracyLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  top_k_ = this->layer_param_.accuracy_param().top_k();
}

template <typename Dtype>
void AccuracyLayer<Dtype>::Reshape(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  CHECK_EQ(bottom[0]->num(), bottom[1]->num())
      << "The data and label should have the same number.";
  CHECK_LE(top_k_, bottom[0]->count() / bottom[0]->num())
      << "top_k must be less than or equal to the number of classes.";
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  (*top)[0]->Reshape(1, 1, 1, 1);
}

LayerSetUp()中可以看到,top_k是可以在网络配置文件中设置的。

另外,因为该层的目的只是计算精度,所以top层只有1个元素。

1.4 前馈函数:

template <typename Dtype>
void AccuracyLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  Dtype accuracy = 0;
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* bottom_label = bottom[1]->cpu_data();
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  vector<Dtype> maxval(top_k_+1);
  vector<int> max_id(top_k_+1);
  for (int i = 0; i < num; ++i) {
    // Top-k accuracy
    std::vector<std::pair<Dtype, int> > bottom_data_vector;
    for (int j = 0; j < dim; ++j) {
      bottom_data_vector.push_back(
          std::make_pair(bottom_data[i * dim + j], j));
    }
    std::partial_sort(
        bottom_data_vector.begin(), bottom_data_vector.begin() + top_k_,
        bottom_data_vector.end(), std::greater<std::pair<Dtype, int> >());
    // check if true label is in top k predictions
    for (int k = 0; k < top_k_; k++) {
      if (bottom_data_vector[k].second == static_cast<int>(bottom_label[i])) {
        ++accuracy;
        break;
      }
    }
  }

  // LOG(INFO) << "Accuracy: " << accuracy;
  (*top)[0]->mutable_cpu_data()[0] = accuracy / num;
  // Accuracy layer should not be used as a loss function.
}

整个过程不难理解,其中用到了比较有意思的  部分排序——partial_sort,然后其中定义的max_id似乎没有用的样子欸。


2 LossLayer:

An interface for Layer%s that take two Blob%s as input -- usually  (1) predictions and (2) ground-truth labels -- and output a singleton Blob representing the loss.

LossLayers are typically only capable of backpropagating to their first input  -- the predictions.

这里就先不介绍了,因为从类中可以看到,这还只是一个抽象类,说明在caffe中实现了各种损失层,都继承了这个抽象类:

template <typename Dtype>
class LossLayer : public Layer<Dtype> {
 public:
  explicit LossLayer(const LayerParameter& param)
     : Layer<Dtype>(param) {}
  virtual void LayerSetUp(
      const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);
  virtual void Reshape(
      const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);

  virtual inline int ExactNumBottomBlobs() const { return 2; }

  /**
   * @brief For convenience and backwards compatibility, instruct the Net to
   *        automatically allocate a single top Blob for LossLayers, into which
   *        they output their singleton loss, (even if the user didn't specify
   *        one in the prototxt, etc.).
   */
  virtual inline bool AutoTopBlobs() const { return true; }
  virtual inline int ExactNumTopBlobs() const { return 1; }
  /**
   * We usually cannot backpropagate to the labels; ignore force_backward for
   * these inputs.
   */
  virtual inline bool AllowForceBackward(const int bottom_index) const {
    return bottom_index != 1;
  }
};

3 ContrastiveLossLayer:

Computes the contrastive loss


where


This can be used to train siamese networks.

@param bottom input Blob vector (length 3)
    -# @f$ (N \times C \times 1 \times 1) @f$
        the features:


     -# @f$ (N \times C \times 1 \times 1) @f$
        the features:


     -# @f$ (N \times 1 \times 1 \times 1) @f$
        the binary similarity:


3.1 原理介绍:

这是一种损失值计算方式,上面也提到了,这种损失值对于siamese network很有用,这种网络是05年Yann Lecun提出来的,它的特点是它接收两个图片作为输入,而不是一张图片作为输入。

摘抄自caffe github的issue697
Siamese nets are supervised models for metric learning [1].
[1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539–546. IEEE, 2005. http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf

Speaking of metric learning, I remember that @norouzi had proposed and open sourced a method that learned a Hamming distance metric to distinguish similar and dissimilar images [2].
[2] Mohammad Norouzi, David J. Fleet, Ruslan Salakhutdinov, Hamming Distance Metric Learning, Neural Information Processing Systems (NIPS), 2012.

根据前面的介绍,其实该层的前馈计算过程已经算是描述清楚了,那么反馈过程是怎样的呢?

根据链式法则有:

当两图相似的时候(也就是y=1):

对b的求导也就是多了一个负号而已。

当两图不相似时(也就是y=0):

因为margin是常量,所以求导过程与上面类似,只是还要再多一个负号。当然这里面还要分margin-d是否大于0,如果小于0,导数还是0。

3.2 属性变量:

  Blob<Dtype> diff_;  // cached for backward pass
  Blob<Dtype> dist_sq_;  // cached for backward pass
  Blob<Dtype> diff_sq_;  // tmp storage for gpu forward pass
  Blob<Dtype> summer_vec_;  // tmp storage for gpu forward pass

3.3 构造函数:

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::LayerSetUp(bottom, top);
  CHECK_EQ(bottom[0]->channels(), bottom[1]->channels());
  CHECK_EQ(bottom[0]->height(), 1);
  CHECK_EQ(bottom[0]->width(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  CHECK_EQ(bottom[2]->channels(), 1);
  CHECK_EQ(bottom[2]->height(), 1);
  CHECK_EQ(bottom[2]->width(), 1);
  diff_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  diff_sq_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  dist_sq_.Reshape(bottom[0]->num(), 1, 1, 1);
  // vector of ones used to sum along channels
  summer_vec_.Reshape(bottom[0]->channels(), 1, 1, 1);
  for (int i = 0; i < bottom[0]->channels(); ++i)
    summer_vec_.mutable_cpu_data()[i] = Dtype(1);
}

前面说到这种siamese network的输入是两个图片,那么为啥这里在检测底层Blob的时候有三个呢?从后面的前馈函数可以看到,其实bottom[2]是标签!也就是两个图片是否相似。

3.4 前馈反馈函数:

前馈:

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  int count = bottom[0]->count();
  caffe_sub(
      count,
      bottom[0]->cpu_data(),  // a
      bottom[1]->cpu_data(),  // b
      diff_.mutable_cpu_data());  // a_i-b_i
  const int channels = bottom[0]->channels();
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  Dtype loss(0.0);
  for (int i = 0; i < bottom[0]->num(); ++i) {
    dist_sq_.mutable_cpu_data()[i] = caffe_cpu_dot(channels,
        diff_.cpu_data() + (i*channels), diff_.cpu_data() + (i*channels));
    if (static_cast<int>(bottom[2]->cpu_data()[i])) {  // similar pairs
      loss += dist_sq_.cpu_data()[i];
    } else {  // dissimilar pairs
      loss += std::max(margin-dist_sq_.cpu_data()[i], Dtype(0.0));
    }
  }
  loss = loss / static_cast<Dtype>(bottom[0]->num()) / Dtype(2);
  (*top)[0]->mutable_cpu_data()[0] = loss;
}

前馈的过程很简单,按照公式进行计算就好了。另外从代码中还可以看到,在前馈的公式中y表示是否相似,是一个0,1变量。

反馈:

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  for (int i = 0; i < 2; ++i) {
    if (propagate_down[i]) {
      const Dtype sign = (i == 0) ? 1 : -1;
      const Dtype alpha = sign * top[0]->cpu_diff()[0] /
          static_cast<Dtype>((*bottom)[i]->num());
      int num = (*bottom)[i]->num();
      int channels = (*bottom)[i]->channels();
      for (int j = 0; j < num; ++j) {
        Dtype* bout = (*bottom)[i]->mutable_cpu_diff();
        if (static_cast<int>((*bottom)[2]->cpu_data()[j])) {  // similar pairs
          caffe_cpu_axpby(
              channels,
              alpha,
              diff_.cpu_data() + (j*channels),
              Dtype(0.0),
              bout + (j*channels));
        } else {  // dissimilar pairs
          if ((margin-dist_sq_.cpu_data()[j]) > Dtype(0.0)) {
            caffe_cpu_axpby(
                channels,
                -alpha,
                diff_.cpu_data() + (j*channels),
                Dtype(0.0),
                bout + (j*channels));
          } else {
            caffe_set(channels, Dtype(0), bout + (j*channels));
          }
        }
      }
    }
  }
}

这里的计算过程也很简单,按照前面原理介绍中提到的公式来计算就好了。不过 有一个疑问,没有弄懂:

应该来说,这个层是整个网络中最后一层了,那么误差也就是在这里面计算的,也就是loss嘛,但是前馈的时候,将loss存入top.cpu_data中的。而反馈的时候,误差传递用的却是top.cpu_dff,那么top.cpu_diff是哪里来的呢?

4 EuclideanLossLayer:

 @brief Computes the Euclidean (L2) loss:


for real-valued regression tasks.

@param bottom input Blob vector (length 2)

    -# @f$ (N \times C \times H \times W) @f$
      the predictions:


   -# @f$ (N \times C \times H \times W) @f$
      the targets


This can be used for least-squares regression tasks.  An InnerProductLayer input to a EuclideanLossLayer exactly formulates a linear least squares regression problem. With non-zero weight decay the problem becomes one of ridge regression -- see src/caffe/test/test_sgd_solver.cpp for a concrete example wherein we check that the gradients computed for a Net with exactly this structure match hand-computed gradient formulas for ridge regression.

(Note: Caffe, and SGD in general, is certainly \b not the best way to solve linear least squares problems! We use it only as an instructive example.)

4.1 原理介绍:

前馈的过程就直接按照上面的公式计算即可;反馈的过程其实更简单,因为上面的导数:

4.2 前馈反馈函数:

template <typename Dtype>
void EuclideanLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  int count = bottom[0]->count();
  caffe_sub(
      count,
      bottom[0]->cpu_data(),
      bottom[1]->cpu_data(),
      diff_.mutable_cpu_data());
  Dtype dot = caffe_cpu_dot(count, diff_.cpu_data(), diff_.cpu_data());
  Dtype loss = dot / bottom[0]->num() / Dtype(2);
  (*top)[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void EuclideanLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  for (int i = 0; i < 2; ++i) {
    if (propagate_down[i]) {
      const Dtype sign = (i == 0) ? 1 : -1;
      const Dtype alpha = sign * top[0]->cpu_diff()[0] / (*bottom)[i]->num();
      caffe_cpu_axpby(
          (*bottom)[i]->count(),              // count
          alpha,                              // alpha
          diff_.cpu_data(),                   // a
          Dtype(0),                           // beta
          (*bottom)[i]->mutable_cpu_diff());  // b
    }
  }
}

因为这里比较简单,就不啰嗦了,只是比较奇怪的是, 为什么反馈的时候会对标签y做偏导计算呢?好奇怪!!当然如果不要将y理解成标签,将这种欧氏loss理解层和上面介绍的loss同样的效果,那么对y求偏导也是可以理解的。

5 HingeLossLayer:

@brief Computes the hinge loss for a one-of-many classification task.

@param bottom input Blob vector (length 2)
   -# @f$ (N \times C \times H \times W) @f$
      the predictions @f$ t @f$, a Blob with values in

      indicating the predicted score for each of the

      classes. In an SVM, @f$ t @f$ is the result of taking the inner product

      of the D-dimensional features

      and the learned hyperplane parameters

      so a Net with just an InnerProductLayer (with num_output = D) providing predictions to a HingeLossLayer and no other learnable parameters or losses is equivalent to an SVM.
   -# @f$ (N \times 1 \times 1 \times 1) @f$
      the labels @f$ l @f$, an integer-valued Blob with values


      indicating the correct class label among the @f$ K @f$ classes
@param top output Blob vector (length 1)
   -# @f$ (1 \times 1 \times 1 \times 1) @f$
      the computed hinge loss:


      for the @f$ L^p @f$ norm
      (defaults to @f$ p = 1 @f$, the L1 norm; L2 norm, as in L2-SVM, is also available), and



 In an SVM,


is the result of taking the inner product


of the features


 and the learned hyperplane parameters


So, a Net with just an InnerProductLayer (with num_output = @f$k@f$) providing predictions to a HingeLossLayer is equivalent to an SVM (assuming it has no other learned outside the InnerProductLayer and no other losses outside the HingeLossLayer).

5.1 原理介绍:

前馈的原理已经在上面说明了。

那么反馈的时候是怎么做的呢?

Gradients cannot be computed with respect to the label inputs (bottom[1]), so this method ignores bottom[1] and requires !propagate_down[1], crashing if propagate_down[1] is set.

@param top output Blob vector (length 1), providing the error gradient with respect to the outputs
  -# @f$ (1 \times 1 \times 1 \times 1) @f$
     This Blob's diff will simply contain the


     as is the coefficient of this layer's output
      in the overall Net loss

     hence

     (*Assuming that this top Blob is not used as a bottom (input) by any other layer of the Net.)
@param propagate_down see Layer::Backward.
     propagate_down[1] must be false as we can't compute gradients with respect to the labels.
@param bottom input Blob vector (length 2)
  -# @f$ (N \times C \times H \times W) @f$
     the predictions @f$t@f$; Backward computes diff

             
  -# @f$ (N \times 1 \times 1 \times 1) @f$
     the labels -- ignored as we can't compute their error gradients

5.2 前馈和反馈函数:

前馈:

template <typename Dtype>
void HingeLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
  const Dtype* label = bottom[1]->cpu_data();
  int num = bottom[0]->num();
  int count = bottom[0]->count();
  int dim = count / num;

  caffe_copy(count, bottom_data, bottom_diff);
  for (int i = 0; i < num; ++i) {
    bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
  }
  for (int i = 0; i < num; ++i) {
    for (int j = 0; j < dim; ++j) {
      bottom_diff[i * dim + j] = std::max(
        Dtype(0), 1 + bottom_diff[i * dim + j]);
    }
  }
  Dtype* loss = (*top)[0]->mutable_cpu_data();
  switch (this->layer_param_.hinge_loss_param().norm()) {
  case HingeLossParameter_Norm_L1:
    loss[0] = caffe_cpu_asum(count, bottom_diff) / num;
    break;
  case HingeLossParameter_Norm_L2:
    loss[0] = caffe_cpu_dot(count, bottom_diff, bottom_diff) / num;
    break;
  default:
    LOG(FATAL) << "Unknown Norm";
  }
}

这里,其实不是很难理解。从上面的计算过程来看,相当于是先将bottom_data拷贝到bottom_diff中,然后对bottom_diff中的数据一部分乘以了-1,注意,乘以-1的只是该图片对应的标签位置(可以参考手写数字识别的时候,那个标签是怎么做的)。后面开始做+1的操作,按照公式来看,其实是max(0, 1-bottom_diff),但由于bottom_diff中只是很小的一部分数据乘以了-1,所以在代码中在做+1操作时,其实对应的公式应该是

那么在前馈的公式中 t 也就是bottom_data(拷贝到了bottom_diff中)。

前馈的最后就没什么好说的,做1阶范数或者2阶范数。

反馈:

template <typename Dtype>
void HingeLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down[0]) {
    Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
    const Dtype* label = (*bottom)[1]->cpu_data();
    int num = (*bottom)[0]->num();
    int count = (*bottom)[0]->count();
    int dim = count / num;

    for (int i = 0; i < num; ++i) {
      bottom_diff[i * dim + static_cast<int>(label[i])] *= -1;
    }

    const Dtype loss_weight = top[0]->cpu_diff()[0];
    switch (this->layer_param_.hinge_loss_param().norm()) {
    case HingeLossParameter_Norm_L1:
      caffe_cpu_sign(count, bottom_diff, bottom_diff);
      caffe_scal(count, loss_weight / num, bottom_diff);
      break;
    case HingeLossParameter_Norm_L2:
      caffe_scal(count, loss_weight * 2 / num, bottom_diff);
      break;
    default:
      LOG(FATAL) << "Unknown Norm";
    }
  }
}

前馈中的max(0, 1 - {l_n = k} t_nk)令为f,那么对t_nk的偏导可以写为:(2-范数为例)

在别的地方看到:

其实不太理解这是为什么!!!因为 f>0 的地方为啥只能是 l_n=k 的时候呢?

6 InfogainLossLayer:

A generalization of MultinomialLogisticLossLayer that takes an "information gain" (infogain) matrix specifying the "value" of all label pairs.

6.1 原理介绍:

输入:

  1. 形状:  预测值   内, 表示这预测每一类的概率,共   个类, 每一个预测 概率  的和为1:  .

  2. 形状:  标签值:  , 是一个整数值,其范围是   表示着在   个类中的索引。

  3. 形状:  (可选) 信息增益矩阵  .作为第三个输入参数,. 如果  , 则它等价于多项式逻辑损失函数

输出:

形状:

计算公式:  , 其中   表示 行   of  .

有了上面的说明,前馈相当于就知道了,反馈的过程如下:


6.2 属性变量:

Blob<Dtype> infogain_;

这个变量也就是前面介绍的H。

6.3 构造函数:

template <typename Dtype>
void InfogainLossLayer<Dtype>::LayerSetUp(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::LayerSetUp(bottom, top);
  if (bottom.size() < 3) {
    CHECK(this->layer_param_.infogain_loss_param().has_source())
        << "Infogain matrix source must be specified.";
    BlobProto blob_proto;
    ReadProtoFromBinaryFile(
      this->layer_param_.infogain_loss_param().source(), &blob_proto);
    infogain_.FromProto(blob_proto);
  }
}

template <typename Dtype>
void InfogainLossLayer<Dtype>::Reshape(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  LossLayer<Dtype>::Reshape(bottom, top);
  Blob<Dtype>* infogain = NULL;
  if (bottom.size() < 3) {
    infogain = &infogain_;
  } else {
    infogain = bottom[2];
  }
  CHECK_EQ(bottom[1]->channels(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  const int num = bottom[0]->num();
  const int dim = bottom[0]->count() / num;
  CHECK_EQ(infogain->num(), 1);
  CHECK_EQ(infogain->channels(), 1);
  CHECK_EQ(infogain->height(), dim);
  CHECK_EQ(infogain->width(), dim);
}


从这里可以看到,当输入的bottom只有两个时,H矩阵是需要被指定的。如果输入bottom有三个时,第三个bottom就作为H矩阵。

6.4 前馈和反馈函数:

前馈:

template <typename Dtype>
void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  const Dtype* bottom_label = bottom[1]->cpu_data();
  const Dtype* infogain_mat = NULL;
  if (bottom.size() < 3) {
    infogain_mat = infogain_.cpu_data();
  } else {
    infogain_mat = bottom[2]->cpu_data();
  }
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  Dtype loss = 0;
  for (int i = 0; i < num; ++i) {
    int label = static_cast<int>(bottom_label[i]);
    for (int j = 0; j < dim; ++j) {
      Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
      loss -= infogain_mat[label * dim + j] * log(prob);
    }
  }
  (*top)[0]->mutable_cpu_data()[0] = loss / num;
}

反馈:

template <typename Dtype>
void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    vector<Blob<Dtype>*>* bottom) {
  if (propagate_down[1]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to label inputs.";
  }
  if (propagate_down.size() > 2 && propagate_down[2]) {
    LOG(FATAL) << this->type_name()
               << " Layer cannot backpropagate to infogain inputs.";
  }
  if (propagate_down[0]) {
    const Dtype* bottom_data = (*bottom)[0]->cpu_data();
    const Dtype* bottom_label = (*bottom)[1]->cpu_data();
    const Dtype* infogain_mat = NULL;
    if (bottom->size() < 3) {
      infogain_mat = infogain_.cpu_data();
    } else {
      infogain_mat = (*bottom)[2]->cpu_data();
    }
    Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
    int num = (*bottom)[0]->num();
    int dim = (*bottom)[0]->count() / (*bottom)[0]->num();
    const Dtype scale = - top[0]->cpu_diff()[0] / num;
    for (int i = 0; i < num; ++i) {
      const int label = static_cast<int>(bottom_label[i]);
      for (int j = 0; j < dim; ++j) {
        Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
        bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
      }
    }
  }
}


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值