CAFFE CONTRASTIVE_LOSS 对比损失函数

最新推荐文章于 2024-12-13 21:19:35 发布

deep_learninger

最新推荐文章于 2024-12-13 21:19:35 发布

阅读量1.1w

点赞数 1

分类专栏： c++

本文链接：https://blog.csdn.net/u014114990/article/details/47974031

版权

c++ 专栏收录该内容

47 篇文章 3 订阅

订阅专栏

caffe的损失函数，目前已经囊括了所有可以用的了吧，损失函数由激活函数决定，同时有时会加入regularization,在BP过程中，使得误差传递得以良好运行。

一、

contrastive_loss，对应contrastive_loss_layer，我看了看代码，这个应该是输入是一对用来做验证的数据，比如两张人脸图，可能是同一个人的（正样本），也可能是不同个人（负样本）。在caffe的examples中，siamese这个例子中，用的损失函数是该类型的。

该损失函数具体数学表达形式：

二、

euclidean_loss，对应euclidean_loss_layer,该损失函数就是loss=(y-f(wx))^2

hinge_loss，对应hinge_loss_layer，该损失函数就是loss=(0,)

infogain_loss，对应infogain_loss_layer，损失函数表达式：

multinomial_logistic_loss，对应multinomial_logistic_loss_layer，损失函数表达式：

sigmoid_cross_entropy，对应sigmoid_cross_entropy_loss_layer,损失函数表达式：

softmax_loss,对应softmax_loss_layer，损失函数表达式：

三、

对比损失函数（Contrastive loss）

输入：

形状： (N×C×1×1) 特征 a∈[−∞,+∞]

形状： (N×C×1×1) 特征 b∈[−∞,+∞]

形状： (N×1×1×1) 相似性 y∈[0,1]

输出：

形状： (1×1×1×1)

对比损失函数为: E=12N∑n=1N(y)d+(1−y)max(margin−d,0)

其中 d=||an−bn||22 .

适合场景：

可以用来训练Siamese网络

#include <algorithm>
#include <vector>

#include "caffe/layer.hpp"
#include "caffe/loss_layers.hpp"
#include "caffe/util/io.hpp"
#include "caffe/util/math_functions.hpp"

// Computes the contrastive loss ,loss = [sum(y)*d + max(margin - d,0)]/(2*N)
// where d = |a_i-b_i|. Detailed reference to the official document:
// http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ContrastiveLossLayer.html

namespace caffe {

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::LayerSetUp(
  const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  LossLayer<Dtype>::LayerSetUp(bottom, top);
  CHECK_EQ(bottom[0]->channels(), bottom[1]->channels());
  CHECK_EQ(bottom[0]->height(), 1);
  CHECK_EQ(bottom[0]->width(), 1);
  CHECK_EQ(bottom[1]->height(), 1);
  CHECK_EQ(bottom[1]->width(), 1);
  CHECK_EQ(bottom[2]->channels(), 1);
  CHECK_EQ(bottom[2]->height(), 1);
  CHECK_EQ(bottom[2]->width(), 1);
  diff_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  diff_sq_.Reshape(bottom[0]->num(), bottom[0]->channels(), 1, 1);
  dist_sq_.Reshape(bottom[0]->num(), 1, 1, 1);
  // vector of ones used to sum along channels
  summer_vec_.Reshape(bottom[0]->channels(), 1, 1, 1);
  for (int i = 0; i < bottom[0]->channels(); ++i)
    summer_vec_.mutable_cpu_data()[i] = Dtype(1);
}

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  int count = bottom[0]->count();
  // diff_ = a_i-b_i

  // d 
  caffe_sub(
      count,
      bottom[0]->cpu_data(),  // a
      bottom[1]->cpu_data(),  // b
      diff_.mutable_cpu_data());  // a_i-b_i
  const int channels = bottom[0]->channels();
  // margin for dissimilar pair, default = 1.0
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  // legacy_version = false (the default) uses (margin - d)^2 
  // legacy_version = true uses (margin - d^2). This is kept to support 
  // reproduce existing models and results
  bool legacy_version =
      this->layer_param_.contrastive_loss_param().legacy_version();
  Dtype loss(0.0);
  for (int i = 0; i < bottom[0]->num(); ++i) {

    // d 点乘
    dist_sq_.mutable_cpu_data()[i] = caffe_cpu_dot(channels,
        diff_.cpu_data() + (i*channels), diff_.cpu_data() + (i*channels));
    if (static_cast<int>(bottom[2]->cpu_data()[i])) {  // similar pairs
      loss += dist_sq_.cpu_data()[i];
    } else {  // dissimilar pairs
	  // if legacy_version = true ,uses (margin - d^2), else uses (margin - d)^2 
      if (legacy_version) {
        loss += std::max(margin - dist_sq_.cpu_data()[i], Dtype(0.0));
      } else {
        Dtype dist = std::max(margin - sqrt(dist_sq_.cpu_data()[i]), 0.0);
        loss += dist*dist;
      }
    }
  }
  loss = loss / static_cast<Dtype>(bottom[0]->num()) / Dtype(2);
  top[0]->mutable_cpu_data()[0] = loss;
}

template <typename Dtype>
void ContrastiveLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
  Dtype margin = this->layer_param_.contrastive_loss_param().margin();
  bool legacy_version =
      this->layer_param_.contrastive_loss_param().legacy_version();
  for (int i = 0; i < 2; ++i) {
    if (propagate_down[i]) {
      const Dtype sign = (i == 0) ? 1 : -1;
      const Dtype alpha = sign * top[0]->cpu_diff()[0] /
          static_cast<Dtype>(bottom[i]->num());
      int num = bottom[i]->num();
      int channels = bottom[i]->channels();
      for (int j = 0; j < num; ++j) {
        Dtype* bout = bottom[i]->mutable_cpu_diff();
        if (static_cast<int>(bottom[2]->cpu_data()[j])) {  // similar pairs
          caffe_cpu_axpby(
              channels,
              alpha,
              diff_.cpu_data() + (j*channels),
              Dtype(0.0),
              bout + (j*channels));
        } else {  // dissimilar pairs
          Dtype mdist(0.0);
          Dtype beta(0.0);
          if (legacy_version) {
            mdist = margin - dist_sq_.cpu_data()[j];
            beta = -alpha;
          } else {
            Dtype dist = sqrt(dist_sq_.cpu_data()[j]);
            mdist = margin - dist;
            beta = -alpha * mdist / (dist + Dtype(1e-4));
          }
          if (mdist > Dtype(0.0)) {
            caffe_cpu_axpby(
                channels,
                beta,
                diff_.cpu_data() + (j*channels),
                Dtype(0.0),
                bout + (j*channels));
          } else {
            caffe_set(channels, Dtype(0), bout + (j*channels));
          }
        }
      }
    }
  }
}

#ifdef CPU_ONLY
STUB_GPU(ContrastiveLossLayer);
#endif

INSTANTIATE_CLASS(ContrastiveLossLayer);
REGISTER_LAYER_CLASS(ContrastiveLoss);

}  // namespace caffe

从程序中可以看到，代码不是按照上边的那个代价函数写的，这可能是 caffe 一种优化方法，这种代价函数可能效果更好。

从代码中可以看出

loss = d + max(margin - d)^2 或 + (margin - d^2)

成员函数，见的最多的有：
const Dtype* cpu_data() const; //cpu使用的数据
void set_cpu_data(Dtype* data);//用数据块的值来blob里面的data。
const Dtype* gpu_data() const;//返回不可更改的指针，下同
const Dtype* cpu_diff() const;
const Dtype* gpu_diff() const;
Dtype* mutable_cpu_data();//返回可更改的指针，下同
Dtype* mutable_gpu_data();
Dtype* mutable_cpu_diff();
Dtype* mutable_gpu_diff();
总之，带mutable_开头的意味着可以对返回的指针内容进行更改，而不带mutable_开头的返回const 指针，不能对其指针的内容进行修改，

很多程序中用到 mutable_cpu_data();//返回可更改的指针和 cpu_data() const; //cpu使用的数据
其实这两个返回的都是 cpu_data。一个是返回可以更改的指针，一个返回的是不可更改的指针
这里体现作者编程技巧的地方。
有的数据比如读取的数据，这些在运行过程中不能让他发生变化，所以定义了const 类型，但是定义他的指针就必须定义为const 类型的指针，但是很多时候，读取完数据后，下面要进行很多操作。这里就需要改变他的值大小。所以再去定义一个 mutable类型的指针，这个指向原来的数据，这体现了作者编程能力