深度哈希-DSH

最新推荐文章于 2022-11-18 20:56:41 发布

maocaisheng

最新推荐文章于 2022-11-18 20:56:41 发布

阅读量8k

点赞数 2

分类专栏：图像检索论文阅读文章标签：深度哈希

本文链接：https://blog.csdn.net/u012938704/article/details/60776220

版权

图像检索同时被 2 个专栏收录

13 篇文章

订阅专栏

论文阅读

12 篇文章

订阅专栏

论文：Deep Supervised Hashing for Fast Image Retrieval CVPR2016
源代码： https://github.com/lhmRyan/deep-supervised-hashing-DSH
论文网络结构似乎是CIFAR-10和Siamese两种网络的结合：
这里写图片描述
个人认为有两个创新点：
1、通过设计损失函数，使得最后一层的输出Binary-like。
论文修改了ContrastIve Loss，加上正则项：

这个正则项会使得最终输出的特征b1，b2的取值趋向于-1和+1。

2、generate image pairs online
这里一开始以为使用的是类似Siamese的双分支的网络结构。实则不然，只是如图所示的单支网络，重点是在作者设计的损失函数中。网络的训练是Batch为单位的，比如说每次输入n张图片，对应n个标签，在Loss函数中，使用两层循环来不重复的生成所有可能的图片对（i，j），共 $\frac {n(n-1)} 2$ 对，当i，j的标签相同即认为相似。这样做的好处自然是节省了很多存储空间和计算成本。文中说：To cover those image pairs across batches, in each iteration the training images are randomly selected from the whole training set.

void HashingLossLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  // initialize parameters
  Dtype* bout = bottom[0]->mutable_cpu_diff();//前向传播计算Loss的同时计算梯度
  const int num = bottom[0]->num();
  const Dtype alpha = top[0]->cpu_diff()[0] / static_cast<Dtype>(num * (num - 1));//top[0]->cpu_diff()[0]保存的是该损失层的权重，默认为1.0
  const Dtype beta = top[0]->cpu_diff()[0] / static_cast<Dtype>(num);
  const int channels = bottom[0]->channels();
  Dtype margin = this->layer_param_.hashing_loss_param().bi_margin();
  Dtype tradeoff = this->layer_param_.hashing_loss_param().tradeoff();//两种损失（ContrastiveLoss与RegularizationLoss）权衡系数
  const int label_num = bottom[1]->count() / num;
  bool sim;
  Dtype loss(0.0);//总损失
  Dtype reg(0.0);//正则化损失
  Dtype data(0.0);//输入向量每维的值
  Dtype dist_sq(0.0);//两向量距离的平方
  caffe_set(channels*num, Dtype(0), bout);
  // calculate loss and gradient
  for (int i = 0; i < num; ++i) {
    for (int j=i+1; j < num; ++j){
      caffe_sub(
    channels,
    bottom[0]->cpu_data()+(i*channels),  // a
    bottom[0]->cpu_data()+(j*channels),  // b
    diff_.mutable_cpu_data());  // a_i-b_j
      dist_sq = caffe_cpu_dot(channels, diff_.cpu_data(), diff_.cpu_data());  //D_w^2
      if (label_num > 1) {  //多标签
        sim = caffe_cpu_dot(label_num, bottom[1]->cpu_data() + (i * label_num), bottom[1]->cpu_data() + (j * label_num)) > 0;
      }
      else {
    sim = ((static_cast<int>(bottom[1]->cpu_data()[i])) == (static_cast<int>(bottom[1]->cpu_data()[j])));
      }
      if (sim) {  // similar pairs
        loss += dist_sq;
        // gradient with respect to the first sample
    caffe_cpu_axpby(
          channels,
          alpha,
          diff_.cpu_data(),
          Dtype(1.0),
          bout + (i*channels));//计算损失函数对输入向量i的梯度
        // gradient with respect to the second sample
        caffe_cpu_axpby(
          channels,
          -alpha,
          diff_.cpu_data(),
          Dtype(1.0),
          bout + (j*channels));//计算损失函数对输入向量j的梯度
      } 
      else {  // dissimilar pairs
        loss += std::max(margin - dist_sq, Dtype(0.0));
        if ((margin-dist_sq) > Dtype(0.0)) {
          // gradient with respect to the first sample
          caffe_cpu_axpby(
            channels,
            -alpha,
            diff_.cpu_data(),
            Dtype(1.0),
            bout + (i*channels));//计算损失函数对输入向量i的梯度
          // gradient with respect to the second sample
          caffe_cpu_axpby(
            channels,
            alpha,
            diff_.cpu_data(),
            Dtype(1.0),
            bout + (j*channels));//计算损失函数对输入向量j的梯度
        }
      }
    }//内层循环结束
    //只针对一个输入向量而言
    for (int k = 0; k < channels;k++){
      data = *(bottom[0]->cpu_data()+(i*channels)+k);
      // gradient corresponding to the regularizer
      //正则化部分的梯度
      *(bout + (i*channels) + k) += beta * tradeoff * (((data>=Dtype(1.0))||(data<=Dtype(0.0)&&data>=Dtype(-1.0)))?Dtype(1.0):Dtype(-1.0));
      data = std::abs(data)-1;
      reg += std::abs(data);//正则化部分的损失
    }
  }//外层循环结束
  //将两段损失各自取平均，然后相加
  loss = loss / static_cast<Dtype>(bottom[0]->num()*(bottom[0]->num()-1));
  loss += tradeoff * reg /static_cast<Dtype>(bottom[0]->num());
  top[0]->mutable_cpu_data()[0] = loss;
}

这里写图片描述
可以看到Online方式相比传统的方式收敛速度更快。这是由于每次输入到网络的图片总数目一致的情况下，比如2n张图片，Online方式可以给出 $\frac {2n(2n-1)} 2$ 个图片对的相关信息，而Offline（以 $（i，j，s_{ij}）$ 方式输入)只能给出n对的相关信息。