Eltwise_layer简介

最新推荐文章于 2019-10-19 16:21:39 发布

知海无涯学无止境

最新推荐文章于 2019-10-19 16:21:39 发布

阅读量3k

点赞数 1

http://www.voidcn.com/blog/thy_2014/article/p-6117416.html

common_layer：

ArgMaxLayer类；

ConcatLayer类：

EltwiseLayer类；

FlattenLayer类；

InnerProductLayer类；

MVNLayer类；

SilenceLayer类；

SoftmaxLayer类，CuDNNSoftmaxLayer类；

SplitLayer类；

SliceLayer类。

呃，貌似就晓得全链接一样！！一个个的来看看这些是可以用在什么地方？

1 ArgMaxLayer：

Compute the index of the @f$ K @f$ max values for each datum across all dimensions @f$ (C \times H \times W) @f$.

Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to true, output is a vector of pairs (max_ind, max_val) for each image.

NOTE: does not implement Backwards operation.

1.1 原理介绍：

在做分类之后，也就是经过全链接层之后，对每组数据计算其最大的前K个值。

感觉上有点像：例如我们在使用caffeNet做预测的时候，通常会输出概率最大的5个值，感觉上就是这个层在起作用。(这句话是乱说的哈，没有得到确认！)

所以也不需要反馈什么的了。

1.2 属性变量：

  bool out_max_val_;
  size_t top_k_;

从下面的构造函数里面可以看到，当out_max_val_赋值为true的时候，输出包括下标和值；赋值为false的时候，就只输出下标。

top_k_的话，用于表明找到前top_k_个最大值吧。

1.3 构造函数：

template <typename Dtype>
void ArgMaxLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  out_max_val_ = this->layer_param_.argmax_param().out_max_val();
  top_k_ = this->layer_param_.argmax_param().top_k();
  CHECK_GE(top_k_, 1) << " top k must not be less than 1.";
  CHECK_LE(top_k_, bottom[0]->count() / bottom[0]->num())
      << "top_k must be less than or equal to the number of classes.";
}

template <typename Dtype>
void ArgMaxLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  if (out_max_val_) {
    // Produces max_ind and max_val
    (*top)[0]->Reshape(bottom[0]->num(), 2, top_k_, 1);
  } else {
    // Produces only max_ind
    (*top)[0]->Reshape(bottom[0]->num(), 1, top_k_, 1);
  }
}

这两个函数没什么好说的嘛，很好理解。只是好像最开始学习使用caffe，并试着训练一些模型，试着写模型的配置文件时，没有用过这个层一样？！

1.4 前馈函数：

template <typename Dtype>
void ArgMaxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  for (int i = 0; i < num; ++i) {
    std::vector<std::pair<Dtype, int> > bottom_data_vector;
    for (int j = 0; j < dim; ++j) {
      bottom_data_vector.push_back(
          std::make_pair(bottom_data[i * dim + j], j));
    }
    std::partial_sort(
        bottom_data_vector.begin(), bottom_data_vector.begin() + top_k_,
        bottom_data_vector.end(), std::greater<std::pair<Dtype, int> >());
    for (int j = 0; j < top_k_; ++j) {
      top_data[(*top)[0]->offset(i, 0, j)] = bottom_data_vector[j].second;
    }
    if (out_max_val_) {
      for (int j = 0; j < top_k_; ++j) {
        top_data[(*top)[0]->offset(i, 1, j)] = bottom_data_vector[j].first;
      }
    }
  }
}

我想可以用下面这样一个图来表述ArgMaxLayer的作用：

这个图的最有端，也表明了其计算过程，所以再去读一下上面的前馈函数，就很容易理解了吧。

2 ConcatLayer：

Takes at least two Blob%s and concatenates them along either the num or channel dimension, outputting the result.

2.1 原理介绍：

前馈：(矩阵合并)

反馈：(矩阵分割)

有没有觉得奇怪，什么地方会用这种层呢？其实至少在google的论文中看到了确实用得上这种层，也就是那个“盗梦空间”结构。

2.2 属性变量：

  Blob<Dtype> col_bob_;
  int count_;
  int num_;
  int channels_;
  int height_;
  int width_;
  int concat_dim_;

其中两个变量不怎么认识：

col_bob_：

concat_dim_：指定在链接Blob时的维度，例如当concat_dim_，表示从第2个维度链接Blob。

其余的几个变量都是比较熟悉了，不过需要注意的是，这里的几个值都是用于设置top层Blob大小的。

2.3 构造函数：

template <typename Dtype>
void ConcatLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  concat_dim_ = this->layer_param_.concat_param().concat_dim();
  CHECK_GE(concat_dim_, 0) <<
    "concat_dim should be >= 0";
  CHECK_LE(concat_dim_, 1) <<
    "For now concat_dim <=1, it can only concat num and channels";
}

template <typename Dtype>
void ConcatLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  // Initialize with the first blob.
  count_ = bottom[0]->count();
  num_ = bottom[0]->num();
  channels_ = bottom[0]->channels();
  height_ = bottom[0]->height();
  width_ = bottom[0]->width();
  for (int i = 1; i < bottom.size(); ++i) {
    count_ += bottom[i]->count();
    if (concat_dim_== 0) {
      num_ += bottom[i]->num();
    } else if (concat_dim_ == 1) {
      channels_ += bottom[i]->channels();
    } else if (concat_dim_ == 2) {
      height_ += bottom[i]->height();
    } else if (concat_dim_ == 3) {
      width_ += bottom[i]->width();
    }
  }
  (*top)[0]->Reshape(num_, channels_, height_, width_);
  CHECK_EQ(count_, (*top)[0]->count());
}

这里在初始化的时候， Reshape() 中，注意到那个for了吧。假设bottom中有K个Blob，链接的维度是1，那么自然top层Blob的channels_维等于bottom中K个channels之和。

2.4 前馈反馈函数：

前馈：

template <typename Dtype>
void ConcatLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  if (concat_dim_== 0) {
    int offset_num = 0;
    for (int i = 0; i < bottom.size(); ++i) {
      const Dtype* bottom_data = bottom[i]->cpu_data();
      int num_elem = bottom[i]->count();
      caffe_copy(num_elem, bottom_data, top_data+(*top)[0]->offset(offset_num));
      offset_num += bottom[i]->num();
    }
  } else if (concat_dim_ == 1) {
    int offset_channel = 0;
    for (int i = 0; i < bottom.size(); ++i) {
      const Dtype* bottom_data = bottom[i]->cpu_data();
      int num_elem =
        bottom[i]->channels()*bottom[i]->height()*bottom[i]->width();
      for (int n = 0; n < num_; ++n) {
        caffe_copy(num_elem, bottom_data+bottom[i]->offset(n),
          top_data+(*top)[0]->offset(n, offset_channel));
      }
      offset_channel += bottom[i]->channels();
    }  // concat_dim_ is guaranteed to be 0 or 1 by LayerSetUp.
  }
}

这里的实现中，算是默认了，链接的维度只可能是第0维和第1维。既然这样的话，Reshape中也没有必要写那么多了嘛。

其它的就相当于是矩阵的拼接。

反馈：

template <typename Dtype>
void ConcatLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  const Dtype* top_diff = top[0]->cpu_diff();
  if (concat_dim_ == 0) {
    int offset_num = 0;
    for (int i = 0; i < bottom->size(); ++i) {
      Blob<Dtype>* blob = (*bottom)[i];
      if (propagate_down[i]) {
        Dtype* bottom_diff = blob->mutable_cpu_diff();
        caffe_copy(blob->count(), top_diff + top[0]->offset(offset_num),
                   bottom_diff);
      }
      offset_num += blob->num();
    }
  } else if (concat_dim_ == 1) {
    int offset_channel = 0;
    for (int i = 0; i < bottom->size(); ++i) {
      Blob<Dtype>* blob = (*bottom)[i];
      if (propagate_down[i]) {
        Dtype* bottom_diff = blob->mutable_cpu_diff();
        int num_elem = blob->channels()*blob->height()*blob->width();
        for (int n = 0; n < num_; ++n) {
          caffe_copy(num_elem, top_diff + top[0]->offset(n, offset_channel),
                     bottom_diff + blob->offset(n));
        }
      }
      offset_channel += blob->channels();
    }
  }  // concat_dim_ is guaranteed to be 0 or 1 by LayerSetUp.
}

同样，反馈的时候，就是矩阵分割的问题。

3 EltwiseLayer：

Compute elementwise operations, such as product and sum, along multiple input Blobs.

3.1 原理介绍：

对多个矩阵之间按元素进行某种操作，通过源码可以看到，一共提供了：乘以，求和，取最大值，三种操作。

前面介绍了那么多前馈和反馈的原理，这里理解起来应该很容易。这里三种操作，分别进行就好了。

3.2 属性变量：

  EltwiseParameter_EltwiseOp op_;
  vector<Dtype> coeffs_;
  Blob<int> max_idx_;

  bool stable_prod_grad_;

既然实现的是多个Blob之间的某种操作，那么自然会定义是什么操作，所以有了变量op_，但是EltwiseParameter_EltwiseOp类型是在什么地方定义的？

coeffs_：该变量的大小应该是和bottom层的Blob个数是相同的，也就是说如果在进行求和的时候，是按照加权求和的。也就是：

其中的 y 和 x_i 都是矩阵，而coeffs_i是一个值。

max_idx_：如果是进行取最大值操作，为了在反馈的时候，能够反馈得回去，所以需要记录最大值来源于哪个Blob。从后面会看到top层的Blob和bottom的Blob尺寸大小是相同的，但是top层只有一个Blob，而bottom层有多个Blob。

stable_prod_grad_：在乘积方式反馈的时候，控制反馈的方式。

3.3 构造函数：

template <typename Dtype>
void EltwiseLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  CHECK(this->layer_param().eltwise_param().coeff_size() == 0
      || this->layer_param().eltwise_param().coeff_size() == bottom.size()) <<
      "Eltwise Layer takes one coefficient per bottom blob.";
  CHECK(!(this->layer_param().eltwise_param().operation()
      == EltwiseParameter_EltwiseOp_PROD
      && this->layer_param().eltwise_param().coeff_size())) <<
      "Eltwise layer only takes coefficients for summation.";
  op_ = this->layer_param_.eltwise_param().operation();
  // Blob-wise coefficients for the elementwise operation.
  coeffs_ = vector<Dtype>(bottom.size(), 1);
  if (this->layer_param().eltwise_param().coeff_size()) {
    for (int i = 0; i < bottom.size(); ++i) {
      coeffs_[i] = this->layer_param().eltwise_param().coeff(i);
    }
  }
  stable_prod_grad_ = this->layer_param_.eltwise_param().stable_prod_grad();
}

template <typename Dtype>
void EltwiseLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  const int num = bottom[0]->num();
  const int channels = bottom[0]->channels();
  const int height = bottom[0]->height();
  const int width = bottom[0]->width();
  for (int i = 1; i < bottom.size(); ++i) {
    CHECK_EQ(num, bottom[i]->num());
    CHECK_EQ(channels, bottom[i]->channels());
    CHECK_EQ(height, bottom[i]->height());
    CHECK_EQ(width, bottom[i]->width());
  }
  (*top)[0]->Reshape(num, channels, height, width);
  // If max operation, we will initialize the vector index part.
  if (this->layer_param_.eltwise_param().operation() ==
      EltwiseParameter_EltwiseOp_MAX && top->size() == 1) {
    max_idx_.Reshape(bottom[0]->num(), channels, height, width);
  }
}

从这里的 Reshape() 中看到，该层的所有输入Blob的尺寸必须相同。

3.4 前馈反馈函数：

前馈：

template <typename Dtype>
void EltwiseLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  int* mask = NULL;
  const Dtype* bottom_data_a = NULL;
  const Dtype* bottom_data_b = NULL;
  const int count = (*top)[0]->count();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  switch (op_) {
  case EltwiseParameter_EltwiseOp_PROD:
    caffe_mul(count, bottom[0]->cpu_data(), bottom[1]->cpu_data(), top_data);
    for (int i = 2; i < bottom.size(); ++i) {
      caffe_mul(count, top_data, bottom[i]->cpu_data(), top_data);
    }
    break;
  case EltwiseParameter_EltwiseOp_SUM:
    caffe_set(count, Dtype(0), top_data);
    // TODO(shelhamer) does BLAS optimize to sum for coeff = 1?
    for (int i = 0; i < bottom.size(); ++i) {
      caffe_axpy(count, coeffs_[i], bottom[i]->cpu_data(), top_data);
    }
    break;
  case EltwiseParameter_EltwiseOp_MAX:
    // Initialize
    mask = max_idx_.mutable_cpu_data();
    caffe_set(count, -1, mask);
    caffe_set(count, Dtype(-FLT_MAX), top_data);
    // bottom 0 & 1
    bottom_data_a = bottom[0]->cpu_data();
    bottom_data_b = bottom[1]->cpu_data();
    for (int idx = 0; idx < count; ++idx) {
      if (bottom_data_a[idx] > bottom_data_b[idx]) {
        top_data[idx] = bottom_data_a[idx];  // maxval
        mask[idx] = 0;  // maxid
      } else {
        top_data[idx] = bottom_data_b[idx];  // maxval
        mask[idx] = 1;  // maxid
      }
    }
    // bottom 2++
    for (int blob_idx = 2; blob_idx < bottom.size(); ++blob_idx) {
      bottom_data_b = bottom[blob_idx]->cpu_data();
      for (int idx = 0; idx < count; ++idx) {
        if (bottom_data_b[idx] > top_data[idx]) {
          top_data[idx] = bottom_data_b[idx];  // maxval
          mask[idx] = blob_idx;  // maxid
        }
      }
    }
    break;
  default:
    LOG(FATAL) << "Unknown elementwise operation.";
  }
}

这里的代码直接看，容易理解。

反馈：

template <typename Dtype>
void EltwiseLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
  const int* mask = NULL;
  const int count = top[0]->count();
  const Dtype* top_data = top[0]->cpu_data();
  const Dtype* top_diff = top[0]->cpu_diff();
  for (int i = 0; i < bottom->size(); ++i) {
    if (propagate_down[i]) {
      const Dtype* bottom_data = (*bottom)[i]->cpu_data();
      Dtype* bottom_diff = (*bottom)[i]->mutable_cpu_diff();
      switch (op_) {
      case EltwiseParameter_EltwiseOp_PROD:
        if (stable_prod_grad_) {
          bool initialized = false;
          for (int j = 0; j < bottom->size(); ++j) {
            if (i == j) { continue; }
            if (!initialized) {
              caffe_copy(count, (*bottom)[j]->cpu_data(), bottom_diff);
              initialized = true;
            } else {
              caffe_mul(count, (*bottom)[j]->cpu_data(), bottom_diff,
                        bottom_diff);
            }
          }
        } else {
          caffe_div(count, top_data, bottom_data, bottom_diff);
        }
        caffe_mul(count, bottom_diff, top_diff, bottom_diff);
        break;
      case EltwiseParameter_EltwiseOp_SUM:
        if (coeffs_[i] == Dtype(1)) {
          caffe_copy(count, top_diff, bottom_diff);
        } else {
          caffe_cpu_scale(count, coeffs_[i], top_diff, bottom_diff);
        }
        break;
      case EltwiseParameter_EltwiseOp_MAX:
        mask = max_idx_.cpu_data();
        for (int index = 0; index < count; ++index) {
          Dtype gradient = 0;
          if (mask[index] == i) {
            gradient += top_diff[index];
          }
          bottom_diff[index] = gradient;
        }
        break;
      default:
        LOG(FATAL) << "Unknown elementwise operation.";
      }
    }
  }
}

反馈中的求和反馈，取最大值的反馈，都还是很好理解。

乘积的反馈好像有点怪怪的。首先来看看成绩反馈时的基本原理：

所以直接使用top_data/bottom_data再乘以top_diff，这个是很好理解的。

可是源代码中提供了两种方式：

第1中方式是：计算mul(x_i)，i=0...k-1且i != j

第2种方式就是：top_data/bottom_data

只要数据不是很多0，结果应该是差不多的，那么为什么会用这两种方式呢？不理解。

4 FlattenLayer：

知海无涯学无止境

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Eltwise_layer简介

http://www.voidcn.com/blog/thy_2014/article/p-6117416.htmlcommon_layer：ArgMaxLayer类；ConcatLayer类：EltwiseLayer类；FlattenLayer类；InnerProductLayer类；MVNLayer类；SilenceLayer
复制链接

扫一扫