【视频语义分割】——Semantic Video CNNs through Representation Warping，ICCV2017

最新推荐文章于 2021-12-14 09:23:59 发布

农夫山泉2号

最新推荐文章于 2021-12-14 09:23:59 发布

阅读量696

点赞数

分类专栏：语义分割文章标签：深度学习神经网络语义分割 warp

本文链接：https://blog.csdn.net/u011622208/article/details/120182607

版权

语义分割专栏收录该内容

45 篇文章 14 订阅

订阅专栏

视频语义分割
code

1. 原理

转载自:https://zhuanlan.zhihu.com/p/52014957
在这里插入图片描述

图3 模型结构

这篇文章提出了一个叫做Netwarp的结构，它的主要作用是利用光流把前一帧的特征搬移到当前帧，进而起到一定程度上特征增强的作用，其中光流定义为两张图像之间对应像素移动的向量，这个结构可以插入到video 的帧与帧之间（如图3所示）。

在这里插入图片描述

图4 Netwarp模块

模块的具体操作如图4所示，模型的输入是两张连续的帧，(t-1)代表前一帧，t代表当前帧，第一步是计算两帧之间的光流F(t)，这里的光流计算是采用offline的形式，即每个光流是提前计算好的，具体的光流的计算方法为Dis-Flow。接着把光流和两帧图像送入到一个叫做Transform Flow的模块中，这个模块是有小的全卷积网络模块组成，其设计目的是用图像信息来补充光流信息（如下图5，可以看出transflow除了运动信息之外还包含了物体的细节信息），之后用transform的flow再把前一帧的特征warp到当前帧（warp的具体实现是采用双线性差值操作是根据当前帧的特征点由光流信息找到对应的前一帧的特征点，再把特征点拿过来，之后介绍很多工作都会用到这种操作）。最后结合当前帧和之前帧的信息得到最终的特征表示。最终的实验结果在以PSP net上为基础网络的基础上可以提高一定的性能。

2. warp代码

这里我们看代码的warp部分。其核心就是在双线性插值的基础上加上光流的偏移量。
h + bottom_1_data_[ index_x ]

template <typename Dtype>
void WarpLayer<Dtype>::Forward_cpu(
    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_0_data_ = bottom[0]->cpu_data();    // t-1时刻的特征
  const Dtype* bottom_1_data_ = bottom[1]->cpu_data();    // 光流信息
  Dtype* top_data = top[0]->mutable_cpu_data();
  Dtype* theta_data = theta.mutable_cpu_data();
  Dtype* theta_data_ = theta_.mutable_cpu_data();
  Dtype* x_w_data = x_w.mutable_cpu_data();
  caffe_set(bottom[0]->count(), (Dtype)0., top_data);
  caffe_set(bottom[1]->count(), (Dtype)0., theta_data);
  caffe_set(bottom[1]->count(), (Dtype)0., theta_data_);
  caffe_set(bottom[1]->count(), (Dtype)0., x_w_data);
  for (int n=0; n<num_; n++) {
    for (int c=0; c<channels_; c++) {
      for (int h=0; h<height_; h++) {
        for (int w=0; w<width_; w++) {
          int index_x = ((n * 2 + 1) * height_ + h) * width_ + w;     // 某点光流的 x 
          int index_y = ((n * 2 + 0) * height_ + h) * width_ + w;     // 某点光流的 y
          x_w_data[ index_x ] = h + bottom_1_data_[ index_x ];// + 0.00000001;    // 在双线性的基础上加上光流信息
          x_w_data[ index_y ] = w + bottom_1_data_[ index_y ];// + 0.00000001;
          int xw_floor = (int)floor(x_w_data[ index_x ]);
          int yw_floor = (int)floor(x_w_data[ index_y ]);
          int xw_ceil = (int)ceil(x_w_data[ index_x ]);
          int yw_ceil = (int)ceil(x_w_data[ index_y ]);
          theta_data[ index_x ] = x_w_data[ index_x ] - floor(x_w_data[ index_x ]);
          theta_data[ index_y ] = x_w_data[ index_y ] - floor(x_w_data[ index_y ]);
          if (outliers_ == WarpParameter_WarpType_NEAREST) {
            if (x_w_data[ index_x ] < 0) {
              theta_data[ index_x ] = x_w_data[ index_x ];
              xw_floor = 0; xw_ceil = 0;
            } 
            if (x_w_data[ index_x ] >= height_-1) {
              theta_data[ index_x ] = x_w_data[ index_x ] - height_;
              xw_floor = height_-1; xw_ceil = height_-1;
            }
            if (x_w_data[ index_y ] < 0) {
              theta_data[ index_y ] = x_w_data[ index_y ];
              yw_floor = 0; yw_ceil = 0;
            }
            if (x_w_data[ index_y ] >= width_-1) {
              theta_data[ index_y ] = x_w_data[ index_y ] - width_;
              yw_floor = width_-1; yw_ceil = width_-1;
            }
          }
          theta_data_[ index_x ] = 1 - theta_data[ index_x ];
          theta_data_[ index_y ] = 1 - theta_data[ index_y ];
          int offset = (n * channels_ + c) * height_;

          if (!(outliers_ == WarpParameter_WarpType_TRUNCATE && 
                (x_w_data[ index_x ] < 0 || 
                 x_w_data[ index_x ] > height_-1 || 
                 x_w_data[ index_y ] < 0 || 
                 x_w_data[ index_y ] > width_-1))) {
            Dtype I0 = bottom_0_data_[ (offset + xw_floor) * width_ + yw_floor ];     // 4 个点的值
            Dtype I1 = bottom_0_data_[ (offset + xw_ceil ) * width_ + yw_floor ]; 
            Dtype I2 = bottom_0_data_[ (offset + xw_floor) * width_ + yw_ceil ]; 
            Dtype I3 = bottom_0_data_[ (offset + xw_ceil ) * width_ + yw_ceil ];
            top_data[ (offset +  h) * width_ +  w ] = (theta_data_[index_x] * theta_data_[index_y] * I0) + 
                                                      (theta_data[index_x]  * theta_data_[index_y] * I1) + 
                                                      (theta_data_[index_x] * theta_data[index_y]  * I2) + 
                                                      (theta_data[index_x]  * theta_data[index_y]  * I3);     // 插值计算
          }
        }
      }
    }
  }
}

这里的思想和SFNet有类似的地方。如果在pytorch中可以用grid_sample实现。

3. reference

双线性插值的推导

农夫山泉2号

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【视频语义分割】——Semantic Video CNNs through Representation Warping，ICCV2017

视频语义分割code1. 原理转载自:https://zhuanlan.zhihu.com/p/52014957 图3 模型结构这篇文章提出了一个叫做Netwarp的结构，它的主要作用是利用光流把前一帧的特征搬移到当前帧，进而起到一定程度上特征增强的作用，其中光流定义为两张图像之间对应像素移动的向量，这个结构可以插入到video 的帧与帧之间（如图3所示）。图4 Netwarp模块模块的具体操作如图4所示，模型的输入是两张连续的帧，(t-1)代表前一帧，t代表当前帧，第一步是计算两帧.
复制链接

扫一扫