RecoNet，3D attention的新方案

最新推荐文章于 2024-06-12 09:49:18 发布

A霸天下

最新推荐文章于 2024-06-12 09:49:18 发布

阅读量2.6k

点赞数 3

分类专栏： attention 图像分割人工智能

本文链接：https://blog.csdn.net/qq_43534932/article/details/107993814

版权

人工智能同时被 3 个专栏收录

25 篇文章 1 订阅

订阅专栏

图像分割

8 篇文章 1 订阅

订阅专栏

attention

4 篇文章 0 订阅

订阅专栏

前言

近几年attention用在图像识别，分割等领域上越来越多了，从去年的GCnet，CCnet，再到Dual attention,大多数都是引入attention，进行相应的变种，但是他们的共同点都是把一个维度为**[batch_size,H,W,C]的tensor转化为[batch_size,HW,C]，然后进行各种操作，如下图所示。这种转化是否会带来Channel上面的信息损失，Tensor Low-Rank Reconstruction for Semantic Segmentation提出了一种新的方案RecoNet，直接进行3D attention，这种方法，计算量少，并且保护了Channel上的信息。

网络如图所示，其主要的思想非常简单易懂，就是先把特征都拆成一维数据，分别进行处理，其中文章中提出了最主要的两个模块Tensor Generation Module (TGM) and Tensor Reconstruction Module (TRM).**

其主要的公式可以表示为

首先来拆分一下TGM，其主要结构如下图所示：

分别对输入张量的三个维度上进行GP,之后通过一个11的卷积，以及sigmoid激活函数，得到类向量，其中每个向量个数为r。
TRM图上比较生涩难懂，其实就是公式1的一个操作，三个向量对应相乘，所得的结果相加。
在这里插入图片描述

代码分析

def TGM_TRM(x,Rank):
    x_height=tf.transpose(x,(0,3,2,1))
    print(x_height)
    x_width=tf.transpose(x,(0,1,3,2))
    print(x_width)
    x_channel=x

首先输入有两个，一个是特征x，一个是秩Rank。为了各个维度进行GP，我们需要进行转置一下。
TGM

 ######################TGM
    height_pooling=tf.keras.layers.GlobalAvgPool2D()(x_height)
    width_pooling=tf.keras.layers.GlobalAvgPool2D()(x_width)
    channel_pooling=tf.keras.layers.GlobalAvgPool2D()(x_channel)

    height_pooling=tf.reshape(height_pooling,[-1,height_pooling.get_shape().as_list()[1],1,1])
    width_pooling=tf.reshape(width_pooling,[-1,1,width_pooling.get_shape().as_list()[1],1])
    channel_pooling=tf.reshape(channel_pooling,[-1,1,1,channel_pooling.get_shape().as_list()[1]])
    
    height_feature=tf.sigmoid(tf.layers.conv2d(height_pooling,Rank,1,strides=1, padding='same'))
    width_feature=tf.sigmoid(tf.layers.conv2d(width_pooling,Rank,1,strides=1, padding='same'))
    channel_feature=tf.sigmoid(tf.layers.conv2d(channel_pooling,Rank*channel_pooling.get_shape().as_list()[-1],1,strides=1, padding='same')

经历完GP后维度变小，要进行相关的reshape，然后我们再过rank个卷积，这里面有个讨巧的方法，对于前height和width，我们直接采用了一个11，rank维度的卷积，对于channel我们采用了channel.shape大小rank的卷积。
TGM

 ######################TGM
    height_pooling=tf.keras.layers.GlobalAvgPool2D()(x_height)
    width_pooling=tf.keras.layers.GlobalAvgPool2D()(x_width)
    channel_pooling=tf.keras.layers.GlobalAvgPool2D()(x_channel)

    height_pooling=tf.reshape(height_pooling,[-1,height_pooling.get_shape().as_list()[1],1,1])
    width_pooling=tf.reshape(width_pooling,[-1,1,width_pooling.get_shape().as_list()[1],1])
    channel_pooling=tf.reshape(channel_pooling,[-1,1,1,channel_pooling.get_shape().as_list()[1]])
    
    height_feature=tf.sigmoid(tf.layers.conv2d(height_pooling,Rank,1,strides=1, padding='same'))
    width_feature=tf.sigmoid(tf.layers.conv2d(width_pooling,Rank,1,strides=1, padding='same'))
    channel_feature=tf.sigmoid(tf.layers.conv2d(channel_pooling,Rank*channel_pooling.get_shape().as_list()[-1],1,strides=1, padding='same')

这个就很好理解了，按rank相乘，再乘以factor。

A霸天下

关注

3
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
RecoNet，3D attention的新方案

前言近几年attention用在图像识别，分割等领域上越来越多了，从去年的GCnet，CCnet，再到Dual attention,大多数都是引入attention，进行相应的变种，但是他们的共同点都是把一个维度为**[batch_size,H,W,C]的tensor转化为[batch_size,HW,C]，然后进行各种操作，如下图所示。这种转化是否会带来Channel上面的信息损失，Tensor Low-Rank Reconstruction for Semantic Segmentation提出了一种
复制链接

扫一扫

专栏目录