深度学习图像分割方法U-Net改进之Attention U-Net

无情滴怪蜀黍

已于 2023-06-10 16:45:42 修改

阅读量4.9k

点赞数 9

分类专栏：图像分割文章标签：深度学习人工智能 python

于 2023-06-10 16:45:32 首次发布

本文链接：https://blog.csdn.net/weixin_63694345/article/details/131143279

版权

图像分割专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1、Introduce

2、 Attention U-Net

2.1 structure

2.2 Attention module

3、Reference

1、Introduce

注意力机制是一种基于权重的模型，其作用是让深度学习模型能够更加集中地关注当前输入数据中最具有代表性和区分性的部分，从而提高模型的分类精度和泛化性能。

注意力机制在深度学习中广泛应用于自然语言处理、计算机视觉等领域。例如，在自然语言处理任务中，如机器翻译或文本摘要任务中，注意力机制可以帮助模型专注于输入序列中与预测结果最相关的内容，从而提高模型的翻译或摘要质量。再比如，在图像问答(QA)任务中，注意力机制可以对原始图像像素进行加权，以聚焦于图像中最相关的区域，从而对图片问题作出正确回答。

总的来说，注意力机制能够显著提高深度学习模型的表现，并在许多自然语言处理和计算机视觉领域的任务中取得了不错的效果。

2 、Attention U-Net

2.1 structure

Attention U-Net是基于U-Net模型结构的变体，其增加了注意力机制来提高模型在图像分割任务中的性能。与传统的U-Net模型相比，注意力U-Net包括了编码器、解码器和跳跃连接等常见的模块，但在解码器部分引入了注意力机制。

U-Net结构

Attention U-Net结构

2.2 Attention module

具体来说，注意力U-Net在每个解码器层中都添加了一个注意力子模块，以帮助模型更准确地学习如何区分前景和背景。该子模块利用了一个注意力门控网络，它可以自动地对前景和背景进行建模，并计算不同位置处的像素应该被赋予的权重，从而使模型可以更加聚焦与所关注的区域，提高分割质量。具体地，注意力U-Net中的注意力子模块由三个组成部分组成：查询嵌入(即解码器的特征向量)，键嵌入(即编码器的特征向量)和值嵌入(即加权后编码器特征向量)。其中，查询嵌入和键嵌入都采用卷积神经网络从当前解码器层和编码器各自的最大池化层输出计算得到；然后对键嵌入和查询嵌入进行相关运算，得到对应的权重矩阵；再将设置得之权重矩阵与值嵌入相乘得到加权编码器特征向量。

Attention 结构

Attention 结构图中尤其要注意的是X为解码器的倒数第二层，g为完成上采样的解码器倒数第一层。只有完成上采样后的层才能与X完成后续的加权操作。为了便于理解直接上代码：

在注意力子模块输出后，其与解码器层的输出进行加和。最后通过卷积操作得到最终分割结果。实验证明，引入该注意力子模块可以显著提高模型在图像分割任务中的性能。

def attention_gate(X, g, channel,  
                   activation='ReLU', 
                   attention='add', name='att'):
    '''
    Self-attention gate modified from Oktay et al. 2018.
    
    attention_gate(X, g, channel,  activation='ReLU', attention='add', name='att')
    
    Input
    ----------
        X: input tensor, i.e., key and value.
        g: gated tensor, i.e., query.
        channel: number of intermediate channel.
                 Oktay et al. (2018) did not specify (denoted as F_int).
                 intermediate channel is expected to be smaller than the input channel.
        activation: a nonlinear attnetion activation.
                    The `sigma_1` in Oktay et al. 2018. Default is 'ReLU'.
        attention: 'add' for additive attention; 'multiply' for multiplicative attention.
                   Oktay et al. 2018 applied additive attention.
        name: prefix of the created keras layers.
        
    Output
    ----------
        X_att: output tensor.
    


    1. 将输入张量 X 通过一个卷积层映射到中间层 theta_att。
    2. 将门控张量 g 通过一个卷积层映射到中间层 phi_g。
    3. 使用指定的注意力机制（additive 或 multiplicative）根据 theta_att 和 phi_g 计算 Q，即 query。
    4. 对 Q 进行指定的非线性激活函数处理（如 ReLU），得到 f。
    5. 将 f 经过卷积层转换到输出通道数为 1 的 tensor psi_f。
    6. 对 psi_f 输出进行 sigmoid 激活操作，得到注意力系数 coef_att。
    7. 对输入张量 X 乘以 coef_att 得到加权后的张量 X_att。返回 X_att。
    '''

    activation_func = eval(activation)
    attention_func = eval(attention)
    
    # mapping the input tensor to the intermediate channel
    # x输入完成卷积操作
    theta_att = Conv2D(channel, 1, use_bias=True, name='{}_theta_x'.format(name))(X)
    
    # mapping the gate tensor
    # g输入完成卷积操作
    phi_g = Conv2D(channel, 1, use_bias=True, name='{}_phi_g'.format(name))(g)
    
    # ----- attention learning ----- #
    # 将[conv—x，conx-g] -> add
    query = attention_func([theta_att, phi_g], name='{}_add'.format(name))
    
    # nonlinear activation
    # 对 Q 进行指定的非线性激活函数处理（如 ReLU），得到 f。
    f = activation_func(name='{}_activation'.format(name))(query)
    
    # linear transformation
    # 将 f 经过卷积层转换到输出通道数为 1 的 tensor psi_f。
    psi_f = Conv2D(1, 1, use_bias=True, name='{}_psi_f'.format(name))(f)
    # ------------------------------ #
    
    # sigmoid activation as attention coefficients
    # 对psi_f输出进行sigmoid激活操作，得到注意力系数coef_att。
    coef_att = Activation('sigmoid', name='{}_sigmoid'.format(name))(psi_f)
    
    # multiplicative attention masking
    # 对输入张量 X 乘以 coef_att 得到加权后的张量 X_att。返回 X_att。
    X_att = multiply([X, coef_att], name='{}_masking'.format(name))
    
    return X_att