BiLSTM_Attention中的Attention

最新推荐文章于 2024-06-06 17:14:26 发布

BeKnown

最新推荐文章于 2024-06-06 17:14:26 发布

阅读量2.8k

点赞数 1

分类专栏： Python 算法

本文链接：https://blog.csdn.net/m0_37953759/article/details/110652856

版权

Python 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

算法

2 篇文章 0 订阅

订阅专栏

网上都是如下的模型结构图
在这里插入图片描述
从上图只能看出Attention层的位置是在BiLSTM的后面，从公式更直观一些：

通过一个MLP获取隐含表示
$u_{it}=tanh(W_wh_{it}+b_w)$
其中 $h_{it}$ 为LSTM层的输出，这个全连接层结构 $W_wh_{it}+b_w$ 并不会改变其维度。
矩阵表示为 $u = t a n h (W h + b)$ ，其维度变化 $\in \mathbb{R}^{d \times T}$ ， $\in \mathbb{R}^{d \times d}$ ， $\in \mathbb{R}^{d \times T}$
通过softmax归一化的权重
$\alpha_{it} = \frac{exp(\mathbf{u_{it}}^\mathrm{T}u_w)}{\Sigma_{t}exp(\mathbf{u_{it}}^\mathrm{T}u_w)}$
矩阵表示 $\alpha = softmax(uu_{w})$ ， $u_{it}^\mathrm{T}u_w$ 是Attention的核心，其中 $u_w \in \mathbb{R}^{d\times 1}$ 是需要学习的参数，因此 $\alpha$ 的维度为 $\mathbb{R}^{T\times 1}$ ，这里的softmax是对整个序列的归一化
计算Attention score
$s_i=\Sigma_{t}\alpha_{it}h_{it}$
矩阵表示 $s=\Sigma \alpha \cdot h$ ，其中 $\in \mathbb{R}^{d\times 1}$ 。注意，先点乘，再对 $T$ 的维度求和！

上面的公式计算过程，通过代码的方式来表示，有以下两种方式：
代码实现1：

import tensorflow as tf

def attention(inputs):
    # inputs维度为：[B, T, D]
    hidden_size = inputs.shape[2].value
    u_omega = tf.get_variable("u_omega", [hidden_size], initializer=tf.keras.initializers.glorot_normal())  #[D, 1]

    with tf.name_scope('v'):
        v = tf.tanh(inputs) # [B, T, D]

    vu = tf.tensordot(v, u_omega, axes=1, name='vu')  #[B,T,D]*[D,1]->[B, T, 1]->[B, T]
    alphas = tf.nn.softmax(vu, name='alphas')  # (B,T) shape

    # 注意这里的点乘，是alphas的每个元素与矩阵的行相乘，reduce_sum操作的维度变化为[B, T, D]->[B,D]
    output = tf.reduce_sum(inputs * tf.expand_dims(alphas, -1), 1)

    # Final output with tanh
    output = tf.tanh(output)

    return output, alphas

代码实现2：

def attention(self, H):
    """
    利用Attention机制得到句子的向量表示
    """
	# inputs维度为：[B, T, D]
    # 获得最后一层LSTM的神经元数量
    hiddenSize = config.model.hiddenSizes[-1]

    # 初始化一个权重向量，是可训练的参数，[D,1]
    W = tf.Variable(tf.random_normal([hiddenSize], stddev=0.1))

    # 对Bi-LSTM的输出用激活函数做非线性转换，[B, T, D]
    M = tf.tanh(H)

    # 维度变化过程：[B*T, D]*[D,1] -> [B*T, 1]
    newM = tf.matmul(tf.reshape(M, [-1, hiddenSize]), tf.reshape(W, [-1, 1]))

    # 对newM做维度转换成[B, T]
    restoreM = tf.reshape(newM, [-1, config.sequenceLength])

    # 用softmax做归一化处理[B, T]
    self.alpha = tf.nn.softmax(restoreM)

    # [B, D, T]*[B, T, 1] -> [B, D, 1]
    r = tf.matmul(tf.transpose(H, [0, 2, 1]), tf.reshape(self.alpha, [-1, config.sequenceLength, 1]))

    # 将三维压缩成二维sequeezeR=[B, D]
    sequeezeR = tf.reshape(r, [-1, hiddenSize])
	
    sentenceRepren = tf.tanh(sequeezeR)

    # 对Attention的输出可以做dropout处理
    output = tf.nn.dropout(sentenceRepren, self.dropoutKeepProb)

    return output