文本分类任务的一大核心就是获得文本的准确语义表示,笔者之前在文本分类任务中只是简单地调用LSTM或GRU来获得文本的词向量表示。在阅读论文和github项目时,会发现顶会论文在获得文本的语义向量时会使用Attention机制。下面,博主就介绍几种文本分类任务中在获得文本语义向量表示的过程中Attention机制的运用(后续随着论文的阅读会不断更新)。
adversarialLSTM的attention机制
def _attention(self, H):
"""
利用Attention机制得到句子的向量表示
"""
# 获得最后一层LSTM的神经元数量
hiddenSize = config.model.hiddenSizes[-1]
# 初始化一个权重向量,是可训练的参数
W = tf.Variable(tf.random_normal([hiddenSize], stddev=0.1))
# 对Bi-LSTM的输出用激活函数做非线性转换
M = tf.tanh(H)
# 对W和M做矩阵运算,M=[batch_size, time_step, hidden_size],计算前做维度转换成[batch_size * time_step, hidden_size]
# newM = [batch_size*time_step, 1],每一个时间步的输出由向量转换成一个数字
newM = tf.matmul(tf.reshape(M, [-1, hiddenSize]), tf.reshape(W, [-1, 1]))
# 对newM做维度转换成[batch_size, time_step]
restoreM = tf.reshape(newM, [-1, config.sequenceLength])
# 用softmax做归一化处理[batch_size, time_step]
self.alpha = tf.nn.softmax(restoreM)
# 利用求得的alpha的值对H进行加权求和,用矩阵运算直接操作
r = tf.matmul(tf.transpose(H, [0, 2, 1]), tf.reshape(self.alpha, [-1, config.sequenceLength, 1]))
# 将三维压缩成二维sequeezeR=[batch_size, hidden_size]
sequeezeR = tf.squeeze(r)
sentenceRepren = tf.tanh(sequeezeR)
# 对Attention的输出可以做dropout处理
output = tf.nn.dropout(sentenceRepren, self.dropoutKeepProb)
return output
- H为Bi-LSTM的输出,H到M时,shape保持不变,为[batch_size,time_step,hiddenSize],W的shape为[hiddenSize]
- newM = tf.matmul(tf.reshape(M, [-1, hiddenSize]), tf.reshape(W, [-1,1])),此时newM的shape为[batch_size*time_step,1],接着restoreM的shape为[batch_size,time_step],这里面time_step与config.sequenceLength相同
- restoreM经过siftmax层后得到的shape与其保持不变。 r = tf.matmul(tf.transpose(H, [0,
2, 1]), tf.reshape(self.alpha, [-1, config.sequenceLength,
1]))中[batch_size,hiddenSize,time_step]与[batch_size,time_step,1]做矩阵乘法,得到的shape为[batch_size,hiddenSize,1] - 接着将[batch_size,hiddenSize,1]压缩成二维,在经过Tanh和Dropout后就得到注意力机制后的向量,shape为[batch_size,hiddenSize]
HiGRU的attention机制
# Dot-product attention
def get_attention(q, k, v, attn_mask=None):
"""
:param : (batch, seq_len, seq_len)
:return: (batch, seq_len, seq_len)
"""
attn = torch.matmul(q, k.transpose(1, 2))
if attn_mask is not None:
attn.data.masked_fill_(attn_mask, -1e10)
attn = F.softmax(attn, dim=-1)
output = torch.matmul(attn, v)
return output, attn
# Get mask for attention
def get_attn_pad_mask(seq_q, seq_k):
assert seq_q.dim() == 2 and seq_k.dim() == 2
pad_attn_mask = torch.matmul(seq_q.unsquee