transformer源码笔记

最新推荐文章于 2024-06-21 21:34:41 发布

彩色电暖

最新推荐文章于 2024-06-21 21:34:41 发布

阅读量306

点赞数

分类专栏： transformer

本文链接：https://blog.csdn.net/eruiwen1624/article/details/89438278

版权

transformer 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

EncoderStack的三个输入参数encoder_inputs, attention_bias, inputs_padding的用途:

          encoder_inputs = self_attention_layer(encoder_inputs, attention_bias)
          encoder_inputs = feed_forward_network(encoder_inputs, inputs_padding)

EncoderStack最后输出

self.output_normalization(encoder_inputs)

SelfAttention结构是一个特殊的Attention类,其第1,2参数一样

class SelfAttention(Attention):
  """Multiheaded self-attention layer."""

  def call(self, x, bias, cache=None):
    return super(SelfAttention, self).call(x, x, bias, cache)

Attention类,实现x对y的attention

生成q,k,v

    q = self.q_dense_layer(x)
    k = self.k_dense_layer(y)
    v = self.v_dense_layer(y)

multi-head (注意这里最终的q,k,v好像是三维数组,[head,x,x],第一维代表一个head,而后面用到的tf.matmul是支持三维数组相乘的)
split_heads输入x
[batch_size, length, hidden_size]
输出
[batch_size, num_heads, length, hidden_size/num_heads]

    # Split q, k, v into heads.
    q = self.split_heads(q)
    k = self.split_heads(k)
    v = self.split_heads(v)

scale,除以 $\sqrt {d_k}$

    # Scale q to prevent the dot product between q and k from growing too large.
    depth = (self.hidden_size // self.num_heads)
    q *= depth ** -0.5

q* (k的转置) 结果shape为[batch_size, num_heads, x_length,y_length ]

    logits = tf.matmul(q, k, transpose_b=True)

然后加上bias

    logits += bias

softmax,使和为1

    weights = tf.nn.softmax(logits, name="attention_weights")

按一定比例组合value

    attention_output = tf.matmul(weights, v)

把不同的head结果拼接,转换为split之前的形式

    # Recombine heads --> [batch_size, length, hidden_size]
    attention_output = self.combine_heads(attention_output)

最后经过[hidden_size,input_size]的权重矩阵输出 [batch_size, length,input_size]

    # Run the combined outputs through another linear projection layer.
    attention_output = self.output_dense_layer(attention_output)

另外最后的output_dense_layer和形成q , k,v的层都是 tf.layers.Dense,是个全连接网络,节点数为hidden_size

彩色电暖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
transformer源码笔记

EncoderStack的三个输入参数encoder_inputs, attention_bias, inputs_padding的用途: encoder_inputs = self_attention_layer(encoder_inputs, attention_bias) encoder_inputs = feed_forward_network(en...
复制链接

扫一扫

专栏目录