AttentionMechanism类

最新推荐文章于 2022-06-19 16:28:21 发布

jiaqi71

最新推荐文章于 2022-06-19 16:28:21 发布

阅读量272

点赞数 2

分类专栏： attention

本文链接：https://blog.csdn.net/jiaqi71/article/details/87869485

版权

attention 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

源代码
 TensorFlow函数：tf.layers.Layer
Luong-style 注意力机制有两种类型：
①standard Luong attention
Effective Approaches to Attention-based Neural Machine Translation 2015
②scaled form inspired partly by the normalized form of Bahdanau attention

1. $\color{purple}基类 AttentionMechanism(object):$
存在两个函数
$\color{green}\begin{array}l@property&\\ def \,\,\,\,\,\,alignments\_size(self):\end{array}$
$\color{green}\begin{array}l@property&\\ def \,\,\,\,\,\,state\_size(self):\end{array}$
2. $\color{purple} \_prepare\_memory函数:$
转换成张量，并将memory（encoder输出）中超过序列长度的屏蔽，将memory转你换成恰当的格式。( $\color{pink}Convert\, to\, tensor\, and \,possibly\, mask \,memory.$ )

def _prepare_memory(memory, memory_sequence_length, check_inner_dims_defined):

参数介绍：
memory: 张量, 形状为 [batch_size, max_time, ...]
memory_sequence_length: int32 类型张量，形状为[batch_size]。与输入memory对应，每行的值，代表着对应memory行中序列的长度
check_inner_dims_defined: 布尔值。如果为真，检查上面memory参数的形状，确保除两个最外层维度外（batch_size,max_time）的所有维度都已明确定义。
Returns:
A (possibly masked), checked, new memory.
Raises:
ValueError: If check_inner_dims_defined is True and not memory.shape[2:].is_fully_defined().
此函数包含一个子函数， $\color{green}\_maybe\_mask(m, seq\_len\_mask)$

  memory = nest.map_structure(lambda m: ops.convert_to_tensor(m, name="memory"), memory)

上述代码行的含义是，将memory中的所有元素都转换成张量
①nest.map_structure()函数：
tf.contrib.framework.nest.map_structure(func, *structure, **check_types_dict)
作用：对一个可循环结构的元素一次应用函数。返回一个与参数structure有相同参数数量的新结构。
②ops.convert_to_tensor()函数：
作用：将不同数据变成张量：比如可以让数组变成张量、也可以让列表变成张量

  if memory_sequence_length is not None:
  		memory_sequence_length = ops.convert_to_tensor(memory_sequence_length, name="memory_sequence_length")

上述代码将memory_sequence_lenght转换成张量

  if check_inner_dims_defined:
    def _check_dims(m):
      if not m.get_shape()[2:].is_fully_defined():
        raise ValueError("Expected memory %s to have fully defined inner dims, "
                         "but saw shape: %s" % (m.name, m.get_shape()))
nest.map_structure(_check_dims, memory)

如果check_inner_dims_defined为True，检查memory形状，如果任一元素没有被彻底定义，报错

  if memory_sequence_length is None:
    seq_len_mask = None
  else:
    seq_len_mask = array_ops.sequence_mask(
        memory_sequence_length,
        maxlen=array_ops.shape(nest.flatten(memory)[0])[1],
        dtype=nest.flatten(memory)[0].dtype)
    seq_len_batch_size = (
        memory_sequence_length.shape[0].value
or array_ops.shape(memory_sequence_length)[0])

上述代码，else部分，seq_len_mask是一个seq_len_batch_size $\times$ array_ops.shape(nest.flatten(memory)[0])[1]大小的张量
①sequence_mask(lenghts, maxlen=None,dtype=tf.bool,name=None)
参数：
lengths：整数张量，其所有值小于等于maxlen。
maxlen：标量整数张量，返回张量的最后维度的大小；默认值是lengths中的最大值。
dtype：结果张量的输出类型。
name：操作的名字
作用：将[batch_size]转换成[batch_size, seq_length]

  def _maybe_mask(m, seq_len_mask):
    rank = m.get_shape().ndims
    rank = rank if rank is not None else array_ops.rank(m)
    extra_ones = array_ops.ones(rank - 2, dtype=dtypes.int32)
    m_batch_size = m.shape[0].value or array_ops.shape(m)[0]
    if memory_sequence_length is not None:
      message = ("memory_sequence_length and memory tensor batch sizes do not "
                 "match.")
      with ops.control_dependencies([
          check_ops.assert_equal(
              seq_len_batch_size, m_batch_size, message=message)]):
        seq_len_mask = array_ops.reshape(
            seq_len_mask,
            array_ops.concat((array_ops.shape(seq_len_mask), extra_ones), 0))
        return m * seq_len_mask
    else:
return m

判断上一步生成的seq_len_mask中的seq_len_batch_size与memory的m_batch_size是否相等。
3.定义了一个_maybe_mask_score函数，
$\color{green}\_maybe\_mask\_score(score, memory\_sequence\_length, score\_mask\_value)$

4. $\color{red}{类\_BaseAttentionMechanism继承了类AttentionMechanism}$
提供公共功能的一个AttentionMechanism基类。
通用的功能包括:

存储查询和内存层（query layers and memory layers）。
预处理和存储memory。

  def __init__(self,
               query_layer,
               memory,
               probability_fn,
               memory_sequence_length=None,
               memory_layer=None,
               check_inner_dims_defined=True,
               score_mask_value=None,
               name=None):

参数：
$\color{green}query\_layer$ : Callable. tf.layers.Layers的实例。层深必须和memory_layer的深度一样。如果没有提供query_layer，query的形状必须和memory_layers一致。这是因为对于公式 $score(s_t, h_i)=\,v_a^Ttanh(W_a[s_t;h_i])$ 中的 $W_a[s_t;h_i])$ 部分，代码是通过两部分实现的，即query_layer层实现 $s_t$ 部分，memory_layer层实现 $h_i$ 部分，所以需要两层的深度一致，可以在后面的代码中看到，两层是通过相加结合在一起的。
$\color{green}memory$ :要查询的memory（这里的memory是RNN encoder的输出）；尺寸[batch_size, max_time,...] $\color{red}这里的max\_time代表的是什么呢？是代表encoder的步数吗$
$\color{green}probability\_fn$ : callable.将score和previous alignment转换为概率( $\color{red}{这里的sore应该是encoder输出和decoder输出的对齐程度}$ )，公式如下：probabilities = probability_fn(score, state)，state代表所有的score。
$\color{green}memory_sequence_length(optional)$ : memory(Encoder输出)中每条（batch entries）序列长度。如果提供，encoder输出张量行中超过各自序列长度的值，用0进行屏蔽。
$\color{green}memory\_layer$ :tf.layers.Layer的实例（也可能为None）。 $\color{red}memory\_layer和query\_layer的关系是什么？$ memory_layer层的深度必须和query_layer的深度一致。If memory_layer is not provided, the shape of memory must match that of query_layer.
$\color{green}check\_inner\_dims\_defined$ :布尔值。如果为真，则核查memory参数形状，用以确保处最外面两个维度（batch_size, max_time）外其余所有维度已被完全定义。
$\color{green}score\_mask\_value$ :(可选)The mask value for score before passing into
probability_fn. The default is -inf. Only used if
memory_sequence_length is not None.
$\color{green}name$ :操作的名称

def initial_alignments（self，batch_size，dtype）

  def initial_alignments(self, batch_size, dtype):
    max_time = self._alignments_size
	return _zero_state_tensors(max_time, batch_size, dtype)

为“AttentionWrapper”类创建初始对齐值（权重）。这对于使用前面的对齐来计算下一个时间步(例如monotonic attention)的对齐的注意力机制很重要。默认返回一个全零张量。
参数介绍：
batch_size: int32 标量, the batch_size.
dtype: The dtype.
Returns:
A dtype tensor shaped [batch_size, alignments_size]
(alignments_size is the values’ max_time).

def initial_state(self, batch_size, dtype)
为“AttentionWrapper”类创建初始状态值。这对于使用前面对齐的注意力机制计算下一个时间步的对齐(例如单调注意)非常重要。默认行为是返回与initial_alignments相同的输出。

  def initial_state(self, batch_size, dtype):
	return self.initial_alignments(batch_size, dtype)

参数介绍：
batch_size: int32 scalar, the batch_size.
dtype: The dtype.
Returns:
A structure of all-zero tensors with shapes as described by state_size.

5. $\color{red}类LuongAttention$

6. $\color{red}类BahdanauAttention$
继承基类：_BaseAttentionMechanism
Bahdanau注意力机制共有两种形式：
①Bahdanau attention： Neural Machine Traslation by Jointly Learning to Align and Translate 2015
②normalized form： Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.

在这个类之前定义了一个 $\color{purple}\_bahdanau\_score(processed\_query, keys, normalize):$ 函数。如果normilize为False，此函数处理如下公式：
$score(s_t, h_i)=\,v_a^Ttanh(W_a[s_t;h_i])$

return math_ops.reduce_sum(v * math_ops.tanh(keys + processed_query), [2])

其中，keys对应 $h_i$ （memory）encoder的输出，形状为[batch_size, max_time, num_units]；
processed_query对应 $s_t$ decoder的隐藏状态，形状为[batch_size, num_units]
返回值的形状为[batch_size, max_time]，即分数

Bahdanau注意力机制的初始化函数：

  def __init__(self,
               num_units,
               memory,
               memory_sequence_length=None,
               normalize=False,
               probability_fn=None,
               score_mask_value=None,
               dtype=None,
               name="BahdanauAttention"):

参数介绍：
num_units： $\color{green}query\,\, mechanism$ （注意力机制）的深度
memory：注意力机制的输入；通常是RNN编码器的输出。此张量的形状为：[batch_size, max_time, ...]
memory_sequence_lenght（可选）：注意力输入（memory）中批处理条目的序列长度。如果给定，则对注意力输入（memory）这个张量的行进行如下处理，对于超过相应序列长度的值，行用0进行屏蔽。
normalize：布尔值。如果normalize=True，那么选择第二种形式的注意力机制
probability_fn：（可选）A callable。将分数转换成概率。对应于论文中下面的这个公式， $\alpha_{t, i}\;= align(y_t, x_i)\text{表示${y_t}$与$x_i$的对齐程度} \\\;\;\;\;\;\; \,\,= \frac{exp(score(s_{t-1}, h_i))}{\sum_{i^{'} =1}^nexp(score(s_{t-1}, h_{i^{'}}))}\text{表示一些预定义对齐得分的Softmax}$ 默认形式为tf.nn.softmax，其他选项包括：tf.contrib.seq2seq.hardmax 和tf.contrib.sparsemax.sparsemax。应用如下， probabilities = probability_fn(score)，其中，probabilities代表了权重向量 $\alpha_t$
score_mask_value：（可选）在传递到probability_fn之前的分数掩码值。默认值是-inf。仅在memory_sequence_length不是None的情况下使用。
dtype：注意机制的查询层（query）和内存层（memory）的数据类型。默认为float32。
name：创建操作时使用的名称。
$\color{green}函数体部分：$

    if probability_fn is None:
      probability_fn = nn_ops.softmax
    if dtype is None:
      dtype = dtypes.float32
    wrapped_probability_fn = lambda score, _: probability_fn(score)
    super(BahdanauAttention, self).__init__(
        query_layer=layers_core.Dense(
            num_units, name="query_layer", use_bias=False, dtype=dtype),
        memory_layer=layers_core.Dense(
            num_units, name="memory_layer", use_bias=False, dtype=dtype),
        memory=memory,
        probability_fn=wrapped_probability_fn,
        memory_sequence_length=memory_sequence_length,
        score_mask_value=score_mask_value,
        name=name)
    self._num_units = num_units
    self._normalize = normalize
    self._name = name

$def\,\,\, \_\_call\_\_$ 函数实现了权重 $\alpha_t$ 的求取

def __call__(self, query, state):
    with variable_scope.variable_scope(None, "bahdanau_attention", [query]):
      processed_query = self.query_layer(query) if self.query_layer else query
      score = _bahdanau_score(processed_query, self._keys, self._normalize)
    alignments = self._probability_fn(score, state)
    next_state = alignments
    return alignments, next_state

参数介绍:
query：张量类型与 self.values 匹配，形状为[batch_size, query_depth]。
state：张量类型与self.values匹配，形状为[batch_size, alignments_size]（alignments_size 就是输入（memory）的 max_time）
返回值：
alignments：张量类型与self.values匹配，形状为[batch_size, alignments_size]（alignments_size 就是输入（memory）的 max_time）