tfa.seq2seq.LuongAttention解析

最新推荐文章于 2022-05-03 20:07:48 发布

zz__2020

最新推荐文章于 2022-05-03 20:07:48 发布

阅读量644

点赞数 1

分类专栏： tf nlp 神经网络文章标签： nlp

本文链接：https://blog.csdn.net/qq_42075890/article/details/114228577

版权

nlp 同时被 3 个专栏收录

6 篇文章 0 订阅

订阅专栏

4 篇文章 0 订阅

订阅专栏

神经网络

4 篇文章 0 订阅

订阅专栏

源码

class LuongAttention(AttentionMechanism):
    """Implements Luong-style (multiplicative) attention scoring.
    This attention has two forms.  The first is standard Luong attention,
    as described in:
    Minh-Thang Luong, Hieu Pham, Christopher D. Manning.
    [Effective Approaches to Attention-based Neural Machine Translation.
    EMNLP 2015.](https://arxiv.org/abs/1508.04025)
    The second is the scaled form inspired partly by the normalized form of
    Bahdanau attention.
    To enable the second form, construct the object with parameter
    `scale=True`.
    """

    @typechecked
    def __init__(
        self,
        units: TensorLike,
        memory: Optional[TensorLike] = None,
        memory_sequence_length: Optional[TensorLike] = None,
        scale: bool = False,
        probability_fn: str = "softmax",
        dtype: AcceptableDTypes = None,
        name: str = "LuongAttention",
        **kwargs,
    ):
        """Construct the AttentionMechanism mechanism.
        Args:
          units: The depth of the attention mechanism.
          memory: The memory to query; usually the output of an RNN encoder.
            This tensor should be shaped `[batch_size, max_time, ...]`.
          memory_sequence_length: (optional): Sequence lengths for the batch
            entries in memory.  If provided, the memory tensor rows are masked
            with zeros for values past the respective sequence lengths.
          scale: Python boolean. Whether to scale the energy term.
          probability_fn: (optional) string, the name of function to convert
            the attention score to probabilities. The default is `softmax`
            which is `tf.nn.softmax`. Other options is `hardmax`, which is
            hardmax() within this module. Any other value will result
            intovalidation error. Default to use `softmax`.
          dtype: The data type for the memory layer of the attention mechanism.
          name: Name to use when creating ops.
          **kwargs: Dictionary that contains other common arguments for layer
            creation.
        """
        # For LuongAttention, we only transform the memory layer; thus
        # num_units **must** match expected the query depth.
        self.probability_fn_name = probability_fn
        probability_fn = self._process_probability_fn(self.probability_fn_name)

        def wrapped_probability_fn(score, _):
            return probability_fn(score)

        memory_layer = kwargs.pop("memory_layer", None)
        if not memory_layer:
            memory_layer = tf.keras.layers.Dense(
                units, name="memory_layer", use_bias=False, dtype=dtype
            )
        self.units = units
        self.scale = scale
        self.scale_weight = None
        super().__init__(
            memory=memory,
            memory_sequence_length=memory_sequence_length,
            query_layer=None,
            memory_layer=memory_layer,
            probability_fn=wrapped_probability_fn,
            name=name,
            dtype=dtype,
            **kwargs,
        )

    def build(self, input_shape):
        super().build(input_shape)
        if self.scale and self.scale_weight is None:
            self.scale_weight = self.add_weight(
                "attention_g", initializer=tf.ones_initializer, shape=()
            )
        self.built = True

    def _calculate_attention(self, query, state):
        """Score the query based on the keys and values.
        Args:
          query: Tensor of dtype matching `self.values` and shape
            `[batch_size, query_depth]`.
          state: Tensor of dtype matching `self.values` and shape
            `[batch_size, alignments_size]`
            (`alignments_size` is memory's `max_time`).
        Returns:
          alignments: Tensor of dtype matching `self.values` and shape
            `[batch_size, alignments_size]` (`alignments_size` is memory's
            `max_time`).
          next_state: Same as the alignments.
        """
        score = _luong_score(query, self.keys, self.scale_weight)
        alignments = self.probability_fn(score, state)
        next_state = alignments
        return alignments, next_state

    def get_config(self):
        config = {
            "units": self.units,
            "scale": self.scale,
            "probability_fn": self.probability_fn_name,
        }
        base_config = super().get_config()
        return {**base_config, **config}

    @classmethod
    def from_config(cls, config, custom_objects=None):
        config = AttentionMechanism.deserialize_inner_layer_from_config(
            config, custom_objects=custom_objects
        )
        return cls(**config)

原理解析

这个方法的主要思路来源于论文https://arxiv.org/pdf/1508.04025.pdf，具体的网络结构如下图：
在这里插入图片描述
其中蓝色的方块代表encoder，红色的方块代表着decoder。hs代表encoder的中间层状态，ht代表decoder当前的隐层状态。计算步骤如下：