tfa.seq2seq.LuongAttention解析

4 篇文章 0 订阅
4 篇文章 0 订阅

源码

class LuongAttention(AttentionMechanism):
    """Implements Luong-style (multiplicative) attention scoring.
    This attention has two forms.  The first is standard Luong attention,
    as described in:
    Minh-Thang Luong, Hieu Pham, Christopher D. Manning.
    [Effective Approaches to Attention-based Neural Machine Translation.
    EMNLP 2015.](https://arxiv.org/abs/1508.04025)
    The second is the scaled form inspired partly by the normalized form of
    Bahdanau attention.
    To enable the second form, construct the object with parameter
    `scale=True`.
    """

    @typechecked
    def __init__(
        self,
        units: TensorLike,
        memory: Optional[TensorLike] = None,
        memory_sequence_length: Optional[TensorLike] = None,
        scale: bool = False,
        probability_fn: str = "softmax",
        dtype: AcceptableDTypes = None,
        name: str = "LuongAttention",
        **kwargs,
    ):
        """Construct the AttentionMechanism mechanism.
        Args:
          units: The depth of the attention mechanism.
          memory: The memory to query; usually the output of an RNN encoder.
            This tensor should be shaped `[batch_size, max_time, ...]`.
          memory_sequence_length: (optional): Sequence lengths for the batch
            entries in memory.  If provided, the memory tensor rows are masked
            with zeros for values past the respective sequence lengths.
          scale: Python boolean. Whether to scale the energy term.
          probability_fn: (optional) string, the name of function to convert
            the attention score to probabilities. The default is `softmax`
            which is `tf.nn.softmax`. Other options is `hardmax`, which is
            hardmax() within this module. Any other value will result
            intovalidation error. Default to use `softmax`.
          dtype: The data type for the memory layer of the attention mechanism.
          name: Name to use when creating ops.
          **kwargs: Dictionary that contains other common arguments for layer
            creation.
        """
        # For LuongAttention, we only transform the memory layer; thus
        # num_units **must** match expected the query depth.
        self.probability_fn_name = probability_fn
        probability_fn = self._process_probability_fn(self.probability_fn_name)

        def wrapped_probability_fn(score, _):
            return probability_fn(score)

        memory_layer = kwargs.pop("memory_layer", None)
        if not memory_layer:
            memory_layer = tf.keras.layers.Dense(
                units, name="memory_layer", use_bias=False, dtype=dtype
            )
        self.units = units
        self.scale = scale
        self.scale_weight = None
        super().__init__(
            memory=memory,
            memory_sequence_length=memory_sequence_length,
            query_layer=None,
            memory_layer=memory_layer,
            probability_fn=wrapped_probability_fn,
            name=name,
            dtype=dtype,
            **kwargs,
        )

    def build(self, input_shape):
        super().build(input_shape)
        if self.scale and self.scale_weight is None:
            self.scale_weight = self.add_weight(
                "attention_g", initializer=tf.ones_initializer, shape=()
            )
        self.built = True

    def _calculate_attention(self, query, state):
        """Score the query based on the keys and values.
        Args:
          query: Tensor of dtype matching `self.values` and shape
            `[batch_size, query_depth]`.
          state: Tensor of dtype matching `self.values` and shape
            `[batch_size, alignments_size]`
            (`alignments_size` is memory's `max_time`).
        Returns:
          alignments: Tensor of dtype matching `self.values` and shape
            `[batch_size, alignments_size]` (`alignments_size` is memory's
            `max_time`).
          next_state: Same as the alignments.
        """
        score = _luong_score(query, self.keys, self.scale_weight)
        alignments = self.probability_fn(score, state)
        next_state = alignments
        return alignments, next_state

    def get_config(self):
        config = {
            "units": self.units,
            "scale": self.scale,
            "probability_fn": self.probability_fn_name,
        }
        base_config = super().get_config()
        return {**base_config, **config}

    @classmethod
    def from_config(cls, config, custom_objects=None):
        config = AttentionMechanism.deserialize_inner_layer_from_config(
            config, custom_objects=custom_objects
        )
        return cls(**config)

原理解析

这个方法的主要思路来源于论文https://arxiv.org/pdf/1508.04025.pdf,具体的网络结构如下图:
在这里插入图片描述
其中蓝色的方块代表encoder,红色的方块代表着decoder。hs代表encoder的中间层状态,ht代表decoder当前的隐层状态。计算步骤如下:

1. ht对每个hs计算权重a(s),公式如下,并不是很复杂,不想看数学原理的可以略过,这一步主要是计算每个encoder中单词hs的权重。

在这里插入图片描述在这里插入图片描述

2. 将权重a(s)和hs相乘,获得加权和ct

![在这里插入图片描述](https://img-blog.csdnimg.cn/20210228164041168.png

3. 将ct和ht相拼接之后经过经过全连接层,获得新的ht’

在这里插入图片描述

4. 将ht’经过output_layer获得对应单词分类

总结

总的来说,tfa.seq2seq.LuongAttention通过计算对原始不同词的权重,在decoder的时候,除了当前隐层状态,还能够注意到原来句子的不同词语进行翻译。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值