【深度学习系列（三）】：基于CNN+seq2seq公式识别系统实现（2）

wxplol

已于 2022-06-14 09:12:58 修改

阅读量1.7k

点赞数 2

分类专栏：公式识别文章标签：深度学习 cnn 自然语言处理计算机视觉算法

于 2019-08-23 10:44:39 首次发布

本文链接：https://blog.csdn.net/wxplol/article/details/100008016

版权

公式识别专栏收录该内容

10 篇文章 5 订阅

订阅专栏

2、公式的Decoder

现在我们已经有了公式图片的编码部分 $[e_{1},e_{2},...e_{n}]$ ，其大小为NxHxWxC。但是如何将他运用于seq2seq中解码器中呢？这里需要一个转化的过程，来实现对解码器器的输入。

2.1、细胞状态的初始化

对于编码部分的最后一层，我们需要将其转化为隐藏向量并作为解码部分细胞状态的初始值。这里采用一个全连接操作，通过学习权重W、b来计算其转化后的向量。具体计算公式如下：

$h_{0}=\tanh \left ( W \cdot \left ( \frac{1}{n} \sum_{t=1}^{n}e_{i} \right ) +b \right )$

其代码位于：/model/componets/attention_mechanism.py中的146~155行。

def initial_state(self, name, dim):
    """Returns initial state of dimension specified by dim"""
    with tf.variable_scope(self._scope_name):
        img_mean = tf.reduce_mean(self._img, axis=1)  # [N,1,512] 这里的img经过reshap变为[N,h*W,C]
        W = tf.get_variable("W_{}_0".format(name), shape=[self._n_channels, dim])  # (C, dim)
        b = tf.get_variable("b_{}_0".format(name), shape=[dim])
        h = tf.tanh(tf.matmul(img_mean, W) + b) # [N,1,dim]
    return h

2.2、注意力机制 Attention Mechanism

注意力机制主要是为了计算context vector,Attention主要是为了解决对于长序列会丢失很多信息这样的问题，通过将encoder阶段的所有隐层状态先保存起来，和后续的decode阶段的前一时刻的状态计算两者之间的相关程度，加权求和得到编码信息，也就是上下文关系向量（context vector）。主要思想是作用在 decoder阶段，每次观察整个句子，在每一步可以决定那些单词是重要的。计算过程如下：

$\alpha_{t^{'}}=\beta ^{T}\tanh \left ( W_{1}\cdot e_{t^{'}}+W_{2}\cdot h_{t} \right )$ 。通过decoder的hidden states加上encoder的hidden states来计算一个分数，用于计算权重；
$\tilde{\alpha }=softmax(\alpha )=\frac{exp(\alpha _{ij})}{\sum_{k=1}^{T_{x}}exp(\alpha _{ik})}$ 。计算每一个encoder的hidden states对应的权重；
$c_{t}=\sum_{i=1}^{n} \bar{a}_{t^{'}}e_{t^{'}}$ 。context vector是一个对于encoder输出的hidden states的一个加权平均。

该操作的代码位于：/model/componets/attention_mechanism.py，各步骤代码实现如下：

步骤一：计算输入特征图的加权和，代码位于43行；

#输入图片进行全连接操作，得到[N,W*H,256]图像
self._att_img = tf.layers.dense(inputs=self._img, units=self._dim_e, use_bias=False, name="att_img")

步骤二：计算解码器隐藏状态的输出的加权和及最终的权重和各特征的权重分数；

def compute_attention(self, h, att_img):
    # computes attention over the hidden vector
    att_h = tf.layers.dense(inputs=h, units=self._dim_e, use_bias=False) #[tiles*batch,256]
    att_h = tf.expand_dims(att_h, axis=1) #[tiles*batch,1,256]

    # sums the two contributions
    att = tf.tanh(att_img + att_h) #[tiles*batch,H*W,256]

    # computes scalar product with beta vector
    # works faster with a matmul than with a * and a tf.reduce_sum
    att_beta = tf.get_variable("att_beta", shape=[self._dim_e, 1], dtype=tf.float32)
    att_flat = tf.reshape(att, shape=[-1, self._dim_e])  # 扁平化

    e = tf.matmul(att_flat, att_beta)   #[tiles*batch*H*W,1]
    e = tf.reshape(e, shape=[-1, self._n_regions])  # (tiles*batch, H*W)

    # compute weights
    return tf.nn.softmax(e)

步骤三：计算上下文关系向量context vector

def context(self, h):
    """Computes attention
    这里是注意力机制的核心

    Args:h: (batch_size, num_units) hidden stat
    Returns:c: (batch_size, channels) context vector

    """
    with tf.variable_scope(self._scope_name):
        # 1. 传入 img 和 att_img
        if self._tiles > 1: # self._tiles == config.beam_size
            att_img = tf.expand_dims(self._att_img, axis=1)
            att_img = tf.tile(att_img, multiples=[1, self._tiles, 1, 1])
            att_img = tf.reshape(att_img, shape=[-1, self._n_regions, self._dim_e]) # (tiles*batch, H*W, 256)
            img = tf.expand_dims(self._img, axis=1) # 增加一维给 beam_search
            img = tf.tile(img, multiples=[1, self._tiles, 1, 1]) # 在加的这一维上复制 beam_size 个一摸一样的
            img = tf.reshape(img, shape=[-1, self._n_regions, self._n_channels]) # (tiles*batch, H*W, 512)
        else:
            att_img = self._att_img  # (tiles*batch, H*W, 256)
            img     = self._img      # (tiles*batch, H*W, 512)

        a = self.compute_attention(h, att_img) #[tiles*batch, H*W]
        a = self.insert_visualize_op(a)

        a = tf.expand_dims(a, axis=-1) #[tiles*batch, H*W, 1]
        c = tf.reduce_sum(a * img, axis=1) # 以 attention 给原来的 img 加权，attention 的地方权重大
        return c

2.3、解码

上面几步了解完之后，终于到了重点部分了，就是解码的过程到底是什么样的？正如seq2seq网络一样，我们需要将进行embeddings后的标签 $O_{t}$ ,以及上一时刻得到的隐藏状态 $h_{t-1}$ 传入解码器，得到当前时刻的输出 $h_{t}$ 。将当前时刻的输出 $h_{t}$ 与编码器的输出 $e_{t}$ 计算上下文关系向量(context vector) $c_{t}$ ，将 $c_{t}$ 与 $h_{t}$ 进行加权乘积输出 $h_{t}^{'}$ 。最后对输出进行打分输出其概率P。具体公式如下：

$h_{t}=LSTM\left ( h_{t-1} , \left [ w_{t-1},o_{t-1} \right ] \right )$ 。解码器方面接受的是目标句子里单词的word embedding，和上一个时间点的hidden state。
$c_{t}=Attention\left ( [e_{1},...,e_{n}],h_{t} \right )$ 。计算上下文关系向量(context vector)。
$o_{t}=\tanh \left ( W_{3},\left [ h_{t},c_{t} \right ] \right )$ 。将context vector 和 decoder的hidden states 串起来。
$p_{t}=softmax\left ( W_{4},o_{t} \right )$ 。计算最后的输出概率。

具体实现代码位于：/model/componets/attention_cell.py59~90行。

def step(self, embedding, attn_cell_state):
    """
    Args:
        embedding: shape = (batch_size, dim_embeddings) embeddings from previous time step
        attn_cell_state: (AttentionState) state from previous time step

    """
    prev_cell_state, o = attn_cell_state  # 上一步的注意力状态

    scope = tf.get_variable_scope()
    with tf.variable_scope(scope):
        # compute new h
        x = tf.concat([embedding, o], axis=-1) #[batch,]
        new_h, new_cell_state = self._cell.__call__(x, prev_cell_state)
        new_h = tf.nn.dropout(new_h, self._dropout) #[batch,512]

        # compute attention
        c = self._attention_mechanism.context(new_h) #[batch,512]

        # compute o
        o_W_c = tf.get_variable("o_W_c", dtype=tf.float32, shape=(self._n_channels, self._dim_o)) #[512,256]
        o_W_h = tf.get_variable("o_W_h", dtype=tf.float32, shape=(self._num_units, self._dim_o)) #[512,256]
        y_W_o = tf.get_variable("y_W_o", dtype=tf.float32, shape=(self._dim_o, self._num_proj)) #[256,301]

        new_o = tf.tanh(tf.matmul(new_h, o_W_h) + tf.matmul(c, o_W_c)) #[batch,256]
        new_o = tf.nn.dropout(new_o, self._dropout)
        logits = tf.matmul(new_o, y_W_o) #[N,301]

        # new Attn cell state
        new_state = AttentionState(new_cell_state, new_o)

        return logits, new_state

该代码位于：/models/decoder.py 49~58行。

# training
with tf.variable_scope("AttentionCell", reuse=False):
    embeddings = get_embeddings(formula, embedding_table, dim_embeddings,
                                            start_token, batch_size)  # (N, T, dim_embedding)
    attn_meca = AttentionMechanism(img, dim_e)
    recu_cell = LSTMCell(num_units)

    attn_cell = AttentionCell(recu_cell, attn_meca, dropout, self._config.attn_cell_config, self._n_tok)

    train_outputs, _ = tf.nn.dynamic_rnn(attn_cell, embeddings, initial_state=attn_cell.initial_state())

参考文章：

真正的完全图解Seq2Seq Attention模型

探索 Seq2Seq 模型及 Attention 机制

wxplol

关注

2
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
【深度学习系列（三）】：基于CNN+seq2seq公式识别系统实现（2）

2、公式的Decoder现在我们已经有了公式图片的编码部分，其大小为NxHxWxC。但是如何将他运用于seq2seq中解码器中呢？这里需要一个转化的过程，来实现对解码器器的输入。2.1、细胞状态的初始化对于编码部分的最后一层，我们需要将其转化为隐藏向量并作为解码部分细胞状态的初始值。这里采用一个全连接操作，通过学习权重W、b来计算其转化后的向量。具体计算公式如下： ......
复制链接

扫一扫