TextRNN及与其他模型的组合

最新推荐文章于 2021-06-27 18:04:45 发布

笑给我看

最新推荐文章于 2021-06-27 18:04:45 发布

阅读量596

点赞数

分类专栏： nlp

本文链接：https://blog.csdn.net/qq_41610436/article/details/86664301

版权

nlp 专栏收录该内容

26 篇文章 3 订阅

订阅专栏

首先复习一下基础的RNN结构如下所示
在这里插入图片描述
开始正题

一、LSTM模型

LSTM（Long Short-Term Memory）是长短期记忆网络，是一种时间递归神经网络，适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
就是所谓的该记得会一直传递，不该记得就被“忘记”。

LSTM“记忆细胞”变得稍微复杂了一点
在这里插入图片描述

1.1 细胞状态

细胞状态类似于传送带。直接在整个链上运行，只有一些少量的线性交互。信息在上面流传会很容易保持不变。
在这里插入图片描述

LSTM控制“细胞状态”的方式：

通过“门”让信息选择性通过，来去除或者增加信息到细胞状态。
包含一个SIGMOD神经元层和一个pointwise乘法操作。
SIGMOD层输出0到1之间的概率值，描述每个部分有多少量可以通过。0代表“不许任何量通过”，1就表示“允许任意量通过”。

在这里插入图片描述

1.2 遗忘门

遗忘门（forget gate）顾名思义，是控制是否遗忘的，在LSTM中即以一定的概率控制是否遗忘上一层的隐藏细胞状态。遗忘门子结构如下图所示：
在这里插入图片描述

1.3 输入门

输入门（input gate）负责处理当前序列位置的输入，它的子结构如下图：
在这里插入图片描述

1.4 更新细胞

在研究LSTM输出门之前，我们要先看看LSTM之细胞状态。前面的遗忘门和输入门的结果都会作用于细胞状态 C(t)，我们来看看细胞如何从C(t−1)到C(t):
在这里插入图片描述

1.5 输出门

有了新的隐藏细胞状态C(t)，我们就可以来看输出门了，子结构如下：
在这里插入图片描述

所以LSTM总的结构示意是这样的
在这里插入图片描述

1.6 LSTM变体

相比于传统的LSTM网络：增加peephole connection；让门层也会接受细胞状态的输入。
通过使用coupled忘记和输入门，之前是分开确定需要忘记和添加的信息，然后一同做出决定。

在这里插入图片描述

二、 GRU

Gatad Reacurrent Unit (GRU)，2014年提出。

将忘记门和输入门合成了一个单一的更新门
混合了细胞状态和隐藏状态
比标准的LSTM简单

三、 TextRNN

尽管TextCNN能够在很多任务里面能有不错的表现，但CNN有个最大问题是固定 filter_size 的视野，一方面无法建模更长的序列信息，另一方面 filter_size 的超参调节也很繁琐。CNN本质是做文本的特征表达工作，而自然语言处理中更常用的是递归神经网络（RNN, Recurrent Neural Network），能够更好的表达上下文信息。具体在文本分类任务中，Bi-directional RNN（实际使用的是双向LSTM）从某种意义上可以理解为可以捕获变长且双向的的 “n-gram” 信息。

双向LSTM算是在自然语言处理领域非常一个标配网络了，在序列标注/命名体识别/seq2seq模型等很多场景都有应用，下图是Bi-LSTM用于分类问题的网络结构原理示意图，黄色的节点分别是前向和后向RNN的输出，示例中的是利用最后一个词的结果直接接全连接层softmax输出了。

在这里插入图片描述
多层LSTM结构：

三、神经网络的组合拓展

3.1 TextRNN + Attention

CNN和RNN用在文本分类任务中尽管效果显著，但都有一个不足的地方就是不够直观，可解释性不好，特别是在分析badcase时。而注意力（Attention）机制是自然语言处理领域一个常用的建模长时间记忆机制，能够很直观的给出每个词对结果的贡献，基本成了Seq2Seq模型的标配了。实际上文本分类从某种意义上也可以理解为一种特殊的Seq2Seq，所以考虑把Attention机制引入近来，研究了下学术界果然有类似做法。

3.1.1 Attention机制介绍：

详细介绍Attention恐怕需要一小篇文章的篇幅，感兴趣的可参考14年这篇paper NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE。
在这里插入图片描述
Attention的核心point是在翻译每个目标词（或预测商品标题文本所属类别）所用的上下文是不同的，这样的考虑显然是更合理的。

3.2 TextRCNN（TextRNN + CNN）

在这里插入图片描述
利用前向和后向RNN得到每个词的前向和后向上下文的表示：

这样词的表示就变成词向量和前向后向上下文向量concat起来的形式了，即：

最后再接跟TextCNN相同卷积层，pooling层即可，唯一不同的是卷积层 filter_size = 1就可以了，不再需要更大 filter_size 获得更大视野，这里词的表示也可以只用双向RNN输出。

事实上，不论是哪种网络，他们在实际应用中常常都混合着使用，比如CNN和RNN在上层输出之前往往会接上全连接层，很难说某个网络到底属于哪个类别。不难想象随着深度学习热度的延续，更灵活的组合方式、更多的网络结构将被发展出来。

四、网络搭建：Bi-LSTM -> LSTM -> FC

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

class TextRNN:
    def __init__(self,num_classes,learn_rate,batch_size,sequence_length,
                 vocab_size,embed_size,is_training,initializer=tf.random_normal_initializer(stddev=0.1)):   
        '''初始化超参数'''
        self.num_classes = num_classes
        self.batch_size = batch_size
        self.sequence_length = sequence_length
        self.vocab_size = vocab_size
        self.embed_size = embed_size
        self.hidden_size =embed_size
        self.is_training = is_training
        self.learning_rate = learn_rate
        self.initializer = initializer
        self.num_sampled = 20

        # add palceholder
        self.input_x = tf.placeholder(tf.int32,[None,self.sequence_length],name='input_x')
        self.input_y =tf.placeholder(tf.int32,[None],name='input_y') # todo  因为后面的sparse_softmax_cross_entropy_with_logits方法，会进行one-hot编码所以这里直接传一维数组就行==>[None,num_classes]
        self.dropout_keep_prob = tf.placeholder(tf.float32,name='dropout_keep_prob')

        self.global_step = tf.Variable(0,trainable=False,name='Global_Step')
        self.epoch_step = tf.Variable(0,trainable=False,name='Epoch_Step')
        self.epoch_increament = tf.assign(self.epoch_step,tf.add(self.epoch_step,tf.constant(1)))

        self.instantiate_weights()
        self.logits = self.inference() # [None, self.labels_size]  todo     main!
        if not is_training:
            return
        self.loss_val = self.loss() # ==> self.loss_nec()
        self.train_op = self.train()
        self.predictions = tf.argmax(self.logits,1,name='predictions') # shape:[None,]
        correct_prediction = tf.equal(tf.cast(self.predictions,tf.int32),self.input_y)
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='Accuracy') # shape = ()

    def instantiate_weights(self):
        '''定义所有的权重'''
        with tf.name_scope('embedding'):    #embedding matrix
            self.Embedding = tf.get_variable('Embedding',shape=[self.vocab_size,self.embed_size],initializer=self.initializer) # [vocab_size,embed_size] tf.random_uniform([self.vocab_size,self.embed_size],-1.0,1.0)
            self.W_projection = tf.get_variable('W_projection',shape=[self.hidden_size * 2, self.num_classes],initializer=self.initializer) # [embed_size,label_size]
            self.b_projection = tf.get_variable('b_projection',shape=[self.num_classes])

    def inference(self):
        '''这是主要的计算图: 1. embedding layer 2. Bi-LSTM ==>dropout 3.LSTM layer ==>dropout 4.FC layer 5.softmax layer'''
        # 1. get embedding of words in the sentence
        self.embedded_words = tf.nn.embedding_lookup(self.Embedding,self.input_x) # shape:[None,sentence_length,embed_size]

        # 2. Bi-LSTM layer
        # define lstm cell :get lstm cell output
        lstm_fw_cell = rnn.BasicLSTMCell(self.hidden_size) # forward direction cell
        lstm_bw_cell = rnn.BasicLSTMCell(self.hidden_size) # backward direction cell

        if self.dropout_keep_prob is not None:
            lstm_fw_cell = rnn.DropoutWrapper(lstm_fw_cell,output_keep_prob=self.dropout_keep_prob)
            lstm_bw_cell = rnn.DropoutWrapper(lstm_bw_cell,output_keep_prob=self.dropout_keep_prob)
            # bidirectional_dynamic_rnn: input: [batch_size, max_time, input_size]
            #                            output: A tuple (outputs, output_states)
            #                                    where:outputs: A tuple (output_fw, output_bw) containing the forward and the backward rnn output `Tensor`.
        outputs, _ = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell,lstm_bw_cell,self.embedded_words,dtype=tf.float32) # [batch_size,sequence_length,hidden_size] # 创建一个双向动态的RNN
        print('outputs:===>',outputs) # outputs:(<tf.Tensor 'bidirectional_rnn/fw/fw/transpose:0' shape=(?, 5, 100) dtype=float32>, <tf.Tensor 'ReverseV2:0' shape=(?, 5, 100) dtype=float32>))
        output_rnn = tf.concat(outputs,axis=2) # [batch_size,sequence_length,hidden_size * 2] todo 这个维度是time_major=False的时候，但是使用True的话更快一点。把最后（里）一个维度进行拼接，所以 *2

        # 3. second LSTM layer
        rnn_cell = rnn.BasicLSTMCell(self.hidden_size * 2)
        if self.dropout_keep_prob is not None:
            rnn_cell = rnn.DropoutWrapper(rnn_cell,output_keep_prob=self.dropout_keep_prob)
        _, final_state_c_h = tf.nn.dynamic_rnn(rnn_cell,output_rnn,dtype=tf.float32)
        final_state = final_state_c_h[1]        # todo output 包含隐层的输出，但是state的输出是（C，H）的结构,这里输出H ==>https://www.cnblogs.com/lovychen/p/9294624.html

        # 4. FC layer
        output = tf.layers.dense(final_state,self.hidden_size * 2 ,activation=tf.nn.tanh)

        # 5. logits(use linear layer)
        with tf.name_scope('output'):# inputs: a tensor of shape [batch_size,dim]
            logits = tf.matmul(output,self.W_projection) + self.b_projection # [batch_size,num_classes]
        return logits

    def loss(self,l2_lambda=1e-4):  # todo 研究一下参数的设置
        with tf.name_scope('loss'):
            #input: `logits` and `labels` must have the same shape `[batch_size, num_classes]`
            #output: A 1-D `Tensor` of length `batch_size` of the same type as `logits` with the softmax cross entropy loss.
            losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.input_y,logits=self.logits)
            '''
            sparse_softmax_cross_entropy_with_logits中 lables接受直接的数字标签 
            如[1], [2], [3], [4] （类型只能为int32，int64） 
            而softmax_cross_entropy_with_logits中 labels接受one-hot标签 
            如[1,0,0,0], [0,1,0,0],[0,0,1,0], [0,0,0,1] （类型为int32， int64）
            相当于sparse_softmax_cross_entropy_with_logits 对标签多做一个one-hot动作
            '''
            loss = tf.reduce_mean(losses)
            l2_losses = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'bias' not in v.name]) * l2_lambda
            loss = loss + l2_losses
        return loss


    def train(self):
        '''based on the loss ,use SGD to update parameter'''
        train_op = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss_val)
        return train_op


# test start
def test():
    num_classes =10
    learning_rate = 1e-3
    batch_size = 8
    decay_steps = 1000
    decay_rate = 0.9
    sequence_length = 5
    vocab_size = 10000
    embed_size = 100
    is_training =  True
    dropout_keep_prob = 1
    textRNN = TextRNN(num_classes,learning_rate,batch_size,decay_steps,decay_rate,sequence_length,vocab_size,embed_size,is_training)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for i in range(100):
            input_x = np.zeros((batch_size,sequence_length))    # [None,self.sequence_length]
            input_y =np.array([3,0,5,1,7,2,3,0]) # np.zeros((batch_size),dtype=np.int32) # [None,self.sequence_length]
            loss, acc, predict, _ = sess.run([textRNN.loss_val,textRNN.accuracy,textRNN.predictions,textRNN.train_op],feed_dict={textRNN.input_x:input_x,textRNN.input_y:input_y,textRNN.dropout_keep_prob:dropout_keep_prob})
            print('loss:{}  acc:{}  label:{} predict:{}'.format(loss,acc,input_y,predict))


if __name__ == '__main__':
    test()

笑给我看

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
TextRNN及与其他模型的组合

首先复习一下基础的RNN结构如下所示开始正题一、LSTM模型LSTM（Long Short-Term Memory）是长短期记忆网络，是一种时间递归神经网络，适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。就是所谓的该记得会一直传递，不该记得就被“忘记”。LSTM“记忆细胞”变得稍微复杂了一点1.1 细胞状态细胞状态类似于传送带。直接在整个链上运行，只有一些少量的...
复制链接

扫一扫