Dynamic Memory Networks模型用于文本分类

最新推荐文章于 2023-06-25 16:17:04 发布

weixin_43838622

最新推荐文章于 2023-06-25 16:17:04 发布

阅读量519

点赞数 2

分类专栏：自然语言处理文章标签： nlp 文本分类

本文链接：https://blog.csdn.net/weixin_43838622/article/details/86577912

版权

自然语言处理专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Dynamic Memory Networks模型用于文本分类

模型

模型主要包含四个模块：提问、回答、记忆存储、输入。
输入模块首先计算问题和输入得文本向量表示，然后根据问题计算attention，并以此选择和问题相关的输入。然后记忆存储模块会根据问题和输入迭代存储记忆，并以最后的时序向量作为答案模块的输入，答案模块结合问题和该向量输出答案。

在这里插入图片描述

以上图为例，输入时8句话，问题是Where is the football？下面分模块解析求解过程

输入模块

输入主要采用GRU单元进行编码，分两种情况：
1、当输入是一个句子时，直接喂到GRU单元，输出句长个向量表示，这时后续的attention会选择最相关的词
2、当输入是多个句子时，将句子用特殊标识符连接，然后将每个特殊标识符处的隐藏状态输出，这时attention会选择和句子相关的句子

问题模块

也采用GRU单元编码，不同的是这里只需输出最后的隐藏状态，而输入模块需输出全部的隐藏状态，熟悉tensorflow的朋友一定知道tf.nn.dynamic_rnn，这个函数的输出是一个拥有两个值得矢量，其中一个是RNN得最后隐藏状态，一个是全部的状态

记忆存储模块

这一块分两部分，attention和记忆更新
1、attention：attention的计算如下图，在i次迭代的t时刻参数分别为输入向量（8句话的向量表示，迭代过程中保持不变）、上一时刻的记忆、问题向量表示
在这里插入图片描述
其中：

这里利用了多种向量相似性计算方法，并拼接后输入到两层神经网络中计算attention值。

2、记忆更新：在第i次迭代，计算如下图，第0次迭代m的初始值为q，c_t每次迭代都保存不变，为输入（8个句子）向量，h_t为隐藏向量表示，每轮迭代中，将每个时刻的隐藏向量表示和记忆向量上一时刻的向量放入GRU单元，更新记忆向量m。
在这里插入图片描述
在模块实例图中，以8个句子和“Where is the football”作为问题
第一次迭代：通过q和input以及m（初始为q)计算attention，找到和q最相关的句子是s₇:“John put down the football”，并给予高权重g，以此更新记忆向量m，例如记住‘john’。
第二次迭代：通过q和input、m计算attention，此时m记住了“john”，而不是q了，找到的最相关句子为S₂和S₆，它们都包含了“john”，接着继续更新记忆向量，最后输出到回答模块

回答模块

如下计算表示，a₀=m（m为记忆模块最后的记忆）
在这里插入图片描述

DMN简单实现

记忆更新门控单元：这里用到了Attention based GRU，把g添加到GRU内部。

    def gated_gru(self,c_current,h_previous,g_current):
        """
        gated gru to get updated hidden state
        :param  c_current: [batch_size,embedding_size]
        :param  h_previous:[batch_size,hidden_size]
        :param  g_current: [batch_size,1]
        :return h_current: [batch_size,hidden_size]
        """
        # 1.compute candidate hidden state using GRU.
        h_candidate=self.gru_cell(c_current, h_previous,"gru_candidate_sentence") #[batch_size,hidden_size]
        # 2.combine candidate hidden state and previous hidden state using weight(a gate) to get updated hidden state.
        h_current=tf.multiply(g_current,h_candidate)+tf.multiply(1-g_current,h_previous) #[batch_size,hidden_size]
        return h_current

这部分采用了DMN+中的实现，

在这里插入图片描述
最后一步和传统GRU不同

因此

attention实现


    def attention_mechanism_parallel(self,c_full,m,q,i):
        """ parallel implemtation of gate function given a list of candidate sentence, a query, and previous memory.
        Input:
           c_full: candidate fact. shape:[batch_size,story_length,hidden_size]
           m: previous memory. shape:[batch_size,hidden_size]
           q: question. shape:[batch_size,hidden_size]
        Output: a scalar score (in batch). shape:[batch_size,story_length]
        """
        q=tf.expand_dims(q,axis=1) #[batch_size,1,hidden_size]
        m=tf.expand_dims(m,axis=1) #[batch_size,1,hidden_size]

        # 1.define a large feature vector that captures a variety of similarities between input,memory and question vector: z(c,m,q)
        c_q_elementwise=tf.multiply(c_full,q)          #[batch_size,story_length,hidden_size]
        c_m_elementwise=tf.multiply(c_full,m)          #[batch_size,story_length,hidden_size]
        c_q_minus=tf.abs(tf.subtract(c_full,q))        #[batch_size,story_length,hidden_size]
        c_m_minus=tf.abs(tf.subtract(c_full,m))        #[batch_size,story_length,hidden_size]
        # c_transpose Wq
        c_w_q=self.x1Wx2_parallel(c_full,q,"c_w_q"+str(i))   #[batch_size,story_length,hidden_size]
        c_w_m=self.x1Wx2_parallel(c_full,m,"c_w_m"+str(i))   #[batch_size,story_length,hidden_size]
        # c_transposeWm
        q_tile=tf.tile(q,[1,self.story_length,1])     #[batch_size,story_length,hidden_size]
        m_tile=tf.tile(m,[1,self.story_length,1])     #[batch_size,story_length,hidden_size]
        z=tf.concat([c_full,m_tile,q_tile,c_q_elementwise,c_m_elementwise,c_q_minus,c_m_minus,c_w_q,c_w_m],2) #[batch_size,story_length,hidden_size*9]
        # 2. two layer feed foward
        g=tf.layers.dense(z,self.hidden_size*3,activation=tf.nn.tanh)  #[batch_size,story_length,hidden_size*3]
        g=tf.layers.dense(g,1,activation=tf.nn.sigmoid)                #[batch_size,story_length,1]
        g=tf.squeeze(g,axis=2)                                         #[batch_size,story_length]
        return g
    def x1Wx2_parallel(self,x1,x2,scope):
        """
        :param x1: [batch_size,story_length,hidden_size]
        :param x2: [batch_size,1,hidden_size]
        :param scope: a string
        :return:  [batch_size,story_length,hidden_size]
        """
        with tf.variable_scope(scope):
            x1=tf.reshape(x1,shape=(self.batch_size,-1)) #[batch_size,story_length*hidden_size]
            x1_w=tf.layers.dense(x1,self.story_length*self.hidden_size,use_bias=False) #[self.hidden_size, story_length*self.hidden_size]
            x1_w_expand=tf.expand_dims(x1_w,axis=2)     #[batch_size,story_length*self.hidden_size,1]
            x1_w_x2=tf.matmul(x1_w_expand,x2)           #[batch_size,story_length*self.hidden_size,hidden_size]
            x1_w_x2=tf.reshape(x1_w_x2,shape=(self.batch_size,self.story_length,self.hidden_size,self.hidden_size))
            x1_w_x2=tf.reduce_sum(x1_w_x2,axis=3)      #[batch_size,story_length,hidden_size]
            return x1_w_x2

模型实现完整代码见github：https://github.com/DLZWY/Machine-Learning-Checklist/tree/master/nlp/dmn
其中实现了dmn用于多标签文档分类的方法。其中的query和input相同，所有attention求的是与query最相关的词

weixin_43838622

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Dynamic Memory Networks模型用于文本分类

Dynamic Memory Networks模型用于文本分类模型模型主要包含四个模块：提问、回答、记忆存储、输入。输入模块首先计算问题和输入得文本向量表示，然后根据问题计算attention，并以此选择和问题相关的输入。然后记忆存储模块会根据问题和输入迭代存储记忆，并以最后的时序向量作为答案模块的输入，答案模块结合问题和该向量输出答案。以上图为例，输入时8句话，问题是Where is ...
复制链接

扫一扫