Topic Aware Neural Response Generation

最新推荐文章于 2021-04-12 21:48:55 发布

imperfect00

最新推荐文章于 2021-04-12 21:48:55 发布

阅读量3k

点赞数 2

分类专栏： NLP

本文链接：https://blog.csdn.net/u011961856/article/details/77431115

版权

该博客介绍了如何在seq2seq模型中引入话题信息，使用tase2seq模型进行对话生成。通过Twitter LDA模型获取话题词，并利用Colapsed Gibbs Sampling训练模型。在解码阶段，结合message attention和topic attention来约束输出，使其既与输入信息相关又受输入话题影响。在实践中，遇到的调试问题包括输入维度不匹配、GPU数据类型转换等，通过修改数据处理方式和THEANO_FLAGS参数解决了这些问题。

摘要由CSDN通过智能技术生成

tase2seq模型

整体模型结果图如下:
这里写图片描述

图2给出了topic avare seq2seq模型,该模型在seq2seq的基础上,通过一个联合attenton机制和一个偏置生成概率引入topic 信息.

topic word的获取

采用twitter lda模型,每个输入语句x,对应一个topic z,对于topic z,语句x中语该topic有关的字有n个,取n=100,表示为K,利用输入语句x,topic words k,以及输出y,训练生成模型.

lda模型参数的估计采用colapsed gibbs sampling 算法(zhao et al.2011),之后我们使用该模型得到x的topic 在,选取概率最大的n个字作为topic words.

在学习的过程中,我们需要每个topic word的一个向量表示,首先计算topic word w的一个分布,公式为:

这里写图片描述

式中, $C_{wz}$ 为在训练过程中,w被赋值给topic z 的次数,这样我们就可以得到topic words的向量表示的分布.

在实验中,我们训练lda模型使用大规模的新浪微博语料.数据提供了topic 信息以及聊天对,以便我们训练对话生成模型.

除了lda模型,我们也可以使用tag recommendation,或者关键词抽取来生成topic words,也可以从其他资源如wikipedia以及其他web 文件得到topic words.

lda模型训练：

从新浪微博爬取３千万语句来训练lda模型，设置topic数为Ｔ＝２００，并设置lda模型参数， $\alpha=1/T$ , $\beta=0.01$ $,$ $\gamma＝0.01$ ，对于每个topic，选取前１００个字作为topic words，为了滤除普遍性的词，我们计算这３千万语句中字的字频，滤除频率最大的２０００个字，得到topic　wods　字典，在此字典外的字为unk．

seq2seq

(1)在encoding 阶段,通过bi gru对输入x进行encoder,得到输出 ${h_t}_{t=1}^T$ ，对应代码如下:

class BidirectionalEncoder(Initializable):
    """Encoder of RNNsearch model."""

    def __init__(self, vocab_size, embedding_dim, state_dim, **kwargs):
        super(BidirectionalEncoder, self).__init__(**kwargs)
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.state_dim = state_dim

        self.lookup = LookupTable(name='embeddings')
        self.bidir = BidirectionalWMT15(
            GatedRecurrent(activation=Tanh(), dim=state_dim))
        self.fwd_fork = Fork(
            [name for name in self.bidir.prototype.apply.sequences
             if name != 'mask'], prototype=Linear(), name='fwd_fork')
        self.back_fork = Fork(
            [name for name in self.bidir.prototype.apply.sequences
             if name != 'mask'], prototype=Linear(), name='back_fork')

        self.children = [self.lookup, self.bidir,
                         self.fwd_fork, self.back_fork]

    def _push_allocation_config(self):
        self.lookup.length = self.vocab_size
        self.lookup.dim = self.embedding_dim

        self.fwd_fork.input_dim = self.embedding_dim
        self.fwd_fork.output_dims = [self.bidir.children[0].get_dim(name)
                                     for name in self.fwd_fork.output_names]
        self.back_fork.input_dim = self.embedding_dim
        self.back_fork.output_dims = [self.bidir.children[1].get_dim(name)
                                      for name in self.back_fork.output_names]

    @application(inputs=['source_sentence', 'source_sentence_mask'],
                 outputs=['representation'])
    def apply(self, source_sentence, source_sentence_mask):
        # Time as first dimension
        source_sentence = source_sentence.T
        source_sentence_mask = source_sentence_mask.T

        embeddings = self.lookup.apply(source_sentence)

        representation = self.bidir.apply(
            merge(self.fwd_fork.apply(embeddings, as_dict=True),
                  {
  'mask': source_sentence_mask}),
            merge(self.back_fork.apply(embeddings, as_dict=True),
                  {
  'mask': source_sentence_mask})
        )
        return representation#[seq_len,batch_size,2*state_dim]

同时，根据等式(4)计算的到的topic　words的向量表示表，查找当前信息x的topic words　k的向量表示，即图中hyderate,skin,face，facemask,moisturize为topic　words,k1,k2,k3,…,k10为查找表得到的向量表示．

在代码实现中,则是将topic words输入MLP中,得到topic representation,代码如下:

class topicalq_transformer(Initializable):

    def __init__(self, vocab_size, topical_embedding_dim, state_dim,word_num,batch_size,
                 **kwargs):
        super(topicalq_transformer, self).__init__(**kwargs)
        self.vocab_size = vocab_size;
        self.word_embedding_dim = topical_embedding_dim;
        self.state_dim = state_dim;
        self.word_num=word_num;
        self.batch_size=batch_size;
        self.look_up=LookupTable(name='topical_embeddings');
        self.transformer=MLP(activations=[Tanh()],
                                dims=[self.word_embedding_dim*self.word_num, self.state_dim],
                                name='topical_transformer');
        self.children = [self.look_up,self.transformer];

    def _push_allocation_config(self):
        self.look_up.length = self.vocab_size
        self.look_up.dim = self.word_embedding_dim


    # do we have to push_config? remain unsure
    @application(inputs=['source_topical_word_sequence'],
                 outputs=['topical_embedding'])
    def apply(self, source_topical_word_sequence):#suource topic words
        # Time as first dimension
        source_topical_word_sequence=source_topical_word_sequence.T#[word_num,batch_size]
        word_topical_embeddings = self.look_up.apply(source_topical_word_sequence)#[word_num,batch_size,embedding_dim]
        word_topical_embeddings&#