dialog如何利用context知识

最新推荐文章于 2024-01-28 00:00:50 发布

zixufang

最新推荐文章于 2024-01-28 00:00:50 发布

阅读量590

点赞数

分类专栏： DST学习

本文链接：https://blog.csdn.net/yagreenhand/article/details/103300394

版权

DST学习专栏收录该内容

7 篇文章 1 订阅

订阅专栏

对话里面，有些模型是用了完整的context，有些模型是只用了上一句（我没有做完整统计），
1.到底哪一种更好呢？
2.如果采用第一种，能不能用好呢？

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Sharp Nearby, Fuzzy Far Away How Neural Language Models Use Context
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Pretraining Methods for Dialog Context Representation Learning
(感觉3和4都是永大规模数据集做类似bert的事情，对我来说意义不大，我是想该模型，而不是说发现关系，还是看看学习学习）
STRUCTBERT: INCORPORATING LANGUAGE STRUCTURES INTO PRE-TRAINING FOR DEEP LANGUAGE UNDERSTANDING

第一篇，ACL2019讲的什么：（感觉是把2的研究对象抽象到更高一层）
4个数据集上，10中调整方式(语句级别和句内单词级别)，判断产生的回答变化大不大。
10种调整方式包括：随意打散，完全翻转，删掉；丢掉动词，丢掉名词，等
结果：
1.lstm-seq2seq比transformer-seq2seq能从context学到更多，因为结果显示它的影响更大，
2.transformer-seq2seq主要是学习词层面
3.lstm-seq2seq with attention降得最多，因为它集成的信息比较多，context早期的信息在它的模型占得比额更多。
启示：time信息还是比较重要的，时序模型对对话更有帮助

第二篇，1的引用，
参考https://zhuanlan.zhihu.com/p/38470368，https://juejin.im/post/5b177596f265da6e1e1adad7
针对lstm工作，在长文本编码的情况下，lstm能学到多少
重要结论：
我们对神经语言模型（LM）如何利用先验的语言上下文知之甚少。本文通过控制变量研究探讨了上下文在 LSTM 语言模型中的作用。具体而言，本文分析了当先验的上下文中的单词被调序、替换或删除时模型困惑度的增加。在两个标准的数据集（Penn Treebank 和 WikiText-2）上，我们发现
1.模型能够平均利用大约 200 个单词组成的上下文，
2.但是能明显地将近邻的上下文（最近的 50 个单词）和过去的长距离上下文区分开来。模型对最近邻的句子中的词序变化十分敏感，但是长距离上下文（超过 50 个单词）的词序变化可以忽略不计，这说明长距离上下文中过去的单词仅仅被建模为一个模糊的语义场或主题。
3.我们进一步发现神经缓存模型（Grave et al., 2017b）特别地有助于 LSTM 从这种长距离上下文中复制单词。综上所述，本文的分析提供了一个对「语言模型如何利用它们的上下文」这一问题的更好理解，也启发了基于缓存的模型在近期取得成功的原因解释。
4.非频繁出现的单词比频繁出现的单词需要更多的上下文。实词比功能词需要更多的上下文。(标注：生成的某一类词的loss)
5.context中的实词比功能词(介词，连词)更加重要。将距离target word最近的若干个单词序列中的content words或者function words去掉（drop）。

In this framework, we adopt flexible attention mechanisms to fully leverage the bi-directional context and the unidirectional characteristic of language generation.； discrete latent variables to tackle with the natural born one-tomany mapping problem in response generation

First of all, to reduce the gap between data distributions, large-scale Reddit and Twitter conversations are utilized to further pre-train the generation model (upon the basis of language models pre-trained with general text). Secondly, to mitigate the difference of training modes, a flexible paradigm integrating uni- and bi-directional processing is employed in this work, which is inspired by the latest unified language modeling (Dong et al., 2019). Thirdly, a discrete latent variable is introduced to model the one-to-many relationship among utterances in conversations.
Each value of the latent variable corresponds to the particular conversational intent of one response, denoted as latent speech act.
感觉他的z是选择的，给定context还是可以选z，而不是说context->z。公式 $p (r ∣ c, z)$

像这样的大手笔我们不好用也不好模仿，还是看看就好。

4.也是句子级别的训练，没有代码，个人感觉利用价值不高。是每个句子经过lstm然后通过句子h经过lstm

5.structBert
名字吸引了我，
在这里插入图片描述
感觉作用不是很大，参见https://posts.careerengine.us/p/5dcecedc9c5b8348be6ed4ee?nav=post_newest&p=5a11f3b4fa62953b4d8742a1
啊啊啊啊啊啊啊啊啊啊啊感觉检索式的对话也要看，因为涉及到context modeling好累

zixufang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
dialog如何利用context知识

对话里面，有些模型是用了完整的context，有些模型是只用了上一句（我没有做完整统计），1.到底哪一种更好呢？2.如果采用第一种，能不能用好呢？Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical StudySharp Nearby, Fuzzy Far Away How Neur...
复制链接

扫一扫