深度学习：自然语言生成-集束/柱搜索beam search和随机搜索random search

-柚子皮-

已于 2023-05-22 15:52:23 修改

阅读量3.3w

点赞数 10

CC 4.0 BY-SA版权

分类专栏：深度学习DeepLearning 文章标签：自然语言集束搜索

于 2017-10-31 16:31:47 首次发布

本文链接：https://blog.csdn.net/pipisorry/article/details/78404964

深度学习DeepLearning 专栏收录该内容

38 篇文章

订阅专栏

http://blog.csdn.net/pipisorry/article/details/78404964

当我们训练完成一个自然语言生成模型后，需要使用这个模型生成新的语言（句子），如何生成这些句子，使用如下的方法：采样，集束搜索，随机搜索。

Greedy Search/采样Sampling

在每个阶段都选择分值最高的项。此方法经常奏效，但显然不是最优的。just sample the first word according to p1 , then provide the corresponding embedding as input and sample p2 , continuing like this until we sample the special end-of-sentence token or some maximum length.

BeamSearch:集束搜索/柱搜索

在sequence2sequence模型[深度学习：Seq2seq模型 ]中，beam search的方法只用在测试的情况（decoder解码的时候），因为在训练过程中，每一个decoder的输出是有正确答案的，也就不需要beam search去加大输出的准确率。

predict阶段的decoder

BeamSearch: iteratively consider the set of the k best sentences up to time t as candidates to generate sentences of size t þ 1, and keep only the resulting best k of them.
bs很好地近似了。beam search只是一个搜索策略，对于语言生成的模型中，你给定语言模型，它可以搜索出更差异化、更合理的结果。beam search功能上等价于最简单的单步最大概率，或者viterbi算法等等。

维特比算法和beam search

lz: BS极端情况：beam size 为 1 就是 greedy search；而beam size为整个词典大小时，这时就是viterbi算法[CRF学习和预测_条件随机场预测的维比特算法]，当然此时词典太大，解不出来，所以就只能用bs=某个值时的bs。

beam search 的操作属于贪心算法思想，不一定reach到全局最优解。因为考虑到seq2seq的inference阶段的搜索空间过大而导致的搜索效率降低，所以即使是一个相对的局部优解在工程上也是可接受的。seq2seq中，beam search 是为了找出词表所构成的token组合路径。词表所构成的搜索空间是所有可能的输出token，数量非常非常庞大，比如下面的词表：{‘你’，‘说的’，‘是’，‘对的’，‘……’}，可以非常长；
viterbi属于动态规划思想，保证有最优解。viterbi应用到宽度较小的graph最优寻径是非常favorable的，毕竟，能reach到全局最优为何不用！hmm中，viterbi是为了找出隐状态所构成的隐状态符号组合路径。隐状态符号表所构成的搜索空间相对是比较窄的。比如是这样的隐状态符号：{‘O’，‘I’，‘B’}。

[条件随机场CRF - 学习和预测_条件随机场预测的维比特算法_-柚子皮-的博客-CSDN博客]

[HMM：隐马尔科夫模型 - 预测和解码 ]

[维特比算法和beam search]

BeamSearch示例

test的时候，假设词表大小为3（vocabSize=3），内容为a，b，</s>。

1：生成第1个词的时候，选择概率最大的2个词（beamSize=2），假设为a,b,那么当前序列就是a,b

2：生成第2个词的时候，我们将当前序列a和b，分别与词表中的所有词进行组合，得到新的6个序列aa ab a</s> ba bb b</s>,然后从其中选择2个得分最高的，作为当前序列，假如为aa bb

3：后面会不断重复这个过程，直到遇到结束符为止。最终输出2个得分最高的序列。

[seq2seq中的beam search算法过程?] [seq2seq中的beam search算法过程]

Beam size的选择

一般越大结果越好：if the model was well trained and the likelihood was aligned with human judgement, increasing the beam size should always yield better sentences.

如果不是可能是过拟合了或者评估标准不一致：The fact that we obtained the best performance with a relatively small beam size is an indication that either the model has overfitted or the objective function used to train it (likelihood) is not aligned with human judgement.

减小beam size会使生成的句子更新颖：reducing the beam size (i.e., with a shallower search over sentences), we increase the novelty of generated sentences. Indeed, instead of generating captions which repeat training captions 80 percent of the time, this gets reduced to 60 percent.

减小beam size结果好说明overfit了，同时也说明减小beam siez是regularize的一种方法。This hypothesis supports the fact that the model has overfitted to the training set, and we see this reduced beam size technique as another way to regularize (by adding some noise to the inference process).

[Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. TPAMI2017]

随机搜索random search

...

某小皮

Scheduled Sampling

curriculum learning strategy: proposed a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous word, towards a less guided scheme which mostly uses the model generated word instead.

[S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” NIPS2015.]

from: 深度学习：自然语言生成-集束/柱搜索beam search和随机搜索random search_-柚子皮-的博客-CSDN博客

ref: