【吴恩达深度学习】05_week3_quiz Sequence models & Attention mechanism

最新推荐文章于 2022-06-25 11:35:59 发布

深海里的鱼(・ω<)★

最新推荐文章于 2022-06-25 11:35:59 发布

阅读量1.6k

点赞数

分类专栏：人工智能，机器学习，深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/qq_50710984/article/details/124233613

版权

人工智能，机器学习，深度学习专栏收录该内容

48 篇文章 8 订阅

订阅专栏

(1)Consider using this encoder-decoder model for machine translation.
在这里插入图片描述
This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.
[A]True
[B]False
答案：B
解析：输入的是句子x的特征，不是概率。

(2)In beam search, if you increase the beam width B, which of the following would you expect to be true? Check all that apply.
[A]Beam search will run more slowly.
[B]Beam search will use up more memory.
[C]Beam search will generally find better solutions (i.e. do a better job maximizing P(y|x) )
[D]Beam search will converge after fewer steps.
答案：A,B,C
解析：beam search 是每一步选择B个概率最大的，B越大，则选择的句子越多，运行的也越慢，内存消耗也越多，但是得到的结果会更好。

(3)In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.
[A]True
[B]False
答案：A
解析：beam search需要最大化 $\prod_{t=1}^{T_y}{P\left( y^{<t>}|x,y^{<1>},...,y^{<t-1>} \right)}$
其中每一项都是小于1的，所以越乘概率会越小，在没有归一化的情况下，通常短句子的概率更大些。

(4)Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip $x$ to a text transcript $y$ . Your algorithm uses beam search to try to find the value of $y$ that maximizes $P (y ∣ x)$ .
On a dev set example, given an input audio clip, your algorithm outputs the transcript $\hat{y}="I'm\ building\ an\ A\ Eye\ system\ in\ Silly\ con\ Valley."$ , whereas a human gives a much superior transcript $y^{*}="I'm\ building\ an\ AI\ system\ in\ Silicon\ Valley."$
According to your model,
$P(\hat{y}|x)=1.09*10^{-7}$
$P(y^{*}|x)=7.21*10^{-8}$
Would you expect increasing the beam width B to help correct this example?
[A]No, because $P(y^{*}|x) \leq P(\hat{y}|x)$ indicates the error should be attributed to the RNN rather than to the search algorithm.
[B]No, because $P(y^{*}|x) \leq P(\hat{y}|x)$ indicates the error should be attributed to the search algorithm rather than to the RNN.
[C]Yes, because $P(y^{*}|x) \leq P(\hat{y}|x)$ indicates the error should be attributed to the RNN rather than to the search algorithm.
[D]Yes, because $P(y^{*}|x) \leq P(\hat{y}|x)$ indicates the error should be attributed to the search algorithm rather than to the RNN.
答案：A
解析：见3.5 Error analysis in beam search

(5)Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, $P(y^{*}|x) > P(\hat{y}|x)$ . This suggest you should focus your attention on improving the search algorithm.
[A]True
[B]False
答案：A

(6)Consider the attention model for machine translation.
在这里插入图片描述
Further, here is the formula for $\alpha ^{<t,t'>}$
$\alpha ^{<t,t'>}=\frac{\exp \left( e^{<t,t'>} \right)}{\sum_{t'=1}^{Tx}{\exp \left( e^{<t,t'>} \right)}}$
Which of the following statements about $\alpha ^{<t,t'>}$ are true? Check all that apply.
[A]We expect $\alpha ^{<t,t'>}$ to be generally larger for value of $a^{<t'>}$ that are highly relevant to the value the network should output for $y^{<t>}$ . (Note the indices in the superscripts.)
[B]We expect $\alpha ^{<t,t'>}$ to be generally larger for value of $a^{<t>}$ that are highly relevant to the value the network should output for $y^{<t'>}$ . (Note the indices in the superscripts.)
[C] $\sum_t{a^{<t,t'>}}=1$ (Note the summation is over t)
[D] $\sum_{t'}{a^{<t,t'>}}=1$ (Note the summation is over t’)
答案：A,D

(7)The network learns where to “pay attention” by learning the values $e^{<t,t'>}$ , which are computed using a small neural network:
We can’t replace $s^{<t-1>}$ with $s^{<t>}$ as an input to this neural network. This is because $s^{<t>}$ depends on $\alpha ^{<t,t'>}$ which in turn depends on $e^{<t,t'>}$ ; so at the time we need to evaluate this network, we haven’t computed $s^{<t>}$ yet.
[A]True
[B]False
答案：A
在这里插入图片描述

(8)Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:
[A]The input sequence length $T x$ is large.
[B]The input sequence length $T x$ is small.
答案：A
解析：
在这里插入图片描述
绿色是加入注意力机制以后的Bleu 评分，可以看到对于长句子，加入注意力机制能有效的提升翻译的准确性。

(9)Under the CTC model, identical repeated characters not separated by the “blank” character(_) are collapsed. Under the CTC model, what dpes the following string collapse to?
__coo_o_kk___b_ooooo__oo_kkk
[A]cokbok
[B]cookbook
[C]cook book
[D]coookkboooooookkk
答案：B
解析：CTC 损失函数的一个基本规则是将空白符之间的重复的字符折叠起来。

(10)in trigger word detection, $x^{<t>}$ is:
[A]Features of the audio (such as spectrogram features) at time t.
[B]The t-th input word, represented as either a one-hot vector or a word embedding.
[C]Whether the trigger word is being said at time t.
[D]Whether someone has just finished saying the trigger word at time t.
答案：A
解析：见3.10 Trigger Word Detection

深海里的鱼(・ω<)★

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
【吴恩达深度学习】05_week3_quiz Sequence models & Attention mechanism

(1)Consider using this encoder-decoder model for machine translation.This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.[A]True[B]False答案：B解析：输入的是句子x
复制链接

扫一扫