【吴恩达深度学习】05_week3_quiz Sequence models & Attention mechanism

(1)Consider using this encoder-decoder model for machine translation.
在这里插入图片描述
This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.
[A]True
[B]False

答案:B
解析:输入的是句子x的特征,不是概率。

(2)In beam search, if you increase the beam width B, which of the following would you expect to be true? Check all that apply.
[A]Beam search will run more slowly.
[B]Beam search will use up more memory.
[C]Beam search will generally find better solutions (i.e. do a better job maximizing P(y|x) )
[D]Beam search will converge after fewer steps.

答案:A,B,C
解析:beam search 是每一步选择B个概率最大的,B越大,则选择的句子越多,运行的也越慢,内存消耗也越多,但是得到的结果会更好。

(3)In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.
[A]True
[B]False

答案:A
解析:beam search需要最大化 ∏ t = 1 T y P ( y < t > ∣ x , y < 1 > , . . . , y < t − 1 > ) \prod_{t=1}^{T_y}{P\left( y^{<t>}|x,y^{<1>},...,y^{<t-1>} \right)} t=1TyP(y<t>x,y<1>,...,y<t1>)
其中每一项都是小于1的,所以越乘概率会越小,在没有归一化的情况下,通常短句子的概率更大些。

(4)Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip x x x to a text transcript y y y. Your algorithm uses beam search to try to find the value of y y y that maximizes P ( y ∣ x ) P(y|x) P(yx).
On a dev set example, given an input audio clip, your algorithm outputs the transcript y ^ = " I ′ m   b u i l d i n g   a n   A   E y e   s y s t e m   i n   S i l l y   c o n   V a l l e y . " \hat{y}="I'm\ building\ an\ A\ Eye\ system\ in\ Silly\ con\ Valley." y^="Im building an A Eye system in Silly con Valley.", whereas a human gives a much superior transcript y ∗ = " I ′ m   b u i l d i n g   a n   A I   s y s t e m   i n   S i l i c o n   V a l l e y . " y^{*}="I'm\ building\ an\ AI\ system\ in\ Silicon\ Valley." y="Im building an AI system in Silicon Valley."
According to your model,
P ( y ^ ∣ x ) = 1.09 ∗ 1 0 − 7 P(\hat{y}|x)=1.09*10^{-7} P(y^x)=1.09107
P ( y ∗ ∣ x ) = 7.21 ∗ 1 0 − 8 P(y^{*}|x)=7.21*10^{-8} P(yx)=7.21108
Would you expect increasing the beam width B to help correct this example?
[A]No, because P ( y ∗ ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(yx)P(y^x) indicates the error should be attributed to the RNN rather than to the search algorithm.
[B]No, because P ( y ∗ ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(yx)P(y^x) indicates the error should be attributed to the search algorithm rather than to the RNN.
[C]Yes, because P ( y ∗ ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(yx)P(y^x) indicates the error should be attributed to the RNN rather than to the search algorithm.
[D]Yes, because P ( y ∗ ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(yx)P(y^x) indicates the error should be attributed to the search algorithm rather than to the RNN.

答案:A
解析:见3.5 Error analysis in beam search

(5)Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, P ( y ∗ ∣ x ) > P ( y ^ ∣ x ) P(y^{*}|x) > P(\hat{y}|x) P(yx)>P(y^x). This suggest you should focus your attention on improving the search algorithm.
[A]True
[B]False

答案:A

(6)Consider the attention model for machine translation.
在这里插入图片描述
Further, here is the formula for α < t , t ′ > \alpha ^{<t,t'>} α<t,t>
α < t , t ′ > = exp ⁡ ( e < t , t ′ > ) ∑ t ′ = 1 T x exp ⁡ ( e < t , t ′ > ) \alpha ^{<t,t'>}=\frac{\exp \left( e^{<t,t'>} \right)}{\sum_{t'=1}^{Tx}{\exp \left( e^{<t,t'>} \right)}} α<t,t>=t=1Txexp(e<t,t>)exp(e<t,t>)
Which of the following statements about α < t , t ′ > \alpha ^{<t,t'>} α<t,t> are true? Check all that apply.
[A]We expect α < t , t ′ > \alpha ^{<t,t'>} α<t,t> to be generally larger for value of a < t ′ > a^{<t'>} a<t> that are highly relevant to the value the network should output for y < t > y^{<t>} y<t>. (Note the indices in the superscripts.)
[B]We expect α < t , t ′ > \alpha ^{<t,t'>} α<t,t> to be generally larger for value of a < t > a^{<t>} a<t> that are highly relevant to the value the network should output for y < t ′ > y^{<t'>} y<t>. (Note the indices in the superscripts.)
[C] ∑ t a < t , t ′ > = 1 \sum_t{a^{<t,t'>}}=1 ta<t,t>=1 (Note the summation is over t)
[D] ∑ t ′ a < t , t ′ > = 1 \sum_{t'}{a^{<t,t'>}}=1 ta<t,t>=1 (Note the summation is over t’)

答案:A,D

(7)The network learns where to “pay attention” by learning the values e < t , t ′ > e^{<t,t'>} e<t,t>, which are computed using a small neural network:
We can’t replace s < t − 1 > s^{<t-1>} s<t1> with s < t > s^{<t>} s<t> as an input to this neural network. This is because s < t > s^{<t>} s<t> depends on α < t , t ′ > \alpha ^{<t,t'>} α<t,t> which in turn depends on e < t , t ′ > e^{<t,t'>} e<t,t>; so at the time we need to evaluate this network, we haven’t computed s < t > s^{<t>} s<t> yet.
[A]True
[B]False

答案:A
在这里插入图片描述

(8)Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:
[A]The input sequence length T x Tx Tx is large.
[B]The input sequence length T x Tx Tx is small.

答案:A
解析:
在这里插入图片描述
绿色是加入注意力机制以后的Bleu 评分,可以看到对于长句子,加入注意力机制能有效的提升翻译的准确性。

(9)Under the CTC model, identical repeated characters not separated by the “blank” character(_) are collapsed. Under the CTC model, what dpes the following string collapse to?
__coo_o_kk___b_ooooo__oo_kkk
[A]cokbok
[B]cookbook
[C]cook book
[D]coookkboooooookkk
答案:B
解析:CTC 损失函数的一个基本规则是将空白符之间的重复的字符折叠起来。

(10)in trigger word detection, x < t > x^{<t>} x<t> is:
[A]Features of the audio (such as spectrogram features) at time t.
[B]The t-th input word, represented as either a one-hot vector or a word embedding.
[C]Whether the trigger word is being said at time t.
[D]Whether someone has just finished saying the trigger word at time t.

答案:A
解析:见3.10 Trigger Word Detection

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值