[coursera/SequenceModels/week3]Sequence models & Attention mechanism (summary&question)

最新推荐文章于 2021-08-01 22:43:31 发布

置顶 gdtop818

最新推荐文章于 2021-08-01 22:43:31 发布

阅读量7k

点赞数

本文链接：https://blog.csdn.net/weixin_37993251/article/details/79334583

版权

deep learning 同时被 3 个专栏收录

32 篇文章 0 订阅

订阅专栏

machine learning

31 篇文章 0 订阅

订阅专栏

coursera_deep_learning

25 篇文章 25 订阅

订阅专栏

3.1 Various sequence to sequence architectures

3.1.1 Basic Models

3.1.2 Picking the most likely sentence

conditional probability

pick most likely sentence

Greedy search(not useful)

3.1.3 Beam Search

example

3.1.4 Refinements to Beam Search

3.1.5 Error analysis in beam search

3.1.6 Attention model

3.2 Speech recognition-Audio data

Q&A

9.B

10. A

1 / 1 points

1. Question 1

Consider using this encoder-decoder model for machine translation.

This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.

True

False

Correct

Question 2

Correct

1 / 1 points

2. Question 2

In beam search, if you increase the beam width B, which of the following would you expect to be true? Check all that apply.

Beam search will run more slowly.

Correct

Beam search will use up more memory.

Correct

Beam search will generally find better solutions (i.e. do a better job maximizing P(y∣x))

Correct

Beam search will converge after fewer steps.

Un-selected is correct

Question 3

Correct

1 / 1 points

3. Question 3

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.

True

Correct

False

Question 4

Correct

1 / 1 points

4. Question 4

Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip x to a text transcript y. Your algorithm uses beam search to try to find the value of ythat maximizes P(y∣x).

On a dev set example, given an input audio clip, your algorithm outputs the transcript y^= “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript y∗= “I’m building an AI system in Silicon Valley.”

According to your model,

P(y^∣x)=1.09∗10−7

P(y∗∣x)=7.21∗10−8

Would you expect increasing the beam width B to help correct this example?

No, because P(y∗∣x)≤P(y^∣x) indicates the error should be attributed to the RNN rather than to the search algorithm.

Correct

No, because P(y∗∣x)≤P(y^∣x) indicates the error should be attributed to the search algorithm rather than to the RNN.

Yes, because P(y∗∣x)≤P(y^∣x) indicates the error should be attributed to the RNN rather than to the search algorithm.

Yes, because P(y∗∣x)≤P(y^∣x) indicates the error should be attributed to the search algorithm rather than to the RNN.

Question 5

Correct

1 / 1 points

5. Question 5

Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, P(y∗∣x)>P(y^∣x). This suggest you should focus your attention on improving the search algorithm.

True.

Correct

False.

Question 6

Correct

1 / 1 points

6. Question 6

Consider the attention model for machine translation.

Further, here is the formula for α<t,t′>.

Which of the following statements about α<t,t′> are true? Check all that apply.

We expect α<t,t′> to be generally larger for values of a<t′> that are highly relevant to the value the network should output for y<t>. (Note the indices in the superscripts.)

Correct

We expect α<t,t′> to be generally larger for values of a<t> that are highly relevant to the value the network should output for y<t′>. (Note the indices in the superscripts.)

Un-selected is correct

∑tα<t,t′>=1 (Note the summation is over t.)

Un-selected is correct

∑t′α<t,t′>=1 (Note the summation is over t′.)

Correct

Question 7

Correct

1 / 1 points

7. Question 7

The network learns where to “pay attention” by learning the values e<t,t′>, which are computed using a small neural network:

We can't replace s<t−1> with s<t> as an input to this neural network. This is because s<t>depends on α<t,t′> which in turn depends on e<t,t′>; so at the time we need to evalute this network, we haven’t computed s<t> yet.

True

Correct

False

Question 8

Correct

1 / 1 points

8. Question 8

Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:

The input sequence length Tx is large.

Correct

The input sequence length Tx is small.

Question 9

Incorrect

0 / 1 points

9. Question 9

Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to?

__c_oo_o_kk___b_ooooo__oo__kkk

cokbok

This should not be selected

cookbook

cook book

coookkboooooookkk

Question 10

Incorrect

0 / 1 points