【吴恩达深度学习】05_week1_quiz Recurrent Neural Networks

(1)Suppose your training examples are sentences (sequences of words). Which of the following refers to the jth word in the ith training example?
[A] x ( i ) < j > x^{(i)<j>} x(i)<j>
[B] x < i > ( j ) x^{<i>(j)} x<i>(j)
[C] x ( j ) < i > x^{(j)<i>} x(j)<i>
[D] x < j > ( i ) x^{<j>(i)} x<j>(i)

答案:A

(2)Consider this RNN:
在这里插入图片描述
This specific type of architecture is appropriate when:
[A] T x = T y T_x=T_y Tx=Ty
[B] T x < T y T_x<T_y Tx<Ty
[C] T x > T y T_x>T_y Tx>Ty
[D] T x = 1 T_x=1 Tx=1
答案:A
解析:如图所示,输入和输出序列长度相等。

(3)To which of these tasks would you apply a many-to-one RNN architecture?(Check all that apply)
在这里插入图片描述
[A]Speech recognition (input an audio clip and output a transcript)
[B]Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment)
[C]Image classification (input an image and output a label)
[D]Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender)

答案:B,D
关键词:many-to-one

(4)You are training this RNN language model.
在这里插入图片描述
At the tth time step, what is the RNN doing? Choose the best answer.
[A]Estimating P ( y < 1 > , y < 2 > , . . . , y < t − 1 > ) P(y^{<1>},y^{<2>},...,y^{<t-1>}) P(y<1>,y<2>,...,y<t1>)
[B]Estimating P ( y < t > ) P(y^{<t>}) P(y<t>)
[C]Estimating P ( y < t > ∣ y < 1 > , y < 2 > , . . . , y < t − 1 > ) P(y^{<t>}|y^{<1>},y^{<2>},...,y^{<t-1>}) P(y<t>y<1>,y<2>,...,y<t1>)
[D]Estimating P ( y < t > ∣ y < 1 > , y < 2 > , . . . , y < t > ) P(y^{<t>}|y^{<1>},y^{<2>},...,y^{<t>}) P(y<t>y<1>,y<2>,...,y<t>)

答案:C

(5)You have finished training a language model RNN and are using it to sample random sentences, as follow:
在这里插入图片描述
What are you doing at each time step t?
[A] (i)Use the probabilities output by the RNN to pick the highest probability word for that time-step as y < t > y^{<t>} y<t>. (ii)Then pass the ground-truth word from the training set to the next time-step.
[B] (i)Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as y < t > y^{<t>} y<t>. (ii)Then pass the ground-truth word from the training set to the next time-step.
[C] (i)Use the probabilities output by the RNN to pick the highest probability word for that time-step as y < t > y^{<t>} y<t>. (ii)Then pass the selected word to the next time-step.
[D] (i)Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as y < t > y^{<t>} y<t>. (ii)Then pass the selected word to the next time-step.

答案:D

(6)You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?
[A]Vanishing gradient problem.
[B]Exploding gradient problem.
[C]ReLU activation function g(.) used to compute g(z), where z is too large.
[D]Sigmoid activation function g(.) used to compute g(z), where z is too large.

答案:B

(7)Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100 dimensional activations a < t > a^{<t>} a<t>. What is the dimension of Γ u \Gamma_u Γu at each time step?
[A] 1
[B] 100
[C] 300
[D] 10000

答案:B
解析: Γ u \Gamma_u Γu的维度和激活函数的维度相同。

(8)Here’re the update equations for the GRU.
在这里插入图片描述
Alice proposes to simplify the GRU by always removing the Γ u \Gamma_u Γu. I.e.,setting Γ u = 1 \Gamma_u=1 Γu=1. Betty proposes to simplify the GRU by removing the Γ r \Gamma_r Γr. I.e.,setting Γ r = 1 \Gamma_r=1 Γr=1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences.
[A]Alice’s model (removing Γ u \Gamma_u Γu), because if Γ r ≈ 0 \Gamma_r \approx 0 Γr0 for a timestep, the gradient can propagate back through that timestep without much decay.
[B]Alice’s model (removing Γ u \Gamma_u Γu), because if Γ r ≈ 1 \Gamma_r \approx 1 Γr1 for a timestep, the gradient can propagate back through that timestep without much decay.
[C]Betty’s model (removing Γ r \Gamma_r Γr), because if Γ u ≈ 0 \Gamma_u \approx 0 Γu0 for a timestep, the gradient can propagate back through that timestep without much decay.
[D]Betty’s model (removing Γ r \Gamma_r Γr), because if Γ u ≈ 1 \Gamma_u \approx 1 Γu1 for a timestep, the gradient can propagate back through that timestep without much decay.

答案:C
解析:要想梯度尽可能不消失,就要使 c < t > c^{<t>} c<t>尽可能依赖于 c < t − 1 > c^{<t-1>} c<t1>,可以参考残差网络的结构来理解。

(9)Here are the equations for the GRU and the LSTM:
在这里插入图片描述
From these, we can see that the Update Gate and Forget Gate in the LSTM play a role similar to _______ and _______ in the GRU. What should go in the the blanks?
[A] Γ u \Gamma_u Γu and 1 − Γ u 1-\Gamma_u 1Γu
[B] Γ u \Gamma_u Γu and Γ r \Gamma_r Γr
[C] 1 − Γ u 1-\Gamma_u 1Γu and Γ u \Gamma_u Γu
[D] Γ r \Gamma_r Γr and Γ u \Gamma_u Γu

答案:A

(10)You have a pet dog whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x < 1 > , . . . , x < 365 > x^{<1>},...,x^{<365>} x<1>,...,x<365>. You’ve also collected data on your dog’s mood, which you represent as y < 1 > , . . . , y < 365 > y^{<1>},...,y^{<365>} y<1>,...,y<365>. You’d like to build a model to map from x → y x \rightarrow y xy. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?
[A]Bidirectional RNN, because this allows the prediction of mood on day t to take into account more information.
[B]Bidirectional RNN, because this allows backpropagation to compute more accurate gradients.
[C]Unidirectional RNN, because the value of y < t > y^{<t>} y<t> depends only on x < 1 > , . . . , x < t > x^{<1>},...,x^{<t>} x<1>,...,x<t>, but not on x < t + 1 > , . . . , x < 365 > x^{<t+1>},...,x^{<365>} x<t+1>,...,x<365>
[D]Unidirectional RNN, because the value of y < t > y^{<t>} y<t> depends only on x < t > x^{<t>} x<t>, and not other days’ weather.

答案:C
解析:a pet dog whose mood is heavily dependent on the current and past few days’ weather.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值