Andrew Ng Deep Learning 第三周 选择题

1.Which of the following are true? (Check all that apply.)
A. X X X is a matrix in which each row is one training example.

B. a 4 [ 2 ] a^{[2]}_4 a4[2] is the activation output by the 4 t h 4^{th} 4th neuron of the 2 n d 2^{nd} 2nd layer

C. a [ 2 ] ( 12 ) a^{[2](12)} a[2](12) denotes the activation vector of the 2 n d 2^{nd} 2nd layer for the 1 2 t h 12^{th} 12th training example.

D. a [ 2 ] ( 12 ) a^{[2](12)} a[2](12)denotes activation vector of the 1 2 t h 12^{th} 12thlayer on the 2 n d 2^{nd} 2nd training example.

E. a [ 2 ] a^{[2]} a[2] denotes the activation vector of the 2 n d 2^{nd} 2ndlayer.

F. X X X is a matrix in which each column is one training example.

G. a 4 [ 2 ] a^{[2]}_4 a4[2] is the activation output of the 2 n d 2^{nd} 2nd layer for the 4 t h 4^{th} 4th training example

答案: B C E F

2.The tanh activation is not always better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data, making learning complex for the next layer. True/False?

答案: False
As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

3.Which of these is a correct vectorized implementation of forward propagation for layer l l l, where 1 ≤ l ≤ L 1 ≤ l ≤ L 1lL

A. Z [ l ] = W [ l ] A [ l ] + b [ l ] Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]} Z[l]=W[l]A[l]+b[l]
A [ l + 1 ] = g [ l ] ( Z [ l ] ) A^{[l+1]} = g^{[l]}(Z^{[l]}) A[l+1]=g[l](Z[l])

B. Z [ l ] = W [ l ] A [ l ] + b [ l ] Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]} Z[l]=W[l]A[l]+b[l]
A [ l + 1 ] = g [ l + 1 ] ( Z [ l ] ) A^{[l+1]} = g^{[l+1]}(Z^{[l]}) A[l+1]=g[l+1](Z[l])

C. Z [ l ] = W [ l ] A [ l − 1 ] + b [ l ] Z^{[l]} = W^{[l]} A^{[l-1]}+ b^{[l]} Z[l]=W[l]A[l1]+b[l]
A [ l ] = g [ l ] ( Z [ l ] ) A^{[l]} = g^{[l]}(Z^{[l]}) A[l]=g[l](Z[l])

D. Z [ l ] = W [ l − 1 ] A [ l ] + b [ l − 1 ] Z^{[l]} = W^{[l-1]} A^{[l]}+ b^{[l-1]} Z[l]=W[l1]A[l]+b[l1]
A [ l ] = g [ l ] ( Z [ l ] ) A^{[l]} = g^{[l]}(Z^{[l]}) A[l]=g[l](Z[l])

答案: C

4.You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?
A.sigmoid

B.ReLU

C.tanh

D.Leaky ReLU
答案: A

5.Consider the following code:
A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)
What will be B.shape? (If you’re not sure, feel free to run this in python to find out).
A.(4, 1)

B.(4, )

C.(1, 3)

D.(, 3)
答案: A
Yes, we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more robust.

6.Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
A.Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

B.Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

C.The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

D.Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
答案: D

7.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?
答案: False
Yes, Logistic Regression doesn’t have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there’s no hidden layer) which is not zero. So at the second iteration, the weights values follow x’s distribution and are different from each other if x is not a constant vector.

8.You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(…,…)*1000. What will happen?
A.This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

B.This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning.

C.This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

D.It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

答案: C
Yes. tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.

9.Consider the following 1 hidden layer neural network:
在这里插入图片描述
Which of the following statements are True? (Check all that apply).
A. W [ 1 ] W^{[1]} W[1] will have shape (4, 2)

B. W [ 2 ] W^{[2]} W[2] will have shape (1, 4)

C. W [ 2 ] W^{[2]} W[2] will have shape (4, 1)

D. b [ 2 ] b^{[2]} b[2] will have shape (4, 1)

E. W [ 1 ] W^{[1]} W[1] will have shape (2, 4)

F. b [ 1 ] b^{[1]} b[1] will have shape (2, 1)

G. b [ 1 ] b^{[1]} b[1] will have shape (4, 1)

H. b [ 2 ] b^{[2]} b[2] will have shape (1, 1)
答案: A B G H

10.In the same network as the previous question, what are the dimensions of Z [ 1 ] Z^{[1]} Z[1]and A [ 1 ] A^{[1]} A[1]?
A. Z [ 1 ] Z^{[1]} Z[1]and A [ 1 ] A^{[1]} A[1] are (1,4)

B. Z [ 1 ] Z^{[1]} Z[1]and A [ 1 ] A^{[1]} A[1] are (4,1)

C. Z [ 1 ] Z^{[1]} Z[1]and A [ 1 ] A^{[1]} A[1] are (4,2)

D. Z [ 1 ] Z^{[1]} Z[1] and A [ 1 ] A^{[1]} A[1]are (4,m)
答案: D

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值