(1)If you have 10,000,000 examples, how would you split the train/dev/test set?
[A] 98% train. 1% dev. 1%test
[B] 33% train. 33% dev. 33%test
[A] 60% train. 20% dev. 20%test
答案:A
解析:见视频1.1 Train/dev/test sets.
(2)The dev and test set should:
[A]Come from the same distribution.
[B]Come from different distributions.
[C]Be identical to each other (same (x,y) pairs)
[D]Have the same number of examples.
答案:A
(3)If your Neural Network model seems to have high variance, what of the following would be promising things to try?
[A]Add regularization.
[B]Make the Neural Network deeper.
[C]Get more test data.
[D]Get more training data.
答案:A,D
解析:B为减小高偏差(bias)的方法,C对方差(variance)和偏差(bias)均无影响
(4)You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges, Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply)
[A]Increase the regularization parameter lambda.
[B]Decrease the regularization parameter lambda.
[C]Get more training data.
[D]Use a bigger neural network.
答案:A,C
解析:题设条件分析可得出现了高方差现象
(5)What is weight decay?
[A]A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights.
[B]A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration.
[C]The process of gradually decreasing the learning rate during training.
[D]Gradual corruption of the weights in the neural network if it is trained on noisy data.
答案:B
(6)What happens when you increase the regularization hyperparameter lambda?
[A]Weights are pushed toward becoming smaller (closer to 0)
[B]Weights are pushed toward becoming bigger (further from 0)
[C]Doubling lambda should roughly result in doubling the weights.
[D]Gradient descent taking bigger steps with each iteration (proportional to lambda)
答案:A
解析:
λ
\lambda
λ的增大会导致,代价函数
J
(
ω
,
b
)
=
1
m
∑
i
=
1
m
L
(
y
^
(
i
)
,
y
ω
)
+
λ
2
m
∥
ω
∥
2
2
J(\omega, b)=\frac{1}{m} \sum_{i=1}^{m} \mathcal{L}\left(\hat{y}^{(i)}, y^{\omega}\right)+\frac{\lambda}{2 m}\|\omega\|_{2}^{2}
J(ω,b)=m1i=1∑mL(y^(i),yω)+2mλ∥ω∥22
的
λ
2
m
∥
ω
∥
2
2
\frac{\lambda}{2 m}\|\omega\|_{2}^{2}
2mλ∥ω∥22这一项增大,由于要使代价函数
J
(
ω
,
b
)
J(\omega, b)
J(ω,b)尽可能小,所以权重
ω
\omega
ω会随着训练变小
(7)With the inverted dropout technique, at test time:
[A]You apply dropout (randomly eliminating units) and do not keep the
1
k
e
e
p
_
p
r
o
b
\frac{1}{keep\_prob}
keep_prob1 in the calculations used in training.
[B]You do not apply dropout (do not randomly eliminating units) and do not keep the
1
k
e
e
p
_
p
r
o
b
\frac{1}{keep\_prob}
keep_prob1 in the calculations used in training.
[C]You do not apply dropout (do not randomly eliminating units) ,but keep the
1
k
e
e
p
_
p
r
o
b
\frac{1}{keep\_prob}
keep_prob1 in the calculations used in training.
[D]You apply dropout (randomly eliminating units) ,but keep the
1
k
e
e
p
_
p
r
o
b
\frac{1}{keep\_prob}
keep_prob1 in the calculations used in training.
答案:B
关键词:test time 测试时候
解析:测试的时候需要用到所有神经元,不然会导致测试的结果不稳定。训练的时候已经除以keep_prob来确保激活函数的期望不变,所以测试阶段不用除了
(8)Increasing the parameter keep_prob from 0.5 to 0.6 will likely cause the following:(Check the two that apply)
[A]Increasing the regularization effect.
[B]Reducing the regularization effect.
[C]Causing the neural network to end up with a higher training set error.
[D]Causing the neural network to end up with a lower training set error.
答案:B,D
解析: keep_prob从0.5提升到0.6将减少消除的神经元数量
(9)Which of these techniques are useful for reducing variance (reducing overfitting)?(Check all that apply.)
[A]Xavier initialization
[B]Gradient Checking
[C]Exploding gradient
[D]Vanishing gradient
[E]Dropout
[F]L2 regularization
[G]Data augmentation
答案:E,F,G
(10)Why do we normalize the inputs x?
[A]Normalization is another word for regularization–It helps to reduce variance
[B]It makes it easier to visualize the data.
[C]It makes the parameter initialization faster.
[D]It makes the cost function faster to optimize.
答案:D
【吴恩达深度学习】02_week1_quiz Practical aspects of deep learning
于 2022-02-23 16:22:17 首次发布