【吴恩达深度学习】02_week1_quiz Practical aspects of deep learning

深海里的鱼(・ω<)★

已于 2022-02-23 20:24:14 修改

阅读量1k

点赞数

分类专栏：人工智能，机器学习，深度学习文章标签：深度学习机器学习 pytorch

于 2022-02-23 16:22:17 首次发布

本文链接：https://blog.csdn.net/qq_50710984/article/details/123081440

版权

人工智能，机器学习，深度学习专栏收录该内容

48 篇文章

订阅专栏

(1)If you have 10,000,000 examples, how would you split the train/dev/test set?
[A] 98% train. 1% dev. 1%test
[B] 33% train. 33% dev. 33%test
[A] 60% train. 20% dev. 20%test
答案：A
解析：见视频1.1 Train/dev/test sets.

(2)The dev and test set should:
[A]Come from the same distribution.
[B]Come from different distributions.
[C]Be identical to each other (same (x,y) pairs)
[D]Have the same number of examples.
答案：A

(3)If your Neural Network model seems to have high variance, what of the following would be promising things to try?
[A]Add regularization.
[B]Make the Neural Network deeper.
[C]Get more test data.
[D]Get more training data.
答案：A,D
解析：B为减小高偏差（bias）的方法，C对方差（variance）和偏差（bias）均无影响

(4)You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges, Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier? (Check all that apply)
[A]Increase the regularization parameter lambda.
[B]Decrease the regularization parameter lambda.
[C]Get more training data.
[D]Use a bigger neural network.
答案：A,C
解析：题设条件分析可得出现了高方差现象

(5)What is weight decay?
[A]A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights.
[B]A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration.
[C]The process of gradually decreasing the learning rate during training.
[D]Gradual corruption of the weights in the neural network if it is trained on noisy data.
答案：B

(6)What happens when you increase the regularization hyperparameter lambda?
[A]Weights are pushed toward becoming smaller (closer to 0)
[B]Weights are pushed toward becoming bigger (further from 0)
[C]Doubling lambda should roughly result in doubling the weights.
[D]Gradient descent taking bigger steps with each iteration (proportional to lambda)
答案：A
解析： $\lambda$ 的增大会导致，代价函数
$J(\omega, b)=\frac{1}{m} \sum_{i=1}^{m} \mathcal{L}\left(\hat{y}^{(i)}, y^{\omega}\right)+\frac{\lambda}{2 m}\|\omega\|_{2}^{2}$
的 $\frac{\lambda}{2 m}\|\omega\|_{2}^{2}$ 这一项增大，由于要使代价函数 $J(\omega, b)$ 尽可能小，所以权重 $\omega$ 会随着训练变小

(7)With the inverted dropout technique, at test time:
[A]You apply dropout (randomly eliminating units) and do not keep the $\frac{1}{keep\_prob}$ in the calculations used in training.
[B]You do not apply dropout (do not randomly eliminating units) and do not keep the $\frac{1}{keep\_prob}$ in the calculations used in training.
[C]You do not apply dropout (do not randomly eliminating units) ,but keep the $\frac{1}{keep\_prob}$ in the calculations used in training.
[D]You apply dropout (randomly eliminating units) ,but keep the $\frac{1}{keep\_prob}$ in the calculations used in training.
答案：B
关键词：test time 测试时候
解析：测试的时候需要用到所有神经元，不然会导致测试的结果不稳定。训练的时候已经除以keep_prob来确保激活函数的期望不变，所以测试阶段不用除了

(8)Increasing the parameter keep_prob from 0.5 to 0.6 will likely cause the following:(Check the two that apply)
[A]Increasing the regularization effect.
[B]Reducing the regularization effect.
[C]Causing the neural network to end up with a higher training set error.
[D]Causing the neural network to end up with a lower training set error.
答案：B,D
解析： keep_prob从0.5提升到0.6将减少消除的神经元数量

(9)Which of these techniques are useful for reducing variance (reducing overfitting)?(Check all that apply.)
[A]Xavier initialization
[B]Gradient Checking
[C]Exploding gradient
[D]Vanishing gradient
[E]Dropout
[F]L2 regularization
[G]Data augmentation
答案：E,F,G

(10)Why do we normalize the inputs x?
[A]Normalization is another word for regularization–It helps to reduce variance
[B]It makes it easier to visualize the data.
[C]It makes the parameter initialization faster.
[D]It makes the cost function faster to optimize.
答案：D