吴恩达课程深度学习错题集

最新推荐文章于 2022-03-12 06:40:59 发布

js_sjtu

最新推荐文章于 2022-03-12 06:40:59 发布

阅读量741

点赞数

分类专栏： Deep Learning 文章标签：深度学习吴恩达

本文链接：https://blog.csdn.net/js_sjtu/article/details/78985108

版权

Deep Learning 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

No, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.

2 If you have 10,000,000 examples, how would you split the train/dev/test set?

98% train . 1% dev . 1% test

3. 在exponentially weighted average曲线中，减小beta值会让曲线左移。

个人解释：越高的beta值，某点处的值被平均到后面的就越多。减小beta值可以让出现在后面的值较多得“补回来”，视觉上看应该是曲线左移。

After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False?

False

5.市议会的一位成员对机器学习知之甚少, 并认为应该将100万公民的数据图像添加到测试组中。您的意见是：（B、C）

A、一个更大的测试集将减慢迭代的速度, 因为在测试集上评估模型的计算费用。

B、这将导致开发和测试集分布变得不同。这是一个坏主意, 因为你没有瞄准你想要击中的地方。

C、测试集不再反映您最关心的数据 (安全摄像机拍的) 的分布。

D、与其余的数据相比，100万公民的数据图像没有一个一致的 x->> y 映射 (类似于纽约市/底特律住房价格的例子, 从讲座)

6. Because pooling layers do not have parameters, they do not affect the backpropagation (derivatives) calculation.

False

7.You have an input volume that is 32x32x16, and apply max pooling with a stride of 2 and a filter size of 2. What is the output volume?

16x16x16

池化层总是2维的

Which ones of the following statements on Residual Networks are true? (Check all that apply.)

Using a skip-connection helps the gradient to backpropagate and thus helps you to train deeper networks

A ResNet with L layers would have on the order of L2 skip connections in total.

The skip-connections compute a complex non-linear function of the input to pass to a deeper layer in the network.

The skip-connection makes it easy for the network to learn an identity mapping between the input and the output within the ResNet block.

You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

What is the most appropriate set of output units for your neural network?

Logistic unit, bx, by

10.

Alice proposes to simplify the GRU by always removing the Γu. I.e., setting Γu = 1. Betty proposes to simplify the GRU by removing the Γr. I. e., setting Γr = 1 always. Which of these models is more likely to work without vanishing gradient problems even when trained on very long input sequences?

Betty’s model (removing Γr), because if Γu≈0 for a timestep, the gradient can propagate back through that timestep without much decay.

Yes. For the signal to backpropagate without vanishing, we need c<t> to be highly dependant on c<t−1>.

11.

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:

min∑10,000i=1∑10,000j=1f(Xij)(θTiej+bi+b′j−logXij)2

Which of these statements are correct? Check all that apply.

θi and ej

should be initialized randomly at the beginning of training

js_sjtu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达课程深度学习错题集

1.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundar...
复制链接

扫一扫

专栏目录