Neural Networks and Deep Learning week3 Shallow Neural Networks

本文探讨了深度学习中向量表示的重要性,解释了激活函数如tanh相对于sigmoid的优势,并详细阐述了前向传播过程。内容包括矩阵操作、权重初始化的影响、二元分类的合适激活函数选择以及神经网络中各层输出的形状。同时,讨论了初始化权重为零对模型学习能力的影响以及如何打破对称性。
摘要由CSDN通过智能技术生成

看别人见解违法coursera荣誉,看懂和做对是两码事

Which of the following are true? (Check all that apply.)

  • a^{[2](12) }denotes activation vector of the 12^{th} layer on the 2^{nd} training example.
  • a^{[2](12) } denotes the activation vector of the 2^{nd} layer for the 12^{th} training example.
  • X is a matrix in which each column is one training example.
  • a^{[2]}_4 is the activation output by the 4^{th} neuron of the2^{nd} layer
  • X is a matrix in which each row is one training example.
  • a^{[2]}_4​ is the activation output of the 2^{nd} layer for the 4^{th} training example
  • a^{[2]} denotes the activation vector of the 2^{nd} layer.

a^{[x](y)}_z a激活函数  x层数  y第几个例子  z该层下的第几个参数 

The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True

显然 tanh (-1,1)  sigmoid(0,1)

Which of these is a correct vectorized implementation of forward propagation for layer l, where 1 \leq l \leq L?

  • Z^{[l]} = W^{[l-1]} A^{[l]}+ b^{[l-1]}
  • A^{[l]} = g^{[l]}(Z^{[l]})
  •  
  • Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}
  • A^{[l+1]} = g^{[l+1]}(Z^{[l]})
  •  
  • Z^{[l]} = W^{[l]} A^{[l-1]}+ b^{[l]}
  • A^{[l]} = g^{[l]}(Z^{[l]})
  •  
  • Z^{[l]} = W^{[l]} A^{[l]}+ b^{[l]}
  • A^{[l+1]} = g^{[l]}(Z^{[l]})

分清各层意义,Z当前层参量,A当前层输出,b当前层参量,当前层输出=当前层参量*上一层输出+当前层参量

You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

  • ReLU
  • Leaky ReLU
  • sigmoid
  • tanh

二元分类sigmoid

Consider the following code:What will be B.shape? (If you’re not sure, feel free to run this in python to find out).

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

  • (1, 3)
  • (4, 1)
  • (, 3)
  • (4, )

(,3)和(4,)很容易出现,可变大小,所以在使用的时候要好好看,一般情况下出现这种情况没事,但是极少数会出现错误,一种错误是大小不匹配,另一种错误是能算,但是最后结果不对

axis 0列1行

Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

  • Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.
  • Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.
  • Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.
  • The first hidden layer’s neurons will perform different computations from each other even in the first iteration; their parameters will thus keep evolving in their own way.

对称性问题,同样的输入,同样的计算,凭什么打破,因此需要初始化

Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

False

逻辑回归没有隐藏层。如果将权重初始化为零,logistic回归中输入的第一个示例x将输出零,但logistic回归的导数取决于输入x(因为没有隐藏层),而输入x不是零。因此,在第二次迭代中,权重值遵循x的分布,如果x不是一个常量向量,那么它们之间就不同了。

You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

  • This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set \alphaα to be very small to prevent divergence; this will slow down learning.
  • It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.
  • This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

为了更快的计算结束,当然希望斜率大点,但是对于tanh函数而言,斜率大的部分在中间,需要小值*0.01,不是*1000

Consider the following 1 hidden layer neural network:

 

 

Which of the following statements are True? (Check all that apply).

  • W^{[1]} will have shape (2, 4)
  • b^{[1]} will have shape (4, 1)
  • W^{[1]} will have shape (4, 2)
  • b^{[1]} will have shape (2, 1)
  • W^{[2]} will have shape (1, 4)
  • b^{[2]} will have shape (4, 1)
  • W^{[2]}will have shape (4, 1)
  • b^{[2]} will have shape (1, 1)

同第一题,搞清楚各角标的含义 a^{[x](y)}_z  x层数  y第几个例子  z该层下的第几个参数  A=WX+b

size关系跟前层和当前层的参数个数有关,注意为了让代码运行更快,我们决定向量化参数,向量化后堆叠参数,如果不清楚参数的大小关系,建议画图

In the same network as the previous question, what are the dimensions of Z^{[1]} and A^{[1]}?

  • Z^{[1]} and A^{[1]} are (4,m)
  • Z^{[1]} and A^{[1]} are (4,1)
  • Z^{[1]} and A^{[1]} are (4,2)
  • Z^{[1]} and A^{[1]} are (1,4)

A=WX+b

 

### 回答1: 《Neural Networks and Deep Learning》这本书被许多人评价为是关于深度学习的一本非常好的入门书。它以清晰易懂的方式解释了深度学习的核心概念,并且使用大量的图片和代码来帮助读者理解。如果您对深度学习感兴趣,那么这本书是一个不错的选择。 ### 回答2: 《Neural Networks and Deep Learning》是一本非常出色的书籍。它由Michael Nielsen撰写,提供了关于神经网络和深度学习的详细而清晰的介绍。 这本书以易于理解和深入的方式解释了神经网络和深度学习的基本概念和原理。它从基础知识开始,逐步引导读者了解神经元、多层神经网络、反向传播和激活函数等关键概念。通过直观的解释和简单的数学推导,读者可以很好地理解这些复杂的概念。 书中还包含了许多实例和示例,帮助读者将理论应用到实际问题中。例如,它详细介绍了如何使用神经网络解决手写数字识别的问题,并提供了相关的代码实现。这些实例不仅使得理论更加易于理解,也为读者提供了实际操作的经验和技能。 此外,《Neural Networks and Deep Learning》还提供了大量的引用文献和进一步阅读的建议,帮助读者进一步深入学习和探索相关领域的研究。这为读者进一步拓宽知识领域提供了便利。 总体而言,这本书不仅适合对神经网络和深度学习感兴趣的初学者,也适合那些已经有一定了解但希望进一步加深理解的读者。它以简洁明了的方式传递了复杂的概念,提供了大量的实例和引用文献,是一本极具价值的学习资源。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值