[coursera/dl&nn/week3]Shallow Neural Network(summary&question)

recommend an app named "Grammarly"

3.1Neural Network Overview

for iteration in (ntimes)
z = np.dot(w.T,x)+b
a = 1/(1 + np.exp(-z))
dz = a - y
dw = 1/m * np.dot(x,dz.T)
db = 1/m * np.sum(dz)
update
w = w - learning_rate * dw
b = b - learning_rate * db

 

3.2 Neural Network Representation

Input layer 

Hidden layer

Output layer


3.3 Compute a Neural Network's Output

For first hidden layer:

For each layer:


3.4 Vectorizing across multiple examples

Use the vector to simply compute the hidden layer.

#for first hidden layer x = a0
z1 = np.dot(w1.T,a0) + b1 
a1 = sigmoid(z1) 
#for second hidden layer
z2 = np.dot(w2.T,a1) + b2
a2 = sigmoid(z2)


3.5 Explanation for Vectorized Implementation

For Mth example, vectorizing z and a.

The parameter i means the ith example.


3.6 Activation functions

sigmoid: a = 1/(1 + exp(-z))
tanh: a = (exp(z) - exp(-z))/(exp(z) + exp(-z))
ReLU: a = max(0 , z)
Leaky ReLU: a = max(0.01*z , z)



3.7 Why need non-linear activation function

If we use a linear function as an activating function, the network still represents the linear model the same as we do not use a neural network.

When the output is linear, we use a linear function to predict the output at the last layer.


3.8 Derivatives of activation functions

the prime of sigmoid: a(1-a)

the prime of tanh: 1-a^2

the prime of ReLU: 0, z<0; 1, z>0


3.9 Gradient descent for Neural Networks & 3.10 bp intuition

Just give some formula, we can fully understand what is forward propagate and back propagate.

Logistic regression: 

Loss Function:


forward propagate



backpropagate



intuition:


3.10 Random Initialization

set a smaller learning rate

symmetry breaking: do not set w,b as 0


question:

1.Which of the following are true? (Check all that apply.)
a[2](12denotes the activation vector of the 2nlayer for the 12ttraining example.
is a matrix in which each column is one training example.
a[2]  denotes the activation vector of the 2nlayer.
is the activation output by the 4tneuron of the 2nlayer


2.The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True

Correct 

Yes. As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.


3.Which of these is a correct vectorized implementation of forward propagation for layer l, where 1lL?


4.You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

ReLU

Leaky ReLU

tanh


8.You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set αto be very small to prevent divergence; this will slow down learning.

It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值