[coursera/dl&nn/week3]Shallow Neural Network(summary&question)

最新推荐文章于 2024-03-29 14:50:17 发布

gdtop818

最新推荐文章于 2024-03-29 14:50:17 发布

阅读量817

点赞数

本文链接：https://blog.csdn.net/weixin_37993251/article/details/79178648

版权

deep learning 同时被 3 个专栏收录

32 篇文章 0 订阅

订阅专栏

machine learning

31 篇文章 0 订阅

订阅专栏

coursera_deep_learning

25 篇文章 25 订阅

订阅专栏

recommend an app named "Grammarly"

3.1Neural Network Overview

for iteration in (ntimes)
z = np.dot(w.T,x)+b
a = 1/(1 + np.exp(-z))
dz = a - y
dw = 1/m * np.dot(x,dz.T)
db = 1/m * np.sum(dz)
update
w = w - learning_rate * dw
b = b - learning_rate * db

3.2 Neural Network Representation

Input layer

Hidden layer

Output layer

3.3 Compute a Neural Network's Output

For first hidden layer:

For each layer:

3.4 Vectorizing across multiple examples

Use the vector to simply compute the hidden layer.

#for first hidden layer x = a0
z1 = np.dot(w1.T,a0) + b1 
a1 = sigmoid(z1) 
#for second hidden layer
z2 = np.dot(w2.T,a1) + b2
a2 = sigmoid(z2)

3.5 Explanation for Vectorized Implementation

For Mth example, vectorizing z and a.

The parameter i means the ith example.

3.6 Activation functions

sigmoid: a = 1/(1 + exp(-z))

tanh: a = (exp(z) - exp(-z))/(exp(z) + exp(-z))

ReLU: a = max(0 , z)

Leaky ReLU: a = max(0.01*z , z)

3.7 Why need non-linear activation function

If we use a linear function as an activating function, the network still represents the linear model the same as we do not use a neural network.

When the output is linear, we use a linear function to predict the output at the last layer.

3.8 Derivatives of activation functions

the prime of sigmoid: a(1-a)

the prime of tanh: 1-a^2

the prime of ReLU: 0, z<0; 1, z>0

3.9 Gradient descent for Neural Networks & 3.10 bp intuition

Just give some formula, we can fully understand what is forward propagate and back propagate.

Logistic regression:

Loss Function:

forward propagate

backpropagate

intuition:

3.10 Random Initialization

set a smaller learning rate

symmetry breaking: do not set w,b as 0

question:

1.Which of the following are true? (Check all that apply.)
denotes the activation vector of the layer for the training example.
X is a matrix in which each column is one training example.
denotes the activation vector of the layer.
is the activation output by the neuron of the layer

2.The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. True/False?

True

Correct

Yes. As seen in lecture the output of the tanh is between -1 and 1, it thus centers the data which makes the learning simpler for the next layer.

3.Which of these is a correct vectorized implementation of forward propagation for layer , where

4.You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?

ReLU

Leaky ReLU

sigmoid

Correct

Yes. Sigmoid outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. It can be done with tanh as well but it is less convenient as the output is between -1 and 1.

tanh

5.Consider the following code:

A = np.random.randn(4,3)
B = np.sum(A, axis = 1, keepdims = True)

What will be B.shape? (If you’re not sure, feel free to run this in python to find out).

(, 3)

(4, 1)

Correct

Yes, we use (keepdims = True) to make sure that A.shape is (4,1) and not (4, ). It makes our code more rigorous.

(4, )

(1, 3)

6.Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

Each neuron in the first hidden layer will perform the same computation. So even after multiple iterations of gradient descent each neuron in the layer will be computing the same thing as other neurons.

Each neuron in the first hidden layer will perform the same computation in the first iteration. But after one iteration of gradient descent they will learn to compute different things because we have “broken symmetry”.

This should not be selected

Each neuron in the first hidden layer will compute the same thing, but neurons in different layers will compute different things, thus we have accomplished “symmetry breaking” as described in lecture.

7.Logistic regression’s weights w should be initialized randomly rather than to all zeros, because if you initialize to all zeros, then logistic regression will fail to learn a useful decision boundary because it will fail to “break symmetry”, True/False?

True

This should not be selected

No, Logistic Regression doesn't have a hidden layer. If you initialize the weights to zeros, the first example x fed in the logistic regression will output zero but the derivatives of the Logistic Regression depend on the input x (because there's no hidden layer) which is not zero. So at the second iteration, the weights values follow x's distribution and are different from each other if x is not a constant vector.

False

8.You have built a network using the tanh activation for all the hidden units. You initialize the weights to relative large values, using np.random.randn(..,..)*1000. What will happen?

This will cause the inputs of the tanh to also be very large, thus causing gradients to also become large. You therefore have to set to be very small to prevent divergence; this will slow down learning.

It doesn’t matter. So long as you initialize the weights randomly gradient descent is not affected by whether the weights are large or small.

This will cause the inputs of the tanh to also be very large, causing the units to be “highly activated” and thus speed up learning compared to if the weights had to start from small values.

This will cause the inputs of the tanh to also be very large, thus causing gradients to be close to zero. The optimization algorithm will thus become slow.

Correct

Yes. tanh becomes flat for large values, this leads its gradient to be close to zero. This slows down the optimization algorithm.

9.Consider the following 1 hidden layer neural network:

Which of the following statements are True? (Check all that apply).

will have shape (2, 4)

Un-selected is correct

will have shape (4, 1)

Correct

will have shape (4, 2)

Correct

will have shape (2, 1)

Un-selected is correct

will have shape (1, 4)

This should be selected

will have shape (4, 1)

Un-selected is correct

will have shape (4, 1)

This should not be selected

will have shape (1, 1)

Correct

10.In the same network as the previous question, what are the dimensions of

and

and are (4,1)

and are (4,m)

Correct

and are (1,4)

[Math Processing Error] and are (4,2)

gdtop818

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[coursera/dl&nn/week3]Shallow Neural Network(summary&question)

recommend an app named "Grammarly"3.1Neural Network Overviewfor iteration in (ntimes)z = np.dot(w.T,x)+ba = 1/(1 + np.exp(-z))dz = a - ydw = 1/m * np.dot(x,dz.T)db = 1/m * np.sum(dz)update
复制链接

扫一扫