吴恩达深度学习课程第二课第一周第一次作业：用神经网络简单预测结果-CSDN博客

本文链接：https://blog.csdn.net/junchengberry/article/details/80827168
# coding: utf-8

# # Initialization
# Welcome to the first assignment of "Improving Deep Neural Networks".
#
# Training your neural network requires specifying an initial value of the weights.
# A well chosen initialization method will help learning.
# If you completed the previous course of this specialization, you probably followed
#  our instructions for weight initialization, and it has worked out so far. But how
# do you choose the initialization for a new neural network? In this notebook, you
# will see how different initializations lead to different results.
#
# A well chosen initialization can:
# - Speed up the convergence of gradient descent
# - Increase the odds of gradient descent converging to a lower training (and generalization) error
# To get started, run the following cell to load the packages and the planar
# dataset you will try to classify.

# In[1]:different initializations lead to different results
#测试不同的参数会导致不同的结果

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec

#get_ipython().magic('matplotlib inline')
plt.rcParams['figure.figsize'] = (7.0, 4.0)  # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# load image dataset: blue/red dots in circles
train_X, train_Y, test_X, test_Y = load_dataset()#读取数据

# You would like a classifier to separate the blue dots from the red dots.

# ## 1 - Neural Network model

# You will use a 3-layer neural network (already implemented for you). Here are
# the initialization methods you will experiment with:
# - *Zeros initialization* --  setting `initialization = "zeros"` in the input argument.
# - *Random initialization* -- setting `initialization = "random"` in the input argument.
#  This initializes the weights to large random values.
# - *He initialization* -- setting `initialization = "he"` in the input argument.
# This initializes the weights to random values scaled according to a paper by He et al., 2015.
#
# **Instructions**: Please quickly read over the code below, and run it. In the next part
# you will implement the three initialization methods that this `model()` calls.

# In[2]:构建测试模型并进行测试
def model(X, Y, learning_rate=0.01, num_iterations=15000, print_cost=True, initialization="he"):
    """
    Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.

    Arguments:
    X -- input data, of shape (2, number of examples)
    Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)
    learning_rate -- learning rate for gradient descent
    num_iterations -- number of iterations to run gradient descent
    print_cost -- if True, print the cost every 1000 iterations
    initialization -- flag to choose which initialization to use ("zeros","random" or "he")

    Returns:
    parameters -- parameters learnt by the model
    """

    grads = {}
    costs = []  # to keep track of the loss
    m = X.shape[1]  # number of examples
    layers_dims = [X.shape[0], 10, 5, 1]

    # Initialize parameters dictionary.
    if initialization == "zeros":
        parameters = initialize_parameters_zeros(layers_dims)
    elif initialization == "random":
        parameters = initialize_parameters_random(layers_dims)
    elif initialization == "he":
        parameters = initialize_parameters_he(layers_dims)

    # Loop (gradient descent)

    for i in range(0, num_iterations):

        # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
        a3, cache = forward_propagation(X, parameters)

        # Loss计算损失
        cost = compute_loss(a3, Y)

        # Backward propagation.反向传播
        grads = backward_propagation(X, Y, cache)

        # Update parameters.更新参数
        parameters = update_parameters(parameters, grads, learning_rate)

        # Print the loss every 1000 iterations
        if print_cost and i % 1000 == 0:
            print("Cost after iteration {}: {}".format(i, cost))
            costs.append(cost)
        """第一次值
        Cost after iteration 0: 0.6931471805599453
        Cost after iteration 1000: 0.6931471805599453
        Cost after iteration 2000: 0.6931471805599453
        Cost after iteration 3000: 0.6931471805599453
        Cost after iteration 4000: 0.6931471805599453
        Cost after iteration 5000: 0.6931471805599453
        Cost after iteration 6000: 0.6931471805599453
        Cost after iteration 7000: 0.6931471805599453
        Cost after iteration 8000: 0.6931471805599453
        Cost after iteration 9000: 0.6931471805599453
        Cost after iteration 10000: 0.6931471805599455
        Cost after iteration 11000: 0.6931471805599453
        Cost after iteration 12000: 0.6931471805599453
        Cost after iteration 13000: 0.6931471805599453
        Cost after iteration 14000: 0.6931471805599453
        """
    # plot the loss
    plt.plot(costs)
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()

    return parameters


# ## 2 - Zero initialization
#
# There are two types of parameters to initialize in a neural network:
# - the weight matrices (W[1], W[2], W[3], ..., W[L-1]}, W[L])
# - the bias vectors (b[1], b[2], b[3], ..., b[L-1], b[L])
#
# **Exercise**: Implement the following function to initialize all parameters to zeros.
# You'll see later that this does not work well since it fails to "break symmetry", but
# lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

# In[3]:初始化参数为0的

# GRADED FUNCTION: initialize_parameters_zeros

def initialize_parameters_zeros(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
    parameters = {}
    L = len(layers_dims)  # number of layers in the network

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1]))
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###
    return parameters

# In[4]:输出初始化后的结果

parameters = initialize_parameters_zeros([3, 2, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

"""第一次值
W1 = [[ 0.  0.  0.]
 [ 0.  0.  0.]]
b1 = [[ 0.]
 [ 0.]]
W2 = [[ 0.  0.]]
b2 = [[ 0.]]
"""
# **Expected Output**:
# Run the following code to train your model on 15,000 iterations using zeros initialization.

# In[5]:根据模型预测结果

parameters = model(train_X, train_Y, initialization="zeros")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

# The performance is really bad, and the cost does not really decrease, and the algorithm
# performs no better than random guessing. Why? Lets look at the details of the predictions
# and the decision boundary:

# In[6]:输出预测的结果
print("predictions_train = " + str(predictions_train))
print("predictions_test = " + str(predictions_test))

# In[7]:用图形显示最终的结果
plt.title("Model with Zeros initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()

# The model is predicting 0 for every example.
#
# In general, initializing all the weights to zero results in the network failing to break
#  symmetry. This means that every neuron in each layer will learn the same thing, and you
# might as well be training a neural network with n[l]=1 for every layer, and the network
# is no more powerful than a linear classifier such as logistic regression.

# **What you should remember**:
# - The weights W[l] should be initialized randomly to break symmetry.
# - It is however okay to initialize the biases b[l] to zeros. Symmetry is still broken
#  so long as W[l] is initialized randomly.
#

# ## 3 - Random initialization
#
# To break symmetry, lets intialize the weights randomly. Following random initialization,
# each neuron can then proceed to learn a different function of its inputs. In this exercise,
# you will see what happens if the weights are intialized randomly, but to very large values.
#
# **Exercise**: Implement the following function to initialize your weights to large random
# values (scaled by *10) and your biases to zeros. Use `np.random.randn(..,..) * 10 for
# weights and `np.zeros((.., ..))` for biases. We are using a fixed `np.random.seed(..)` to make
# sure your "random" weights  match ours, so don't worry if running several times your code gives
# you always the same initial values for the parameters.

# In[8]:初始化随机赋值
# GRADED FUNCTION: initialize_parameters_random

def initialize_parameters_random(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """

    np.random.seed(3)  # This seed makes sure your "random" numbers will be the as ours
    parameters = {}
    L = len(layers_dims)  # integer representing the number of layers

    for l in range(1, L):
        ### START CODE HERE ### (≈ 2 lines of code)
        #parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * 10
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1])*10
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###

    return parameters

# In[9]:输出随机变量各个参数的值

parameters = initialize_parameters_random([3, 2, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

# Run the following code to train your model on 15,000 iterations using random initialization.

# In[10]:用随机变量预测模型

parameters = model(train_X, train_Y, initialization="random")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

# If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff;
# a more numerically sophisticated implementation would fix this. But this isn't worth worrying
#  about for our purposes.
#
# Anyway, it looks like you have broken symmetry, and this gives better results. than before.
# The model is no longer outputting all 0s.

# In[11]:输出随机变量的结果
print("predictions_train:")
print(predictions_train)
print("predictions_test:")
print(predictions_test)

# In[12]:用图形显示随机变量预测的结果

plt.title("Model with large random initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()

# **Observations**:
# - The cost starts very high. This is because with large random-valued weights, the last
# activation (sigmoid) outputs results that are very close to 0 or 1 for some examples,
# and when it gets that example wrong it incurs a very high loss for that example. Indeed,
#  when log(a[3]) = log(0), the loss goes to infinity.
# - Poor initialization can lead to vanishing/exploding gradients, which also slows down
# the optimization algorithm.
# - If you train this network longer you will see better results, but initializing with
# overly large random numbers slows down the optimization.
#

# **In summary**:
# - Initializing weights to very large random values does not work well.
# - Hopefully intializing with small random values does better. The important question is: how
# small should be these random values be? Lets find out in the next part!

# ## 4 - He initialization
#
# Finally, try "He Initialization"; this is named for the first author of He et al., 2015.
# (If you have heard of "Xavier initialization", this is similar except Xavier initialization
# uses a scaling factor for the weights W[l] of sqrt(1./layers_dims[l-1]) where He
# initialization would use sqrt(2./layers_dims[l-1]))
#
# **Exercise**: Implement the following function to initialize your parameters with He initialization.
#
# **Hint**: This function is similar to the previous `initialize_parameters_random(...)`. The only
# difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it
# by sqrt(2/(dimension of the previous layer)), which is what He initialization recommends for
# layers with a ReLU activation.

# In[13]:用initialize_parameters_he预测结果

# GRADED FUNCTION: initialize_parameters_he

def initialize_parameters_he(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """

    np.random.seed(3)
    parameters = {}
    L = len(layers_dims) - 1  # integer representing the number of layers

    for l in range(1, L + 1):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(
            2. / layers_dims[l - 1])
        parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
    ### END CODE HERE ###

    return parameters

# In[14]:输出用initialize_parameters_he处理后的结果

parameters = initialize_parameters_he([2, 4, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

# Run the following code to train your model on 15,000 iterations using He initialization.

# In[15]:训练模型，并输出预测结果
parameters = model(train_X, train_Y, initialization="he")
print("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

"""第一次值
On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5
"""
# In[16]:绘制initialize_parameters_he处理后的结果

plt.title("Model with He initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
plt.show()
# **Observations**:
# - The model with He initialization separates the blue and the red dots very well
# in a small number of iterations.
#

# ## 5 - Conclusions

# You have seen three different types of initializations. For the same number of iterations
# and same hyperparameters the comparison is:

# **What you should remember from this notebook**:
# - Different initializations lead to different results
# - Random initialization is used to break symmetry and make sure different hidden units
# can learn different things
# - Don't intialize to values that are too large
# - He initialization works well for networks with ReLU activations.

# In[17]:最终结果：the result of Accuracy is zero<random<initialize_parameters_he