课程2 第1周 改善深层神经网络—初始化、正则化、梯度校验(1 & 2 & 3)

课程2 - 改善深层神经网络

第一周 改善深层神经网络—初始化、正则化、梯度校验(1 & 2 & 3)

cd D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)

D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周
初始化、正则化、梯度校验(1&2&3)

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import scipy.io
from testCases import *
from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec
from reg_utils import sigmoid, relu, plot_decision_boundary, initialize_parameters, load_2D_dataset, predict_dec
from reg_utils import compute_cost, predict, forward_propagation, backward_propagation, update_parameters
from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector

%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# load image dataset: blue/red dots in circles
#1、初始化
train_X, train_Y, test_X, test_Y = load_dataset()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UA8mIJKc-1660710511490)(output_2_0.png)]

一、初始化

初始化权重

好的初始化可以:
1、加快梯度下降、模型收敛
2、减小梯度下降收敛过程中训练(和泛化)出现误差的几率

(一)神经网络模型

零初始化 :在输入参数中设置initialization = “zeros”。
随机初始化 :在输入参数中设置initialization = “random”,这会将权重初始化为较大的随机值。
He初始化 :在输入参数中设置initialization = “he”,这会根据He等人(2015)的论文将权重初始化为按比例缩放的随机值。

实现此model()调用的三种初始化方法

def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):
    
    m = X.shape[1]
    grads = {}
    costs = []
    layers_dims = [X.shape[0], 10, 5, 1]
    
    if initialization == "zeros":
        parameters  = initialize_parameters_zeros(layers_dims)
    elif initialization == "he":
        parameters = initialize_parameters_he(layers_dims)
    elif initialization == "random":
        parameters = initialize_parameters_random(layers_dims)
        
    for i in range(0,num_iterations):
        a3,cache = forward_propagation(X,parameters)
        cost = compute_loss(a3,Y)
        grads = backward_propagation(X,Y,cache)
        parameters = update_parameters(parameters,grads,learning_rate)
        
        if print_cost and i%1000==0:
            print("Cost after iteration {}: {}".format(i, cost))
            costs.append(cost)
    
    plt.plot(costs)
    plt.ylabel("cost")
    plt.xlabel("iterations (per hundreds)")
    plt.title("learning_rate="+str(learning_rate))
    plt.show()
    
    return parameters

(二)零初始化

练习:实现以下函数以将所有参数初始化为零。 稍后你会看到此方法会报错,因为它无法“打破对称性”。
总之先尝试一下,看看会发生什么。确保使用正确维度的np.zeros((…,…))。

def initialize_parameters_zeros(layers_dims):
    
    parameters = {}
    L = len(layers_dims)
    
    for i in range(1,L):
        parameters["W"+str(i)] = np.zeros((layers_dims[i],layers_dims[i-1]))
        parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
        
    return parameters
parameters = initialize_parameters_zeros([3,2,1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[0. 0. 0.]
[0. 0. 0.]]
b1 = [[0.]
[0.]]
W2 = [[0. 0.]]
b2 = [[0.]]

parameters = model(train_X, train_Y, initialization = "zeros")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

Cost after iteration 0: 0.6931471805599453
Cost after iteration 1000: 0.6931471805599453
Cost after iteration 2000: 0.6931471805599453
Cost after iteration 3000: 0.6931471805599453
Cost after iteration 4000: 0.6931471805599453
Cost after iteration 5000: 0.6931471805599453
Cost after iteration 6000: 0.6931471805599453
Cost after iteration 7000: 0.6931471805599453
Cost after iteration 8000: 0.6931471805599453
Cost after iteration 9000: 0.6931471805599453
Cost after iteration 10000: 0.6931471805599455
Cost after iteration 11000: 0.6931471805599453
Cost after iteration 12000: 0.6931471805599453
Cost after iteration 13000: 0.6931471805599453
Cost after iteration 14000: 0.6931471805599453

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9yiUNTll-1660710511492)(output_9_1.png)]

On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5

print ("predictions_train = " + str(predictions_train))
print ("predictions_test = " + str(predictions_test))
predictions_train = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0]]
predictions_test = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
#将init_util.py中plot_decision_boundary的c=y改成c=np.squeeze(y)
plt.title("Model with Zeros initialization")
axes = plt.gca() #获取axes对象
axes.set_xlim([-1.5, 1.5]) #设置x轴视图限制
axes.set_ylim([-1.5, 1.5]) #设置y轴视图限制
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

在这里插入图片描述

(三)随机初始化

练习:实现以下函数,将权重初始化为较大的随机值(按*10缩放),并将偏差设为0。 将 np.random.randn(…,…) * 10用于权重,将np.zeros((…, …))用于偏差。
我们使用固定的np.random.seed(…),以确保你的“随机”权重与我们的权重匹配。
因此,如果运行几次代码后参数初始值始终相同,也请不要疑惑。

def initialize_parameters_random(layers_dims):
    
    np.random.seed(3)
    L = len(layers_dims)
    parameters = {}
    
    for i in range(1,L):
        parameters["W"+str(i)] = np.random.randn(layers_dims[i],layers_dims[i-1])*10
        parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
    
    return parameters
parameters = initialize_parameters_random([3, 2, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[ 17.88628473 4.36509851 0.96497468]
[-18.63492703 -2.77388203 -3.54758979]]
b1 = [[0.]
[0.]]
W2 = [[-0.82741481 -6.27000677]]
b2 = [[0.]]

parameters = model(train_X, train_Y, initialization = "random")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周
初始化、正则化、梯度校验(1&2&3)\init_utils.py:50: RuntimeWarning: divide by zero
encountered in log
logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)
D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\init_utils.py:50: RuntimeWarning: invalid value
encountered in multiply
logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)

Cost after iteration 0: inf
Cost after iteration 1000: 0.6250982793959966
Cost after iteration 2000: 0.5981216596703697
Cost after iteration 3000: 0.5638417572298645
Cost after iteration 4000: 0.5501703049199763
Cost after iteration 5000: 0.5444632909664456
Cost after iteration 6000: 0.5374513807000807
Cost after iteration 7000: 0.4764042074074983
Cost after iteration 8000: 0.39781492295092263
Cost after iteration 9000: 0.3934764028765484
Cost after iteration 10000: 0.3920295461882659
Cost after iteration 11000: 0.38924598135108
Cost after iteration 12000: 0.3861547485712325
Cost after iteration 13000: 0.384984728909703
Cost after iteration 14000: 0.3827828308349524

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1Q5MBAQp-1660710511494)(output_15_2.png)]

On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86

print (predictions_train)
print (predictions_test)
[[1 0 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 1 1 1 1 1 0 1 1 0 0 1
  1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 1 1 0
  0 0 0 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 0 1 1 0
  1 0 1 1 0 0 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0
  0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 1 1 1
  1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 1
  0 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 1 1
  1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1
  1 1 1 1 0 0 0 1 1 1 1 0]]
[[1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0 1
  0 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0
  1 1 1 1 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0]]
#将权重初始化为非常大的随机值效果不佳。
plt.title("Model with large random initialization")
axes = plt.gca()
axes.set_xlim([-1.5,1.5])
axes.set_ylim([-1.5,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

在这里插入图片描述

四、He初始化

练习:实现以下函数,以He初始化来初始化参数。
He初始化建议使用的ReLU激活层

def initialize_parameters_he(layers_dims):
    np.random.seed(3)
    L = len(layers_dims)
    parameters = {}
    
    for i in range(1,L):
        parameters["W"+str(i)] = np.random.randn(layers_dims[i],layers_dims[i-1])*np.sqrt(2./layers_dims[i-1])
        parameters["b"+str(i)] = np.zeros((layers_dims[i],1))
    
    return parameters
parameters = initialize_parameters_he([2, 4, 1])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

W1 = [[ 1.78862847 0.43650985]
[ 0.09649747 -1.8634927 ]
[-0.2773882 -0.35475898]
[-0.08274148 -0.62700068]]
b1 = [[0.]
[0.]
[0.]
[0.]]
W2 = [[-0.03098412 -0.33744411 -0.92904268 0.62552248]]
b2 = [[0.]]

parameters = model(train_X, train_Y, initialization = "he")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

Cost after iteration 0: 0.8830537463419761
Cost after iteration 1000: 0.6879825919728063
Cost after iteration 2000: 0.6751286264523371
Cost after iteration 3000: 0.6526117768893807
Cost after iteration 4000: 0.6082958970572937
Cost after iteration 5000: 0.5304944491717495
Cost after iteration 6000: 0.4138645817071793
Cost after iteration 7000: 0.3117803464844441
Cost after iteration 8000: 0.23696215330322556
Cost after iteration 9000: 0.18597287209206828
Cost after iteration 10000: 0.15015556280371808
Cost after iteration 11000: 0.12325079292273548
Cost after iteration 12000: 0.09917746546525937
Cost after iteration 13000: 0.08457055954024274
Cost after iteration 14000: 0.07357895962677366

在这里插入图片描述

On the train set:
Accuracy: 0.9933333333333333
On the test set:
Accuracy: 0.96

#使用He初始化的模型可以在少量迭代中很好地分离蓝色点和红色点。
plt.title("Model with He initialization")
axes = plt.gca()
axes.set_xlim([-1.5,1.5])
axes.set_ylim([-1.5,1.5])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HOhNfYXc-1660710511498)(output_22_0.png)]

此作业中应记住的内容
1、不同的初始化会导致不同的结果
2、随机初始化用于打破对称性,并确保不同的隐藏单元可以学习不同的东西
3、不要初始化为太大的值
4、初始化对于带有ReLU激活的网络非常有效

二、 正则化

你将使用以下神经网络(已为你实现),可以如下使用此模型:

regularization mode中,通过lambd将输入设置为非零值。我们使用lambd代替lambda,因为lambda是Python中的保留关键字。
dropout mode中,将keep_prob设置为小于1的值
首先,你将尝试不进行任何正则化的模型。然后,你将实现:

L2 正则化 函数:compute_cost_with_regularization()和backward_propagation_with_regularization()
Dropout 函数:forward_propagation_with_dropout()和backward_propagation_with_dropout()
在每个部分中,你都将使用正确的输入来运行此模型,以便它调用已实现的函数。查看以下代码以熟悉该模型。

(一)非正则化模型

train_X, train_Y, test_X, test_Y = load_2D_dataset()

在这里插入图片描述

def model(X, Y, learning_rate = 0.3, num_iterations = 30000, print_cost = True, lambd = 0, keep_prob = 1):
    
    grads = {}
    costs = []
    m = X.shape[1]
    layers_dims = [X.shape[0], 20, 3, 1]
    
    parameters = initialize_parameters(layers_dims)
    
    for i in range(num_iterations):
        
        #1、正向
        if keep_prob==1:
            a3,cache = forward_propagation(X,parameters)
        elif keep_prob<1:
            a3,cache =forward_propagation_with_dropout(X,parameters,keep_prob)
        
        #2、成本函数
        if lambd == 0:
            cost = compute_cost(a3,Y)
        else:
            cost =compute_cost_with_regularization(a3,Y,parameters,lambd)
        
        assert(keep_prob==1 or lambd == 0)
        
        if lambd == 0 and keep_prob==1:
            grads = backward_propagation(X,Y,cache)
        elif keep_prob<1:
            grads = backward_propagation_with_dropout(X,Y,cache,keep_prob)
        elif lambd!=0:
            grads = backward_propagation_with_regularization(X,Y,cache,lambd)
        
        parameters = update_parameters(parameters,grads,learning_rate)
        
        if print_cost and i%10000==0:
            print("Cost after iteration {}:{}".format(i,cost))
        if print_cost and i%1000==0:
            costs.append(cost)
    
    # plot the cost
    plt.plot(costs)
    plt.ylabel('cost')
    plt.xlabel('iterations (x1,000)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()
    
    return parameters
parameters = model(train_X,train_Y)
print ("On the training set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

Cost after iteration 0:0.6557412523481002
Cost after iteration 10000:0.16329987525724196
Cost after iteration 20000:0.13851642423253843

在这里插入图片描述

On the training set:
Accuracy: 0.9478672985781991
On the test set:
Accuracy: 0.915

plt.title("Model without regularization")
axes = plt.gca()
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

在这里插入图片描述

(二)L2正则化

练习:实现compute_cost_with_regularization(),以计算公式(2)的损失。

np.square:计算平方

L2正则化的原理:

L2正则化基于以下假设:权重较小的模型比权重较大的模型更简单。因此,通过对损失函数中权重的平方值进行惩罚,可以将所有权重驱动为较小的值。比重太大会使损失过高!这将导致模型更平滑,输出随着输入的变化而变化得更慢。

你应该记住 L2正则化的影响:

损失计算:
- 正则化条件会添加到损失函数中
反向传播函数:
- 有关权重矩阵的渐变中还有其他术语
权重最终变小(“权重衰减”):
- 权重被推到较小的值。

def compute_cost_with_regularization(A3, Y, parameters, lambd):
    
    m = Y.shape[1]
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    W3 = parameters["W3"]
    
    cross_entropy_cost = compute_cost(A3,Y)
    #平方和
    L2_regularization_cost  = (1./m*lambd/2)*(np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
    
    cost = cross_entropy_cost + L2_regularization_cost
    return cost
A3, Y_assess, parameters = compute_cost_with_regularization_test_case()
cost = compute_cost_with_regularization(A3, Y_assess, parameters, lambd = 0.1)
print("cost="+str(cost))

cost=1.7864859451590758

练习:实现正则化后的反向传播。更改仅涉及dW1,dW2和dW3。

def backward_propagation_with_regularization(X, Y, cache, lambd):
    m = X.shape[1]
    (Z1,A1,W1,b1,Z2,A2,W2,b2,Z3,A3,W3,b3) = cache
    
    dZ3 = A3-Y
    
    dW3 = 1/m*np.dot(dZ3,A2.T) + lambd/m*W3
    db3 = 1/m*np.sum(dZ3,axis=1,keepdims=True)
    
    dA2 = np.dot(W3.T,dZ3)
    #relu的导数
    dZ2 = np.multiply(dA2, np.int64(A2 > 0))
    dW2 = 1/m*np.dot(dZ2,A1.T) + lambd/m*W2
    db2 = 1/m*np.sum(dZ2,axis=1,keepdims=True)
    
    dA1 = np.dot(W2.T,dZ2)
    dZ1 = np.multiply(dA1, np.int64(A1 > 0))
    dW1 = 1/m*np.dot(dZ1,X.T)+lambd/m*W1
    db1 = 1/m*np.sum(dZ1,axis=1,keepdims=True)
    
    gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
                 "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
                 "dZ1": dZ1, "dW1": dW1, "db1": db1}

    return gradients
X_assess, Y_assess, cache = backward_propagation_with_regularization_test_case()
grads = backward_propagation_with_regularization(X_assess, Y_assess, cache, lambd=0.7)
print ("dW1 = "+ str(grads["dW1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("dW3 = "+ str(grads["dW3"]))

dW1 = [[-0.25604646 0.12298827 -0.28297129]
[-0.17706303 0.34536094 -0.4410571 ]]
dW2 = [[ 0.79276486 0.85133918]
[-0.0957219 -0.01720463]
[-0.13100772 -0.03750433]]
dW3 = [[-1.77691347 -0.11832879 -0.09397446]]

parameters = model(train_X, train_Y, lambd = 0.7)
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

Cost after iteration 0:0.6974484493131264
Cost after iteration 10000:0.2684918873282239
Cost after iteration 20000:0.2680916337127301

在这里插入图片描述

On the train set:
Accuracy: 0.9383886255924171
On the test set:
Accuracy: 0.93
plt.title("Model with L2-regularization")
axes = plt.gca()
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

在这里插入图片描述

(1)带有Dropout的正向传播
def forward_propagation_with_dropout(X, parameters, keep_prob = 0.5):
    
    np.random.seed(1)
    
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    
    D1 = np.random.rand(A1.shape[0],A1.shape[1])     # Step 1: 与A1同规模
    D1 = D1 < keep_prob                             # Step 2: 判断1或0
    A1 = A1 * D1                                    # Step 3: 去除
    A1 = A1 / keep_prob                             #Step4 :/规模,保证平均值
    
    Z2 = np.dot(W2, A1) + b2
    A2 = relu(Z2)

    D2 = np.random.rand(A2.shape[0],A2.shape[1])                
    D2 = D2 < keep_prob                                   
    A2 = A2 * D2                                 
    A2 = A2 / keep_prob                                 

    Z3 = np.dot(W3, A2) + b3
    A3 = sigmoid(Z3)

    cache = (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3)

    return A3, cache
X_assess, parameters = forward_propagation_with_dropout_test_case()

A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob = 0.7)
print ("A3 = " + str(A3))

A3 = [[0.36974721 0.00305176 0.04565099 0.49683389 0.36974721]]

(2)带有dropout的反向传播

带有dropout的反向传播实现上非常容易。你将必须执行2个步骤:
1.你先前通过在A1上应用掩码D来关闭正向传播过程中的某些神经元。在反向传播中,你将必须将相同的掩码D重新应用于dA1来关闭相同的神经元。
2.在正向传播过程中,你已将A1除以keep_prob。 因此,在反向传播中,必须再次将dA1除以keep_prob(计算的解释是,如果A被keep_prob缩放,则其派生dA的也由相同的keep_prob缩放)。

关dropout你应该记住的事情:

dropout是一种正则化技术。
仅在训练期间使用dropout,在测试期间不要使用。
在正向和反向传播期间均应用dropout。
在训练期间,将每个dropout层除以keep_prob,以保持激活的期望值相同。例如,如果keep_prob为0.5,则平均而言,我们将关闭一半的节点,因此输出将按0.5缩放,因为只有剩余的一半对解决方案有所贡献。除以0.5等于乘以2,因此输出现在具有相同的期望值。你可以检查此方法是否有效,即使keep_prob的值不是0.5。
def backward_propagation_with_dropout(X, Y, cache, keep_prob):
    
    m = X.shape[1]
    (Z1, D1, A1, W1, b1, Z2, D2, A2, W2, b2, Z3, A3, W3, b3) = cache
    
    dZ3 = A3 - Y
    dW3 = 1./m * np.dot(dZ3, A2.T)
    db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
    dA2 = np.dot(W3.T, dZ3)

    dA2 = dA2 * D2         #1:乘以D  
    dA2 = dA2 / keep_prob        #2:除以规模 和A保持一致

    dZ2 = np.multiply(dA2, np.int64(A2 > 0))
    dW2 = 1./m * np.dot(dZ2, A1.T)
    db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)

    dA1 = np.dot(W2.T, dZ2)

    dA1 = dA1 * D1            
    dA1 = dA1 / keep_prob         

    dZ1 = np.multiply(dA1, np.int64(A1 > 0))
    dW1 = 1./m * np.dot(dZ1, X.T)
    db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)

    gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,"dA2": dA2,
                 "dZ2": dZ2, "dW2": dW2, "db2": db2, "dA1": dA1, 
                 "dZ1": dZ1, "dW1": dW1, "db1": db1}
    
    return gradients
X_assess, Y_assess, cache = backward_propagation_with_dropout_test_case()

gradients = backward_propagation_with_dropout(X_assess, Y_assess, cache, keep_prob = 0.8)

print ("dA1 = " + str(gradients["dA1"]))
print ("dA2 = " + str(gradients["dA2"]))

dA1 = [[ 0.36544439 0. -0.00188233 0. -0.17408748]
[ 0.65515713 0. -0.00337459 0. -0. ]]
dA2 = [[ 0.58180856 0. -0.00299679 0. -0.27715731]
[ 0. 0.53159854 -0. 0.53159854 -0.34089673]
[ 0. 0. -0.00292733 0. -0. ]]

parameters = model(train_X, train_Y, keep_prob = 0.86, learning_rate = 0.3)

print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)

Cost after iteration 0:0.6543912405149825

D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\reg_utils.py:121: RuntimeWarning: divide by zero

encountered in log
logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)
D:\software\OneDrive\桌面\吴恩达深度学习课后作业\第二部分 改善深层神经网络\第一周 初始化、正则化、梯度校验(1&2&3)\reg_utils.py:121: RuntimeWarning: invalid value
encountered in multiply
logprobs = np.multiply(-np.log(a3),Y) + np.multiply(-np.log(1 - a3), 1 - Y)

Cost after iteration 10000:0.061016986574905605
Cost after iteration 20000:0.060582435798513114

在这里插入图片描述

On the train set:
Accuracy: 0.9289099526066351
On the test set:
Accuracy: 0.95

plt.title("Model with dropout")
axes = plt.gca()
axes.set_xlim([-0.75,0.40])
axes.set_ylim([-0.75,0.65])
plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)

在这里插入图片描述

三、梯度检验

(一)梯度检验原理

(二)一维梯度检查

你需要3个步骤来计算此公式:

  1. 使用np.linalg.norm(…)计算分子
  2. 计算分母,调用np.linalg.norm(…)两次
  3. 相除
def forward_propagation(x, theta):
    J = theta * x
    return J
x, theta = 2, 4
J = forward_propagation(x, theta)
print ("J = " + str(J))
J = 8
def backward_propagation(x, theta):
     dtheta = x
     return dtheta
x, theta = 2, 4
dtheta = backward_propagation(x, theta)
print ("dtheta = " + str(dtheta))
dtheta = 2
def gradient_check(x, theta, epsilon = 1e-7):
    thetaplus = theta + epsilon                               # Step 1
    thetaminus = theta - epsilon                              # Step 2
    J_plus = forward_propagation(x, thetaplus)                                  # Step 3
    J_minus = forward_propagation(x, thetaminus)                                 # Step 4
    gradapprox = (J_plus - J_minus) / (2 * epsilon)                              # Step 5
    
    grad = backward_propagation(x, theta)
    
    numerator = np.linalg.norm(grad - gradapprox)                               # Step 1'  np.linalg.norm:欧氏距离
    denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)                            # Step 2'
    difference = numerator / denominator                              # Step 3'
    
    if difference < 1e-7:
        print ("The gradient is correct!")
    else:
        print ("The gradient is wrong!")
    
    return difference
x, theta = 2, 4
difference = gradient_check(x, theta)
print("difference = " + str(difference))
The gradient is correct!
difference = 2.919335883291695e-10

(三)N维梯度检验

def forward_propagation_n(X, Y, parameters):
    
    m = X.shape[1]
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = relu(Z2)
    Z3 = np.dot(W3, A2) + b3
    A3 = sigmoid(Z3)
    
    logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
    cost = 1./m * np.sum(logprobs)
    
    cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
    
    return cost, cache
def backward_propagation_n(X, Y, cache):
    
    m = X.shape[1]
    (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
    
    dZ3 = A3 - Y
    dW3 = 1./m * np.dot(dZ3, A2.T)
    db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
    
    dA2 = np.dot(W3.T, dZ3)
    dZ2 = np.multiply(dA2, np.int64(A2 > 0))
    dW2 = 1./m * np.dot(dZ2, A1.T) * 2   #错
    db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
    
    dA1 = np.dot(W2.T, dZ2)
    dZ1 = np.multiply(dA1, np.int64(A1 > 0))
    dW1 = 1./m * np.dot(dZ1, X.T)
    db1 = 4./m * np.sum(dZ1, axis=1, keepdims = True)  #故意写错为了测试difference
    
    gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
                 "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
                 "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
    
    return gradients
def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
    
    parameters_values, _ = dictionary_to_vector(parameters)
    grad = gradients_to_vector(gradients)
    num_parameters = parameters_values.shape[0]
    J_plus = np.zeros((num_parameters, 1))
    J_minus = np.zeros((num_parameters, 1))
    gradapprox = np.zeros((num_parameters, 1))
    
    for i in range(num_parameters):
        
        thetaplus = np.copy(parameters_values)                                      # Step 1 所有参数加epsilon
        thetaplus[i][0] = thetaplus[i][0] + epsilon                                # Step 2
        J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))                                  # Step 3

        thetaminus = np.copy(parameters_values)                                     # Step 1 所有参数减epsilon
        thetaminus[i][0] = thetaminus[i][0] - epsilon                            # Step 2        
        J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus))                                  # Step 3

        gradapprox[i] = (J_plus[i] - J_minus[i]) / (2.* epsilon)

    numerator = np.linalg.norm(grad - gradapprox)                                           # Step 1'
    denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)                                         # Step 2'
    difference = numerator / denominator                                          # Step 3'

    if difference > 1e-7:
        print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
    else:
        print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
    
    return difference
X, Y, parameters = gradient_check_n_test_case()

cost, cache = forward_propagation_n(X, Y, parameters)
gradients = backward_propagation_n(X, Y, cache)
difference = gradient_check_n(parameters, gradients, X, Y)

There is a mistake in the backward propagation! difference = 0.2850931566540251

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
BP神经网络正则化是为了避免过拟合现象而进行的一种处理方法,主要有L1正则化和L2正则化两种方式。其中L1正则化的处理方法是将权重矩阵中的每个元素都乘以一个小于1的缩小因子,而L2正则化的处理方法是将权重矩阵中的每个元素都乘以一个小于1的缩小因子的平方。 以下是一个使用L2正则化的BP神经网络代码示例: ``` import numpy as np class NeuralNetwork: def __init__(self, input_dim, hidden_dim, output_dim, lambda_val): self.input_dim = input_dim self.hidden_dim = hidden_dim self.output_dim = output_dim self.lambda_val = lambda_val self.W1 = np.random.randn(self.input_dim, self.hidden_dim) / np.sqrt(self.input_dim) self.b1 = np.zeros((1, self.hidden_dim)) self.W2 = np.random.randn(self.hidden_dim, self.output_dim) / np.sqrt(self.hidden_dim) self.b2 = np.zeros((1, self.output_dim)) def sigmoid(self, x): return 1.0 / (1.0 + np.exp(-x)) def sigmoid_prime(self, x): return self.sigmoid(x) * (1 - self.sigmoid(x)) def feedforward(self, X): self.z1 = np.dot(X, self.W1) + self.b1 self.a1 = self.sigmoid(self.z1) self.z2 = np.dot(self.a1, self.W2) + self.b2 exp_scores = np.exp(self.z2) probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) return probs def calculate_loss(self, X, y): num_examples = len(X) probs = self.feedforward(X) logprobs = -np.log(probs[range(num_examples), y]) data_loss = np.sum(logprobs) data_loss += self.lambda_val/2 * (np.sum(np.square(self.W1)) + np.sum(np.square(self.W2))) return 1./num_examples * data_loss def predict(self, X): probs = self.feedforward(X) return np.argmax(probs, axis=1) def backpropagation(self, X, y): num_examples = len(X) delta3 = self.feedforward(X) delta3[range(num_examples), y] -= 1 dW2 = (self.a1.T).dot(delta3) db2 = np.sum(delta3, axis=0, keepdims=True) delta2 = delta3.dot(self.W2.T) * self.sigmoid_prime(self.z1) dW1 = np.dot(X.T, delta2) db1 = np.sum(delta2, axis=0) dW2 += self.lambda_val * self.W2 dW1 += self.lambda_val * self.W1 return dW1, db1, dW2, db2 def train(self, X, y, num_passes=20000, learning_rate=0.01): for i in range(num_passes): dW1, db1, dW2, db2 = self.backpropagation(X,y) self.W1 -= learning_rate * dW1 self.b1 -= learning_rate * db1 self.W2 -= learning_rate * dW2 self.b2 -= learning_rate * db2 if i % 1000 == 0: print("Loss after iteration %i: %f" %(i, self.calculate_loss(X,y))) ``` 以上代码中,NeuralNetwork类的构造函数中输入参数依次为输入层维度、隐藏层维度、输出层维度和正则化因子。其中,初始化权重矩阵时使用的是随机高斯分布,并且通过除以根号下输入层或隐藏层维度来缩放权重矩阵。在feedforward方法中,首先计算z值和激活函数值,然后通过softmax函数计算输出概率。在calculate_loss方法中,计算交叉熵损失和L2正则化损失之和。在backpropagation方法中,首先计算输出层误差,然后反向传播计算隐藏层误差,最后计算梯度并添加L2正则化项。在train方法中,迭代训练神经网络并打印出每一次迭代后的损失值。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

伍六琪

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值