吴恩达cs229|编程作业第四周（Python）

最新推荐文章于 2024-06-24 16:20:40 发布

NotFound1911

最新推荐文章于 2024-06-24 16:20:40 发布

阅读量661

点赞数

分类专栏：自学

本文链接：https://blog.csdn.net/qq_24739717/article/details/88726258

版权

自学专栏收录该内容

78 篇文章 21 订阅

订阅专栏

练习四：神经网络学习

1.包含的文件

2.神经网络

3.反向传播

1.包含的文件

文件名	含义
ex4.py	主程序
ex4data1.mat	数据
ex4weights.mat	神经网络权重
displayData.py	数据可视化
debugInitializeWeights.py	权重初始化
predict.py	前向传播预测
sigmoid.py	sigmoid函数
computeNumericalGradient.py	计算神经网络梯度
checkNNGradients.py	验证梯度
nncostfunction.py	神经网络代价函数
sigmoidgradient.py	sigmoid梯度
randInitializeWeights.py	随机初始化权重

红色部分需要自己填写。

2.神经网络

导入需要的包以及初始化：

import matplotlib.pyplot as plt
import numpy as np
import scipy.io as scio
import scipy.optimize as opt
import displayData as dd
import nncostfunction as ncf
import sigmoidgradient as sg
import randInitializeWeights as rinit
import checkNNGradients as cng
import predict as pd

plt.ion()

# Setup the parameters you will use for this part of the exercise
input_layer_size = 400  # 20x20 input images of Digits
hidden_layer_size = 25  # 25 hidden layers
num_labels = 10         # 10 labels, from 0 to 9
                        # Note that we have mapped "0" to label 10

2.1数据可视化

数据可视化代码displayData.py:

import matplotlib.pyplot as plt
import numpy as np


def display_data(x):
    (m, n) = x.shape

    # Set example_width automatically if not passed in
    example_width = np.round(np.sqrt(n)).astype(int)
    example_height = (n / example_width).astype(int)

    # Compute the number of items to display
    display_rows = np.floor(np.sqrt(m)).astype(int)
    display_cols = np.ceil(m / display_rows).astype(int)

    # Between images padding
    pad = 1

    # Setup blank display
    display_array = - np.ones((pad + display_rows * (example_height + pad),
                              pad + display_rows * (example_height + pad)))

    # Copy each example into a patch on the display array
    curr_ex = 0
    for j in range(display_rows):
        for i in range(display_cols):
            if curr_ex > m:
                break

            # Copy the patch
            # Get the max value of the patch
            max_val = np.max(np.abs(x[curr_ex]))
            display_array[pad + j * (example_height + pad) + np.arange(example_height),
                          pad + i * (example_width + pad) + np.arange(example_width)[:, np.newaxis]] = \
                          x[curr_ex].reshape((example_height, example_width)) / max_val
            curr_ex += 1

        if curr_ex > m:
            break

    # Display image
    plt.figure()
    plt.imshow(display_array, cmap='gray', extent=[-1, 1, -1, 1])
    plt.axis('off')

测试代码：

# ===================== Part 1: Loading and Visualizing Data =====================
# We start the exercise by first loading and visualizing the dataset.
# You will be working with a dataset that contains handwritten digits.
#

# Load Training Data
print('Loading and Visualizing Data ...')

data = scio.loadmat('ex4data1.mat')
X = data['X']
y = data['y'].flatten()
m = y.size

# Randomly select 100 data points to display
rand_indices = np.random.permutation(range(m))#打乱顺序排列
selected = X[rand_indices[0:100], :]

dd.display_data(selected)

input('Program paused. Press ENTER to continue')

测试结果：

2.2前向传播和代价函数

神经网络模型：

还记得神经网络的代价函数(没有正则化)是

同样，回想一下原来的标签(在变量y中)是1,2，…， 10，为了训练神经网络，我们需要将标签重新编码为只包含0或1的向量

正则化的代价函数：

编写神经网络代价函数（此处以及加上正则化）nncostfunction.py：

import numpy as np
from sigmoid import *

def nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd):
    # Reshape nn_params back into the parameters theta1 and theta2, the weight 2-D arrays
    # for our two layer neural network
    theta1 = nn_params[:hidden_layer_size * (input_layer_size + 1)].reshape(hidden_layer_size, input_layer_size + 1)
    theta2 = nn_params[hidden_layer_size * (input_layer_size + 1):].reshape(num_labels, hidden_layer_size + 1)

    # Useful value
    m = y.size

    # You need to return the following variables correctly
    cost = 0
    theta1_grad = np.zeros(theta1.shape)  # 25 x 401
    theta2_grad = np.zeros(theta2.shape)  # 10 x 26

    # ===================== Your Code Here =====================
    # Instructions : You should complete the code by working thru the
    #                following parts
    #
    # Part 1 : Feedforward the neural network and return the cost in the
    #          variable cost. After implementing Part 1, you can verify that your
    #          cost function computation is correct by running ex4.py
    #
    # Part 2: Implement the backpropagation algorithm to compute the gradients
    #         theta1_grad and theta2_grad. You should return the partial derivatives of
    #         the cost function with respect to theta1 and theta2 in theta1_grad and
    #         theta2_grad, respectively. After implementing Part 2, you can check
    #         that your implementation is correct by running checkNNGradients
    #
    #         Note: The vector y passed into the function is a vector of labels
    #               containing values from 1..K. You need to map this vector into a 
    #               binary vector of 1's and 0's to be used with the neural network
    #               cost function.
    #
    #         Hint: We recommend implementing backpropagation using a for-loop
    #               over the training examples if you are implementing it for the 
    #               first time.
    #
    # Part 3: Implement regularization with the cost function and gradients.
    #
    #         Hint: You can implement this around the code for
    #               backpropagation. That is, you can compute the gradients for
    #               the regularization separately and then add them to theta1_grad
    #               and theta2_grad from Part 2.
    #
    #给输入添加1列为偏置 (即行添加方向)
    a = np.ones(m)
    X_in = np.c_[a,X]
    layer1 = np.dot(X_in, theta1.T) #(5000*401)*(401*25) = 5000*25
    layer1_out = sigmoid(layer1) #5000*25
    b = np.ones(layer1_out.shape[0])
    layer2_in = np.c_[b, layer1_out]#5000*26
    layer2 = np.dot(layer2_in, theta2.T)#(5000*26)*(26*10)
    layer2_out = sigmoid(layer2)#5000*10
    
    reg_theta1 = theta1[:, 1:]  # 25 x 400
    reg_theta2 = theta2[:, 1:]  # 10 x 25
    Y = np.zeros((m, num_labels))  # 5000 x 10
    for i in range(m):
        Y[i, y[i]-1] = 1
    cost = (1/m)*np.sum(-Y*np.log(layer2_out) - np.subtract(1, Y)*np.log(np.subtract(1, layer2_out)))\
            +lmd/(2*m)*(np.sum(reg_theta1*reg_theta1)+np.sum(reg_theta2*reg_theta2))
    
    error3 = layer2_out - Y #二次损失函数是这个形式 5000*10
    #5000*26
    error2 = np.dot(error3, theta2)*(layer2_in*np.subtract(1, layer2_in)) #5000*26
    error2 = error2[:, 1:] #5000*25
    
    delta1 = np.dot(error2.T, X_in)  # 25 x 401
    delta2 = np.dot(error3.T, layer2_in ) # 10 x 26
    
    p1 = (lmd / m) * np.c_[np.zeros(hidden_layer_size), reg_theta1]#第一列是偏置项  不进行正则化
    p2 = (lmd / m) * np.c_[np.zeros(num_labels), reg_theta2]

    theta1_grad = p1 + (delta1 / m)
    theta2_grad = p2 + (delta2 / m)
 
    # ====================================================================================
    # Unroll gradients
    grad = np.concatenate([theta1_grad.flatten(), theta2_grad.flatten()])

    return cost, grad

相关测试验证代码：

# ===================== Part 2: Loading Parameters =====================
# In this part of the exercise, we load some pre-initiated
# neural network parameters

print('Loading Saved Neural Network Parameters ...')

data = scio.loadmat('ex4weights.mat')
theta1 = data['Theta1']
theta2 = data['Theta2']

nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])

# ===================== Part 3: Compute Cost (Feedforward) =====================
# To the neural network, you should first start by implementing the
# feedforward part of the neural network that returns the cost only. You
# should complete the code in nncostfunction.py to return cost. After
# implementing the feedforward to compute the cost, you can verify that
# your implementation is correct by verifying that you get the same cost
# as us for the fixed debugging parameters.
#
# We suggest implementing the feedforward cost *without* regularization
# first so that it will be easier for you to debug. Later, in part 4, you
# will get to implement the regularized cost.
#

print('Feedforward Using Neural Network ...')

# Weight regularization parameter (we set this to 0 here).
lmd = 0

cost, grad = ncf.nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)

print('Cost at parameters (loaded from ex4weights): {:0.6f}\n(This value should be about 0.287629)'.format(cost))

input('Program paused. Press ENTER to continue')

# ===================== Part 4: Implement Regularization =====================
# Once your cost function implementation is correct, you should now
# continue to implement the regularization with the cost.
#

print('Checking Cost Function (w/ Regularization) ...')

# Weight regularization parameter (we set this to 1 here).
lmd = 1

cost, grad = ncf.nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)

print('Cost at parameters (loaded from ex4weights): {:0.6f}\n(This value should be about 0.383770)'.format(cost))

input('Program paused. Press ENTER to continue')

测试结果：

Program paused. Press ENTER to continue
Loading Saved Neural Network Parameters ...
Feedforward Using Neural Network ...
Cost at parameters (loaded from ex4weights): 0.28762900000000000000000000000000000000
(This value should be about 0.287629)

Program paused. Press ENTER to continue
Checking Cost Function (w/ Regularization) ...
Cost at parameters (loaded from ex4weights): 0.383770
(This value should be about 0.383770)

3.反向传播

3.1sigmoid梯度计算

sigmoid函数的梯度可以计算为:

编写代码sigmoidgradient.py：

import numpy as np
from sigmoid import *

def sigmoid_gradient(z):
    g = np.zeros(z.shape)

    # ===================== Your Code Here =====================
    # Instructions : Compute the gradient of the sigmoid function evaluated at
    #                each value of z (z can be a matrix, vector or scalar)
    #
    g = sigmoid(z) * (1 - sigmoid(z))


    # ===========================================================

    return g

测试代码：

# ===================== Part 5: Sigmoid Gradient =====================
# Before you start implementing the neural network, you will first
# implement the gradient for the sigmoid function. You should complete the
# code in the sigmoidGradient.py file
#

print('Evaluating sigmoid gradient ...')

g = sg.sigmoid_gradient(np.array([-1, -0.5, 0, 0.5, 1]))

print('Sigmoid gradient evaluated at [-1  -0.5  0  0.5  1]:\n{}'.format(g))

input('Program paused. Press ENTER to continue')

测试结果

Program paused. Press ENTER to continue
Evaluating sigmoid gradient ...
Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:
[ 0.19661193 0.23500371 0.25 0.23500371 0.19661193]

3.2初始化权重

编写随机初始化权重代码

import numpy as np

def rand_initialization(l_in, l_out):
    # You need to return the following variable correctly
    w = np.zeros((l_out, 1 + l_in))

    # ===================== Your Code Here =====================
    # Instructions : Initialize w randomly so that we break the symmetry while
    #                training the neural network
    #
    # Note : The first column of w corresponds to the parameters for the bias unit
    #
    ep_init = 0.08
    w = np.random.rand(l_out, 1 + l_in) * (2 * ep_init) - ep_init
    # ===========================================================

    return w

测试代码：

# ===================== Part 6: Initializing Parameters =====================
# In this part of the exercise, you will be starting to implement a two
# layer neural network that classifies digits. You will start by
# implementing a function to initialize the weights of the neural network
# (randInitializeWeights.m)

print('Initializing Neural Network Parameters ...')

initial_theta1 = rinit.rand_initialization(input_layer_size, hidden_layer_size)
initial_theta2 = rinit.rand_initialization(hidden_layer_size, num_labels)

# Unroll parameters
initial_nn_params = np.concatenate([initial_theta1.flatten(), initial_theta2.flatten()])

3.3反向传播

反向传播示意图：

对于输出层的误差，可定义为 $\delta_{j}^{(4)}=a_{j}^{(4)}-y_{j}$ ，如果将其向量化，可写成 $\delta^{(4)}=a^{(4)}-y$ ，计算反向传播的过程如下：

$\large \begin{align*} \delta^{(4)} &= a^{(4)}-y \\ \delta^{(3)} &= (\theta^{(3)})^{T}\delta^{(4)}.*{g(z^{(3)})}' \\ \delta^{(2)} &= (\theta^{(2)})^{T}\delta^{(3)}.*{g(z^{(2)})}' \end{align*}$

其中，不存在 $\delta^{(1)}$ ，因为输入层不存在误差。如果忽略正则项，可近似如下：

$\large \frac{\partial }{\partial \theta_{ij}^{(l)}}J(\theta)=a_{j}^{(l)}*\delta_{i}^{(l+1)}$

通过前向传播，可以由输入层传递信息至输出层，得到最终输出；再通过反向传播，从输出层传回至输入层，修改参数、优化模型。可以通过以下定义步骤：

其中， $\frac{\partial }{\partial \theta_{ij}^{(l)}}J(\theta)=D_{ij}^{(l)}$ 。因此，可继续使用梯度下降算法，对每一个神经单元间的 $\theta$ 进行优化。
正规来说，误差定义如下：

$\large \begin{align*} \delta_{j}^{(l)}&=\frac{\partial }{\partial z_{j}^{(l)}}cost(i) \\ cost(i)&=y^{(i)}log(h_{\theta}(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)})) \end{align*}$

注：在神经网络的计算中， $J(\theta)$ 是非凸函数，理论上会收敛于局部最小值，但实际运用中，神经网络还是可以收敛于接近全局最小值的局部最小值的。

梯度检测（gradient checking）
如果你要检查一下fi是否输出了正确的导数值，会设置：

你现在可以数值验证f i(θ)的正确性通过检查,对于每一个i：

对于一点的 $J(\theta)$ ，可近似为 $\frac{\partial }{\partial \theta}J(\theta)\approx \frac{J(\theta+\xi)-J(\theta-\xi)}{2\xi}$ ， $\xi$ 为一很小的数。通过此方法，验证上述的反向传播正确性。
注：当证明反向传播正确时，需关闭梯度检测，因梯度检测计算量巨大，时间复杂度高。

（该部分转自：https://blog.csdn.net/zhq9695/article/details/82864551#2.%C2%A0%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD%EF%BC%88back%20propagation%EF%BC%89）

梯度检测代码：

import numpy as np
import debugInitializeWeights as diw
import nncostfunction as ncf
import computeNumericalGradient as cng


def check_nn_gradients(lmd):

    input_layer_size = 3
    hidden_layer_size = 5
    num_labels = 3
    m = 5
    # We generatesome 'random' test data
    theta1 = diw.debug_initialize_weights(hidden_layer_size, input_layer_size)
    theta2 = diw.debug_initialize_weights(num_labels, hidden_layer_size)

    # Reusing debugInitializeWeights to genete X
    X = diw.debug_initialize_weights(m, input_layer_size - 1)
    y = 1 + np.mod(np.arange(1, m + 1), num_labels)

    # Unroll parameters
    nn_params = np.concatenate([theta1.flatten(), theta2.flatten()])

    def cost_func(p):
        return ncf.nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)

    cost, grad = cost_func(nn_params)
    numgrad = cng.compute_numerial_gradient(cost_func, nn_params)

    print(np.c_[grad, numgrad])

3.4神经网络正则化

具体地说，在使用反向传播计算了之后，应该使用

一些了测试代码：

# ===================== Part 7: Implement Backpropagation =====================
# Once your cost matches up with ours, you should proceed to implement the
# backpropagation algorithm for the neural network. You should add to the
# code you've written in nncostfunction.py to return the partial
# derivatives of the parameters.
#

print('Checking Backpropagation ... ')

# Check gradients by running check_nn_gradients()

lmd = 0
cng.check_nn_gradients(lmd)

input('Program paused. Press ENTER to continue')

# ===================== Part 8: Implement Regularization =====================
# Once your backpropagation implementation is correct, you should now
# continue to implement the regularization with the cost and gradient.
#

print('Checking Backpropagation (w/ Regularization) ...')

lmd = 3
cng.check_nn_gradients(lmd)

# Also output the cost_function debugging values
debug_cost, _ = ncf.nn_cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)

print('Cost at (fixed) debugging parameters (w/ lambda = {}): {:0.6f}\n(for lambda = 3, this value should be about 0.576051)'.format(lmd, debug_cost))

input('Program paused. Press ENTER to continue')

# ===================== Part 9: Training NN =====================
# You have now implemented all the code necessary to train a neural
# network. To train your neural network, we will now use 'opt.fmin_cg'.
#

print('Training Neural Network ... ')

lmd = 1


def cost_func(p):
    return ncf.nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)[0]


def grad_func(p):
    return ncf.nn_cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lmd)[1]

nn_params, *unused = opt.fmin_cg(cost_func, fprime=grad_func, x0=nn_params, maxiter=400, disp=True, full_output=True)

# Obtain theta1 and theta2 back from nn_params
theta1 = nn_params[:hidden_layer_size * (input_layer_size + 1)].reshape(hidden_layer_size, input_layer_size + 1)
theta2 = nn_params[hidden_layer_size * (input_layer_size + 1):].reshape(num_labels, hidden_layer_size + 1)

input('Program paused. Press ENTER to continue')

# ===================== Part 10: Visualize Weights =====================
# You can now 'visualize' what the neural network is learning by
# displaying the hidden units to see what features they are capturing in
# the data

print('Visualizing Neural Network...')

dd.display_data(theta1[:, 1:])

input('Program paused. Press ENTER to continue')

# ===================== Part 11: Implement Predict =====================
# After the training the neural network, we would like to use it to predict
# the labels. You will now implement the 'predict' function to use the
# neural network to predict the labels of the training set. This lets
# you compute the training set accuracy.

pred = pd.predict(theta1, theta2, X)

print('Training set accuracy: {}'.format(np.mean(pred == y)*100))

input('ex4 Finished. Press ENTER to exit')

测试结果：

Program paused. Press ENTER to continue
Initializing Neural Network Parameters ...
Checking Backpropagation ...
[[ 9.01303866e-03 9.01303866e-03]
[ -6.08047127e-05 -6.08047124e-05]
[ -6.96665817e-06 -6.96665836e-06]
[ 5.32765097e-05 5.32765121e-05]
[ 1.17193332e-02 1.17193332e-02]
[ -7.05495376e-05 -7.05495373e-05]
[ 1.66652194e-04 1.66652196e-04]
[ 2.50634667e-04 2.50634669e-04]
[ 3.66087511e-03 3.66087511e-03]
[ -1.54510225e-05 -1.54510182e-05]
[ 1.86817175e-04 1.86817173e-04]
[ 2.17326523e-04 2.17326523e-04]
[ -7.76550109e-03 -7.76550108e-03]
[ 5.38947948e-05 5.38947931e-05]
[ 3.53029178e-05 3.53029161e-05]
[ -1.57462990e-05 -1.57462976e-05]
[ -1.20637760e-02 -1.20637760e-02]
[ 7.36351996e-05 7.36352002e-05]
[ -1.48712777e-04 -1.48712773e-04]
[ -2.34334912e-04 -2.34334914e-04]
[ 3.02286353e-01 3.02286353e-01]
[ 1.51010770e-01 1.51010770e-01]
[ 1.45233242e-01 1.45233242e-01]
[ 1.58998192e-01 1.58998192e-01]
[ 1.46779086e-01 1.46779086e-01]
[ 1.48987769e-01 1.48987769e-01]
[ 9.95931723e-02 9.95931723e-02]
[ 4.96122519e-02 4.96122519e-02]
[ 4.83540132e-02 4.83540132e-02]
[ 5.18660079e-02 5.18660079e-02]
[ 4.85328991e-02 4.85328991e-02]
[ 4.93783641e-02 4.93783641e-02]
[ 9.69324215e-02 9.69324215e-02]
[ 4.89006564e-02 4.89006564e-02]
[ 4.65577354e-02 4.65577354e-02]
[ 5.05267299e-02 5.05267299e-02]
[ 4.76803471e-02 4.76803471e-02]
[ 4.74319072e-02 4.74319072e-02]]

Program paused. Press ENTER to continue
Checking Backpropagation (w/ Regularization) ...
[[ 0.00901304 0.00901304]
[ 0.05042745 0.05042745]
[ 0.05455088 0.05455088]
[ 0.00852048 0.00852048]
[ 0.01171933 0.01171933]
[-0.05760601 -0.05760601]
[-0.01659828 -0.01659828]
[ 0.03966983 0.03966983]
[ 0.00366088 0.00366088]
[ 0.02471166 0.02471166]
[-0.03245445 -0.03245445]
[-0.05978209 -0.05978209]
[-0.0077655 -0.0077655 ]
[ 0.02526392 0.02526392]
[ 0.05947174 0.05947174]
[ 0.03900152 0.03900152]
[-0.01206378 -0.01206378]
[-0.05761021 -0.05761021]
[-0.04520795 -0.04520795]
[ 0.0087583 0.0087583 ]
[ 0.30228635 0.30228635]
[ 0.20149903 0.20149903]
[ 0.19979109 0.19979109]
[ 0.16746539 0.16746539]
[ 0.10137094 0.10137094]
[ 0.09145231 0.09145231]
[ 0.09959317 0.09959317]
[ 0.08903145 0.08903145]
[ 0.10771551 0.10771551]
[ 0.07659312 0.07659312]
[ 0.01589163 0.01589163]
[-0.01062105 -0.01062105]
[ 0.09693242 0.09693242]
[ 0.07411068 0.07411068]
[ 0.10599418 0.10599418]
[ 0.089544 0.089544 ]
[ 0.03040615 0.03040615]
[-0.01025194 -0.01025194]]
Cost at (fixed) debugging parameters (w/ lambda = 3): 0.576051
(for lambda = 3, this value should be about 0.576051)

Program paused. Press ENTER to continue
Training Neural Network ...
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.308920
Iterations: 400
Function evaluations: 952
Gradient evaluations: 952

Program paused. Press ENTER to continue
Visualizing Neural Network...

Program paused. Press ENTER to continue
Training set accuracy: 99.53999999999999

ex4 Finished. Press ENTER to exit

注：所有代码及说明PDF在全部更新完后统一上传