机器学习-coursera-exercise4-神经网络

最新推荐文章于 2024-07-30 01:28:06 发布

爱抠脚的coder

最新推荐文章于 2024-07-30 01:28:06 发布

阅读量1.1k

点赞数

分类专栏：机器学习-coursera-Andrew Ng 文章标签：神经网络练习题识别手写数字机器学习练习四反向传播的应用

本文链接：https://blog.csdn.net/m0_37393514/article/details/79356566

版权

机器学习-coursera-Andrew Ng 专栏收录该内容

15 篇文章 3 订阅

订阅专栏

一、神经网络-手写数字的识别

（1）可视化数据

跟前面一次练习题的可视化数据的函数一样，代码不贴了，同 exercise3-可视化数据

跟之前一样将每一个样例（20pixel*20pixel）的灰度图像展开成一个向量（400维），这样得到的X矩阵包含m（m=5000）行，每一行是一个样例。

y是一个5000维的向量。

（2）模型表示

神经网络存在三层，一个输入层，一个隐藏层（25单元），一个输出层（10类，数字0-9）。输入层包含400个单元（不包括额外的偏差单元），压缩包里面给我们在ex4weights.mat里面提供了参数

和

，这样我们可以得到大小是25*101的

和大小为10*26的

。

% Load the weights into variables Theta1 and Theta2
load('ex4weights.mat');

% Unroll parameters 
nn_params = [Theta1(:) ; Theta2(:)];

（3）前馈算法和代价函数-nnCostFuncition.m返回的是代价cost

不正则化的神经网络的代价函数：

现在你需要完成神经网络的代价函数和梯度的代码，需要注意的是神经网络的输出不是1-10，而是代表1-10的是个向量，只包含0和1，向量如下：

每个样例运用前向传播算法计算

,并且为了所有的样例累加代价函数（你的代码需要运用到任何大小和任何多个标记（K>=3）的数据集）；

提示：

X的每一行都是一个样例，所以X(i,:)'是第i个样例，大小是n*1维的向量；

你还需要为X添加全1列；

你可以使用for循环来计算所有样例的代价；

神经网络中的每一个单元的参数都表示在参数theta1和theta2里面的每一行，特别的是theta1的第一行对应于第二层的第一个隐藏单元；

这里先贴上部分代码：

X = [ones(m, 1) X];%需要给每一个样例添加1，作为偏差项

ylabel = zeros(num_labels, m);%每一列都是一个样例

%y的每个元素都作为下标来初始化ylabel，这个保存的是给的样例的结果向量形式
for i=1:m
    ylabel(y(i), i) = 1;
end

%前向反馈算法
z2 = X*Theta1';%X->m*(n+1),theta1->hiden*(n+1),z2->m*hiden
a2 = sigmoid(z2);
a2 = [ones(m, 1) a2];%a2->m*(hiden+1)
z2 = [ones(m, 1) z2];
z3=a2*Theta2';%theta2->num_labels*(hiden+1),z3->m*num_labels
a3 = sigmoid(z3);%a3->m*num_labels

%method1
%因为a3和num_labels的大小，可以使用for循环，否则就要使用点乘累加的方法
%{
for i=1:m
    J = J - log(a3(i, :))*ylabel(:, i) - (log(1 - a3(i, :)) * (1 - ylabel(:, i)));
end
%}
%J = J/m;

%method2:使用的是点乘累加的方法
J=-1/m*sum(sum(log(a3').*ylabel)+sum(log(1-a3').*(1-ylabel)));

（4）正则化的代价函数

正则化的代价函数：

因为我们只有三层，所以也可以写成：

虽然我们写成上面这样，但是你需要清楚的知道，你的代码需要适用于不同的单元数目的神经网络。也就是

和

的大小是任意的。同样的是我们添加的偏差项是无需正则化的，这一点与之前相同（对于神经网络的

矩阵来说就是第一列）

在之前的代码的基础上：

%正则化
J = J + lambda/2/m * (sum(sum(Theta1(:, 2:end).^2)) + sum(sum(Theta2(:, 2:end).^2)));

二、反向传播算法

（1）s形函数的求导

在之前的博客中我们已经推倒过s形函数，得到的结果：

代码如下：

g = sigmoid(z).*(1 - sigmoid(z));

（2）随机初始化

不能初始化为0，也不能初始化一样的数，因为如果初始化为一样的数，会导致下一个层的单元都一样！初始化-eps到eps之间！你需要完成randInitializeWeights.m来初始化参数theta，选择eps可以使用下面的方法，

代码如下：

epsilon = 0.12;
W = rand(L_out, 1+L_in)*2*epsilon - epsilon;

（3）反向传播算法

需要注意的是：我们先使用的是前向算法求出所有的激励单元，也包括最后一层的输出假设，对于l层的节点j，我们计算误差项

，这个误差项是为了衡量他对最后输出的误差起到了多大的作用。对于输出层的节点，我们可以直接的计算真实值和网络激励之间差值

，但是对于隐藏层单元来说，我们计算

基于的是l+1层的节点的误差项的加权平均值。我们需要使用一个for循环，每次处理一个案例。你可以这样for t=1:m，并且将下面的四个步骤放入for循环里面，第五步就是将累积的梯度值除以m从而得到神经网络的代价函数。

step1：初始化

，然后使用前向传播算法计算

，当然这个步骤中你需要给

添加一个1项（偏差单元）--->其实之前实现过，所以不要在计算了其实！

step2：计算第三层的误差

；

step3：对于第二层

；

step4：使用公式

累积这个例子的所有的梯度！注意你需要跳过或者移除

，

；

step5：通过将之前累积的梯度除以m得到没有正则化的神经网络代价函数的梯度

；

一些建议：你只有顺利的完成前向反馈算法和代价函数才能够实现反向传播算法！

代码：

%反向传播算法来计算神经网络的代价函数的梯度
Delta1 = zeros(size(Theta1));%误差矩阵
Delta2 = zeros(size(Theta2));

for t = 1:m
    delta3 = a3(t, :)' - ylabel(:, t);
    %step2:a3->m*num_labels,ylabel->num_labels*m,delta3->num_labels*1
    delta2 = Theta2'*delta3 .* sigmoidGradient(z2(t, :)');
    %step3:Theta2->num_labels*(hiden+1),z2->m*(hiden+1)
    %你需要知道的是因为theta2'*delta3得到的是一个(hiden+1)*1且delta2的大小是(hiden+1)*1，所以后面你需要
    %去除一项
    
    %step4:Delta1与Theta1的大小一样->(hiden)*(n+1),Delta2和Theta2的大小一样
    %->num_labels*(hiden+1)
    %X->m*(n+1)
    Delta1 = Delta1 + delta2(2:end) * X(t, :);
    Delta2 = Delta2 + delta3 * a2(t, :);
end
%step5:除以m
Theta1_grad = Delta1 / m;
Theta2_grad = Delta2 / m;

（4）梯度检验（真正运行你的算法的时候，梯度检验是需要关闭的）

当使用梯度检验的时候，我们尽量使用小型的神经网络，带有少数量的是输入单元和隐藏单元，因此参数也相对较少。且当你确定你的梯度计算已经准确了，需要关闭梯度检验！！！（梯度检验可以用在任何函数上，只要你计算了代价和梯度，你可以使用相同的computeNumericalGradient.m函数来完成如果逻辑回归的梯度检查）

在computeNumericalGradient.m函数里面：无需自己填写

function numgrad = computeNumericalGradient(J, theta)
%COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"
%and gives us a numerical estimate of the gradient.
%   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical
%   gradient of the function J around theta. Calling y = J(theta) should
%   return the function value at theta.

% Notes: The following code implements numerical gradient checking, and 
%        returns the numerical gradient.It sets numgrad(i) to (a numerical 
%        approximation of) the partial derivative of J with respect to the 
%        i-th input argument, evaluated at theta. (i.e., numgrad(i) should 
%        be the (approximately) the partial derivative of J with respect 
%        to theta(i).)
%                

numgrad = zeros(size(theta));
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)%numel(A)：返回数组A中的元素的个数
    % Set perturbation vector
    perturb(p) = e;
    loss1 = J(theta - perturb);
    loss2 = J(theta + perturb);
    % Compute Numerical Gradient
    numgrad(p) = (loss2 - loss1) / (2*e);
    perturb(p) = 0;%需要注意的是这个是一个一个的变成加减eps，之后要更新回来0
end

end

checkNNGradient.m：实现的是小型的神经网络来检测反向传播代码产生的梯度和使用函数computeNumericalGradient.m产生的数值梯度，这两个梯度应该是相似的值--->调用computeNumericalGradient.m函数（无需我们填写，但可以很清晰的看到神经网络的构建过程！！！）

function checkNNGradients(lambda)
%CHECKNNGRADIENTS Creates a small neural network to check the
%backpropagation gradients
%   CHECKNNGRADIENTS(lambda) Creates a small neural network to check the
%   backpropagation gradients, it will output the analytical gradients
%   produced by your backprop code and the numerical gradients (computed
%   using computeNumericalGradient). These two gradient computations should
%   result in very similar values.
%

if ~exist('lambda', 'var') || isempty(lambda)
    lambda = 0;
end

%step1：为了构造神经网络的结构！
input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;

%step2：为了"随机"初始化参数
% We generate some 'random' test data
%这里的random打上双引号，意味着其实都是一样的数组，只是数组的大小不一样！！详细可以看debugInitialWeights.m函数
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);

% Reusing debugInitializeWeights to generate X
X  = debugInitializeWeights(m, input_layer_size - 1);
y  = 1 + mod(1:m, num_labels)';%这里产生的y数组很显然是元素小于等于num_labels的正数的列向量

%接下来使用前向反馈算法-->计算J-->反向传播算法计算偏导数（这之前的都包含在nnCostFunction.m这个函数里面）-->数值梯度检验
% Unroll parameters
nn_params = [Theta1(:) ; Theta2(:)];

% Short hand for cost function
costFunc = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, ...
                               num_labels, X, y, lambda);

[cost, grad] = costFunc(nn_params);%返回的是代价和使用反向传播算法计算的梯度，后面会用来与数值检验得到的梯度对比
numgrad = computeNumericalGradient(costFunc, nn_params);%传入的J和theta（向量）得到数值梯度检测得到的梯度

% Visually examine the two gradient computations.  The two columns
% you get should be very similar. 
disp([numgrad grad]);
fprintf(['The above two columns you get should be very similar.\n' ...
         '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);

% Evaluate the norm of the difference between two solutions.  
% If you have a correct implementation, and assuming you used EPSILON = 0.0001 
% in computeNumericalGradient.m, then diff below should be less than 1e-9
diff = norm(numgrad-grad)/norm(numgrad+grad);%norm(A),A是一个向量，那么我们得到的结果就是A中的元素平方想加之后开根号！

fprintf(['If your backpropagation implementation is correct, then \n' ...
         'the relative difference will be small (less than 1e-9). \n' ...
         '\nRelative Difference: %g\n'], diff);

end

（5）正则化的神经网络

当你成功的实现了反向传播算法，你需要给模型添加上正则化！你可以在反向传播计算过梯度之后再加上额外的正则项！！！

特别的是，当你已经使用反向传播算法计算出

，你需要添加额外的正则化使用下面的式子：

注意你不应该将

的第一列正则化，这一列作为偏差项。并且我们在神经网络的课程里面规定了

中i的下标是从1开始，j的下标从0开始，

。

但是你要知道，在MATLAB里面所有的下标都是从1开始的，所以可以得到

%step5:不仅需要除以m，还需要进行正则化！！！
Theta1_grad = Delta1 / m;
Theta1_grad(:, 2:end) = Theta1_grad(:, 2:end) + lambda/m*Theta1(:, 2:end);
Theta2_grad = Delta2 / m;
Theta2_grad(:, 2:end) = Theta2_grad(:, 2:end) + lambda/m*Theta2(:, 2:end);

这个小标题完成之后，我们的nnCostFuncion.m函数就已经完成了！下面是整个完整的代码：：

function [J grad] = nnCostFunction(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
%   [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
%   X, y, lambda) computes the cost and gradient of the neural network. The
%   parameters for the neural network are "unrolled" into the vector
%   nn_params and need to be converted back into the weight matrices. 
% 
%   The returned parameter grad should be a "unrolled" vector of the
%   partial derivatives of the neural network.
%

% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

% Setup some useful variables
m = size(X, 1);

% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
%               following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
%         variable J. After implementing Part 1, you can verify that your
%         cost function computation is correct by verifying the cost
%         computed in ex4.m
%
% Part 2: Implement the backpropagation algorithm to compute the gradients
%         Theta1_grad and Theta2_grad. You should return the partial derivatives of
%         the cost function with respect to Theta1 and Theta2 in Theta1_grad and
%         Theta2_grad, respectively. After implementing Part 2, you can check
%         that your implementation is correct by running checkNNGradients
%
%         Note: The vector y passed into the function is a vector of labels
%               containing values from 1..K. You need to map this vector into a 
%               binary vector of 1's and 0's to be used with the neural network
%               cost function.
%
%         Hint: We recommend implementing backpropagation using a for-loop
%               over the training examples if you are implementing it for the 
%               first time.
%
% Part 3: Implement regularization with the cost function and gradients.
%
%         Hint: You can implement this around the code for
%               backpropagation. That is, you can compute the gradients for
%               the regularization separately and then add them to Theta1_grad
%               and Theta2_grad from Part 2.
%
X = [ones(m, 1) X];%需要给每一个样例添加1，作为偏差项

ylabel = zeros(num_labels, m);%每一列都是一个样例

%y的每个元素都作为下标来初始化ylabel，这个保存的是给的样例的结果向量形式
for i=1:m
    ylabel(y(i), i) = 1;
end

%前向反馈算法
z2 = X*Theta1';%X->m*(n+1),Theta1->hiden*(n+1),z2->m*hiden
a2 = sigmoid(z2);
a2 = [ones(m, 1) a2];%a2->m*(hiden+1)
z2 = [ones(m, 1) z2];%z2->m*(hiden+1)，后面会用到
z3=a2*Theta2';%Theta2->num_labels*(hiden+1),z3->m*num_labels
a3 = sigmoid(z3);%a3->m*num_labels

%method1
%因为a3和num_labels的大小，可以使用for循环，否则就要使用点乘累加的方法
%{
for i=1:m
    J = J - log(a3(i, :))*ylabel(:, i) - (log(1 - a3(i, :)) * (1 - ylabel(:, i)));
end
%}
%J = J/m;

%method2:使用的是点乘累加的方法
J=-1/m*sum(sum(log(a3').*ylabel)+sum(log(1-a3').*(1-ylabel)));

%正则化
J = J + lambda/2/m * (sum(sum(Theta1(:, 2:end).^2)) + sum(sum(Theta2(:, 2:end).^2)));

%反向传播算法来计算神经网络的代价函数的梯度
Delta1 = zeros(size(Theta1));%误差矩阵
Delta2 = zeros(size(Theta2));

for t = 1:m
    delta3 = a3(t, :)' - ylabel(:, t);
    %step2:a3->m*num_labels,ylabel->num_labels*m,delta3->num_labels*1
    delta2 = Theta2'*delta3 .* sigmoidGradient(z2(t, :)');
    %step3:Theta2->num_labels*(hiden+1),z2->m*(hiden+1)
    %你需要知道的是因为theta2'*delta3得到的是一个(hiden+1)*1且delta2的大小是(hiden+1)*1，所以后面你需要
    %去除一项
    
    %step4:Delta1与Theta1的大小一样->(hiden)*(n+1),Delta2和Theta2的大小一样
    %->num_labels*(hiden+1)
    %X->m*(n+1)
    Delta1 = Delta1 + delta2(2:end) * X(t, :);
    Delta2 = Delta2 + delta3 * a2(t, :);
end

%step5:不仅需要除以m，还需要进行正则化！！！
Theta1_grad = Delta1 / m;
Theta1_grad(:, 2:end) = Theta1_grad(:, 2:end) + lambda/m*Theta1(:, 2:end);
Theta2_grad = Delta2 / m;
Theta2_grad(:, 2:end) = Theta2_grad(:, 2:end) + lambda/m*Theta2(:, 2:end);
% -------------------------------------------------------------

% =========================================================================

% Unroll gradients

grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

（6）使用函数fmincg来学习参数进行优化

当你完成前面的所有的内容的时候，那么现在ex4.m里面要开始使用fmincg（类似于我们之前介绍的fminunc函数）进行优化，得到最佳的参数theta，当你完成的时候，你会得到一个准确率在95。3%上下1%的范围内波动，当你的迭代步数增加，可能会有更高的准确率。你可以尝试400步的时候，也可以改变

的值。

优化的代码：ex4.m（无需修改）

%% =================== Part 8: Training NN ===================
%  You have now implemented all the code necessary to train a neural 
%  network. To train your neural network, we will now use "fmincg", which
%  is a function which works similarly to "fminunc". Recall that these
%  advanced optimizers are able to train our cost functions efficiently as
%  long as we provide them with the gradient computations.
%
fprintf('\nTraining Neural Network... \n')
%参数设置好
%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset('MaxIter', 50);

%  You should also try different values of lambda
lambda = 1;


% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);

% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)进行优化
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ================= Part 9: Visualize Weights =================
%  You can now "visualize" what the neural network is learning by 
%  displaying the hidden units to see what features they are capturing in 
%  the data.

fprintf('\nVisualizing Neural Network... \n')

displayData(Theta1(:, 2:end));

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

pedict.m函数：（其实就是前向反馈算法，无需我们填写）

function p = predict(Theta1, Theta2, X)%其实就是前向反馈算法计算结果！！
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);%X->m*n
num_labels = size(Theta2, 1);%Theta2->num_labels*(hiden+1),Theta1->hiden*(n+1)

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);
%X记得添加全一列
h1 = sigmoid([ones(m, 1) X] * Theta1');%h1->m*hiden
h2 = sigmoid([ones(m, 1) h1] * Theta2');%h2->m*num_labels,也就是每一行就代表一个输出向量
[dummy, p] = max(h2, [], 2);%返回的是行方向的最大值和其下标，下标保存在p里面

% =========================================================================
end

三、可视化隐藏层

（1）要理解我们的神经网络学习了什么，你可以可视化隐藏层。我们现在我们学习的这个练习来说，我们的

的第i行是一个401维的向量代表了第i个隐藏单元的参数。我们抛弃掉偏差项之后，是一个400维的向量代表了每个输入像素映射到这个隐藏单元的比重！！！所以我们如何可视化隐藏层呢，是将400维的向量在reshape成20*20的图像，并且展示他！还是使用了displayData.m这个函数来展示图像（25个单元），得到的图像存在25个格子，每一个格子代表了每一个隐藏层的隐藏单元。