吴恩达机器学习ex4-matlab版学习总结笔记-BP神经网络

作业任务项:BP神经网络

代码如下:

load('E:\研究生\机器学习\吴恩达机器学习python作业代码\code\ex4-NN back propagation\ex4weights.mat')
load('E:\研究生\机器学习\吴恩达机器学习python作业代码\code\ex4-NN back propagation\ex4data1.mat')
%初始化参数
input_layer_size=400;
hidden_layer_size=25;
num_labels=10;
sel=randperm(size(X,1));
sel=sel(1:100);
%5000个实例中随意选100displayData(X(sel,:));

数字展示图1如下:
在这里插入图片描述

图1 数字展示

检验部分如下:

首先进行参数的初始化,由于Theta1和Theta2是ex3中预设好的,因此直接用于检验部分,nn_params是将Theta1和Theta2展成向量形式。当测试结果正确,应将此部分注释以加快运算速度。

nn_params=[Theta1(:);Theta2(:)];
lambda=0;
[J,grad]=nnCostFunction(nn_params,input_layer_size,hidden_layer_size,...
                                num_labels,X,y,lambda);
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.287629)\n'], J);

fprintf('\nProgram paused. Press enter to continue.\n');
pause;
lambda=1;
[J,grad]=nnCostFunction(nn_params,input_layer_size,hidden_layer_size,...
                                num_labels,X,y,lambda) ;
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.383770)\n'], J);

fprintf('Program paused. Press enter to continue.\n');

其中nnCostFunction.m如下:

function [J,grad]=nnCostFunction(nn_params,input_layer_size,hidden_layer_size,...
                                num_labels,X,y,lambda)
Theta1=reshape(nn_params(1:hidden_layer_size*(input_layer_size+1)),hidden_layer_size,...
    input_layer_size+1);
Theta2=reshape(nn_params(hidden_layer_size*(input_layer_size+1)+1:end),num_labels,...
    hidden_layer_size+1);
%先把Theta1和Theta2转回矩阵模式
%初始化参数
J=0;
Theta1_grad=zeros(size(Theta1));
Theta2_grad=zeros(size(Theta2));%grad和Theta的个数一样
m=length(y);
% X=[ones(m,1),X];
%y转为ylabel
ylabel=zeros(m,num_labels);
for i=1:m
    ylabel(i,y(i))=1;
end
%X 5000*401
%ylabel 5000*10
%Theta1 25*401
% z2 = X*Theta1';
% %z2 5000*25
% a2 = sigmoid(X*Theta1');
% a2 = [ones(m, 1) a2];
% %a2 5000*26
% %Theta2 10*26
% a3 = sigmoid(a2*Theta2');
%a3 5000*10
a1 = [ones(m, 1) X];
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(size(a2, 1), 1) a2];
z3 = a2*Theta2';
a3 = sigmoid(z3);
J=-1/m*sum(sum(ylabel.*log(a3)+(1-ylabel).*log(1-a3)))+ ...
    lambda/(2*m) * (sum(sum(Theta1(:, 2:end).^2)) + sum(sum(Theta2(:, 2:end).^2)));
% J = 1 / m * sum(sum(-ylabel .* log(a3) - (1 - ylabel) .* log(1 - a3)));
% Theta1_copy = Theta1(:, 2: end);
% Theta2_copy = Theta2(:, 2: end);
% J = J + lambda * (sum(sum(Theta1_copy.^2)) + sum(sum(Theta2_copy.^2))) / (2*m);
delta3=a3-ylabel;
%delta3 5000*10
delta2 =delta3*Theta2(:,2:end).*sigmoidGradient(z2);
%delta2 5000*25
Delta1=zeros(size(Theta1));
Delta2=zeros(size(Theta2));
Delta1=delta2'*a1;
Delta2=delta3'*a2;
Theta1_grad = Delta1 / m;
Theta1_grad(:, 2:end) = Theta1_grad(:, 2:end) + lambda/m*Theta1(:, 2:end);
Theta2_grad = Delta2 / m;
Theta2_grad(:, 2:end) = Theta2_grad(:, 2:end) + lambda/m*Theta2(:, 2:end);
grad=[Theta1_grad(:);Theta2_grad(:)];
end

第一步就是先让Theta1和Theta2从向量形式通过reshape转为矩阵形式。
第二步是初始化参数,并将y中的值转化为向量形式:

1100
2010
3000
9000
10001

第三步是进行正向神经网络推导:
a ( 1 ) = x a^{(1)}=x a(1)=x
z ( 2 ) = θ ( 1 ) a ( 1 ) z^{(2)}=\theta^{(1)}a^{(1)} z(2)=θ(1)a(1)
a ( 2 ) = g ( z ( 2 ) ) a^{(2)}=g(z^{(2)}) a(2)=g(z(2)) (add a 0 ( 2 ) a^{(2)}_0 a0(2))
z ( 3 ) = θ ( 2 ) a ( 2 ) z^{(3)}=\theta^{(2)}a^{(2)} z(3)=θ(2)a(2)
a ( 3 ) = g ( z ( 3 ) ) a^{(3)}=g(z^{(3)}) a(3)=g(z(3)) (add a 0 ( 3 ) a^{(3)}_0 a0(3))
z ( 4 ) = θ ( 3 ) a ( 3 ) z^{(4)}=\theta^{(3)}a^{(3)} z(4)=θ(3)a(3)
a ( 4 ) = h θ ( x ) = g ( z ( 4 ) ) a^{(4)}=h_\theta(x)=g(z^{(4)}) a(4)=hθ(x)=g(z(4))

而这个神经网络模型只有三层,因此只用算到 a ( 3 ) a^{(3)} a(3)即可,它就是输出层。用输出层计算代价函数,公式为:
J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) l o g ( a ( 3 ) ) + ( 1 − y ( i ) ) l o g ( 1 − a ( 3 ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J(\theta)=-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(a^{(3)})+(1-y^{(i)})log(1-a^{(3)})]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 J(θ)=m1i=1m[y(i)log(a(3))+(1y(i))log(1a(3))]+2mλj=1nθj2
其中 j = 1 , 2 j=1,2 j=1,2
第四步是计算误差项,即反向传播算法:
δ ( 3 ) = a ( 3 ) − y \delta^{(3)}=a^{(3)}-y δ(3)=a(3)y
d e l t a ( 2 ) = ( θ ( 2 ) ) T δ ( 3 ) ∗ g ′ ( z ( 2 ) ) delta^{(2)}=(\theta^{(2)})^T\delta^{(3)}*g'(z^{(2)}) delta(2)=(θ(2))Tδ(3)g(z(2))

Δ ( 2 ) = δ ( 3 ) ∗ a ( 2 ) \Delta^{(2)}=\delta^{(3)}*a^{(2)} Δ(2)=δ(3)a(2)
Δ ( 1 ) = δ ( 2 ) ∗ a ( 1 ) \Delta^{(1)}=\delta^{(2)}*a^{(1)} Δ(1)=δ(2)a(1)
D i j ( l ) = 1 m Δ i j ( l ) + λ Θ i j ( l ) D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)}+\lambda\Theta_{ij}^{(l)} Dij(l)=m1Δij(l)+λΘij(l) i f   j ≠ 0 if\ j\not=0 if j=0
D i j ( l ) = 1 m Δ i j ( l ) D_{ij}^{(l)}=\frac{1}{m}\Delta_{ij}^{(l)} Dij(l)=m1Δij(l) i f   j = 0 if\ j=0 if j=0

其中sigmoidGradient.m如下:

公式为: g ( z ) = s i g m o i d ( z ) ∗ ( 1 − s i g m o i d ( z ) ) g(z)=sigmoid(z)*(1-sigmoid(z)) g(z)=sigmoid(z)(1sigmoid(z))

function g=sigmoidGradient(z)
g=zeros(size(z));
g=sigmoid(z).*(1-sigmoid(z));
end

梯度检验部分:

initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
initial_nn_params=[initial_Theta1(:);initial_Theta2(:)];
%gradient test
lambda = 3;
checkNNGradients(lambda);
 
debug_J  = nnCostFunction(nn_params, input_layer_size, ...
                          hidden_layer_size, num_labels, X, y, lambda);

fprintf(['\n\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' ...
         '\n(for lambda = 3, this value should be about 0.576051)\n\n'], lambda, debug_J);

fprintf('Program paused. Press enter to continue.\n');

其中randInitializeWeights.m如下:

这个子程序会生成一个规定行列数的随机矩阵,由于rand生成数值在0~1之间,因此最终得到的W将会有一部分一半正值一半负值,范围在+epsilon~-epsilon之间。

function W = randInitializeWeights(L_in, L_out)
W=zeros(L_out,L_in+1);
epsilon=0.12;
W=rand(L_out,L_in+1)*(2*epsilon)-epsilon;

其中checkNNGradients.m如下:

function checkNNGradients(lambda)
 
if ~exist('lambda', 'var') || isempty(lambda)
    lambda = 0;
end
 
input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;
 
% We generate some 'random' test data
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);
% Reusing debugInitializeWeights to generate X
X  = debugInitializeWeights(m, input_layer_size - 1);
y  = 1 + mod(1:m, num_labels)';
 
% Unroll parameters
nn_params = [Theta1(:) ; Theta2(:)];
 
% Short hand for cost function
costFunc = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, ...
                               num_labels, X, y, lambda);
 
[cost, grad] = costFunc(nn_params);
numgrad = computeNumericalGradient(costFunc, nn_params);
 
% Visually examine the two gradient computations.  The two columns
% you get should be very similar.
disp([numgrad grad]);
fprintf(['The above two columns you get should be very similar.\n' ...
         '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);
 
% Evaluate the norm of the difference between two solutions. 
% If you have a correct implementation, and assuming you used EPSILON = 0.0001
% in computeNumericalGradient.m, then diff below should be less than 1e-9
diff = norm(numgrad-grad)/norm(numgrad+grad);
 
fprintf(['If your backpropagation implementation is correct, then \n' ...
         'the relative difference will be small (less than 1e-9). \n' ...
         '\nRelative Difference: %g\n'], diff);
 
end

其中y用mod得到一个小于3的矩阵[1 2 0 1 2],加一后变为[2 3 1 2 3]。
将nn_params代入costFunc得到grad,同时从computeNumerialGradient中得到numgrad,对这两个值进行对比,利用diff来衡量,得到的diff应为一个极小的数。

其中debugInitializeWeights.m如下:

用sin()是为了让矩阵的值每次都相同而不会变化,这样能够更好地验证。

function W = debugInitializeWeights(fan_out, fan_in)
% Set W to zeros
W = zeros(fan_out, 1 + fan_in);
% Initialize W using "sin", this ensures that W is always of the same
% values and will be useful for debugging
W = reshape(sin(1:numel(W)), size(W)) / 10;
end

其中computeNumericalGradient.m如下:

J实际上是cosFunc函数,将theta加减perturb的值代入cosFunc得到代价函数loss1和loss2,以及梯度grad1和grad2,斜率记为numgrad与grad进行对比。(实际上grad是代价函数的偏导,可以理解为代价函数的斜率)

function numgrad = computeNumericalGradient(J, theta)             
 
numgrad = zeros(size(theta));
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
    % Set perturbation vector
    perturb(p) = e;
    [loss1,gard1] = J(theta - perturb);
    [loss2,grad2] = J(theta + perturb);
    % Compute Numerical Gradient
    numgrad(p) = (loss2 - loss1) / (2*e);
    %求得近似斜率
    perturb(p) = 0;
end
end

高级优化算法及预测部分:

%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset('MaxIter', 50);
 
%  You should also try different values of lambda
lambda = 1;
 
% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);
 
% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
 
% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));
 
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));
%predict
pred = predict(Theta1, Theta2, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);

通过predict可以获得预测结果,通过mean(double(pred == y)) * 100可以得到最终的预测正确率为95.600000%。

其中predict.m如下:

function p = predict(Theta1, Theta2, X)
 
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
 
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
 
h1 = sigmoid([ones(m, 1) X] * Theta1');
h2 = sigmoid([ones(m, 1) h1] * Theta2');
[dummy, p] = max(h2, [], 2);
 
end

其中max()提取同一行的最大值,dummy记录该行最大值,p记录该最大值的列数。

  • 4
    点赞
  • 19
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Tinner_000

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值