机器学习-吴恩达-ex4-BP

相关文件

ex4.m - Octave script that will help step you through the exercise
ex4data1.mat - Training set of hand-written digits
ex4weights.mat - Neural network parameters for exercise 4
submit.m - Submission script that sends your solutions to our servers
submitWeb.m - Alternative submission script
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
computeNumericalGradient.m - Numerically compute gradients
checkNNGradients.m - Function to help check your gradients
debugInitializeWeights.m - Function for initializing weights
predict.m - Neural network prediction function
[*]sigmoidGradient.m - Compute the gradient of the sigmoid function
[*] randInitializeWeights.m - Randomly initialize weights
[*] nnCostFunction.m - Neural network cost function
[*] indicates files you will need to complete

本次作业需要实现BP算法,来学习神经网络的参数

1 数据

数据有5000个样例,每个样例为20*20的图片,所以 X = [ − ( x ( 1 ) ) T − − ( x ( 2 ) ) T − … − ( x ( m ) ) T − ] X=\left[ \begin{matrix} -(x^{(1)})^T-\\ -(x^{(2)})^T-\\ \dots\\ -(x^{(m)})^T-\\ \end{matrix}\right] X=(x(1))T(x(2))T(x(m))T为一个5000*400的矩阵, x ( i ) x^{(i)} x(i)为一个400*1的矩阵, y y y是一个5000*1的矩阵,其值在0-9中

2 模型表示

为一个三层神经网络,输入层、隐藏层、输出层
输入层有4900个神经元,隐藏层有25个神经元,输出层有10个神经元 在这里插入图片描述

3 前馈网络、损失函数及正则化

在nnCostFunction.m中返回损失函数,含正则化的损失函数为: J ( θ ) = 1 m ∑ i = 1 m ∑ k = 1 K [ − y k ( i ) l o g ( ( h θ ( x ( i ) ) ) k ) − ( 1 − y k ( i ) l o g ( 1 − ( h θ ( x ( i ) ) ) k ) ] + λ 2 m [ ∑ j = 1 25 ∑ k = 1 400 ( Θ j , k ( 1 ) ) 2 + ∑ j = 1 10 ∑ k = 1 25 ( Θ j , k ( 2 ) ) 2 ) ] J(\theta)={1\over m}\sum_{i=1}^m\sum_{k=1}^K[-y_k^{(i)}log((h_\theta(x^{(i)}))_k)-(1-y_k^{(i)}log(1-(h_\theta(x^{(i)}))_k)]+{\lambda \over 2m}\left[\begin{matrix} \sum_{j=1}^{25} \sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^2+\sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^2)\end{matrix}\right] J(θ)=m1i=1mk=1K[yk(i)log((hθ(x(i)))k)(1yk(i)log(1(hθ(x(i)))k)]+2mλ[j=125k=1400(Θj,k(1))2+j=110k=125(Θj,k(2))2)]

4 反馈网络、激活函数梯度

激活函数的导数(梯度)为: g ′ ( z ) = d d z g ( z ) = g ( z ) ( a − g ( z ) ) , 其 中 s i g m o i d ( z ) = g ( z ) = 1 1 + e − z g'(z)={d\over dz}g(z)=g(z)(a-g(z)),其中sigmoid(z)=g(z)={1\over 1+e^{-z}} g(z)=dzdg(z)=g(z)(ag(z)),sigmoid(z)=g(z)=1+ez1

g = sigmoid(z) .* (1 - sigmoid(z));

随机初始化:

function W = randInitializeWeights(L_in, L_out) %输入层大小与输出层大小
W = zeros(L_out, 1 + L_in);
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;
end

5具体实现

输入 n n _ p a r a m s nn\_params nn_params权重矩阵、输入层的大小400,隐藏层大小25,输出层大小10,样本 X . s h a p e = [ 5000 , 400 ] X.shape=[5000,400] X.shape=[5000,400],标签 y . s h a p e = [ 400 , 1 ] y.shape=[400,1] y.shape=[400,1],超参数 λ = 0 \lambda=0 λ=0


function [J grad] = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda)

%已获得的模型参数
%Theta1.shape=[25,401],Theta2.shape=[10,26]
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));


m = size(X, 1);   %m=5000个样本
         
% You need to return the following variables correctly 
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


% ====================== YOUR CODE HERE ======================

%% 对y进行处理onehot编码 Y(find(y==3))= [0 0 1 0 0 0 0 0 0 0]; 用于 Feedforward cost function 1和2   
Y=[];
E = eye(num_labels);    % 要满足K可以是任意,则不能写eye(10)!!
for i=1:num_labels
    Y0 = find(y==i);    % 找到等于y=i的序列号,替换向量
    Y(Y0,:) = repmat(E(i,:),size(Y0,1),1);
end

%% regularized Feedforward cost function lambda=1
% 计算前向传输 
X = [ones(m, 1) X];  %X.shape=[5000,400],加入一列后变成[5000,401]
z2 = sigmoid(X * Theta1');    % 隐藏层激活函数输出  z2.shape=[5000,25]
a2 = [ones(m, 1) z2];        % 隐藏层加入b,  a2.shape=[5000,26]
a3 = sigmoid(a2 * Theta2');  %输出层 ,  z3.shape=[5000,10]

%正则化的部分
temp1 = [zeros(size(Theta1,1),1) Theta1(:,2:end)];   % 先把theta1的第一列拿掉,不参与正则化 temp1.shape=[25,401]
temp2 = [zeros(size(Theta2,1),1) Theta2(:,2:end)];   %temp2.shape=[10,26]
temp1 = sum(temp1 .^2);     % 计算每个参数的平方,再就求和
temp2 = sum(temp2 .^2);

%计算loss
cost = Y .* log(a3) + (1 - Y ) .* log( (1 - a3));  % cost是m*K(5000*10)的结果矩阵  sum(cost(:))全部求和
J= -1 / m * sum(cost(:)) + lambda/(2*m) * ( sum(temp1(:))+ sum(temp2(:)) );  %后面部分是正则化

%% 计算 Gradient 
delta_1 = zeros(size(Theta1));
delta_2 = zeros(size(Theta2));

for t = 1:m %对每个样本进行循环
  a_1 = X(t,:)';   %a_1.shape=[401,1]
  z_2 = Theta1 * a_1;   %z_2.shape=[25,1]
  a_2 = sigmoid(z_2);   %a_2.shape=[25,1]
  a_2 = [1 ; a_2];      %a_2.shape=[26,1]
  a_3 = sigmoid(Theta2 * a_2);  %a_3.shape=[10,1]
  
  err_3 = zeros(num_labels,1); %
  for k = 1:num_labels
    err_3(k) = a_3(k) - (y(t) == k);
  endfor
  
  err_2 = Theta2' * err_3;
  err_2 = err_2(2:end) .* sigmoidGradient(z_2);
  
  delta_2 = delta_2 + err_3 * a_2';
  delta_1 = delta_1 + err_2 * a_1';
endfor

% step 5
Theta1_temp = [zeros(size(Theta1,1),1) Theta1(:,2:end)];
Theta2_temp = [zeros(size(Theta2,1),1) Theta2(:,2:end)];
Theta1_grad = 1 / m * delta_1 + lambda/m * Theta1_temp;
Theta2_grad = 1 / m * delta_2 + lambda/m * Theta2_temp ;
      
% =========================================================================

% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];

end
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值