相关文件
ex4.m - Octave script that will help step you through the exercise
ex4data1.mat - Training set of hand-written digits
ex4weights.mat - Neural network parameters for exercise 4
submit.m - Submission script that sends your solutions to our servers
submitWeb.m - Alternative submission script
displayData.m - Function to help visualize the dataset
fmincg.m - Function minimization routine (similar to fminunc)
sigmoid.m - Sigmoid function
computeNumericalGradient.m - Numerically compute gradients
checkNNGradients.m - Function to help check your gradients
debugInitializeWeights.m - Function for initializing weights
predict.m - Neural network prediction function
[*]sigmoidGradient.m - Compute the gradient of the sigmoid function
[*] randInitializeWeights.m - Randomly initialize weights
[*] nnCostFunction.m - Neural network cost function
[*] indicates files you will need to complete
本次作业需要实现BP算法,来学习神经网络的参数
1 数据
数据有5000个样例,每个样例为20*20的图片,所以 X = [ − ( x ( 1 ) ) T − − ( x ( 2 ) ) T − … − ( x ( m ) ) T − ] X=\left[ \begin{matrix} -(x^{(1)})^T-\\ -(x^{(2)})^T-\\ \dots\\ -(x^{(m)})^T-\\ \end{matrix}\right] X=⎣⎢⎢⎡−(x(1))T−−(x(2))T−…−(x(m))T−⎦⎥⎥⎤为一个5000*400的矩阵, x ( i ) x^{(i)} x(i)为一个400*1的矩阵, y y y是一个5000*1的矩阵,其值在0-9中
2 模型表示
为一个三层神经网络,输入层、隐藏层、输出层
输入层有4900个神经元,隐藏层有25个神经元,输出层有10个神经元
3 前馈网络、损失函数及正则化
在nnCostFunction.m中返回损失函数,含正则化的损失函数为: J ( θ ) = 1 m ∑ i = 1 m ∑ k = 1 K [ − y k ( i ) l o g ( ( h θ ( x ( i ) ) ) k ) − ( 1 − y k ( i ) l o g ( 1 − ( h θ ( x ( i ) ) ) k ) ] + λ 2 m [ ∑ j = 1 25 ∑ k = 1 400 ( Θ j , k ( 1 ) ) 2 + ∑ j = 1 10 ∑ k = 1 25 ( Θ j , k ( 2 ) ) 2 ) ] J(\theta)={1\over m}\sum_{i=1}^m\sum_{k=1}^K[-y_k^{(i)}log((h_\theta(x^{(i)}))_k)-(1-y_k^{(i)}log(1-(h_\theta(x^{(i)}))_k)]+{\lambda \over 2m}\left[\begin{matrix} \sum_{j=1}^{25} \sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^2+\sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^2)\end{matrix}\right] J(θ)=m1i=1∑mk=1∑K[−yk(i)log((hθ(x(i)))k)−(1−yk(i)log(1−(hθ(x(i)))k)]+2mλ[∑j=125∑k=1400(Θj,k(1))2+∑j=110∑k=125(Θj,k(2))2)]
4 反馈网络、激活函数梯度
激活函数的导数(梯度)为: g ′ ( z ) = d d z g ( z ) = g ( z ) ( a − g ( z ) ) , 其 中 s i g m o i d ( z ) = g ( z ) = 1 1 + e − z g'(z)={d\over dz}g(z)=g(z)(a-g(z)),其中sigmoid(z)=g(z)={1\over 1+e^{-z}} g′(z)=dzdg(z)=g(z)(a−g(z)),其中sigmoid(z)=g(z)=1+e−z1
g = sigmoid(z) .* (1 - sigmoid(z));
随机初始化:
function W = randInitializeWeights(L_in, L_out) %输入层大小与输出层大小
W = zeros(L_out, 1 + L_in);
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;
end
5具体实现
输入 n n _ p a r a m s nn\_params nn_params权重矩阵、输入层的大小400,隐藏层大小25,输出层大小10,样本 X . s h a p e = [ 5000 , 400 ] X.shape=[5000,400] X.shape=[5000,400],标签 y . s h a p e = [ 400 , 1 ] y.shape=[400,1] y.shape=[400,1],超参数 λ = 0 \lambda=0 λ=0
function [J grad] = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lambda)
%已获得的模型参数
%Theta1.shape=[25,401],Theta2.shape=[10,26]
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), num_labels, (hidden_layer_size + 1));
m = size(X, 1); %m=5000个样本
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% ====================== YOUR CODE HERE ======================
%% 对y进行处理onehot编码 Y(find(y==3))= [0 0 1 0 0 0 0 0 0 0]; 用于 Feedforward cost function 1和2
Y=[];
E = eye(num_labels); % 要满足K可以是任意,则不能写eye(10)!!
for i=1:num_labels
Y0 = find(y==i); % 找到等于y=i的序列号,替换向量
Y(Y0,:) = repmat(E(i,:),size(Y0,1),1);
end
%% regularized Feedforward cost function lambda=1
% 计算前向传输
X = [ones(m, 1) X]; %X.shape=[5000,400],加入一列后变成[5000,401]
z2 = sigmoid(X * Theta1'); % 隐藏层激活函数输出 z2.shape=[5000,25]
a2 = [ones(m, 1) z2]; % 隐藏层加入b, a2.shape=[5000,26]
a3 = sigmoid(a2 * Theta2'); %输出层 , z3.shape=[5000,10]
%正则化的部分
temp1 = [zeros(size(Theta1,1),1) Theta1(:,2:end)]; % 先把theta1的第一列拿掉,不参与正则化 temp1.shape=[25,401]
temp2 = [zeros(size(Theta2,1),1) Theta2(:,2:end)]; %temp2.shape=[10,26]
temp1 = sum(temp1 .^2); % 计算每个参数的平方,再就求和
temp2 = sum(temp2 .^2);
%计算loss
cost = Y .* log(a3) + (1 - Y ) .* log( (1 - a3)); % cost是m*K(5000*10)的结果矩阵 sum(cost(:))全部求和
J= -1 / m * sum(cost(:)) + lambda/(2*m) * ( sum(temp1(:))+ sum(temp2(:)) ); %后面部分是正则化
%% 计算 Gradient
delta_1 = zeros(size(Theta1));
delta_2 = zeros(size(Theta2));
for t = 1:m %对每个样本进行循环
a_1 = X(t,:)'; %a_1.shape=[401,1]
z_2 = Theta1 * a_1; %z_2.shape=[25,1]
a_2 = sigmoid(z_2); %a_2.shape=[25,1]
a_2 = [1 ; a_2]; %a_2.shape=[26,1]
a_3 = sigmoid(Theta2 * a_2); %a_3.shape=[10,1]
err_3 = zeros(num_labels,1); %
for k = 1:num_labels
err_3(k) = a_3(k) - (y(t) == k);
endfor
err_2 = Theta2' * err_3;
err_2 = err_2(2:end) .* sigmoidGradient(z_2);
delta_2 = delta_2 + err_3 * a_2';
delta_1 = delta_1 + err_2 * a_1';
endfor
% step 5
Theta1_temp = [zeros(size(Theta1,1),1) Theta1(:,2:end)];
Theta2_temp = [zeros(size(Theta2,1),1) Theta2(:,2:end)];
Theta1_grad = 1 / m * delta_1 + lambda/m * Theta1_temp;
Theta2_grad = 1 / m * delta_2 + lambda/m * Theta2_temp ;
% =========================================================================
% Unroll gradients
grad = [Theta1_grad(:) ; Theta2_grad(:)];
end