1.3 Feedforward and cost function
Recall that the cost function for the neural network (without regularization) is
J(θ)=1m∑mi=1∑Kk=1[−y(i)klog((hθ(x(i)))k)−(1−y(i)k)log(1−(hθ(x(i)))k)]
1.4 Regularized cost function
The cost function for neural networks with regularization is given by
J(θ)=1m∑mi=1∑Kk=1[−y(i)klog((hθ(x(i)))k)−(1−y(i)k)log(1−(hθ(x(i)))k)]+λ2m[∑25j=1∑400k=1(θ(1)j,k)2+∑10j=1∑25k=1(θ(2)j,k)2]
2 Backpropagation
In this part of the exercise, you will implement the backpropagation algorithm to compute the gradient for the neural network cost function. You will need to complete the nnCostFunction.m so that it returns an appropriate value for grad.
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the code by working through the
% following parts.
%
% Part 1: Feedforward the neural network and return the cost in the
% variable J. After implementing Part 1, you can verify that your
% cost function computation is correct by verifying the cost
% computed in ex4.m
%
% Part 2: Implement the backpropagation algorithm to compute the gradients
% Theta1_grad and Theta2_grad. You should return the partial derivatives of
% the cost function with respect to Theta1 and Theta2 in Theta1_grad and
% Theta2_grad, respectively. After implementing Part 2, you can check
% that your implementation is correct by running checkNNGradients
%
% Note: The vector y passed into the function is a vector of labels
% containing values from 1..K. You need to map this vector into a
% binary vector of 1's and 0's to be used with the neural network
% cost function.
%
% Hint: We recommend implementing backpropagation using a for-loop
% over the training examples if you are implementing it for the
% first time.
%
% Part 3: Implement regularization with the cost function and gradients.
%
% Hint: You can implement this around the code for
% backpropagation. That is, you can compute the gradients for
% the regularization separately and then add them to Theta1_grad
% and Theta2_grad from Part 2.
%
% Implement of forward propogation
X = [ones(m, 1), X];
Layer_Hidden1 = X * Theta1';
Layer_Hidden2 = sigmoid(Layer_Hidden1);
Layer_Hidden3 = [ones(m, 1), Layer_Hidden2];
Layer_Output = sigmoid(Layer_Hidden3 * Theta2');
for i = 1:m
% Recall that the cost function for the neural network
% we need to recode the labels as vectors containing only values 0 or 1
labels = zeros(num_labels, 1); % create 10*1 matrix as labels,init as 0
result = y(i); % get labels
labels(result) = 1; % let correct site be 1
% calculate cost function
J = J + log(Layer_Output(i, :)) * (-labels) - log(1 - Layer_Output(i, :)) * (1-labels);
% difference of final result and output
diff_output = Layer_Output(i, :)' - labels; % actually (x-y)^2/2 derivative to be x-y
% partial derivative for backpropogation
delta2 = diff_output * Layer_Hidden3(i, :);
% chain rule for hidden layer to compute difference
diff_hidden = Theta2(:, 2:end)' * diff_output .* sigmoidGradient(Layer_Hidden1(i, :)');
delta1 = diff_hidden * X(i, :);
Theta2_grad = Theta2_grad + delta2;
Theta1_grad = Theta1_grad + delta1;
end
J = J / m;
Theta2_grad = Theta2_grad / m;
Theta1_grad = Theta1_grad / m;
% Implement regularization with the cost function and gradients.
Theta1_regular = [zeros(hidden_layer_size,1), Theta1(:, 2:end)];
Theta2_regular = [zeros(num_labels,1), Theta2(:, 2:end)];
J = J + (sum(sum(Theta1_regular.^2)) + sum(sum(Theta2_regular.^2))) * lambda/ 2 / m;
Theta1_grad = Theta1_grad + Theta1_regular * lambda / m;
Theta2_grad = Theta2_grad + Theta2_regular * lambda / m;
% -------------------------------------------------------------
% =========================================================================
2.1 Sigmoid gradient
To help you get started with this part of the exercise, you will rst implement the sigmoid gradient function. The gradient for the sigmoid function can be computed as
g′(z)=ddzg(z)=g(z)(1−g(z))
sigmoid(z)=g(z)=11+e−z
When you are done, try testing a few values by calling sigmoidGradient(z) at the Octave/MATLAB command line.
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the gradient of the sigmoid function evaluated at
% each value of z (z can be a matrix, vector or scalar).
g = sigmoid(z) .* (1 - sigmoid(z));
2.2 Random initialization
Your job is to complete randInitializeWeights.m to initialize the weights for θ ; modify the file and fi ll in the following code:
% ====================== YOUR CODE HERE ======================
% Instructions: Initialize W randomly so that we break the symmetry while
% training the neural network.
%
% Note: The first row of W corresponds to the parameters for the bias units
%
% Randomly initialize the weights to small values
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;