randInitializeWeight
epsilon_init = 0.12;
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;
sigmoidGradient
g = sigmoid(z) .* (1 - sigmoid(z));
nnCostFunction
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
a1 = [ones(m,1) X];
z2 = a1 * Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1) a2];
z3 = a2 * Theta2';
preOut = sigmoid(z3);
trueOut = zeros(size(preOut));
for i = 1:m
trueOut(i,y(i)) = 1;
end
tmp = trueOut .* log(preOut) + (1 - trueOut) .* log(1 - preOut);
J = -1.0/m * sum(tmp(:));
t1 = Theta1(:,2:end) .* Theta1(:,2:end);
t2 = Theta2(:,2:end) .* Theta2(:,2:end);
J = J + lambda / 2 / m * (sum(t1(:)) + sum(t2(:)));
epi3 = (preOut - trueOut)';
epi2 = Theta2(:,2:end)' * epi3 .* sigmoidGradient(z2)';
Theta2_grad = (Theta2_grad + epi3 * a2) / m;
Theta1_grad = (Theta1_grad + epi2 * a1) / m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda / m * Theta2(:,2:end);
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda / m * Theta1(:,2:end);
这段代码调了一段时间,虽然题目建议写成loop的,我还是用向量化实现了。
几个重点:
1.ai是包含bias unit的,epi,zi是不包含bias unit的
2.epi是unitnum * m 的,而不是相反