1. result
Gradient checking: 7.4115e-010
Accuracy: 92.640%
2. code
softmaxCost
%hypothesis
M = theta * data;
M = exp(bsxfun(@minus, M, max(M)));
M = bsxfun(@rdivide, M, sum(M));
%cost
cost = -sum(sum(groundTruth .* log(M))) / size(data, 2);
cost = cost + 0.5 * lambda * sum(sum(theta .* theta));
%grad
thetagrad = -(groundTruth - M) * data' / size(data, 2)';
thetagrad = thetagrad + lambda * theta;
softmaxPredict
M = theta * data;
[M, pred] = max(M);