get start code: https://github.com/amaas/stanford_dl_ex
注意,该工具包使用lbfgs作为优化算法, 如果在win32平台下使用,查看common\minFunc_2012\minFunc\compiled目录会发现这里缺少lbfgsAddC.mexw32和lbfgsProdC.mexw32,调用时会报错: Undefined function or method 'lbfgsAddC' for input arguments of type 'int32'。
解决办法是mex两个文件的C源文件,位于\common\minFunc_2012\minFunc\mex目录下:
mex lbfgsAddC
提示选择编译器,选择默认的lcc编译器,将编译好的后缀名为mexw32的文件复制到compiled目录下即可,lbfgsProdC同理。
get data: http://ai.stanford.edu/~amaas/data/data.zip
1, Linear Regression
修改linear_regression.m的末尾
%%% YOUR CODE HERE %%%
for i=1:m
f = f + 0.5 * ( (theta' * X(:, i) - y(i))^2 );
end
for j=1:n
for i=1:m
g(j) = g(j) + X(j, i)*(theta'*X(:, i) - y(i));
end
end
向量化代码如下:
y_hat = theta'*X;
f = sum( (y_hat - y).^2 )/2;
g = X*(y_hat' - y');
运行ex1a_linreg.m,得到结果如下:
2, Logistic Regression
修改logistic_regression.m末尾
%%% YOUR CODE HERE %%%
for i=1:m
h = sigmoid(theta'*X(:, i));
f = f + ( y(i) * log(h) + (1 - y(i))*log(1 - h) );
end
for j=1:n
for i=1:m
g(j) = g(j) + X(j, i)*(sigmoid(theta'*X(:, i)) - y(i));
end
end
向量化代码:
y_hat = sigmoid(theta'*X);
f = (-1)*sum( y*log(y_hat') + (1-y)*log(1-y_hat') );
g = X*(y_hat' - y');
运行ex1b_logreg.m, 提示错误:
??? Error using ==> fread
Invalid precision.
Error in ==> loadMNISTImages at 15
images = fread(fp, inf, 'unsigned char');
修改loadMNISTImages.m
images = fread(fp, inf, 'unsigned char'); 改为:images = fread(fp, inf, 'uchar');
继续运行ex1b_logreg.m,提示错误:
??? Error using ==> permute
Out of memory. Type HELP MEMORY for your options.
是由于WIN32虚拟内存不够引起。
处理方法参照:http://blog.csdn.net/abcjennifer/article/details/43193865
结果如下:
......
Step Size below progTol
Optimization took 126.380402 seconds.
Training accuracy: 100.0%
Test accuracy: 100.0%
3, gradient check
a. 在ex1a_linreg.m文件中:
theta = rand(n,1);
%grad_check(@linear_regression, theta, 100, train.X, train.y);
grad_check(@linear_regression_vec, theta, 100, train.X, train.y);
b. 在ex1b_logreg.m文件中:
theta = rand(n,1)*0.001;
%grad_check(@linear_regression, theta, 100, train.X, train.y);
grad_check(@linear_regression_vec, theta, 100, train.X, train.y);
4, softmax
cost function的形式,看得出来是最大似乎估计的log形式
值得注意的是这一段话:
But the Hessian is singular/non-invertible, which causes a straightforward implementation of Newton’s method to run into numerical problems.
可以理解为最优化参数不唯一,当cost达到最小时,参数依然可以向某个方向移动,优化的时候常常会出现问题,可以对参数再加上额外的约束。