1. 模型简介
给定数据集为
{(x(1),y(1)),...,(x(m),y(m))},y(i)∈{0,1}
的二分类问题,可以使用logistic回归来解决,其模型输出为:
而通常情况下,我们要解决的问题不止两个类别,可能包含多个类别,比如MNIST手写数字分类问题,包含十个类别。对于这样的多分类问题,可以将logistic回归拓展到softmax回归来解决。
具体地,我们有数据集为
{(x(1),y(1)),...,(x(m),y(m))},y(i)∈{1,2,...k}
。给定测试数据
x
,我们想用假设函数给出对于每个类别的概率值:
假设函数的形式如下:
假设函数的全部参数用
θ
来表示,其为一个
k∗n
大小的矩阵,如下所示:
2. 模型训练
模型包含权重衰减项的目标函数为:
可以证明,在包含权重衰减项后,目标函数变为凸函数,Hessian矩阵是可逆的,使用L-BFGS算法可以收敛到全局最优解。目标函数的梯度为:
以参数 θ 为自变量,求解目标函数值及目标函数梯度的matlab代码为:
function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)
% numClasses - the number of classes
% inputSize - the size N of the input vector
% lambda - weight decay parameter
% data - the N x M input matrix, where each column data corresponds to a single test set
% labels - an M x 1 matrix containing the labels corresponding for the input data
% Unroll the parameters from theta
theta = reshape(theta, numClasses, inputSize);
numCases = size(data, 2);
groundTruth = full(sparse(labels, 1:numCases, 1));
% Compute the cost and gradient for softmax regression.
M = theta*data;
M = bsxfun(@minus, M, max(M, [], 1));
M = exp(M);
p = bsxfun(@rdivide, M, sum(M));
thetagrad = -(1./numCases)*(groundTruth-p)*data' + lambda*theta;
cost = -(1./numCases)*sum(sum(groundTruth.*log(p))) + lambda/2.0*sum(sum(theta.^2));
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end
有了计算目标函数值及梯度的function后,可以使用L-BFGS优化算法,得到最优的参数值,代码如下:
% initialize parameters
theta = 0.005 * randn(numClasses * inputSize, 1);
% Use minFunc to minimize the function
addpath minFunc/
options.Method = 'lbfgs';
minFuncOptions.display = 'on';
[softmaxOptTheta, cost] = minFunc( @(p) softmaxCost(p, ...
numClasses, inputSize, lambda, ...
inputData, labels), ...
theta, options);
% Fold softmaxOptTheta into a nicer format
softmaxModel.optTheta = reshape(softmaxOptTheta, numClasses, inputSize);
对于测试数据,使用得到的优化参数,可以预测其类别,代码为:
theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));
M = theta*data;
M = bsxfun(@minus, M, max(M, [], 1));
M = exp(M);
M = bsxfun(@rdivide, M, sum(M));
[p, pred] = max(M, [], 1);
参考内容:
1.http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial