机器学习coursera 第三章编程作业

机器学习coursera 第三章编程作业

Multi-class Classification and Neural Networks

lrCostFunction

整个题目给了两个数据集,一个是关于X,y的,一个是关于theta的,其中X的每一行是一个训练数据,也就是一个手写体数字的位图,每个图片是20*20的,因此就有400列,每一列代表图像中一个点处的灰度值。

第一步是将损失函数的向量化计算方式写出来:

记X的维度是(m, n),由注释中的提示可以推测theta的维度是(n, k),其中k是class数目,这里是指手写体数字的种类数即0~9共10种。

则X*theta的维度就是(m, k),需要注意的是,我们处理的是一个分类问题,所以我们需要使用sigmoid函数将其缩小范围至0~1,对于第theta的第i列,它的值代表了其被归类于该class的概率,即越靠近1则其越有可能是第i个class,在这个题里就是数字i(第10个数字是0)。

这就是老师之前课中的思想,将多分类问题化为多个二分类问题。即对于每个数字,我们先设定第i类为单独一类,所有其他类被归于另一类,循环10次之后,我们就得到该数字对于每一类的预测值p(0<=p<=1)。

对于分类问题,我么需要使用逻辑回归(logistic regression)中的cost函数,因为该函数具有一个良好的性质,就是当y为0时,x的值越靠近0,J越接近于0;反过来当y为1时,x的值越靠近1,J越接近于0。否则J将向无穷大方向趋近。

写出J的表达式后,我们需要对其进行规范化(regularize),即加上一个 λ 2 m ∑ j = 1 n θ j 2 \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^2 2mλj=1nθj2,注意到j从1开始,我们无需对 θ 0 \theta_0 θ0计算损失,求导后即计算梯度时,也同样不会有 θ 0 \theta_0 θ0,不失一般性,我们可以令 θ 0 = 0 \theta_0=0 θ0=0,则无需进行分类讨论。所以这里需要一个temp向量temp = [0; theta(2:end)];。之后我们使用这个temp向量替代theta进行后续计算就好了。

注意到temp是一个列向量,我们求temp.^2只需计算temp’ * temp即可。

求grad的时候特别注意各个变量的维度,X(m, n), h(m, 1), y(m, 1)

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations. 
%
% Hint: When computing the gradient of the regularized cost function, 
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta; 
%           temp(1) = 0;   % because we don't add anything for j = 0  
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

h = sigmoid(X * theta);

% unregularized cost for logistic regression
J = (1.0/m) * sum(-y.*log(h) - (1-y).*log(1-h));

% regularized cost
temp = [0; theta(2:end)];

J = J + (lambda/(2.0*m)) * temp' * temp;

% unregularized gradient for logistic regression
grad = (1.0/m) * X' * (h - y);

% regularized gradient

grad = grad + (1.0/m) * lambda * temp;


% =============================================================

grad = grad(:);

end

oneVsAll

注意,对于二分类问题,y必须是0或者1,代表属于哪个类别。这样我们循环遍历每一个类c,对于属于c的训练数据,我们记y为1,不属于的记为0,并且把训练出来的theta放到all_theta的第c行,代表这一行的theta乘上X后可以告诉我们这个数据是否属于第c类。这样,all_theta乘上X后,第c行就代表这个数据是否属于第c类(值代表概率)。

我们还注意到fmincg中的函数具有参数t(他是一个匿名函数),这个t就是我们写的lrCostFunction函数的theta项,至于为什么要是theta,是因为我们使用的是’GradObj’模式,即梯度下降,这个模式要求每次训练要更新theta的值,并且下次计算theta值时需要使用上一次的theta值,故我们需要把这个theta作为函数参数供fmincg调用。

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda. 
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%

initial_theta = zeros(n + 1, 1);

options = optimset('GradObj', 'On', 'MaxIter', 50);
for c = 1: num_labels
    [theta] = ...
        fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)),  ...
                initial_theta, options);
                
    all_theta(c, :) = theta';
end




% =========================================================================


end

predictOneVsAll

由上面的分析可知,X的维度是(m, n), all_theta的维度是(class, n),即all_theta的每一行代表一个类对应的theta,如果用这个theta去乘X,就会得到该数据集属于这个类的概率值。而如果用all_theta去乘X,就会得到该数据集属于每个类的概率值。

X * all_theta’的结果的维度是(m, class),第c列代表该行数据属于类c的概率,我们要找到最大的概率,并取其下标作为预测值。

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the 
%       max element, for more information see 'help max'. If your examples 
%       are in rows, then, you can use max(A, [], 2) to obtain the max 
%       for each row.
%       

%  X(m, n), all_theta(class, n), where n is the pix num
% The result is (m, class), where in the ith row, every jth col is 
% the ith example's probability of being in the jth class
[~, p] = max(sigmoid(X * all_theta'), [], 2);




% =========================================================================


end

predict

最后让我们完成一个简单的神经网络,这里输入x共有400个,每个x代表一个像素点,只有一个隐含层,层中节点是25个,最后输出节点为10个,每个节点是一个向量,代表数据集中每一项属于该类的概率。

之前老师已经讲过,每一层theta的维度为( R j + 1 , R j + 1 R_{j+1}, R_{j}+1 Rj+1,Rj+1)据此我们利用公式
z = θ a a = g ( z ) z=\theta a \newline a=g(z) z=θaa=g(z)
即可列出式子。

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%


A = [ones(size(X, 1), 1) X];

% Theta1(len(j+1), len(j)+1) A(m, len(j)+1)
z = A * Theta1';
A = sigmoid(z);

A = [ones(size(A, 1), 1) A];
z = A * Theta2';
A = sigmoid(z);

[~, p] = max(A, [], 2);

% =========================================================================


end
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CCF小彤

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值