机器学习coursera 第三章编程作业

最新推荐文章于 2023-06-24 00:53:56 发布

CCF小彤

最新推荐文章于 2023-06-24 00:53:56 发布

阅读量323

点赞数

分类专栏：机器学习文章标签：机器学习深度学习神经网络

本文链接：https://blog.csdn.net/qq_21555569/article/details/120689974

版权

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

机器学习coursera 第三章编程作业

Multi-class Classification and Neural Networks

lrCostFunction

整个题目给了两个数据集，一个是关于X，y的，一个是关于theta的，其中X的每一行是一个训练数据，也就是一个手写体数字的位图，每个图片是20*20的，因此就有400列，每一列代表图像中一个点处的灰度值。

第一步是将损失函数的向量化计算方式写出来：

记X的维度是(m, n)，由注释中的提示可以推测theta的维度是(n, k)，其中k是class数目，这里是指手写体数字的种类数即0~9共10种。

则X*theta的维度就是(m, k)，需要注意的是，我们处理的是一个分类问题，所以我们需要使用sigmoid函数将其缩小范围至0~1，对于第theta的第i列，它的值代表了其被归类于该class的概率，即越靠近1则其越有可能是第i个class，在这个题里就是数字i（第10个数字是0）。

这就是老师之前课中的思想，将多分类问题化为多个二分类问题。即对于每个数字，我们先设定第i类为单独一类，所有其他类被归于另一类，循环10次之后，我们就得到该数字对于每一类的预测值p(0<=p<=1)。

对于分类问题，我么需要使用逻辑回归（logistic regression）中的cost函数，因为该函数具有一个良好的性质，就是当y为0时，x的值越靠近0，J越接近于0；反过来当y为1时，x的值越靠近1，J越接近于0。否则J将向无穷大方向趋近。

写出J的表达式后，我们需要对其进行规范化（regularize），即加上一个 $\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^2$ ，注意到j从1开始，我们无需对 $\theta_0$ 计算损失，求导后即计算梯度时，也同样不会有 $\theta_0$ ，不失一般性，我们可以令 $\theta_0=0$ ，则无需进行分类讨论。所以这里需要一个temp向量temp = [0; theta(2:end)];。之后我们使用这个temp向量替代theta进行后续计算就好了。

注意到temp是一个列向量，我们求temp.^2只需计算temp’ * temp即可。

求grad的时候特别注意各个变量的维度，X(m, n), h(m, 1), y(m, 1)

function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations. 
%
% Hint: When computing the gradient of the regularized cost function, 
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta; 
%           temp(1) = 0;   % because we don't add anything for j = 0  
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%

h = sigmoid(X * theta);

% unregularized cost for logistic regression
J = (1.0/m) * sum(-y.*log(h) - (1-y).*log(1-h));

% regularized cost
temp = [0; theta(2:end)];

J = J + (lambda/(2.0*m)) * temp' * temp;

% unregularized gradient for logistic regression
grad = (1.0/m) * X' * (h - y);

% regularized gradient

grad = grad + (1.0/m) * lambda * temp;


% =============================================================

grad = grad(:);

end

oneVsAll

注意，对于二分类问题，y必须是0或者1，代表属于哪个类别。这样我们循环遍历每一个类c，对于属于c的训练数据，我们记y为1，不属于的记为0，并且把训练出来的theta放到all_theta的第c行，代表这一行的theta乘上X后可以告诉我们这个数据是否属于第c类。这样，all_theta乘上X后，第c行就代表这个数据是否属于第c类（值代表概率）。

我们还注意到fmincg中的函数具有参数t（他是一个匿名函数），这个t就是我们写的lrCostFunction函数的theta项，至于为什么要是theta，是因为我们使用的是’GradObj’模式，即梯度下降，这个模式要求每次训练要更新theta的值，并且下次计算theta值时需要使用上一次的theta值，故我们需要把这个theta作为函数参数供fmincg调用。

function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i

% Some useful variables
m = size(X, 1);
n = size(X, 2);

% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda. 
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%

initial_theta = zeros(n + 1, 1);

options = optimset('GradObj', 'On', 'MaxIter', 50);
for c = 1: num_labels
    [theta] = ...
        fmincg(@(t)(lrCostFunction(t, X, (y == c), lambda)),  ...
                initial_theta, options);
                
    all_theta(c, :) = theta';
end




% =========================================================================


end

predictOneVsAll

由上面的分析可知，X的维度是(m, n)， all_theta的维度是(class, n)，即all_theta的每一行代表一个类对应的theta，如果用这个theta去乘X，就会得到该数据集属于这个类的概率值。而如果用all_theta去乘X，就会得到该数据集属于每个类的概率值。

X * all_theta’的结果的维度是(m, class)，第c列代表该行数据属于类c的概率，我们要找到最大的概率，并取其下标作为预测值。

function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 

m = size(X, 1);
num_labels = size(all_theta, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the 
%       max element, for more information see 'help max'. If your examples 
%       are in rows, then, you can use max(A, [], 2) to obtain the max 
%       for each row.
%       

%  X(m, n), all_theta(class, n), where n is the pix num
% The result is (m, class), where in the ith row, every jth col is 
% the ith example's probability of being in the jth class
[~, p] = max(sigmoid(X * all_theta'), [], 2);




% =========================================================================


end

predict

最后让我们完成一个简单的神经网络，这里输入x共有400个，每个x代表一个像素点，只有一个隐含层，层中节点是25个，最后输出节点为10个，每个节点是一个向量，代表数据集中每一项属于该类的概率。

之前老师已经讲过，每一层theta的维度为( $R_{j+1}, R_{j}+1$ )据此我们利用公式
$z=\theta a \newline a=g(z)$
即可列出式子。

function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)

% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);

% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%


A = [ones(size(X, 1), 1) X];

% Theta1(len(j+1), len(j)+1) A(m, len(j)+1)
z = A * Theta1';
A = sigmoid(z);

A = [ones(size(A, 1), 1) A];
z = A * Theta2';
A = sigmoid(z);

[~, p] = max(A, [], 2);

% =========================================================================


end