吴恩达机器学习第二次作业——逻辑回归

最新推荐文章于 2024-08-09 12:39:33 发布

『 venus』

最新推荐文章于 2024-08-09 12:39:33 发布

阅读量3.3k

点赞数 2

分类专栏：机器学习文章标签：机器学习吴恩达 matlab

本文链接：https://blog.csdn.net/qimingxia/article/details/97419732

版权

机器学习专栏收录该内容

10 篇文章 4 订阅

订阅专栏

逻辑回归

一、逻辑回归
二、正规化逻辑回归

一、逻辑回归

1，数据可视化

在这部分练习中，建立一个逻辑回归模型来预测一个学生是否被大学录取。
假设作为一所大学系的管理者，要根据每一位申请人在两次考试中的成绩来确定他们的入学机会。这里有前几年的历史数据可以将其用作逻辑回归的培训集。对于每个培训示例，都有申请者在两次考试中的分数和招生决定。
定义plotData.m函数，可视化数据集，分布在二维平面

function plotData(X, y)
%PLOTDATA Plots the data points X and y into a new figure 
%   PLOTDATA(x,y) plots the data points with + for the positive examples
%   and o for the negative examples. X is assumed to be a Mx2 matrix.

% Create New Figure
figure; hold on;

% ====================== YOUR CODE HERE ======================
% Instructions: Plot the positive and negative examples on a
%               2D plot, using the option 'k+' for the positive
%               examples and 'ko' for the negative examples.
%
% Find Indices of Positive and Negative Examples
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, ...
'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', ...
'MarkerSize', 7);
% =========================================================================
hold off;

end

数据来自于大学入学数据，通过（Admitted）在图中用加号表示，不通过(Not Admitted)用圆表示。
在这里插入图片描述

2，sigmoid函数，逻辑回归模型

在线性回归中，假设拟合直线函数为
$h_\theta(x)=\theta^{T}x$
在逻辑回归中，我们希望
$0<=h_\theta(x)>=1$
此时，设逻辑回顾假设函数为：
$h_\theta(x)=g(\theta^{T}x)$
其中，
$g(z)=\frac{1}{1+e^{-z}}$
$g (z)$ 就是sigmoid函数，函数曲线如下
在这里插入图片描述
对于假设函数的解释： 假设函数的输出结果就是对于输入 $x$ 时，得到 $y = 1$ 的概率估计。
$h_\theta(x)=p(y=1|x;\theta), y=0ory=1$
显然有
$p(y=1|x;\theta)+p(y=0|x;\theta)=1$

代码编写自定义函数sigmoid.m实现

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).

g = 1 ./ ( 1 + exp(-z) ) ;

% =============================================================

end

3，代价函数以及梯度

逻辑回归模型的代价函数为：
$\left\{\begin{matrix} -log(h_\theta(x)) &if& y=1 \\-log(1-h_\theta(x)) & if &y=0 \end{matrix}\right.$
将上式合并
$cost(h_\theta(x),y)=-y\log(h_\theta(x)-(1-y)\log(1-h_\theta(x)))$
$J(\theta)=\frac{1}{m}\sum_{i=1}^{m}cost(h_\theta x^{(i)},y^{(i)})$
$J(\theta)=-\frac{1}{m}[\sum_{i=1}^{m}y^{(i)}\log(h_\theta(x^{(i)})-(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]$
在线性回归中使用梯度下降算法，这里也可以使用
循环下述赋值过程，直到 $\theta$ 收敛。
repeat{
$\theta_j:=\theta_j-\alpha\frac{\partial }{\partial /\theta_j}J(\theta)$
}
这里的 $\frac{\partial }{\partial /\theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m} (h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
但是，在此次实验中，使用高级优化函数fminunc，得到最后收敛的 $\theta$
代码实现：

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
J= -1 * sum( y .* log( sigmoid(X*theta) ) + (1 - y ) .* log( (1 - sigmoid(X*theta)) ) ) / m ; 
grad = ( X' * (sigmoid(X*theta) - y ) )/ m ;  
% =============================================================

end

根据由高级优化函数fminunc得到的 $\theta$ ，画出下图逻辑回归的分割边界

[theta, J, exit_flag] = fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

简单说一下：

optimset(‘GradObj’, ‘on’, ‘MaxIter’, 400); 这句话中，<‘GradObj’, ‘on’>代表在fminunc函数中使用自定义的梯度下降函数， < ‘MaxIter’, 400>代表最大迭代次数为400。
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options); 这句话中，<@(t)(costFunction(t, X, y)>代表传入一个函数，@ 是一个句柄，类似于C中的指针。< initial_theta>是传入的theta矩阵。是一个optimset函数，对fminunc的属性进行一些设置。

4，评价逻辑回归

在这次实验中，数据集是表征大学入学通过状况， $h_\theta(x)>=0.5$ 时，预测结果为通过。
通过自定义函数predict.m代码实现：

 function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters. 
%               You should set p to a vector of 0's and 1's
%

k = find(sigmoid( X * theta) >= 0.5 );
p(k)= 1;                              % k是输入数据中预测结果为1的数据的下标，令p向量中的这些分量为1。

% =========================================================================


end

由测试集得到该逻辑回归返回结果的准确率
在这里插入图片描述
将预测结果与数据集真实结果比对，得到该逻辑回归判断结果的准确率为百分之八十九点零。可以看出，通过逻辑回归算法，能够较为有效得将两类特征区分开。

二、正规化逻辑回归

在练习的这一部分中，将实现正则逻辑回归，以预测来自制造厂的微芯片是否通过质量保证(QA)。在QA过程中，每个微芯片进行各种测试以确保其正常运行。
假设作为工厂的产品经理，并且在两个不同的测试中获得了一些微芯片的测试结果。从这两个测试中，可以确定微芯片是否是应接受或拒绝。为了帮助做出决策，有一个关于过去微芯片的测试结果的数据集，可以从中构建逻辑回归模型。

1，数据可视化

和上一部分数据可视化类似，也是使用plotDate.m函数，二维坐标的横坐标，纵坐标分别是两次测试的的得分，合格用加号表征（y = 1, accepted），不合格用圆表征（y = 0, rejected）。
在这里插入图片描述

2，特征映射（Feature Mapping）

为了使数据得到更好的拟合，一个很好得方法是从数据点中创造更多得特征。创建mapFeature.m函数，将特征 $x_1,x_2$ 映射到所有 $x_1,x_2$ 的多项式，最高次幂设置为6次幂。
在这里插入图片描述
经过特征映射后，两个特征被映射为28维向量。在这个高维特征向量上训练的Logistic回归分类器将具有更复杂的决策边界，当在我们的二维图中绘制时会出现非线性。
mapFeature.m代码编译

function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of 
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%

degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);
    end
end

end

虽然特征映射允许我们建立一个更有表现力的分类器，但它也更容易受到过度拟合的影响。在本练习的下一部分中，将实现正规化的逻辑回归。继续对数据进行拟合，并学习正则化是如何帮助解决过度拟合问题的。

3，代价函数及梯度

正则化后的代价函数为：
$J(\theta)=-\frac{1}{m}[\sum_{i=1}^{m}y^{(i)}\log(h_\theta(x^{(i)})-(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2$
$\frac{\partial }{\partial /\theta_0}J(\theta)=\frac{1}{m}\sum_{i=1}^{m} (h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$ $f o r j = 0$
$\frac{\partial }{\partial /\theta_j}J(\theta)=(\frac{1}{m}\sum_{i=1}^{m} (h_\theta(x^{(i)})-y^{(i)})x_j^{(i)})+\frac{\lambda}{m}\theta_j$ $f o r j > = 1$

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
h=sigmoid(X*theta);
theta(1,1)=0;
J=sum(-y'*log(h)-(1-y)'*log(1-h))/m+lambda/2/m*sum(power(theta,2));
grad=((h-y)'*X)/m+lambda/m*theta';
% =============================================================

end

通过计算得到代价函数，以及梯度，运用ex1中的高级优化函数fminunc来得到满足最佳拟合的 $\theta$ 值

附上正则化逻辑回归完整代码

%% Machine Learning Online Class - Exercise 2: Logistic Regression
%
%  Instructions
%  ------------
%
%  This file contains code that helps you get started on the second part
%  of the exercise which covers regularization with logistic regression.
%
%  You will need to complete the following functions in this exericse:
%
%     sigmoid.m
%     costFunction.m
%     predict.m
%     costFunctionReg.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

%% Initialization
clear ; close all; clc

%% Load Data
%  The first two columns contains the X values and the third column
%  contains the label (y).

data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);

plotData(X, y);

% Put some labels
hold on;

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

% Specified in plot order
legend('y = 1', 'y = 0')
hold off;


%% =========== Part 1: Regularized Logistic Regression ============
%  In this part, you are given a dataset with data points that are not
%  linearly separable. However, you would still like to use logistic
%  regression to classify the data points.
%
%  To do so, you introduce more features to use -- in particular, you add
%  polynomial features to our data matrix (similar to polynomial
%  regression).
%

% Add Polynomial Features

% Note that mapFeature also adds a column of ones for us, so the intercept
% term is handled
X = mapFeature(X(:,1), X(:,2));

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1
lambda = 1;

% Compute and display initial cost and gradient for regularized logistic
% regression
[cost, grad] = costFunctionReg(initial_theta, X, y, lambda);

fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros) - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

% Compute and display cost and gradient
% with all-ones theta and lambda = 10
test_theta = ones(size(X,2),1);
[cost, grad] = costFunctionReg(test_theta, X, y, 10);

fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
fprintf('Expected cost (approx): 3.16\n');
fprintf('Gradient at test theta - first five values only:\n');
fprintf(' %f \n', grad(1:5));
fprintf('Expected gradients (approx) - first five values only:\n');
fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============= Part 2: Regularization and Accuracies =============
%  Optional Exercise:
%  In this part, you will get to try different values of lambda and
%  see how regularization affects the decision coundart
%
%  Try the following values of lambda (0, 1, 10, 100).
%
%  How does the decision boundary change when you vary lambda? How does
%  the training set accuracy vary?
%

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set regularization parameter lambda to 1 (you should vary this)
lambda = 1;

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
	fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

% Plot Boundary
plotDecisionBoundary(theta, X, y);
hold on;
title(sprintf('lambda = %g', lambda))

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

legend('y = 1', 'y = 0', 'Decision boundary')
hold off;

% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

在这里插入图片描述
通过特征映射，使两个特征映射到28维向量，得到了非线性的分割边界。由于特征向量过多，为了防止由此产生的过拟合现象，采用正则化算法，利用正则化参数 $\lambda$ 对 $\theta$ 进行惩罚，完成较好的拟合分离边界效果。如上图结果显示，分离边界已经很好地完成了任务。
在这里插入图片描述
运行代码后，结果显示预测结果正确率为0.831