机器学习: 逻辑回归(Logistic Regression) 小项目

该项目的所有代码在我的github上,欢迎有兴趣的同学与我探讨研究~
地址:Machine-Learning/machine-learning-ex2/


1. Introduction

逻辑回归(Logistic Regression), 在Wiki上的定义如下:

In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where it can take only two values, “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analysed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression.[2] In the terminology of economics, logistic regression is an example of a qualitative response/discrete choice model.

以我的理解,逻辑回归也是数据拟合的一种工具,它用于解决分类问题,即结果是离散型的,而这类问题往往用线性回归不能很好的解决。因此,对于分类问题,我们应当选择逻辑回归而不是线性回归。

逻辑回归主要有两类:

  • Logistic Regression with two outcome, it often called logistic regression.
  • Logistic Regression with more than two outcome, it often call multinomial logistic Regression

对于逻辑回归,那么不得不提到Decision Boundary(决策边界)。决策边界是用于将不同类别的数据划分开来,它也有两种类型:

  • Linear decision boundaries 线性决策边界
  • Non-linear decision boundaries 非线性决策边界

通常,我们会通过梯度下降或者其它优化算法得到最终的theta。然后利用theta,划出决策边界。注意:决策边界是假设函数的属性,与训练集并没有直接关系,它是由假设函数的theta决定的。

逻辑回归的假设函数,Sigmoid函数,这个函数的特征如下:

  • 当X为0时,函数值为0.5;
  • 当X<0时,函数值< 0.5, 当X > 0时,函数值>0.5;
  • 当X趋近于负无穷时,函数值趋近于0;
  • 当X趋近于正无穷时,函数值趋近于1;
  • 函数形状类似于一个S,所以称为S型函数。

假设函数的函数值代表的是是输出为1的概率,可以设置一个值如0.5:当函数值>=0.5时,output为1, 当函数值 < 0.5时, output为0.

对于损失函数,如果直接以代入假设函数求得,那么该损失函数将会是“non-convex”(非凸性),这将对梯度下降的速率有很大的影响。所以便有了对数形式的损失函数。因为是分类问题,所以损失函数有两种情况,但通过一些技巧可以将它们合二为一。

在编写代码时,将式子以向量化(Vectorization)的形式也是很方便的。

梯度,就是损失函数J(theta)对theta的偏导

为了解决多结果的回归问题,我们要使用的是one-VS-all的算法。例如我的结果有三类(A、B、C),我可以用3个分类器实现(AB,AC,BC)。然后将X带入每个分类器,哪个值高选哪个

对于拟合,有三类情况:

  • Underfitting 欠拟合: 没有较好拟合数据,预测准确率不高
  • Overfitting 过拟合 :过分拟合数据,导致泛化程度不高,影响预测的准确率
  • Oridinary 普通 : 较好拟合数据

Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.

解决Overfitting,主要有两种方法:

  • Reduce the number of features;
    • Manually select which features to keep.
    • Use a model selection algorithm (studied later in the course).
  • Regularization.
    • Keep all the features, but reduce the magnitude of parameters theta
    • Regularization works well when we have a lot of slightly useful features

下面谈谈Regularization (正规化)

正规化就是通过添加多余的项以减少theta的大小,从而避免出现过拟合的情况。我们知道,当feature过多时,会导致过拟合的情况。为了解决这一问题,我们要减少feature的影响,即我们需要适当减少theta的值,极限情况下是对应theta等于0,那么feature也就不发挥作用了。

那如何衡量正规化的程度呢?有lambda参数。合适的lambda参数可以使过拟合变成正常拟合,但是太大的lambada值也有可能将过拟合变成欠拟合。所以选择一个合适的lambada值也是很重要的。待会project会有所涉及。

需要掌握:
1. 逻辑回归的假设函数、损失函数、梯度下降的迭代函数;
2. 逻辑回归的梯度计算;
3. 正规化后逻辑回归的假设函数、损失函数、梯度下降的迭代函数。

2. Logistic Regression

主函数

%% Initialization
clear ; close all; clc

%% Load Data
%  The first two columns contains the exam scores and the third column
%  contains the label.

data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);

%% ==================== Part 1: Plotting ====================
%  We start the exercise by first plotting the data to understand the 
%  the problem we are working with.

fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...
         'indicating (y = 0) examples.\n']);

plotData(X, y);

% Put some labels 
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')

% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;

fprintf('\nProgram paused. Press enter to continue.\n');
pause;


%% ============ Part 2: Compute Cost and Gradient ============
%  In this part of the exercise, you will implement the cost and gradient
%  for logistic regression. You neeed to complete the code in 
%  costFunction.m

%  Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);

% Add intercept term to x and X_test
X = [ones(m, 1) X];

% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);

% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);

fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros): \n');
fprintf(' %f \n', grad);
fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');

% Compute and display cost and gradient with non-zero theta
test_theta = [-24; 0.2; 0.2
  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值