该项目的所有代码在我的github上,欢迎有兴趣的同学与我探讨研究~
地址:Machine-Learning/machine-learning-ex2/
1. Introduction
逻辑回归(Logistic Regression), 在Wiki上的定义如下:
In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where it can take only two values, “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analysed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression.[2] In the terminology of economics, logistic regression is an example of a qualitative response/discrete choice model.
以我的理解,逻辑回归也是数据拟合的一种工具,它用于解决分类问题,即结果是离散型的,而这类问题往往用线性回归不能很好的解决。因此,对于分类问题,我们应当选择逻辑回归而不是线性回归。
逻辑回归主要有两类:
- Logistic Regression with two outcome, it often called logistic regression.
- Logistic Regression with more than two outcome, it often call multinomial logistic Regression
对于逻辑回归,那么不得不提到Decision Boundary(决策边界)。决策边界是用于将不同类别的数据划分开来,它也有两种类型:
- Linear decision boundaries 线性决策边界
- Non-linear decision boundaries 非线性决策边界
通常,我们会通过梯度下降或者其它优化算法得到最终的theta。然后利用theta,划出决策边界。注意:决策边界是假设函数的属性,与训练集并没有直接关系,它是由假设函数的theta决定的。
逻辑回归的假设函数,Sigmoid函数,这个函数的特征如下:
- 当X为0时,函数值为0.5;
- 当X<0时,函数值< 0.5, 当X > 0时,函数值>0.5;
- 当X趋近于负无穷时,函数值趋近于0;
- 当X趋近于正无穷时,函数值趋近于1;
- 函数形状类似于一个S,所以称为S型函数。
假设函数的函数值代表的是是输出为1的概率,可以设置一个值如0.5:当函数值>=0.5时,output为1, 当函数值 < 0.5时, output为0.
对于损失函数,如果直接以代入假设函数求得,那么该损失函数将会是“non-convex”(非凸性),这将对梯度下降的速率有很大的影响。所以便有了对数形式的损失函数。因为是分类问题,所以损失函数有两种情况,但通过一些技巧可以将它们合二为一。
在编写代码时,将式子以向量化(Vectorization)的形式也是很方便的。
而梯度,就是损失函数J(theta)对theta的偏导。
为了解决多结果的回归问题,我们要使用的是one-VS-all的算法。例如我的结果有三类(A、B、C),我可以用3个分类器实现(AB,AC,BC)。然后将X带入每个分类器,哪个值高选哪个。
对于拟合,有三类情况:
- Underfitting 欠拟合: 没有较好拟合数据,预测准确率不高
- Overfitting 过拟合 :过分拟合数据,导致泛化程度不高,影响预测的准确率
- Oridinary 普通 : 较好拟合数据
Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
解决Overfitting,主要有两种方法:
- Reduce the number of features;
- Manually select which features to keep.
- Use a model selection algorithm (studied later in the course).
- Regularization.
- Keep all the features, but reduce the magnitude of parameters theta
- Regularization works well when we have a lot of slightly useful features
下面谈谈Regularization (正规化):
正规化就是通过添加多余的项以减少theta的大小,从而避免出现过拟合的情况。我们知道,当feature过多时,会导致过拟合的情况。为了解决这一问题,我们要减少feature的影响,即我们需要适当减少theta的值,极限情况下是对应theta等于0,那么feature也就不发挥作用了。
那如何衡量正规化的程度呢?有lambda参数。合适的lambda参数可以使过拟合变成正常拟合,但是太大的lambada值也有可能将过拟合变成欠拟合。所以选择一个合适的lambada值也是很重要的。待会project会有所涉及。
需要掌握:
1. 逻辑回归的假设函数、损失函数、梯度下降的迭代函数;
2. 逻辑回归的梯度计算;
3. 正规化后逻辑回归的假设函数、损失函数、梯度下降的迭代函数。
2. Logistic Regression
主函数
%% Initialization
clear ; close all; clc
%% Load Data
% The first two columns contains the exam scores and the third column
% contains the label.
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
%% ==================== Part 1: Plotting ====================
% We start the exercise by first plotting the data to understand the
% the problem we are working with.
fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...
'indicating (y = 0) examples.\n']);
plotData(X, y);
% Put some labels
hold on;
% Labels and Legend
xlabel('Exam 1 score')
ylabel('Exam 2 score')
% Specified in plot order
legend('Admitted', 'Not admitted')
hold off;
fprintf('\nProgram paused. Press enter to continue.\n');
pause;
%% ============ Part 2: Compute Cost and Gradient ============
% In this part of the exercise, you will implement the cost and gradient
% for logistic regression. You neeed to complete the code in
% costFunction.m
% Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);
% Add intercept term to x and X_test
X = [ones(m, 1) X];
% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);
% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);
fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros): \n');
fprintf(' %f \n', grad);
fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');
% Compute and display cost and gradient with non-zero theta
test_theta = [-24; 0.2; 0.2