【机器学习】2 逻辑回归

最新推荐文章于 2024-10-06 15:57:40 发布

社恐患者

最新推荐文章于 2024-10-06 15:57:40 发布

阅读量177

点赞数 1

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/qq_44714521/article/details/108281376

版权

机器学习专栏收录该内容

15 篇文章 1 订阅

订阅专栏

第二章逻辑回归

1 Logistic Regression
2 Multi-class classification：One-vs-all
3 Reference

1 Logistic Regression

It is a classification algorithm although it has the name of regression
逻辑回归是一种解决分类问题的算法

1.1 Differences between Logistic Regression and Linear Regression

Logistic Regression 逻辑回归	Linear Regression 线性回归
clssification algorithm 分类算法	regression algorithm 回归算法
$0≤h_{\theta}(x)≤1$	$h_{\theta}(x)$ can be $> 1$ or $< 0$

1.2 Model

Hypothesis： $\begin{aligned} h_\theta(x)&=P(y=1|x;\theta)=g(\theta^Tx)&\text{【estimated probability that y=1, given x, parameterized by $\theta$】}\\ g(z)&=\frac{1}{1+e^{-z}}&\text{【Sigmoid Function，Logistic Function】} \end{aligned}$

suppose predict：
“ $y = 1$ ” if $h_{\theta}(x)≥0.5（\theta^Tx≥0）$
“ $y = 0$ ” if $h_{\theta}(x)<0.5（\theta^Tx<0）$

import numpy as np
def sigmoid(z):
	return 1 / (1 + np.exp(-z))

Parameters： $\theta$
Decision Boundary：is a property not of the training set but of the hypothesis and of the patameters
决策边界：类与类之间的界限，是假设函数和参数的属性，不是训练集的属性
Cost Function：square error function / square error cost function $\begin{aligned} J(\theta)&=\frac{1}{m}\sum_{i=1}^mCost(h_\theta(x^{(i)},y))\\ Cost(h_\theta(x,y))&=\begin{cases} -log(h_{\theta}(x))&,\text{if $y=1$}\\ -log(1-h_{\theta}(x))&,\text{if $y=0$}\end{cases}\\ 变式： Cost(h_\theta(x,y))&=-ylog(h_{\theta}(x))-(1-y)log(1-h_{\theta}(x))&\text{【y=1 or 0】} \end{aligned}$

import numpy as np
def cost(theta, X, y):
	theta = np.matrix(theta)
	X = np.matrix(X)
	y = np.matrix(y)
	first = np.multiply(-y, np.log(sigmoid(X* theta.T)))
	second = np.multiply((1-y), np.log(1 - sigmoid(X* theta.T)))
	return np.sum(first - second) / (len(X))

Goal（Object Function）： $\mathop{\text{minimize}}\limits_{\theta} J(\theta)$

1.3 use Gradient Descent for $J(\theta)$

repeat{
$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m((h_\theta(x^{(i)})-y^{(i)})·x_j^{(i)})$
(simultaneously update all $\theta_j$ )
}

1.4 Advanced Optimization replace for Gradient Descent

Optimization Algorithms：

Conjugate gradient（共轭梯度）
BFGS（局部优化，Broyden Fletcher Goldfarb Shann）
L-BFGS（有限内存局部优化）

Advantages：

No need to manually pick $\alpha$
不需要人工选择 $\alpha$
Often faster than gradient descent
通常情况下，这些算法都要比梯度下降算法快

Disadvantages：more complex
缺点：更复杂

Octave代码
function [jval, gradient] = costFunction(theta)
	jVal = [...code to compute J(theta)...];
	gradient = [... code to compute derivative of J(theta)...];
end
options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTheta = zeros(2,1);

[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

2 Multi-class classification：One-vs-all

One-versus-all Classification / One-versus-rest
Train a logistic regression classifier $h_\theta^{(i)}(x)$ for each class $i$ to predict the probability that $y = i$
On a new input $x$ to make a prediction, pick the class $i$ that maximizes $\mathop{\text{max}}\limits_{i} h_\theta^{(i)}(x)$
$n$ 个类别需要 $n$ 个分类器