Coursea-吴恩达-machine learning学习笔记（六）【week 3之Logistic Regression】

最新推荐文章于 2019-06-15 15:07:24 发布

痞靥

最新推荐文章于 2019-06-15 15:07:24 发布

阅读量217

点赞数

分类专栏：机器学习文章标签：逻辑回归

本文链接：https://blog.csdn.net/u012347642/article/details/80387022

版权

机器学习专栏收录该内容

17 篇文章 0 订阅

订阅专栏

二元分类问题：

y \in {0,1} {0 : 1 : Negative Class Positive Class

$y\in\text{{0,1}} \begin{cases} 0: & \text{Negative Class} \\ 1: & \text{Positive Class} \end{cases}$
将线性回归应用于二元分类问题：
假设函数：

hθ(x)=θTx h θ ( x ) = θ T x $h_\theta(x)=\theta^Tx\qquad$ 分类器阈值输出

hθ(x) h θ ( x ) $h_\theta(x)$ 为

0.5 0.5 $0.5$
若

hθ(x)≥0.5 h θ ( x ) ≥ 0.5 $h_\theta(x)\ge 0.5$ ，预测

y=1 y = 1 $y=1$ ；
若

hθ(x)<0.5 h θ ( x ) < 0.5 $h_\theta(x)\lt 0.5$ ，预测

y=0 y = 0 $y=0$ 。

对于分类问题来说， $y=0\ or\ 1$ ，但是对于 $h_\theta(x)$ ，可以 $\gt1\ or\ \lt0$ 。

由于我们希望 $0\le h_\theta(x)\le 1$ ，故引入逻辑回归算法：
(注：忽略离散值，可以使用回归算法)

假设函数：

h θ (x) = g (θ T x)

$h_\theta(x)=g(\theta^Tx)$

g g $g$ 函数为

g (z) = \frac{1}{1 + e^{- z}}

$g(z) = \frac{1}{1+e^{-z}}$ 称为

Sigmoid S i g m o i d $Sigmoid$ 函数或逻辑函数
故逻辑回归算法的假设函数为

h θ (x) = 1 1 + e - θ T x

$h_\theta(x)=\frac{1}{1+e^{-\theta^Tx}}$

hθ(x) h θ ( x ) $h_\theta(x)$ 用来估计基于输入特征值

x x $x$ ，

y = 1

$y=1$ 的可能性。
正式写法：

h θ (x) = P (y = 1 | x; θ) = 1 - P (y = 0 | x; θ)

$h_\theta(x)=P(y=1|x;\theta)=1-P(y=0|x;\theta)$
逻辑回归算法的决策边界：

h θ (x) = 0.5 或 z = 0

$h_\theta(x)=0.5或z=0$
当

hθ(x)≥0.5 or z≥0 h θ ( x ) ≥ 0.5 o r z ≥ 0 $h_\theta(x)\ge 0.5\ or\ z\ge 0$ 时，

y=1 y = 1 $y=1$ ；
当

hθ(x)<0.5 or z<0 h θ ( x ) < 0.5 o r z < 0 $h_\theta(x)\lt 0.5\ or\ z\lt 0$ 时，

y=0 y = 0 $y=0$ 。

若存在训练集： $\{(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),\cdots,(x^{(m)},y^{(m)})\}$
其中，

x \in ⎡ ⎣ ⎢ ⎢ ⎢ x 0 x 1 \dots x n ⎤ ⎦ ⎥ ⎥ ⎥ x 0 = 1, y \in {0, 1}

$x\in\left[ \begin{matrix} x_0\\ x_1\\ \cdots\\ x_n \end{matrix} \right]\qquad x_0=1,y\in \{0,1\}$
假设函数为：

h θ (x) = 1 1 + e - θ T x

$h_\theta(x)=\frac{1}{1+e^{-\theta^Tx}}$ 则逻辑回归的代价函数为：

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i))

$J(\theta)={1\over m} \sum_{i=1}^m Cost(h_\theta(x^{(i)}),y^{(i)})$ 其中：

C o s t (h θ (x), y) = {- l o g (h θ (x)) - l o g (1 - h θ (x)) i f y = 1 i f y = 0

$Cost(h_\theta(x),y)=\begin{cases} -log(h_\theta(x)) & if\ y=1\\ -log(1-h_\theta(x)) & if\ y=0 \end{cases}$ 当

y=1 y = 1 $y=1$ 时，若

hθ(x)=1 h θ ( x ) = 1 $h_\theta(x)=1$ ，则

Cost=0 C o s t = 0 $Cost=0$ ，若

hθ(x)=0 h θ ( x ) = 0 $h_\theta(x)=0$ ，则

Cost→∞ C o s t → ∞ $Cost\to\infty$ ；
当

y=0 y = 0 $y=0$ 时，若

hθ(x)=0 h θ ( x ) = 0 $h_\theta(x)=0$ ，则

Cost=0 C o s t = 0 $Cost=0$ ，若

hθ(x)=1 h θ ( x ) = 1 $h_\theta(x)=1$ ，则

Cost→∞ C o s t → ∞ $Cost\to\infty$ 。

将 $Cost(h_\theta(x),y)$ 简化可得：

C o s t (h θ (x), y) = - y l o g (h θ (x)) - (1 - y) l o g (1 - h θ (x))

$Cost(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))$ 则代价函数为：

J (θ) = = 1 m \sum m i = 1 C o s t (h θ (x (i)), y (i)) - 1 m [\sum m i = 1 y (i) l o g (h θ (x (i))) + (1 - y (i)) l o g (1 - h θ (x (i)))]

$\begin{array}\ J(\theta) &=&{1\over m} \sum_{i=1}^m Cost(h_\theta(x^{(i)}),y^{(i)})\\ &=&-{1\over m}[\sum_{i=1}^my^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))] \end{array}$
向量化表示则为：

h = g (X θ)

$h=g(X\theta)$

J (θ) = 1 m (- y T l o g (h) - (1 - y) T l o g (1 - h))

$J(\theta)={1\over m}(-y^Tlog(h)-(1-y)^Tlog(1-h))$ 梯度下降法求

θ θ $\theta$ ：
Repeat {

θ j : = θ j - α \partial \partial θ j J (θ) (θ j 同 时 更 新)

$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial \theta_j}J(\theta)\qquad (\theta_j同时更新)$ }
即：
Repeat {

θ j : = θ j - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j (θ j 同 时 更 新)

$\theta_j:=\theta_j-\alpha{1\over m} \sum\limits_{i=1}^m (h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}\qquad (\theta_j同时更新)$ }
向量化表示则为：

θ : = θ - α m X T (g (X θ) - y)

$\theta:=\theta-{\alpha\over m}X^T(g(X\theta)-y)$
利用梯度下降法最小化

J(θ) J ( θ ) $J(\theta)$ ，须计算的是

J(θ) J ( θ ) $J(\theta)$ 及

∂∂θjJ(θ) ∂ ∂ θ j J ( θ ) $\frac{\partial}{\partial \theta_j}J(\theta)$ 。
除了梯度下降法外，还有其他方法计算

θ θ $\theta$ ：

共轭梯度法；
$BFGS$ (变长度法)
$L$ - $BFGS$ (限制尺度法)

这三种方法的优点：

不需要手动选择学习速率 $\alpha$ ；
收敛得比梯度下降法更快。

缺点：更加复杂。

举例： $\theta=\left[ \begin{matrix} \theta_1\\ \theta_2 \end{matrix} \right]$
$J(\theta)=(\theta_1-5)^2+(\theta_2-5)^2$
$\frac{\partial}{\partial \theta_1}J(\theta)=2*(\theta_1-5)$
$\frac{\partial}{\partial \theta_2}J(\theta)=2*(\theta_2-5)$
实现方法如下：

function[jVal,gradient]=costFunction(theta)

jVal=(theta(1)-5)^2+(theta(2)-5)^2;
gradient=zeros(2,1);
gradient(1)=2*(theta(1)-5);
gradient(2)=2*(theta(2)-5);

options=optimset('Gradobj','on','MaxIter','100');
initialTheta=zeros(2,1);
[OptTheta,functionVal,exitFlag]=fminunc(@costFunction,initialTheta,options);

利用梯度下降法求 $J(\theta)$ 及 $\frac{\partial}{\partial \theta_j}J(\theta)$ 的一般程序模板：
$theta=\left[ \begin{matrix} \theta_0\\ \theta_1\\ \vdots\\ \theta_n \end{matrix} \right]$

$function[jVal,gradient]=costFunction(theta)$

$jVal=[code\ to\ compute J(\theta)]$
$gradient(1)=[code\ to\ compute\ \frac{\partial}{\partial \theta_0}J(\theta)]$
$gradient(2)=[code\ to\ compute\ \frac{\partial}{\partial \theta_1}J(\theta)]$
$\qquad \vdots$
$gradient(n+1)=[code\ to\ compute\ \frac{\partial}{\partial \theta_n}J(\theta)]$

对于多元分类问题，可将其拆解为多个二元分类问题。
即：

h (i) θ = P (y = i | x; θ) (i = 1, 2, \dots, n)

$h_\theta^{(i)}=P(y=i|x;\theta)\qquad (i=1,2,\cdots,n)$

p r e d i c t i o n = m a x h (i) θ (x)

$prediction=max\ h_\theta^{(i)}(x)$

痞靥

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Coursea-吴恩达-machine learning学习笔记（六）【week 3之Logistic Regression】

二元分类问题： y∈{0,1}{0:1:Negative ClassPositive Classy∈{0,1}{0:Negative Class1:Positive Class y\in\text{{0,1}}\begin{cases}0: &amp;amp; \text{Negative Class} \\1: &amp;amp; \text{Positive Class}\end{cases} 将线性回归应用...
复制链接

扫一扫