[深度之眼机器学习训练营第四期]对数几率回归

基本概念

对数几率回归(Logistic Regression,又称逻辑回归)可以用来解决二分类和多分类问题。分类问题中,输出集合不再是连续值,而是离散值,即 Y ∈ { 0 , 1 , 2 , ⋯   } \mathcal{Y}\in \{0,1,2,\cdots\} Y{0,1,2,}。以二分类问题为例,其输出集合一般为 Y ∈ { 0 , 1 } \mathcal{Y}\in \{0,1\} Y{0,1}

为了解决二分类问题,对数几率回归在线性回归的基础上引入Sigmoid函数(Logistic函数),其中 exp ⁡ ( ⋅ ) \exp(\cdot) exp()是自然指数:
g ( z ) = 1 1 + exp ⁡ ( − z ) g(z) = \dfrac{1}{1 +\exp({-z})}\\ g(z)=1+exp(z)1
该函数的值域为 [ 0 , 1 ] [0,1] [0,1],如下图所示:
在这里插入图片描述
因此,对数几率回归中假设集的定义为:
h θ ( x ) = g ( θ T x ) h_\theta (x) = g ( \theta^T x ) hθ(x)=g(θTx)

实际上, h θ ( x ) h_{\theta}(x) hθ(x)给出了在给定参数 θ \theta θ和样本 x x x的条件下,标签 y = 1 y=1 y=1的概率。
h θ ( x ) = P ( y = 1 ∣ x ; θ ) = 1 − P ( y = 0 ∣ x ; θ ) P ( y = 0 ∣ x ; θ ) + P ( y = 1 ∣ x ; θ ) = 1 \begin{aligned}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \\& P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{aligned} hθ(x)=P(y=1x;θ)=1P(y=0x;θ)P(y=0x;θ)+P(y=1x;θ)=1

损失函数

对数几率回归的损失函数如下所示:
J ( θ ) = 1 n ∑ i = 1 N C o s t ( h θ ( x ( i ) ) , y ( i ) ) C o s t ( h θ ( x ( i ) ) , y ( i ) ) = { − log ⁡ ( h θ ( x ( i ) ) )    if  y ( i ) = 1 − log ⁡ ( 1 − h θ ( x ( i ) ) )    if  y ( i ) = 0 J(\theta) = \dfrac{1}{n} \sum_{i=1}^N \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \\ \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) =\left\{ \begin{aligned} &-\log(h_\theta(x^{(i)})) \; & \text{if }y^{(i)} = 1\\ &-\log(1-h_\theta(x^{(i)})) \; & \text{if } y^{(i)} = 0 \end{aligned} \right. J(θ)=n1i=1NCost(hθ(x(i)),y(i))Cost(hθ(x(i)),y(i))={log(hθ(x(i)))log(1hθ(x(i)))if y(i)=1if y(i)=0
该损失函数通过极大似然法导出。对于给定的输入集 X \mathcal{X} X和输出集 Y \mathcal{Y} Y,其似然函数为:
∏ i = 1 n [ h θ ( x ( i ) ) ] y ( i ) [ 1 − h θ ( x ( i ) ) ] 1 − y ( i ) \prod _{i = 1}^n \left[h_\theta(x^{(i)})\right]^{y^{(i)}}\left[1 - h_\theta(x^{(i)})\right]^{1 - y^{(i)}} i=1n[hθ(x(i))]y(i)[1hθ(x(i))]1y(i)

由于连乘不好优化,因此上式两边取对数,转化成连加的形式,得到对数似然函数:
L ( θ ) = 1 n ∑ i = 1 n [ y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] L(\theta)=\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] L(θ)=n1i=1n[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]
最大化上述对数似然函数就可以得到最优的参数 θ \theta θ。而最大化对数似然函数 L ( θ ) L(\theta) L(θ)等价于最小化 − L ( θ ) - L(\theta) L(θ),因此我们可以得到如下损失函数的形式:
J ( θ ) = − 1 n ∑ i = 1 n [ y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] J(\theta) = -\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] J(θ)=n1i=1n[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]

参数学习

得到损失函数后,需要使用梯度下降法求解该函数的最小值。首先,将损失函数进行化简:
J ( θ ) = − 1 n ∑ i = 1 N [ y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] = − 1 n ∑ i = 1 n [ y ( i ) log ⁡ h θ ( x ( i ) ) 1 − h θ ( x ( i ) ) + log ⁡ ( 1 − h θ ( x ( i ) ) ) ] = − 1 n ∑ i = 1 n [ y ( i ) log ⁡ exp ⁡ ( θ ⋅ x ( i ) ) / ( 1 + exp ⁡ ( θ ⋅ x ( i ) ) ) 1 / ( 1 + exp ⁡ ( θ ⋅ x ( i ) ) ) + log ⁡ ( 1 − h θ ( x ( i ) ) ) ] = − 1 n ∑ i = 1 n [ y i ( θ ⋅ x ( i ) ) + log ⁡ ( 1 + exp ⁡ ( θ ⋅ x ( i ) ) ) ] \begin{aligned} J(\theta) &=-\frac{1}{n} \sum _{i=1}^N \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)})\log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)}\log \frac {h_\theta(x^{(i)})} {1 - h_\theta(x^{(i)})} + \log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y^{(i)} \log \frac { {\exp(\theta\cdot x^{(i)})} / (1 + \exp(\theta\cdot x^{(i)}))} {{1} /(1 + \exp(\theta\cdot x^{(i)}))} + \log(1 - h_\theta(x^{(i)})) \right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left[ y_i (\theta\cdot x^{(i)}) + \log(1 + \exp (\theta\cdot x^{(i)})) \right ] \end{aligned} J(θ)=n1i=1N[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]=n1i=1n[y(i)log1hθ(x(i))hθ(x(i))+log(1hθ(x(i)))]=n1i=1n[y(i)log1/(1+exp(θx(i)))exp(θx(i))/(1+exp(θx(i)))+log(1hθ(x(i)))]=n1i=1n[yi(θx(i))+log(1+exp(θx(i)))]

求解损失函数 J ( θ ) J(\theta) J(θ)对参数 θ \theta θ的偏导数:
∂ ∂ θ J ( θ ) = − 1 n ∑ i = 1 n [ y ( i ) ⋅ x ( i ) − 1 1 + exp ⁡ ( θ ⋅ x ( i ) ) ⋅ exp ⁡ ( θ ⋅ x ( i ) ) ⋅ x ( i ) ] = − 1 n ∑ i = 1 n [ y ( i ) ⋅ x ( i ) − exp ⁡ ( θ ⋅ x ( i ) ) 1 + exp ⁡ ( θ ⋅ x ( i ) ) ⋅ x ( i ) ] = − 1 n ∑ i = 1 n ( y ( i ) − exp ⁡ ( θ ⋅ x ( i ) ) 1 + exp ⁡ ( θ ⋅ x ( i ) ) ) x ( i ) = 1 n ∑ i = 1 n ( h θ ( x ( i ) ) − y ( i ) ) x ( i ) \begin{aligned} \frac{\partial}{\partial \theta}J(\theta) &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {1} {1 + \exp(\theta \cdot x^{(i)})} \cdot \exp(\theta \cdot x^{(i)}) \cdot x^{(i)}\right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left [y^{(i)} \cdot x^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \cdot x^{(i)}\right ] \\ &=-\frac{1}{n} \sum _{i=1}^n \left (y^{(i)} - \frac {\exp(\theta \cdot x^{(i)})} {1 + \exp(\theta \cdot x^{(i)})} \right ) x^{(i)}\\ &=\frac{1}{n} \sum _{i=1}^n \left (h_\theta(x^{(i)})-y^{(i)} \right )x^{(i)} \end{aligned} θJ(θ)=n1i=1n[y(i)x(i)1+exp(θx(i))1exp(θx(i))x(i)]=n1i=1n[y(i)x(i)1+exp(θx(i))exp(θx(i))x(i)]=n1i=1n(y(i)1+exp(θx(i))exp(θx(i)))x(i)=n1i=1n(hθ(x(i))y(i))x(i)

使用梯度下降法逐个更新参数:
θ j ≔ θ j − α n ∑ i = 1 n ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j \coloneqq \theta_j - \frac{\alpha}{n} \sum_{i=1}^n \left(h_\theta(x^{(i)}) - y^{(i)}\right) x_j^{(i)} θj:=θjnαi=1n(hθ(x(i))y(i))xj(i)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值