从零开始-Machine Learning学习笔记(39)-Softmax回归

1. Logistic Regression(逻辑回归)

  在逻辑回归中,我们使用Sigmoid函数求取预测的概率:
h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) h_{\theta}(x^{(i)}) = \frac{1}{1+e^{-{\theta^{T}x^{(i)}}}} hθ(x(i))=1+eθTx(i)1
设定结果为正例或者反例的概率为:
P ( y ( i ) = 1 ∣ x ( i ) ; θ ) = h θ ( x ( i ) ) P ( y ( i ) = 0 ∣ x ( i ) ; θ ) = 1 − h θ ( x ( i ) ) P(y^{(i)}=1|x^{(i)};\theta) = h_{\theta}(x^{(i)}) \\ P(y^{(i)}=0|x^{(i)};\theta) = 1 - h_{\theta}(x^{(i)}) \\ P(y(i)=1x(i);θ)=hθ(x(i))P(y(i)=0x(i);θ)=1hθ(x(i))
LR的损失函数可以写为:
J ( θ ) = − 1 m ∑ i = 1 m y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ [ 1 − h θ ( x ( i ) ) ] = − 1 m ∑ i = 1 m ∑ j = 0 1 l { y ( i ) = j } log ⁡ P ( y ( i ) = j ∣ x ( i ) ; θ ) J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log h_{\theta}(x^{(i)})+(1-y^{(i)})\log [1-h_{\theta}(x^{(i)})] \\ =-\frac{1}{m}\sum_{i=1}^{m} \sum_{j=0}^{1} l\{y^{(i)}=j\}\log P(y^{(i)}=j|x^{(i)};\theta) J(θ)=m1i=1my(i)loghθ(x(i))+(1y(i))log[1hθ(x(i))]=m1i=1mj=01l{y(i)=j}logP(y(i)=jx(i);θ)
其中, l { ⋅ } l\{·\} l{}是一个指示函数,它表示如果括号内的语句为真时,结果为1;如果为假,则结果为0。

  以上就是逻辑回归的主要内容。逻辑回归可以用于二分类问题。面对多分类任务,逻辑回归可针对于每一个类别训练一个分类器,即将数据集分为两个部分,是该类别样本的,标签设置为1,其余样本标签都为0。如果类别很多的时候,这种办法就捉襟见肘了。有没有可以直接实现多分类任务呢?除了建立神经网络,还可使用最简单的Softmax回归。

2. Softmax回归

  在softmax回归中,我们预测的概率被写为:
P ( y ( i ) = j ∣ x ( i ) ; θ ) = h θ ( x ( i ) ) = e θ j T x ( i ) ∑ l = 1 k e θ k T x ( i ) P(y^{(i)}=j|x^{(i)};\theta) = h_{\theta}(x^{(i)}) = \frac{e^{\theta_{j}^{T}x^{(i)}}}{\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}} P(y(i)=jx(i);θ)=hθ(x(i))=l=1keθkTx(i)eθjTx(i)
于是对照着LR中的损失函数的写法,那么softmax的损失函数就可以写为:
J ( θ ) = − 1 m ∑ i = 1 m ∑ j = l k l { y ( i ) = j } log ⁡ P ( P ( y ( i ) = j ∣ x ( i ) ; θ ) ) = − 1 m ∑ i = 1 m ∑ j = l k l { y ( i ) = j } log ⁡ e θ j T x ( i ) ∑ l = 1 k e θ k T x ( i ) = − 1 m ∑ i = 1 m ∑ j = l k l { y ( i ) = j } [ log ⁡ e θ j T x ( i ) − log ⁡ ∑ l = 1 k e θ k T x ( i ) ] = − 1 m ∑ i = 1 m ∑ j = l k l { y ( i ) = j } [ θ j T x ( i ) − log ⁡ ∑ l = 1 k e θ k T x ( i ) ] = − 1 m ∑ i = 1 m ∑ j = l k l { y ( i ) = j } θ j T x ( i ) − l { y ( i ) = j } log ⁡ ∑ l = 1 k e θ k T x ( i ) J(\theta) = -\frac{1}{m}\sum_{i=1}^{m} \sum_{j=l}^{k} l\{y^{(i)}=j\}\log P(P(y^{(i)}=j|x^{(i)};\theta)) \\ = -\frac{1}{m}\sum_{i=1}^{m} \sum_{j=l}^{k} l\{y^{(i)}=j\}\log \frac{e^{\theta_{j}^{T}x^{(i)}}}{\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}} \\ = -\frac{1}{m}\sum_{i=1}^{m} \sum_{j=l}^{k} l\{y^{(i)}=j\}[\log e^{\theta_{j}^{T}x^{(i)}}-\log {\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}}] \\ = -\frac{1}{m}\sum_{i=1}^{m} \sum_{j=l}^{k} l\{y^{(i)}=j\}[\theta_{j}^{T}x^{(i)}-\log {\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}}] \\ = -\frac{1}{m}\sum_{i=1}^{m} \sum_{j=l}^{k} l\{y^{(i)}=j\}\theta_{j}^{T}x^{(i)}-l\{y^{(i)}=j\}\log {\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}} J(θ)=m1i=1mj=lkl{y(i)=j}logP(P(y(i)=jx(i);θ))=m1i=1mj=lkl{y(i)=j}logl=1keθkTx(i)eθjTx(i)=m1i=1mj=lkl{y(i)=j}[logeθjTx(i)logl=1keθkTx(i)]=m1i=1mj=lkl{y(i)=j}[θjTx(i)logl=1keθkTx(i)]=m1i=1mj=lkl{y(i)=j}θjTx(i)l{y(i)=j}logl=1keθkTx(i)
J ( θ ) J(\theta) J(θ) θ j \theta_j θj求偏导数为:
∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m l { y ( i ) = j } x ( i ) − x ( i ) e θ j T x ( i ) ∑ l = 1 k e θ k T x ( i ) ∑ j = l k l { y ( i ) = j } = − 1 m ∑ i = 1 m l { y ( i ) = j } x ( i ) − x ( i ) P ( y ( i ) = j ∣ x ( i ) ; θ ) = − 1 m ∑ i = 1 m x ( i ) [ l { y ( i ) = j } − P ( y ( i ) = j ∣ x ( i ) ; θ ) ] \frac{\partial J(\theta)}{\partial \theta_j} = -\frac{1}{m}\sum_{i=1}^{m} l\{y^{(i)}=j\}x^{(i)}-\frac{x^{(i)}e^{\theta_{j}^{T}x^{(i)}}} {\sum_{l=1}^{k}e^{\theta_{k}^{T}x^{(i)}}} \sum_{j=l}^{k} l\{y^{(i)}=j\} \\ = -\frac{1}{m}\sum_{i=1}^{m} l\{y^{(i)}=j\}x^{(i)}-x^{(i)}P(y^{(i)}=j|x^{(i)};\theta) \\ = -\frac{1}{m}\sum_{i=1}^{m} x^{(i)}[l\{y^{(i)}=j\}-P(y^{(i)}=j|x^{(i)};\theta)] θjJ(θ)=m1i=1ml{y(i)=j}x(i)l=1keθkTx(i)x(i)eθjTx(i)j=lkl{y(i)=j}=m1i=1ml{y(i)=j}x(i)x(i)P(y(i)=jx(i);θ)=m1i=1mx(i)[l{y(i)=j}P(y(i)=jx(i);θ)]
进行梯度下降即可学得参数 θ \theta θ
θ j : = θ j − η ∂ J ( θ ) ∂ θ j \theta_j:= \theta_j - \eta \frac{\partial J(\theta)}{\partial \theta_j} θj:=θjηθjJ(θ)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值