逻辑回归

1. 逻辑回归公式

假设函数 (hypotheses function): y ^ = h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x \hat y=h_{\theta}(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}} y^=hθ(x)=g(θTx)=1+eθTx1,其中: g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1sigmoid函数

交叉熵损失函数 (loss function)(单个样本): L ( y ^ , y ) = − y log ⁡ y ^ − ( 1 − y ) log ⁡ ( 1 − y ^ ) L(\hat{y}, y)=-y \log \hat{y}-(1-y) \log (1-\hat{y}) L(y^,y)=ylogy^(1y)log(1y^)

  • y y y: 表示样本的 label,正类为1,负类为0
  • y ^ \hat{y} y^:表示样本预测为正的概率, 1 − y ^ 1-\hat{y} 1y^:表示样本预测为负的概率,损失函数-参考链接

代价函数 (cost function)
J ( θ ) = 1 m ∑ i = 1 m L ( y ^ ( i ) , y ( i ) ) = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ y ^ ( i ) − ( 1 − y ( i ) ) log ⁡ ( 1 − y ^ ( i ) ) ] = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ 1 1 + e − θ T x ( i ) − ( 1 − y ( i ) ) log ⁡ ( 1 − 1 1 + e − θ T x ( i ) ) ] \begin{aligned} J(\theta) &=\frac{1}{m} \sum_{i=1}^{m} L\left(\hat{y}^{(i)}, y^{(i)}\right) \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \hat{y}^{(i)}-\left(1-y^{(i)}\right) \log \left(1-\hat{y}^{(i)}\right)\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \frac{1}{1+e^{-\theta^Tx^{(i)}}}-\left(1-y^{(i)}\right) \log \left(1-\frac{1}{1+e^{-\theta^Tx^{(i)}}}\right)\right] \end{aligned} J(θ)=m1i=1mL(y^(i),y(i))=m1i=1m[y(i)logy^(i)(1y(i))log(1y^(i))]=m1i=1m[y(i)log1+eθTx(i)1(1y(i))log(11+eθTx(i)1)]

∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 m ∑ i = 1 m [ − y ( i ) log ⁡ 1 1 + e − θ T x ( i ) − ( 1 − y ( i ) ) log ⁡ ( 1 − 1 1 + e − θ T x ( i ) ) ] = ∂ ∂ θ j 1 m ∑ i = 1 m [ y ( i ) log ⁡ ( 1 + e − θ T x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 + e θ T x ( i ) ) ] = 1 m ∑ i = 1 m [ y ( i ) − x j ( i ) e − θ T x ( i ) 1 + e − θ T x ( i ) + ( 1 − y ( i ) ) x j ( i ) e θ T x ( i ) 1 + e θ T x ( i ) ] = 1 m ∑ i = 1 m [ y ( i ) − x j ( i ) 1 + e θ T x ( i ) + ( 1 − y ( i ) ) x j ( i ) e θ T x ( i ) 1 + e θ T x ( i ) ] = 1 m ∑ i = 1 m [ − x j ( i ) y ( i ) + x j ( i ) e θ T x ( i ) − y ( i ) x j ( i ) e θ T x ( i ) 1 + e θ T x ( i ) ] = 1 m ∑ i = 1 m [ − y ( i ) ( 1 + e θ T x ( i ) ) + e θ T x ( i ) 1 + e θ T x ( i ) x j ( i ) ] = 1 m ∑ i = 1 m [ ( − y ( i ) + e θ T x ( i ) 1 + e θ T x ( i ) ) x j ( i ) ] = 1 m ∑ i = 1 m [ ( − y ( i ) + 1 1 + e − θ T x ( i ) ) x j ( i ) ] = 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] \begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta) &=\frac{\partial}{\partial \theta_{j}} \frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \frac{1}{1+e^{-\theta^{T} x^{(i)}}}-\left(1-y^{(i)}\right) \log \left(1-\frac{1}{\left.1+e^{-\theta^{T} x^{(i)}}\right)}\right]\right. \\ &=\frac{\partial}{\partial \theta_{j}} \frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(1+e^{-\theta^{T} x^{(i)}}\right)+\left(1-y^{(i)}\right) \log \left(1+e^{\theta^{T} x^{(i)}}\right)\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \frac{-x_{j}^{(i)} e^{-\theta^{T} x^{(i)}}}{1+e^{-\theta^{T} x^{(i)}}}+\left(1-y^{(i)}\right) \frac{x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \frac{-x_{j}^{(i)}}{1+e^{\theta^{T} x^{(i)}}}+\left(1-y^{(i)}\right) \frac{x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[\frac{-x_{j}^{(i)} y^{(i)}+x_{j}^{(i)} e^{\theta^{T} x^{(i)}}-y^{(i)} x_{j}^{(i)} e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[\frac{-y^{(i)}\left(1+e^{\theta^{T} x^{(i)}}\right)+e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}} x_{j}^{(i)}\right]=\frac{1}{m} \sum_{i=1}^{m}\left[\left(-y^{(i)}+\frac{e^{\theta^{T} x^{(i)}}}{1+e^{\theta^{T} x^{(i)}}}\right) x_{j}^{(i)}\right] \\ &=\frac{1}{m} \sum_{i=1}^{m}\left[\left(-y^{(i)}+\frac{1}{1+e^{-\theta^{T} x^{(i)}}}\right) x_{j}^{(i)}\right] = \frac{1}{m} \sum_{i=1}^{m}\left[\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right] \end{aligned} θjJ(θ)=θjm1i=1m[y(i)log1+eθTx(i)1(1y(i))log(11+eθTx(i))1]=θjm1i=1m[y(i)log(1+eθTx(i))+(1y(i))log(1+eθTx(i))]=m1i=1m[y(i)1+eθTx(i)xj(i)eθTx(i)+(1y(i))1+eθTx(i)xj(i)eθTx(i)]=m1i=1m[y(i)1+eθTx(i)xj(i)+(1y(i))1+eθTx(i)xj(i)eθTx(i)]=m1i=1m[1+eθTx(i)xj(i)y(i)+xj(i)eθTx(i)y(i)xj(i)eθTx(i)]=m1i=1m1+eθTx(i)y(i)(1+eθTx(i))+eθTx(i)xj(i)=m1i=1m[(y(i)+1+eθTx(i)eθTx(i))xj(i)]=m1i=1m[(y(i)+1+eθTx(i)1)xj(i)]=m1i=1m[(hθ(x(i))y(i))xj(i)]

  • 补充, h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) h_{\theta}\left(x^{(i)}\right)=\frac{1}{1+e^{-\theta^{T} x^{(i)}}} hθ(x(i))=1+eθTx(i)1,与线性回归不一样
  • 迭代公式: θ j : = θ j − α 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}[(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}] θj:=θjαm1i=1m[(hθ(x(i))y(i))xj(i)] ,与线性回归一致,乘了一个 − 1 -1 1

2. 梯度下降求解事例

  根据上面逻辑回归梯度更新公式 θ j : = θ j − α 1 m ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}[(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}] θj:=θjαm1i=1m[(hθ(x(i))y(i))xj(i)] ,引入实际范例中计算。表格数据中有两个特征量 x 1 , x 2 x_1, x_2 x1,x2和一个输出值 y y y,根据假设函数公式,引入特征量 x 0 x_0 x0,其值均为1。则特征数量 n = 3 n=3 n=3。表格中有2行数据,则数据量 m = 2 m=2 m=2

引入特征量 x 0 x_0 x0房子面积 x 1 x_1 x1房子朝向 x 2 x_2 x2分类 y y y
120011
112020

    假设函数 h θ ( x ( i ) ) = 1 1 + e − θ T x ( i ) = 1 1 + e − ( θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + θ 2 x 2 ( i ) ) h_{\theta}\left(x^{(i)}\right)=\frac{1}{1+e^{-\theta^{T} x^{(i)}}}=\frac{1}{1+e^{-(\theta_0x_0^{(i)}+\theta_{1} x_{1}^{(i)}+\theta_{2} x_{2}^{(i)})}} hθ(x(i))=1+eθTx(i)1=1+e(θ0x0(i)+θ1x1(i)+θ2x2(i))1,随机给定 θ 0 = 0.01 , θ 1 = 0.03 , θ 2 = 0.06 \theta_{0}=0.01, \quad\theta_{1}=0.03,\quad \theta_{2}=0.06 θ0=0.01,θ1=0.03,θ2=0.06,指定学习率 α = 0.01 \alpha=0.01 α=0.01,进行迭代,更新 θ \theta θ

θ 0 = θ 0 − α × [ ( 1 1 + e − ( θ 0 x 0 ( 1 ) + θ 1 x 1 ( 1 ) + θ 2 x 2 ( 1 ) ) − y ( 1 ) ) × x 0 ( 1 ) + ( 1 1 + e − ( θ 0 x 0 ( 2 ) + θ 1 x 1 ( 2 ) + θ 2 x 2 ( 2 ) ) − y ( 2 ) ) × x 0 ( 2 ) ] \theta_0 = \theta_0-\alpha\times[(\frac{1}{1+e^{-(\theta_0x_0^{(1)}+\theta_{1} x_1^{(1)}+\theta_{2} x_2^{(1)})}}-y^{(1)})\times x_0^{(1)} + (\frac{1}{1+e^{-(\theta_0x_0^{(2)}+\theta_{1} x_1^{(2)}+\theta_{2} x_2^{(2)})}}-y^{(2)})\times x_0^{(2)}] θ0=θ0α×[(1+e(θ0x0(1)+θ1x1(1)+θ2x2(1))1y(1))×x0(1)+(1+e(θ0x0(2)+θ1x1(2)+θ2x2(2))1y(2))×x0(2)]

θ 1 , θ 2 \theta_1,\theta_2 θ1,θ2 更新过程与 θ 0 \theta_0 θ0 一致,具体过程可参考线性回归中参数更新-链接

3. 逻辑回归的正则化

  L2范数正则化 解决过拟合
J ( θ ) = 1 m ∑ i = 1 m [ − y ( i ) log ⁡ 1 1 + e − θ T x ( i ) − ( 1 − y ( i ) ) log ⁡ ( 1 − 1 1 + e − θ T x ( i ) ) ) ] + λ 2   m ∥ θ ∥ 2 2 J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \frac{1}{1+e^{-\theta^{T} x^{(i)}}}-\left(1-y^{(i)}\right) \log \left(1-\frac{1}{ \left.1+e^{-\theta^{T} x^{(i)}}\right)}\right)\right]+\frac{\lambda}{2 \mathrm{~m}}\|\theta\|_{2}^{2} J(θ)=m1i=1m[y(i)log1+eθTx(i)1(1y(i))log(11+eθTx(i))1)]+2 mλθ22

迭代公式: θ j : = θ j − α m ( ∑ i = 1 m [ ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ] + λ θ j ) \theta_{j}:=\theta_{j}-\frac{\alpha }{m} (\sum_{i=1}^{m}[(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)}]+\lambda\theta_{j}) θj:=θjmα(i=1m[(hθ(x(i))y(i))xj(i)]+λθj)

4. 逻辑回归实现多分类

3 种方法实现逻辑回归多分类- 参考链接

One-Vs-All (Rest)
在这里插入图片描述
  上图中一个三分类问题,得到 3 个二元分类器。在 预测阶段,每个分类器可以根据测试样本,得到当前正类的概率。即 P(y=i|x;θ),i=1,2,3。选择计算结果 最高 的分类器,其正类就可以作为预测结果。

One-Vs-One

在这里插入图片描述
Many-Vs-Many

    MVM -参考链接

  • 编码:对N个类别做M次划分,每次划分将一部分类别划为正类,其余负类,形成一个二分类的训练集。这样共有M个训练集,则可训练出M个分类器。
  • 解码:M个分类器预测标记组成一个编码,将此预测编码与各自类别的编码进行比较,返回其中距离最小的类。

4. 代码案例

名称代码链接
逻辑回归实现code
逻辑回归对鸢尾花进行分类code
逻辑回归进行手写数字识别code

参考链接 - 深度之眼 机器学习寒假训练营

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值