深度学习Deep learning小白入门笔记——李宏毅深度学习 逻辑回归笔记

Deep Learning 2023/07/11

Logistic Regression

Step 1: Function Set

  • 需要确定一个概率,因为逻辑回归是一种用于解决二分类问题的机器学习算法。由此可得如下定义:

i f    P w , b ( C 1 ∣ x ) ≥ 0.5 , o u t p u t   C 1                 O t h e r w i s e , o u t p u t   C 2 if \ \ P_{w,b}(C_1|x)\geq 0.5 ,output \ C_1 \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Otherwise,output \ C_2 if  Pw,b(C1x)0.5,output C1               Otherwise,output C2

  • 使用Gaussian函数得到如下内容

P w , b ( C 1 ∣ x ) = σ ( z ) z = w ⋅ x + b = ∑ i w i x i + b P_{w,b}(C_1|x)=\sigma(z)\\ z=w·x+b=\sum_iw_ix_i+b Pw,b(C1x)=σ(z)z=wx+b=iwixi+b

  • 综上得出Function set如下:

f w , b = P w , b ( C 1 ∣ x ) f_{w,b}=P_{w,b}(C_1|x) fw,b=Pw,b(C1x)

⇒ f w , b ( x ) = σ ( ∑ i w i x i + b ) \Rightarrow f_{w,b}(x)=\sigma(\sum_iw_ix_i+b) fw,b(x)=σ(iwixi+b)

Step2: Goodness of a Function

  • 假设一组训练集由上述Function Set产生即满足:

f w , b ( x ) = P w , b ( C 1 ∣ x ) f_{w,b}(x) = P_{w,b}(C_1|x) fw,b(x)=Pw,b(C1x)

  • 通过等式可以获悉通过一组(w,b),就可以确认对于一组数据的P,故可得:

L ( w , b ) = f w , b ( x 1 ) f w , b ( x 2 ) ( 1 − f w , b ( x 3 ) ) ⋅ ⋅ ⋅ f w , b ( x N ) L(w,b)=f_{w,b}(x^1)f_{w,b}(x^2)(1-f_{w,b}(x^3)) ···f_{w,b}(x^N) L(w,b)=fw,b(x1)fw,b(x2)(1fw,b(x3))⋅⋅⋅fw,b(xN)

  • 将计算出来最符合L(w,b)的一组(w,b)叫做(w*,b*),即

w ∗ , b ∗ = a r g max ⁡ w , b L ( w , b ) w^*,b^*=arg\max_{w,b} L(w,b) w,b=argw,bmaxL(w,b)

  • 将上述求解*L(w,b)*转换如下

w ∗ , b ∗ = a r g min ⁡ w , b − l n L ( w , b ) w^*,b^*=arg\min_{w,b}-lnL(w,b) w,b=argw,bminlnL(w,b)

  • 公式*L(w,b)*推导如下:

L ( w , b ) = f w , b ( x 1 ) f w , b ( x 2 ) ( 1 − f w , b ( x 3 ) ) ⋅ ⋅ ⋅ f w , b ( x N ) ⇒ − l n L ( w , b ) = l n f w , b ( x 1 ) l n f w , b ( x 2 ) l n ( 1 − f w , b ( x 3 ) ) ⋅ ⋅ ⋅ l n f w , b ( x N ) y ^ n : 1   f o r   c l a s s   1 ,   0   f o r c l a s s   2 ⇒ = ∑ n − [ y ^ n l n f w , b ( x n ) + ( 1 − y ^ n ) l n ( 1 − f w , b ( x n ) ) ] L(w,b)=f_{w,b}(x^1)f_{w,b}(x^2)(1-f_{w,b}(x^3)) ···f_{w,b}(x^N) \\ \Rightarrow -lnL(w,b)=lnf_{w,b}(x^1)lnf_{w,b}(x^2)ln(1-f_{w,b}(x^3)) ···lnf_{w,b}(x^N) \\ \hat{y}^n:1 \ for \ class \ 1, \ 0 \ for class \ 2 \\ \Rightarrow = \sum_n-[\hat{y}^nlnf_{w,b}(x^n)+(1-\hat{y}^n)ln(1-f_{w,b}(x^n))] L(w,b)=fw,b(x1)fw,b(x2)(1fw,b(x3))⋅⋅⋅fw,b(xN)lnL(w,b)=lnfw,b(x1)lnfw,b(x2)ln(1fw,b(x3))⋅⋅⋅lnfw,b(xN)y^n:1 for class 1, 0 forclass 2⇒=n[y^nlnfw,b(xn)+(1y^n)ln(1fw,b(xn))]

D i s t r i b u t i o n   p : p ( x = 1 ) = y ^ n                                    p ( x = 0 ) = 1 − y ^ n Distribution \ p:p(x = 1)=\hat{y}^n \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ p(x = 0)=1-\hat{y}^n Distribution p:p(x=1)=y^n                                  p(x=0)=1y^n

D i s t r i b u t i o n   q : q ( x = 1 ) = f ( x n )                                     q ( x = 0 ) = 1 − f ( x n ) Distribution \ q: q(x=1)=f(x^n) \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ q(x=0)=1-f(x^n) Distribution q:q(x=1)=f(xn)                                   q(x=0)=1f(xn)

  • 通过上述两个定义式计算cross entropy, 即将两式带入下式中:

H ( p , q ) = − ∑ x p ( x ) l n ( q ( x ) ) H(p,q)=-\sum_xp(x)ln(q(x)) H(p,q)=xp(x)ln(q(x))

  • Cross entropy:

C ( f ( x n ) , y ^ n ) = − [ y ^ n l n f ( x n ) + ( 1 − y ^ n ) l n ( 1 − f ( x n ) ) ] C(f(x^n),\hat{y}^n)=-[\hat{y}^nlnf(x^n)+(1-\hat{y}^n)ln(1-f(x^n))] C(f(xn),y^n)=[y^nlnf(xn)+(1y^n)ln(1f(xn))]

Step 3: Find the best function

− l n L ( w , b ) = ∑ n − [ y ^ n l n f w , b ( x n ) + ( 1 − y ^ n ) l n ( 1 − f w , b ( x n ) ) ] ⇒ ∂ l n L ( w , b ) ∂ w i = ∂ l n f w , b ( x ) ∂ z ∂ z ∂ w i -lnL(w,b)=\sum_n-[\hat{y}^nlnf_{w,b}(x^n)+(1-\hat{y}^n)ln(1-f_{w,b}(x^n))] \\ \Rightarrow \frac{\partial lnL(w,b)}{\partial w_i} = \frac{\partial lnf_{w,b}(x)}{\partial z}\frac{\partial z}{\partial w_i} lnL(w,b)=n[y^nlnfw,b(xn)+(1y^n)ln(1fw,b(xn))]wilnL(w,b)=zlnfw,b(x)wiz

⇒ ∂ z ∂ w i = x i \Rightarrow \frac{\partial z}{\partial w_i}=x_i wiz=xi

⇒ ∂ l n σ ( z ) ∂ z = 1 σ ( z ) ∂ σ ( z ) ∂ z = 1 σ ( z ) σ ( z ) ( 1 − σ ( z ) ) = 1 − σ ( z ) \Rightarrow \frac{\partial ln\sigma(z)}{\partial z}=\frac{1}{\sigma(z)}\frac{\partial \sigma(z)}{\partial z}=\frac{1}{\sigma(z)}\sigma(z)(1-\sigma(z))=1-\sigma(z) zl(z)=σ(z)1zσ(z)=σ(z)1σ(z)(1σ(z))=1σ(z)

⇒ ∂ l n ( 1 − f w , b ( x ) ) ∂ w i = ∂ l n ( 1 − f w , b ( x ) ) ∂ z ∂ z ∂ w i \Rightarrow \frac{\partial ln(1-f_{w,b}(x))}{\partial w_i}=\frac{\partial ln(1-f_{w,b}(x))}{\partial z}\frac{\partial z}{\partial w_i} wiln(1fw,b(x))=zln(1fw,b(x))wiz

⇒ ∂ l n ( 1 − σ ( z ) ) ∂ z = − 1 1 − σ ( z ) ∂ σ ( z ) ∂ z = − 1 1 − σ ( z ) σ ( z ) ( 1 − σ ( z ) ) = − σ ( z ) \Rightarrow \frac{\partial ln(1-\sigma(z))}{\partial z}=-\frac{1}{1-\sigma(z)}\frac{\partial \sigma(z)}{\partial z}=-\frac{1}{1-\sigma(z)}\sigma(z)(1-\sigma(z))=-\sigma(z) zln(1σ(z))=1σ(z)1zσ(z)=1σ(z)1σ(z)(1σ(z))=σ(z)

− l n L ( w , b ) ∂ w i = ∑ n − [ y ^ n ( 1 − f w , b ( x n ) ) x i n − ( 1 − y ^ n ) f w , b ( x n ) x i n ]                       = ∑ n − [ y ^ n − y ^ n f w , b x n − f w , b ( x n ) + y ^ n f w , b ( x n ) ] x i n = ∑ n − ( y ^ n − f w , b ( x n ) ) x i n                       \frac{-lnL(w,b)}{\partial w_i}=\sum_n-[\hat{y}^n(1-f_{w,b}(x^n))x_i^n-(1-\hat{y}^n)f_{w,b}(x^n)x_i^n] \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\sum_n-[\hat{y}^n-\hat{y}^nf_{w,b}{x^n}-f_{w,b}(x^n)+\hat{y}^nf_{w,b}(x^n)]x_i^n \\ =\sum_n-(\hat{y}^n-f_{w,b}(x^n))x_i^n \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ wilnL(w,b)=n[y^n(1fw,b(xn))xin(1y^n)fw,b(xn)xin]                     =n[y^ny^nfw,bxnfw,b(xn)+y^nfw,b(xn)]xin=n(y^nfw,b(xn))xin                     

  • 由此可推导出参数更新公式,如下:

w i ← w i − η ∑ n − ( y ^ n − f w , b ( x n ) ) x i n w_i \leftarrow w_i-\eta\sum_n-(\hat{y}^n-f_{w,b}(x^n))x_i^n wiwiηn(y^nfw,b(xn))xin

  • 由上式可知参数更新取决于三个因素

    • 即 learning rate η

    • 取决于数据集的 x

    • 以及预测值与真实值之间的差值
      y ^ n − f w , b ( x n ) \hat{y}^n-f_{w,b}(x^n) y^nfw,b(xn)

  • 根据上述分析可以得知Logistic regression和Linear regression的梯度下降方程一致,不同的是两者方程中的预测值与真实值的取值范围不同。Logistic regression中两者的取值为0或1,而Linear regression中两者的取值则为全体实数。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值