分类和逻辑回归(Classification and logistic regression)

看了一下斯坦福大学公开课:机器学习教程(吴恩达教授),记录了一些笔记,写出来以便以后有用到。笔记如有误,还望告知。
本系列其它笔记:
线性回归(Linear Regression)
分类和逻辑回归(Classification and logistic regression)
广义线性模型(Generalized Linear Models)
生成学习算法(Generative Learning algorithms)

分类和逻辑回归(Classification and logistic regression)

1 逻辑回归(Logistic regression)

h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x h_{\theta}(x) = g(\theta^Tx) = \frac{1}{1+e^{-\theta^Tx}} hθ(x)=g(θTx)=1+eθTx1, g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1(logistic function / sigmoid function)

p ( y = 1 ∣ x ; θ ) = h θ ( x ) p(y=1|x;\theta) = h_\theta(x) p(y=1x;θ)=hθ(x)

p ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) p(y=0|x;\theta) = 1 - h_\theta(x) p(y=0x;θ)=1hθ(x)

p ( y ∣ x ; θ ) = ( h θ ( x ) ) y ( 1 − h θ ( x ) ) 1 − y p(y|x;\theta) = (h_\theta(x))^y(1 - h_\theta(x))^{1-y} p(yx;θ)=(hθ(x))y(1hθ(x))1y
L ( θ ) = p ( y ⃗ ∣ X ; θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) = ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) ⇓ ℓ ( θ ) = log ⁡ L ( θ ) = log ⁡ ∏ i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) = ∑ i = 1 m log ⁡ ( ( h θ ( x ( i ) ) ) y ( i ) ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) ) = ∑ i = 1 m ( log ⁡ ( ( h θ ( x ( i ) ) ) y ( i ) + log ⁡ ( 1 − h θ ( x ( i ) ) ) 1 − y ( i ) ) = ∑ i = 1 m ( y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ) L(\theta) = p(\vec y | X;\theta) \\ = \prod_{i=1}^{m} p(y^{(i)} | x^{(i)}; \theta) \\ = \prod_{i=1}^{m}(h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}} \\ \Downarrow \\ \ell(\theta) = \log L(\theta) \\ = \log \prod_{i=1}^{m}(h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}} \\ = \sum_{i=1}^{m} \log ((h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}}) \\ = \sum_{i=1}^{m}(\log ((h_\theta(x^{(i)}))^{y^{(i)}} + \log (1 - h_\theta(x^{(i)}))^{1-y^{(i)}}) \\ = \sum_{i=1}^{m}(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)}))) L(θ)=p(y X;θ)=i=1mp(y(i)x(i);θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)(θ)=logL(θ)=logi=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)=i=1mlog((hθ(x(i)))y(i)(1hθ(x(i)))1y(i))=i=1m(log((hθ(x(i)))y(i)+log(1hθ(x(i)))1y(i))=i=1m(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))))
最大化 L ( θ ) L(\theta) L(θ) θ   : = θ + α   ∇ θ ℓ ( θ )   ( 此 处 + , 与 前 面 学 习 梯 度 下 降 算 法 的 − 不 同 , 因 为 h θ ( x ) 不 同 ) \theta \ := \theta + \alpha \ \nabla_{\theta}\ell(\theta) \ (此处+,与前面学习梯度下降算法的-不同,因为h_\theta(x)不同) θ :=θ+α θ(θ) (+hθ(x))
∂ ∂ θ j ℓ ( θ ) = ∂ ∂ θ j ∑ i = 1 m ( y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ) = ∑ i = 1 m ∂ ∂ θ j ( y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ) = ∑ i = 1 m ( y ( i ) h θ ( x ( i ) ) ∂ ∂ θ j h θ ( x ( i ) ) + 1 − y ( i ) 1 − h θ ( x ( i ) ) ∂ ∂ θ j ( 1 − h θ ( x ( i ) ) ) ) = ∑ i = 1 m ( y ( i ) h θ ( x ( i ) ) ∂ ∂ θ j h θ ( x ( i ) ) − 1 − y ( i ) 1 − h θ ( x ( i ) ) ∂ ∂ θ j ( h θ ( x ( i ) ) ) ) = ∑ i = 1 m y ( i ) − h θ ( x ( i ) ) h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) ∂ ∂ θ j h θ ( x ( i ) ) { n o t e 1 : ∂ ∂ θ j h θ ( x ( i ) ) = h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) ) ∂ ∂ θ j θ T x ( i ) = h θ ( x ( i ) ) ( 1 − h θ ( x ( i ) ) x j ( i ) } = ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \left.\frac{\partial}{\partial\theta_j}\right.\ell(\theta) = \left.\frac{\partial}{\partial\theta_j}\right.\sum_{i=1}^{m}(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} \left.\frac{\partial}{\partial\theta_j}\right.(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} (\frac{y^{(i)}}{h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) + \frac{1 - y^{(i)}}{1 - h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.(1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} (\frac{y^{(i)}}{h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) - \frac{1 - y^{(i)}}{1 - h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.(h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} \frac{y^{(i)} - h_\theta(x^{(i)})}{h_\theta(x^{(i)})(1 - h_\theta(x^{(i)}))} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) \\ \lbrace note1:\left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) = h_\theta(x^{(i)})(1 - h_\theta(x^{(i)})) \left.\frac{\partial}{\partial\theta_j}\right. \theta^{T}x^{(i)} = h_\theta(x^{(i)})(1 - h_\theta(x^{(i)}) x_{j}^{(i)} \rbrace \\ = \sum_{i=1}^{m} (y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)} θj(θ)=θji=1m(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))))=i=1mθj(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))))=i=1m(hθ(x(i))y(i)θjhθ(x(i))+1hθ(x(i))1y(i)θj(1hθ(x(i))))=i=1m(hθ(x(i))y(i)θjhθ(x(i))1hθ(x(i))1y(i)θj(hθ(x(i))))=i=1mhθ(x(i))(1hθ(x(i)))y(i)hθ(x(i))θjhθ(x(i)){note1:θjhθ(x(i))=hθ(x(i))(1hθ(x(i)))θjθTx(i)=hθ(x(i))(1hθ(x(i))xj(i)}=i=1m(y(i)hθ(x(i)))xj(i)
θ j   : = θ j + α   ∑ i = 1 m ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_{j} \ := \theta_{j} + \alpha \ \sum_{i=1}^{m} (y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)} θj :=θj+α i=1m(y(i)hθ(x(i)))xj(i)

2 感知器学习算法(Digression: The perceptron learning algorithm)

定义g(z)函数:
g ( z ) = { 1 i f   z ≥ 0 0 i f   z ≤ 0 g(z) = \left\{\begin{array}{cc} 1 \quad if \ z\geq 0 \\ 0 \quad if \ z\leq 0 \end{array}\right. g(z)={1if z00if z0
如果我们让 h θ x = g ( θ T x ) h_{\theta}{x} = g({\theta^{T}x)} hθx=g(θTx),那么可得到 θ j   : = θ j + α ( y ( i ) − h θ ( x ( i ) ) ) x j ( i ) \theta_{j} \ := \theta_{j} + \alpha(y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)} θj :=θj+α(y(i)hθ(x(i)))xj(i)(感知器学习算法)。

3 牛顿法最大化 ℓ ( θ ) \ell(\theta) (θ)(Another algorithm for maximizing ℓ ( θ ) \ell(\theta) (θ)

函数 f ( θ ) f(\theta) f(θ)找一个 θ \theta θ使得 f ( θ ) = 0 f(\theta) = 0 f(θ)=0,牛顿法执行以下操作:
θ : = θ − f ( θ ) f ′ ( θ ) . \theta := \theta - \frac{f(\theta)}{f'(\theta)}. θ:=θf(θ)f(θ).
那么我们如何找打一个 θ \theta θ使得函数 ℓ ( θ ) \ell(\theta) (θ)值最大?我们需要是 ℓ ′ ( θ ) = 0 \ell'(\theta) = 0 (θ)=0(不论 ℓ ( θ ) \ell(\theta) (θ)最大值或者最小值,其 ℓ ′ ( θ ) = 0 \ell'(\theta)=0 (θ)=0,极值在导函数拐点处),使用牛顿法可得出以下结论:
θ : = θ − ℓ ′ ( θ ) ℓ ′ ′ ( θ ) . \theta := \theta - \frac{\ell'(\theta)}{\ell''(\theta)}. θ:=θ(θ)(θ).
在逻辑回归设置中, θ \theta θ是一个向量。因此牛顿法中也需满足此条件。
θ : = θ − H − 1 ∇ θ ℓ ( θ ) . H i j = ∂ 2 ℓ ( θ ) ∂ θ i ∂ θ j . \theta := \theta - H^{-1}\nabla_{\theta}\ell(\theta). \\ H_{ij} = \frac{\partial^{2}\ell(\theta)}{\partial\theta_{i}\partial\theta_{j}}. θ:=θH1θ(θ).Hij=θiθj2(θ).

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值