【机器学习笔记】吴恩达公开课·第七章 Logistic 回归

假设陈述

  • h(x) : estimated probability that y = 1 on input x
    h θ ( x ) = P ( y = 1   ∣   x ; θ ) = g ( θ T x ) \large h_\theta(x) = P(y=1\,|\,x;\theta)= g(\theta^Tx) hθ(x)=P(y=1x;θ)=g(θTx)
  • sigmod 函数
    g ( z ) = 1 1 + e − z \large g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1
    image

决策界限 Decision Boundary

y = { 1 , h θ ( x ) ≥ 0.5 &ThinSpace; o r &ThinSpace; h θ ( x ) ≥ 0   0 , h θ ( x ) &lt; 0.5 &ThinSpace; o r &ThinSpace; h θ ( x ) &lt; 0 \large y= \begin{cases} 1, &amp; { h_\theta (x) \geq 0.5 \, or \, h_\theta(x) \geq 0 } \\\ 0, &amp; h_\theta (x) &lt; 0.5 \,or \, h_\theta(x) &lt; 0 \end{cases} y=1, 0,hθ(x)0.5orhθ(x)0hθ(x)<0.5orhθ(x)<0
image

  • Non-Linear Decision Boundary
    image

Logistic regression cost function

  • cost
    C o s t ( h θ ( x ( i ) , y ( i ) ) ) = { − log ⁡ ( h θ ( x ) ) if y = 1 − log ⁡ ( 1 − h θ ( x ) ) if y = 0 \large Cost(h_\theta(x^{(i)},y^{(i)})) = \begin{cases} -\log(h_\theta(x))\quad \text{if y = 1} \\ -\log(1-h_\theta(x))\quad \text{if y = 0} \end{cases} Cost(hθ(x(i),y(i)))=log(hθ(x))if y = 1log(1hθ(x))if y = 0
  • simplify version

C o s t ( h θ ( x ( i ) , y ( i ) ) ) = − y log ⁡ ( h θ ( x ) ) − ( 1 − y ) log ⁡ ( 1 − h θ ( x ) ) \large Cost(h_\theta(x^{(i)},y^{(i)})) = -y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x)) Cost(hθ(x(i),y(i)))=ylog(hθ(x))(1y)log(1hθ(x))

  • J ( θ ) J(\theta) J(θ)

J ( θ ) = − 1 m ∑ i = 1 m C o s t ( h θ ( x ( i ) , y ( i ) ) ) = − 1 m [ ∑ i = 1 m y ( i ) log ⁡ h θ ( x ( i ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ] \large J(\theta)= -\frac{1}{m}\sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})) \\ = -\frac{1}{m}[\sum_{i=1}^my^{(i)} \log h_\theta (x^{(i)}) + (1-y^{(i)})\log (1-h_\theta (x^{(i)}))] J(θ)=m1i=1mCost(hθ(x(i),y(i)))=m1[i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))]

  • to fie parameters θ \theta θ

min ⁡ θ J ( θ ) \large \min_\theta J(\theta) θminJ(θ)

高级优化

Optimization algorithms:

  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS

Advantages

  • No need to manually pick α \alpha α
  • Often faster than gradient descent

Disadvantage

  • More complex

Multiclass classification

Train a Logistic regression classifier h θ ( i ) ( x ) h_\theta ^{(i)}(x) hθ(i)(x) for each class i i i to predict the probability that y = i y=i y=i.
On a new input x x x, to maake a prediction, pick the class i i i that maximizes
max ⁡ i h θ ( i ) ( x ) \large \max_i h_\theta^{(i)}(x) imaxhθ(i)(x)
image
h θ ( i ) ( x ) = P ( y = i &ThinSpace; ∣ &ThinSpace; x ; θ ) ( i = 1 , 2 , 3 ) h_\theta ^{(i)}(x) = P(y=i \,|\, x; \theta) \quad (i = 1, 2, 3) hθ(i)(x)=P(y=ix;θ)(i=1,2,3)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值