Notes on Machine Learning By Andrew Ng (3)

Note on Machine Learning By Andrew Ng (3)

Click here for previous note.

Logistic Regression

—A classification algorithm.

Classificaton

binary classification: y ∈ { 0 , 1 } y \in \{0 ,1\} y{0,1}

0: “Negative Class”

1: “Positive Class”

If we want to use a line to fit or predict something (like the tumor is malignant ), we could set threshold classifier output h θ ( x ) h_\theta(x) hθ(x) at 0.5.

If h θ ( x ) ≥ 0.5 h_\theta(x) \geq 0.5 hθ(x)0.5, predict y = 1 y =1 y=1,

if h θ ( x ) &lt; 0.5 h_\theta(x) &lt; 0.5 hθ(x)<0.5, predict y = 0 y =0 y=0.

You may think linear regression does work in classification, but if the data changes a little bit, it would give us a really bad prediction.

Also, the classification requires y = 0 y = 0 y=0 or 1, but h θ ( x ) h_\theta(x) hθ(x) by linear regression can be &gt; 1 &gt;1 >1 or &lt; 0 &lt;0 <0.

So, by using Logistic Regression, we will generate an h θ ( x ) h_\theta(x) hθ(x) that around [ 0 , 1 ] [0, 1] [0,1].

Hypothesis Representation

Fix h θ ( x ) h_\theta(x) hθ(x) that fit outr need, which is map R \mathbb{R} R to [ 0 , 1 ] [0, 1] [0,1].
h θ ( x ) = g ( θ T x ) g ( z ) = 1 1 + e − z → h θ ( x ) = 1 1 + e − θ T x h_\theta(x) = g(\theta^Tx) \\ g(z) = \frac{1}{1+e^{-z}} \\ \rightarrow h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}} hθ(x)=g(θTx)g(z)=1+ez1hθ(x)=1+eθTx1
We call that Sigmoid function or Logistic function.

[外链图片转存失败(img-4tDGedKz-1563177226529)(C:\Users\chenh\Desktop\Notebook\Machine Learing\2019年7月7日1.png)]

h θ ( x ) = P ( y = 1 ∣ x ; θ ) h_\theta(x) = P(y=1|x;\theta) hθ(x)=P(y=1x;θ)

Probability that y=1, given x, parameterized by θ \theta θ.

P ( y = 1 ∣ x ; θ ) + P ( y = 0 ∣ x ; θ ) = 1 P(y=1|x;\theta)+ P(y=0|x;\theta) = 1 P(y=1x;θ)+P(y=0x;θ)=1

Decision Boundary

The property of the hypothesis.

Separate two different classes by a line odr curve.

If h θ ( x ) ≥ 0.5 h_\theta(x) \geq 0.5 hθ(x)0.5, predict y = 1 y =1 y=1,

if h θ ( x ) &lt; 0.5 h_\theta(x) &lt; 0.5 hθ(x)<0.5, predict y = 0 y =0 y=0.

When z ≥ 0 z \geq 0 z0, g ( z ) ≥ 0.5 g(z) \geq 0.5 g(z)0.5. So, h θ ( x ) = g ( θ T x ) ≥ 0.5 h_\theta(x) = g(\theta^Tx) \geq 0.5 hθ(x)=g(θTx)0.5, when θ T x ≥ 0 \theta^Tx \geq 0 θTx0.

Non-linear decision boundaries

USE POLYNOMIAL TERMS

Cost Function

How do we choose parameters θ \theta θ ?

For linear regression, J ( θ ) = 1 m ∑ i = 1 m 1 2 ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta) = \frac{1}{m}\sum_{i=1}^m\frac{1}{2}{(h_\theta(x^{(i)}) - y^{(i)})}^2 J(θ)=m1i=1m21(hθ(x(i))y(i))2, and we modify it to be like J ( θ ) = 1 m ∑ i = 1 m c o s t ( h θ ( x ) , y ) J(\theta) = \frac{1}{m}\sum_{i=1}^m cost(h_\theta (x),y) J(θ)=m1i=1mcost(hθ(x),y),and the c o s t ( h θ ( x ) , y ) = 1 2 ( h θ ( x ( i ) ) − y ( i ) ) 2 cost(h_\theta (x),y) = \frac{1}{2}{(h_\theta(x^{(i)}) - y^{(i)})}^2 cost(hθ(x),y)=21(hθ(x(i))y(i))2.

But it is a non-convex function, so we have to find a new cost function, which is convex and has a global minimum.

Logistic regression cost function

在这里插入图片描述

C:\Users\chenh\Desktop\Notebook\Machine Learing\untitled.png
c o s t = 0 i f y = 1 , h θ ( x ) = 1 cost = 0 \quad if \quad y=1, h_\theta(x) =1 cost=0ify=1,hθ(x)=1

But as
h θ ( x ) → 0 c o s t → ∞ h_\theta(x) \rightarrow 0\\ cost \rightarrow \infin hθ(x)0cost

Capture intuition that if h θ ( x ) = 0 h_\theta(x) = 0 hθ(x)=0, (predict P ( y = 1 ∣ x ; θ ) = 0 P(y=1|x;\theta) =0 P(y=1x;θ)=0), but y = 1 y =1 y=1, we’ll penalize learning algorithm by a very large cost.

Simplified cost function and gradient descent

C o s t ( h θ ( x ) , y ) = − y ∗ l o g ( h θ ( x ) ) − ( 1 − y ) l o g ( 1 − h θ ( x ) ) Cost(h_\theta(x), y) = -y*log(h_\theta(x))-(1-y)log(1-h_\theta(x)) Cost(hθ(x),y)=ylog(hθ(x))(1y)log(1hθ(x))
J ( θ ) = 1 m ∑ i = 1 m C o s t ( h θ ( x ) , y ) = − 1 m [ ∑ i = 1 m y ( i ) ∗ l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] J(\theta) = \frac{1}{m}\sum_{i = 1}^mCost(h_\theta(x), y)\\ =-\frac{1}{m}[\sum_{i=1}^m y^{(i)}*log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))] J(θ)=m1i=1mCost(hθ(x),y)=m1[i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]
To find parameters θ \theta θ:

m i n J ( θ ) min J(\theta) minJ(θ)

To make a prediction given new x x x:

Output h θ ( x ) = 1 1 + e − θ T x h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}} hθ(x)=1+eθTx1

Gradient Descent

Repeat{

θ j : = θ j − α ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ∗ x j ( i ) \theta_j := \theta_j - \alpha \sum_{i=1}^m(h_\theta(x^{(i)})- y^{(i)})*x_j^{(i)} θj:=θjαi=1m(hθ(x(i))y(i))xj(i)

Just like linear regression!

}

Advanced optimization

Multi-class classification(One vs. all)

Multiclass classification

Email tagging: Work, Friends, Family, Hobby

Weather: Sunny, Cloudy, Rain, Snow

One-vs-all

By sparating to lots of 2-class classification problems.

[外链图片转存失败(img-bVHkm8TZ-1563612757053)(C:\Users\chenh\Desktop\Notebook\Machine Learning\pictures\1563611935738.png)]

Train a logistic regression classifier h θ ( i ) ( x ) h_\theta^{(i)}(x) hθ(i)(x) for each class i i i to predict the probability that y = i y =i y=i.

On a new input x x x, to make a prediction, pick the class i i i that maximizes m a x i h θ ( i ) ( x ) \mathop{max}\limits_{i} h_\theta^{(i)}(x) imaxhθ(i)(x).

Please click here to see the next note.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值