[Machinie Learning] 吴恩达机器学习课程笔记——Week3

Machine Learning by Andrew Ng

💡 吴恩达机器学习课程学习笔记——Week 3
🐠 本人学习笔记汇总 合订本
✓ 课程网址 standford machine learning
🍭 参考资源

学习提纲

Classification and Regression

1.Classification

label 0 denotes the negative class (the absence of sth)
label 1 denotes the positive class(the presence of sth)
but it is rather arbitrary to decide which label denotes the negative/positive class

0


Linear Regression is not a good idea as the values to predict take on a small number of discrete values and linear regression would exceed those values.
1


Logistic Regression is a classification algorithm 分类算法
(not a regression algorithm as its name may indicate)
2


2.Hypothesis Representation
we want our classifier
0 ≤ h θ ( x ) ≤ 1 0 \le h_\theta(x) \le 1 0hθ(x)1

we turn the linear regression function
h θ = θ T x h_\theta = \theta^T x hθ=θTx
into
h θ = g ( θ T x ) h_\theta = g(\theta^T x) hθ=g(θTx)
where g is
g ( z ) = 1 1 + e − z g(z)= \frac{1}{1+e^{-z}} g(z)=1+ez1
then we get
h θ = 1 1 + e − θ T x h_\theta = \frac{1}{1+e^{-{\theta^T x}}} hθ=1+eθTx1
g is called sigmoid function or logistic function。

Sigmoid函数的性质:

g asymptotes at 0 as z goes to minus infinity, g asymptotes at 1 as z goes to infinity

3


The look of the sigmoid function 函数曲线
4


malignant 恶性 benign 良性

Interpretation of Hypothesis Output 解读假设函数的输出
The probability that y = 1, given x, parameterized by θ \theta θ θ \theta θ参数下,给定x,y=1的概率
5


3.Decision Boundary 决策边界
6


The Decision Boudary
7


The boundary is decided by parameters, not the training set
8


with more higher order polynomial terms, we can get more complex decision boundaries
通过构造高阶多项式函数,我们可以得到更加复杂的决策边界。
9


Logistic Regression Model

1.Cost Function
= optimization objective 优化目标

Given a training set, how to choose θ \theta θ
10


comma 逗号 ,

Cost function of linear regression
11


if we directly use the cost function of linear regression, it turn out to be a non-convex function. 不能直接套用线性回归的损失函数,因为用于分类问题的话,它是非凸的。
12


so, we have to find a new const function to make J convex

J ( θ ) = 1 m Σ i = 1 m C o s t ( h θ ( x i , y ) ) J(\theta) = \frac{1}{m} \Sigma_{i=1}^m Cost(h_\theta(x^i, y)) J(θ)=m1Σi=1mCost(hθ(xi,y))

在这里插入图片描述


13



14

损失函数的性质:
15


2.Simplified Cost Function and Gradient Descent

We can compress the cost function’s two conditional cases in to one case
C o s t ( h θ ( x ) , y ) = − y l o g ( h θ ( x ) ) − ( 1 − y ) l o g ( 1 − h θ ( x ) ) Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x)) Cost(hθ(x),y)=ylog(hθ(x))(1y)log(1hθ(x))

The full const function
J ( θ ) = − 1 m ∑ i = 1 m [ y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)})) J(θ)=m1i=1m[y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))

A vectorized implementation is
h = g ( X θ ) h=g(Xθ) h=g(Xθ)
J ( θ ) = 1 m ⋅ ( − y T l o g ( h ) − ( 1 − y ) T l o g ( 1 − h ) ) J(θ)=\frac{1}{m}⋅(−y^Tlog(h)−(1−y)^Tlog(1−h)) J(θ)=m1(yTlog(h)(1y)Tlog(1h))

16


apply the template of gradient descent and take the the derivative of J
17


The look is identical to linear regression except that the definition of h θ ( x ) h_\theta (x) hθ(x) is changed
18


A vectorized implementation
19


3.Advanced Optimization
There are some other optimization algorithm
20


Octave part of optimization, skip

Multi-class Classification

1.Multi-class Classification: 1 vs. all

21


Summary
use k classifiers to solve k-class problems
22


Solving the Problem of Overfitting

1.The Problem of Overfitting

过拟合问题:fit perfect on the training examples but bot good on the testing examples
left plot: underfitting
right plot: overfitting

23


Address overfitting

  • 减少特征
  • 正则(下讲)
    24

2.Cost Function
θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3 + θ 4 x 4 θ_0+θ_1x_1+θ_2 x_2+θ_3x_3+θ_4x_4 θ0+θ1x1+θ2x2+θ3x3+θ4x4

suppose we penalize and make θ3 and θ4 smaller, then we actually force to model to ‘simplify’ itself
25


Regularization
by convention it starts from θ1 (but it makes little difference if it starts from θ0)
26
27


λ \lambda λ is called regularization parameter 正则参数
if λ \lambda λ is set to an extremely large value, then all θ \theta θs will be 0. “underfit”

28


3.Regularized Linear Regression

29
30


Normal Equation 中添加正则项
31

the dimension is (n + 1) * (n + 1)
32

when m < n, X T X X^TX XTX is not invertible. But after adding the term λ \lambda λ * L, it becomes invertible!

4.Regularized Logistic Regression
We can regularize logistic regression in a similar way that we regularize linear regression.

33

34


Recall that our cost function for logistic regression was:

J ( θ ) = − m 1 ∑ i = 1 m [ y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ] J(θ)=−m_1∑_{i=1}^m[y^i log(h_θ(x^i))+(1−y^i) log(1−h_θ(x^i))] J(θ)=m1i=1m[yilog(hθ(xi))+(1yi)log(1hθ(xi))]

regularize the equation

J ( θ ) = − 1 m ∑ i = 1 m [ y i l o g ( h θ ( x i ) ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ] + λ 2 m ∑ j = 1 n θ j 2 J(θ)=−\frac{1}{m}∑_{i=1}^m[y^i log(h_θ(x^i))+(1−y^i) log(1−h_θ(x^i))] +\frac{λ}{2m}∑_{j=1}^nθ_j^2 J(θ)=m1i=1m[yilog(hθ(xi))+(1yi)log(1hθ(xi))]+2mλj=1nθj2

35


Advanced optimization
36

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值