Hello,大家好,我是Tony:
欢迎访问我的个人主页:Tony’s blog,让我们一起站在巨人的肩膀上。
上一张男神的照片,该笔记配合Andrew Ng课程Maching Learning 课时7-1至7-4
对于过拟合的建议
1、减少特征的数量
这里要说明一点对于一类数据,它的拟合数据为:
y = θ 0 + θ 1 ∗ x 1 + θ 2 ∗ x 1 2 + θ 3 ∗ x 1 3 + . . . + θ n ∗ x 1 n y=\theta_0+\theta_1*x_1+\theta_2*x_1^2+\theta_3*x_1^3+...+\theta_n*x_1^n y=θ0+θ1∗x1+θ2∗x12+θ3∗x13+...+θn∗x1n
这里描述的拟合曲线只是一个特征x1!
y = θ 0 + θ 1 ∗ x 1 + θ 2 ∗ x 1 2 + θ 3 ∗ x 1 3 + . . . + θ n ∗ x 1 n + θ 0 ′ + θ 1 ′ ∗ x 2 + θ 2 ′ ∗ x 2 2 + θ 3 ′ ∗ x 2 3 + . . . + θ n ′ ∗ x 2 n + θ 0 ′ ′ + θ 1 ′ ′ ∗ x 3 + θ 2 ′ ′ ∗ x 3 2 + θ 3 ′ ′ ∗ x 3 3 + . . . + θ n ′ ′ ∗ x 3 n y=\theta_0+\theta_1*x_1+\theta_2*x_1^2+\theta_3*x_1^3+...+\theta_n*x_1^n\\+\theta_0'+\theta_1'*x_2+\theta_2'*x_2^2+\theta_3'*x_2^3+...+\theta_n'*x_2^n\\+\theta_0''+\theta_1''*x_3+\theta_2''*x_3^2+\theta_3''*x_3^3+...+\theta_n''*x_3^n y=θ0+θ1∗x1+θ2∗x12+θ3∗x13+...+θn∗x1n+θ0′+θ1′∗x2+θ2′∗x22+θ3′∗x23+...+θn′∗x2n+θ0′′+θ1′′∗x3+θ2′′∗x32+θ3′′∗x33+...+θn′′∗x3n
这里描述的拟合曲线是个3特征x1、x2、x3!
-
手动选择数量合适的特征
-
模型选择算法
2.正则化
-
保留所有特征并减少参数theta的大小
-
当我们有很多特征时效果很好,每个特征都对y的预测有所帮助;
对于过拟合的具体做法
λ ∑ j = 1 n θ j 2 \lambda \sum_{j=1}^{n} \theta_{j}^{2} λj=1∑nθj2
J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 ] J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right] J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=1∑nθj2]
lambda 是正则化参数
1、线性回归正则化的具体做法:
i.梯度下降算法
Repeat:
{
Before:
θ
0
:
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
Later:
θ
j
:
=
θ
j
(
1
−
α
λ
m
)
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta_{j}:=\theta_{j}\left(1-\alpha \frac{\lambda}{m}\right)-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}
θj:=θj(1−αmλ)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
}
ii.正则化方程算法
2、逻辑回归正则化的具体做法
Before
J
(
θ
)
=
−
[
1
m
∑
i
=
1
m
y
(
i
)
log
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
]
J(\theta)=-\left[\frac{1}{m} \sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]
J(θ)=−[m1i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]
====>>
J
(
θ
)
=
−
1
m
[
∑
i
=
1
m
y
(
i
)
log
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
J(\theta)=-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}
J(θ)=−m1[i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))]+2mλj=1∑nθj2
i.梯度下降算法
Repeat
{
Before:
θ
j
:
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
\theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)}
θj:=θj−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
Later:
θ
j
:
=
θ
j
−
α
[
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
+
λ
m
θ
j
]
\theta_{j}:=\theta_{j}-\alpha\left[\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}+\frac{\lambda}{m} \theta_{j}\right]
θj:=θj−α[m1i=1∑m(hθ(x(i))−y(i))xj(i)+mλθj]
}
ii.一些高级优化算法(Advanced Algorithm)
function [jVal,gradient]=costFunction(theta)