正则化(regularization)
过拟合问题(overfitting)
- Underfitting(欠拟合)–>high bias(高偏差)
- Overfitting(过拟合)–>high variance(高方差)
- Overfitting:If we have too many features, the learned hypothesis
may fit the training set very well , but fail to generalize to new examples (predict prices on new examples).模型泛化能力差
addressing overfitting
options:
1)reduce number of features(减少特征数量)
–Manually select which features to keep
–Model selection algorithm(模型选择算法)
2)regularization(正则化)
–keep all the features but reduce magnitude/values(但减少参数的大小/值) of parameters.
–Works well when we have a lot of features,each of which contributes a bit to predicting y.
代价函数Cost function(正则化代价函数)
the effect of penalizing two of the parameter values being large.
加入惩罚增大了两个参数带来的效果。
对
θ
j
\theta_j
θj 加入惩罚项:
In regularized linear regression,we choose
θ
\theta
θ to minimize.
Regularization线性回归代价函数:
J
(
θ
)
=
1
2
m
[
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
+
λ
∑
j
=
1
m
θ
j
2
]
J(\theta)=\frac{1}{2m}\left[ \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^m\theta_j^2\right]
J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=1∑mθj2]
目标:
min
θ
J
(
θ
)
\underset{\theta}{\min}J(\theta)
θminJ(θ)
λ
\lambda
λ:regularization parameter(正则参数)
- λ很大的结果?
线性回归的正则化(Regularized linear regression)
梯度下降(Gradient descent)
梯度下降算法:
repeat:
θ
0
:
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
θ
j
:
=
θ
j
−
α
1
m
[
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
+
λ
θ
j
]
\theta_j:= \theta_j-\alpha\frac{1}{m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}+\lambda\theta_j\right]
θj:=θj−αm1[i=1∑m(hθ(x(i))−y(i))x0(i)+λθj]
等价于:
θ
0
:
=
θ
0
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
\theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))x0(i)
θ
j
:
=
θ
j
(
1
−
α
1
m
)
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
\theta_j:= \theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
θj:=θj(1−αm1)−αm1i=1∑m(hθ(x(i))−y(i))xj(i)
正规方程(Normal equation)
正规方程:
假设:
m
≤
n
(
e
x
a
m
p
l
e
s
≤
f
e
a
t
u
r
e
s
)
m\leq n(examples\leq features)
m≤n(examples≤features)
θ
=
(
X
T
X
)
−
1
X
T
y
\theta=(X^TX)^{-1}X^Ty
θ=(XTX)−1XTy
if λ>0,
θ
=
(
X
T
X
+
λ
[
0
1
1
⋱
1
]
⏟
(
n
+
1
)
×
(
n
+
1
)
)
−
1
X
T
y
\theta=\left(X^TX+\lambda\underbrace{\begin{bmatrix} 0 \\ & 1 & &&\\&&1\\&&&⋱\\&&&&1 \end{bmatrix} }_{(n+1)\times(n+1)}\right)^{-1}X^Ty
θ=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛XTX+λ(n+1)×(n+1)
⎣⎢⎢⎢⎢⎡011⋱1⎦⎥⎥⎥⎥⎤⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞−1XTy
只要λ>0,那么括号内的矩阵一定不是奇异矩阵,也就是可逆的。
逻辑回归的正则化(Regularization logistic regression)
逻辑回归代价函数:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
(
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
1
−
y
(
i
)
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
)
+
λ
2
m
∑
j
=
1
m
θ
j
2
J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{m}\theta_j^2
J(θ)=−m1i=1∑m(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i))))+2mλj=1∑mθj2