返回目录
当特征量过少时,会出现欠拟合。即训练数据不能进行较好的拟合。
当特征量过多时,会出现过拟合。即训练数据能很好的拟合,但测试数据不能进行较好的拟合。
针对过拟合,需要在代价函数上加上惩罚项,模型复杂度越大,惩罚项越大。
1. 线性回归正则化
代价函数:
J
(
θ
⃗
)
=
1
2
m
(
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
2
+
λ
∑
j
=
1
j
=
n
θ
j
2
)
J( \vec{\theta}) = \frac{1}{2m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^{j=n} \theta_j^2)
J(θ)=2m1(i=1∑i=m(h(x(i))−y(i))2+λj=1∑j=nθj2)
其中:
y
⃗
=
[
y
(
1
)
,
y
(
2
)
,
.
.
.
,
y
(
m
)
]
T
∈
R
m
×
1
(
m
为
测
试
样
本
个
数
)
\begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{m\times1} \\ (m为测试样本个数) \end{aligned}
y=[y(1),y(2),...,y(m)]T∈Rm×1(m为测试样本个数)
则:
∂
J
(
θ
⃗
)
∂
θ
j
=
{
1
m
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
j
=
0
1
m
(
∑
i
=
1
i
=
m
(
h
(
x
⃗
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
+
λ
θ
j
)
j
=
1
,
2
,
.
.
.
,
n
\frac{\partial J(\vec{\theta})}{\partial\theta_j}= \begin{cases} \frac{1}{m}\sum_{i=1}^{i=m}( h(\vec{x}^{(i)})-y^{(i)})&j=0\\ \frac{1}{m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)}+\lambda\theta_j)&j=1,2,...,n \end{cases}
∂θj∂J(θ)={m1∑i=1i=m(h(x(i))−y(i))m1(∑i=1i=m(h(x(i))−y(i))xj(i)+λθj)j=0j=1,2,...,n
令
∂
J
(
θ
⃗
)
∂
θ
j
=
0
\frac{\partial J(\vec{\theta})}{\partial\theta_j}=0
∂θj∂J(θ)=0,得:
(
(
h
(
x
⃗
(
1
)
)
−
y
(
1
)
)
(
h
(
x
⃗
(
2
)
)
−
y
(
2
)
)
.
.
.
(
h
(
x
⃗
(
m
)
)
−
y
(
m
)
)
)
(
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
)
=
−
λ
θ
j
(
(
h
(
x
⃗
(
1
)
)
−
y
(
1
)
)
(
h
(
x
⃗
(
2
)
)
−
y
(
2
)
)
.
.
.
(
h
(
x
⃗
(
m
)
)
−
y
(
m
)
)
)
T
(
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
)
=
−
λ
θ
j
(
(
h
(
x
⃗
(
1
)
)
−
y
(
1
)
)
(
h
(
x
⃗
(
2
)
)
−
y
(
2
)
)
.
.
.
(
h
(
x
⃗
(
m
)
)
−
y
(
m
)
)
)
T
(
x
0
(
1
)
x
1
(
1
)
.
.
.
x
n
(
1
)
x
0
(
2
)
x
1
(
2
)
.
.
.
x
n
(
2
)
.
.
.
.
.
.
.
.
.
.
.
.
x
0
(
m
)
x
1
(
m
)
.
.
.
x
n
(
m
)
)
=
−
λ
(
0
θ
1
θ
2
.
.
.
θ
n
)
(
x
⃗
(
1
)
T
θ
⃗
−
y
(
1
)
)
x
⃗
(
2
)
T
θ
⃗
−
y
(
2
)
)
.
.
.
x
⃗
(
m
)
T
θ
⃗
−
y
(
m
)
)
)
T
(
x
0
(
1
)
x
1
(
1
)
.
.
.
x
n
(
1
)
x
0
(
2
)
x
1
(
2
)
.
.
.
x
n
(
2
)
.
.
.
.
.
.
.
.
.
.
.
.
x
0
(
m
)
x
1
(
m
)
.
.
.
x
n
(
m
)
)
=
−
λ
(
θ
0
θ
1
θ
2
.
.
.
θ
n
)
(
0
1
⋱
1
)
\begin{aligned} &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) & (h(\vec{x}^{(2)})-y^{(2)}) & ...& (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix} \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} 0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \\ &\begin{pmatrix} {\vec{x}^{(1)}}^T\vec{\theta}-y^{(1)}) \\ {\vec{x}^{(2)}}^T\vec{\theta}-y^{(2)}) \\ ...\\ {\vec{x}^{(m)}}^T\vec{\theta}-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} \theta_0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \end{aligned}
((h(x(1))−y(1))(h(x(2))−y(2))...(h(x(m))−y(m)))⎝⎜⎜⎜⎛xj(1)xj(2)...xj(m)⎠⎟⎟⎟⎞=−λθj⎝⎜⎜⎛(h(x(1))−y(1))(h(x(2))−y(2))...(h(x(m))−y(m))⎠⎟⎟⎞T⎝⎜⎜⎜⎛xj(1)xj(2)...xj(m)⎠⎟⎟⎟⎞=−λθj⎝⎜⎜⎛(h(x(1))−y(1))(h(x(2))−y(2))...(h(x(m))−y(m))⎠⎟⎟⎞T⎝⎜⎜⎜⎛x0(1)x0(2)...x0(m)x1(1)x1(2)...x1(m)............xn(1)xn(2)...xn(m)⎠⎟⎟⎟⎞=−λ(0θ1θ2...θn)⎝⎜⎜⎜⎛x(1)Tθ−y(1))x(2)Tθ−y(2))...x(m)Tθ−y(m))⎠⎟⎟⎟⎞T⎝⎜⎜⎜⎛x0(1)x0(2)...x0(m)x1(1)x1(2)...x1(m)............xn(1)xn(2)...xn(m)⎠⎟⎟⎟⎞=−λ(θ0θ1θ2...θn)⎝⎜⎜⎛01⋱1⎠⎟⎟⎞
所以公式化求解得到:
(
X
θ
⃗
−
y
⃗
)
T
X
=
−
λ
θ
⃗
T
(
0
1
⋱
1
)
θ
⃗
=
(
X
T
X
+
λ
(
0
1
⋱
1
)
)
−
1
X
T
y
⃗
\begin{aligned} (X\vec{\theta}-\vec{y})^TX=-\lambda\vec{\theta}^T \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \\ \vec{\theta} = (X^TX+\lambda \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix})^{-1}X^T\vec{y} \end{aligned}
(Xθ−y)TX=−λθT⎝⎜⎜⎛01⋱1⎠⎟⎟⎞θ=(XTX+λ⎝⎜⎜⎛01⋱1⎠⎟⎟⎞)−1XTy
梯度下降法得到:
{
θ
0
:
=
θ
0
−
α
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
j
=
0
θ
j
:
=
(
1
−
α
λ
m
)
θ
j
−
α
m
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
x
j
(
i
)
j
=
1
,
2
,
.
.
.
,
n
\begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})&j=0 \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases}
{θ0:=θ0−α∑i=1i=m(θTx(i)−y(i))θj:=(1−αmλ)θj−mα∑i=1i=m(θTx(i)−y(i))xj(i)j=0j=1,2,...,n
2. 逻辑斯蒂回归正则化
代价函数:
J
(
θ
⃗
)
=
−
1
m
(
∑
i
=
1
i
=
m
y
(
i
)
l
n
(
h
(
x
⃗
(
i
)
)
)
+
(
1
−
y
(
i
)
)
l
n
(
1
−
h
(
x
⃗
(
i
)
)
)
)
+
λ
2
m
∑
j
=
1
j
=
n
θ
j
2
J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{j=n} \theta_j^2
J(θ)=−m1(i=1∑i=my(i)ln(h(x(i)))+(1−y(i))ln(1−h(x(i))))+2mλj=1∑j=nθj2
梯度下降法得到:
{
θ
0
:
=
θ
0
−
α
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
θ
j
:
=
(
1
−
α
λ
m
)
θ
j
−
α
m
∑
i
=
1
i
=
m
(
θ
⃗
T
x
⃗
(
i
)
−
y
(
i
)
)
x
j
(
i
)
j
=
1
,
2
,
.
.
.
,
n
\begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)}) \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases}
{θ0:=θ0−α∑i=1i=m(θTx(i)−y(i))θj:=(1−αmλ)θj−mα∑i=1i=m(θTx(i)−y(i))xj(i)j=1,2,...,n
3. Lasso回归
即在代价函数后面添加L1正则化项。
J
(
θ
⃗
)
^
=
J
(
θ
⃗
)
+
α
∑
j
=
1
j
=
n
∣
θ
j
∣
\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} |\theta_j|
J(θ)
=J(θ)+αj=1∑j=n∣θj∣
4. 领回归
即在代价函数后面添加L2正则化项。1,2小节介绍的线性回归和Logistic回归正则化就是领回归。
J
(
θ
⃗
)
^
=
J
(
θ
⃗
)
+
α
∑
j
=
1
j
=
n
θ
j
2
\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} \theta_j^2
J(θ)
=J(θ)+αj=1∑j=nθj2
5. ElasticNet回归
即在代价函数后面添加L1+L2正则化项。
J
(
θ
⃗
)
^
=
J
(
θ
⃗
)
+
∑
j
=
1
j
=
n
(
α
ρ
∣
θ
j
∣
+
α
(
1
−
ρ
)
2
θ
j
2
)
\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\sum_{j=1}^{j=n} (\alpha\rho|\theta_j|+\frac{\alpha(1-\rho)}{2} \theta_j^2)
J(θ)
=J(θ)+j=1∑j=n(αρ∣θj∣+2α(1−ρ)θj2)
返回目录