线性回归的正则化
假设
h θ ( x ) = θ T x = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_{\theta}\left( \boldsymbol{x} \right) =\boldsymbol{\theta }^T\boldsymbol{x}=\theta _0x_0+\theta _1x_1+\theta _2x_2+\cdots +\theta _nx_n hθ(x)=θTx=θ0x0+θ1x1+θ2x2+⋯+θnxn
代价函数
J
(
θ
)
=
1
2
m
[
∑
  
i
  
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
+
λ
∑
j
=
1
n
θ
j
2
]
J\left( \boldsymbol{\theta } \right) =\frac{1}{2m}\left[ \sum_{\,\,i\,\,=1}^m{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -y^{\left( i \right)} \right) ^2}+\lambda \sum_{j=1}^n{\theta _j^2} \right]
J(θ)=2m1[i=1∑m(hθ(x(i))−y(i))2+λj=1∑nθj2]
加入了限制
θ
j
\theta_j
θj大小的时候的惩罚项,使各个
θ
j
\theta_j
θj不会太大,这样有利于防止过拟合。
梯度下降法
θ j    : =    θ j − α ∂ ∂ θ j J ( θ )    ( j = 0,1,2,3 … n ) { θ 0    : =    θ 0 −   α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i )    j = 0 θ j    : =    θ j −   α 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ θ j ]    j > 0 \theta _j\,\,:=\,\,\theta _j-\alpha \frac{\partial}{\partial \theta _j}J\left( \boldsymbol{\theta } \right) \,\, \left( j=\text{0,1,2,3 }\dots n \right) \\ \begin{cases} \theta _0\,\,:=\,\,\theta _0-\,\alpha \frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -\text{y}^{\left( i \right)} \right) x_{0}^{\left( i \right)}}& \,\, j=0\\ \theta _j\,\,:=\,\,\theta _j-\,\alpha \frac{1}{m}\left[ \sum_{i=1}^m{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -\text{y}^{\left( i \right)} \right) x_{j}^{\left( i \right)}}+\lambda \theta _j \right]& \,\, j>0\\ \end{cases} θj:=θj−α∂θj∂J(θ)(j=0,1,2,3 …n){θ0:=θ0−αm1∑i=1m(hθ(x(i))−y(i))x0(i)θj:=θj−αm1[∑i=1m(hθ(x(i))−y(i))xj(i)+λθj]j=0j>0
补充: 正规方程法的正则化
X m × ( n + 1 )    = [ ( x ( 1 ) ) T ⋮ ( x ( m ) ) T ] , θ = [ θ 0 ⋮ θ n ] , y = [ y ( 1 ) ⋮ y ( m ) ] \boldsymbol{X}_{m\times \left( n+1 \right)}\,\,=\left[ \begin{array}{c} \left( x^{\left( 1 \right)} \right) ^T\\ \vdots\\ \left( x^{\left( m \right)} \right) ^T\\ \end{array} \right] ,\boldsymbol{\theta }=\left[ \begin{array}{c} \theta _0\\ \vdots\\ \theta _n\\ \end{array} \right] ,\boldsymbol{y}=\left[ \begin{array}{c} y^{\left( 1 \right)}\\ \vdots\\ y^{\left( m \right)}\\ \end{array} \right] Xm×(n+1)=⎣⎢⎢⎡(x(1))T⋮(x(m))T⎦⎥⎥⎤,θ=⎣⎢⎡θ0⋮θn⎦⎥⎤,y=⎣⎢⎡y(1)⋮y(m)⎦⎥⎤
θ
=
(
X
T
X
+
[
0
1
1
⋱
1
]
⎵
(
n
+
1
)
×
(
n
+
1
)
)
−
1
X
T
y
\boldsymbol{\theta }=\left( \boldsymbol{X}^T\boldsymbol{X}+\underset{\left( n+1 \right) \times \left( n+1 \right)}{\underbrace{\left[ \begin{matrix}{} 0& & & & \\ & 1& & & \\ & & 1& & \\ & & & \ddots& \\ & & & & 1\\ \end{matrix} \right] }} \right) ^{-1}\boldsymbol{X}^T\boldsymbol{y}
θ=⎝⎜⎜⎜⎜⎜⎜⎜⎜⎜⎛XTX+(n+1)×(n+1)
⎣⎢⎢⎢⎢⎡011⋱1⎦⎥⎥⎥⎥⎤⎠⎟⎟⎟⎟⎟⎟⎟⎟⎟⎞−1XTy
加入了一个特殊的矩阵,使方程始终有解。
逻辑分类的正则化
假设
h θ ( x ) = ( 1 + e − θ T x ) − 1 h_{\theta}\left( \boldsymbol{x} \right) =\left( 1+e^{-\boldsymbol{\theta }^T\boldsymbol{x}} \right) ^{-1} hθ(x)=(1+e−θTx)−1
代价函数
J ( θ ) = 1 m ∑ i = 1 m Cost ( h θ ( x ( i ) ) , y ( i ) ) + λ 2 m ∑ j = 1 n θ j 2 J\left( \boldsymbol{\theta } \right) =\frac{1}{m}\sum_{i=1}^m{\text{Cost}\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) ,y^{\left( i \right)} \right)}+\frac{\lambda}{2m}\sum_{j=1}^n{\begin{array}{c} \theta _j^2\\ \end{array}} J(θ)=m1i=1∑mCost(hθ(x(i)),y(i))+2mλj=1∑nθj2
也即
J
(
θ
)
=
1
m
∑
i
=
1
m
[
−
y
(
i
)
log
(
h
θ
(
x
(
i
)
)
)
+
(
y
(
i
)
−
1
)
log
(
1
−
h
θ
(
x
(
i
)
)
)
]
+
λ
2
m
∑
j
=
1
n
θ
j
2
J\left( \boldsymbol{\theta } \right) =\frac{1}{m}\sum_{i=1}^m{\left[ -y^{\left( i \right)}\log \left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) \right) +\left( y^{\left( i \right)}-1 \right) \log \left( 1-h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) \right) \right]}+\frac{\lambda}{2m}\sum_{j=1}^n{\begin{array}{c} \theta _j^2\\ \end{array}}
J(θ)=m1i=1∑m[−y(i)log(hθ(x(i)))+(y(i)−1)log(1−hθ(x(i)))]+2mλj=1∑nθj2
梯度下降法
θ
j
  
:
=
  
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
  
(
j
=
0,1,2,3
…
n
)
\theta _j\,\,:=\,\,\theta _j-\alpha \frac{\partial}{\partial \theta _j}J\left( \boldsymbol{\theta } \right) \,\, \left( j=\text{0,1,2,3 }\dots n \right)
θj:=θj−α∂θj∂J(θ)(j=0,1,2,3 …n)
推导为
{
θ
0
  
:
=
  
θ
0
−
 
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
  
j
=
0
θ
j
  
:
=
  
θ
j
−
 
α
1
m
[
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
j
(
i
)
+
λ
θ
j
]
  
j
>
0
\begin{cases} \theta _0\,\,:=\,\,\theta _0-\,\alpha \frac{1}{m}\sum_{i=1}^m{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -\text{y}^{\left( i \right)} \right) x_{0}^{\left( i \right)}}& \,\, j=0\\ \theta _j\,\,:=\,\,\theta _j-\,\alpha \frac{1}{m}\left[ \sum_{i=1}^m{\left( h_{\theta}\left( \boldsymbol{x}^{\left( i \right)} \right) -\text{y}^{\left( i \right)} \right) x_{j}^{\left( i \right)}}+\lambda \theta _j \right]& \,\, j>0\\ \end{cases}
{θ0:=θ0−αm1∑i=1m(hθ(x(i))−y(i))x0(i)θj:=θj−αm1[∑i=1m(hθ(x(i))−y(i))xj(i)+λθj]j=0j>0