推导损失函数
根据初始化式子:
h
θ
(
x
)
=
θ
0
x
0
+
θ
1
x
1
+
θ
2
x
2
+
.
.
.
h_{\theta }\left ( x \right )= \theta _{0}x_{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+...
hθ(x)=θ0x0+θ1x1+θ2x2+...
简化后可以得到:
h
θ
(
x
)
=
∑
i
=
0
n
θ
i
x
i
=
(
θ
0
θ
1
⋮
θ
n
)
∗
(
x
0
x
1
⋯
x
n
)
=
Θ
T
x
h_{\theta}\left(x\right)=\sum_{i=0}^{n}\theta_{i}x_{i}=\begin{pmatrix}\theta_{0}\\\theta_{1}\\\vdots\\\theta_{n}\\\end{pmatrix}*\begin{pmatrix}x_{0}&x_{1}&\cdots&x_{n}\end{pmatrix}=\Theta^{T}x
hθ(x)=i=0∑nθixi=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞∗(x0x1⋯xn)=ΘTx
又知存在独立同分布的误差项:
ε
(
i
)
\varepsilon ^{(i)}
ε(i)
得出下列式子:
y
(
i
)
=
Θ
T
x
(
i
)
+
ε
(
i
)
y ^{(i)}=\Theta^{T}x ^{(i)}+ \varepsilon ^{(i)}
y(i)=ΘTx(i)+ε(i)
误差项符合数学期望为0、方差为
σ
2
\sigma^{2}
σ2正太分布(忘记的可以回头稍微了解一下高斯分布的公式):
P
(
ε
i
)
=
1
2
π
σ
e
x
p
(
−
(
ε
i
)
2
2
σ
2
)
P\left(\varepsilon_{i}\right)=\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(\varepsilon_{i}\right)^{2}}{2\sigma^{2}}\right)
P(εi)=2πσ1exp(−2σ2(εi)2)
将
ε
(
i
)
\varepsilon ^{(i)}
ε(i)带入上边式子:
P
(
y
i
∣
x
i
;
θ
)
=
1
2
π
σ
e
x
p
(
−
(
y
i
−
Θ
T
x
i
)
2
2
σ
2
)
P\left(y^{i}|x^{i};\theta\right)=\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)
P(yi∣xi;θ)=2πσ1exp(−2σ2(yi−ΘTxi)2)
在这里引入似然函数的概念:
L
(
θ
)
=
∏
i
=
1
m
p
(
y
i
∣
x
i
;
θ
)
L\left ( \theta \right )=\prod_{i=1}^{m}p\left ( y^{i}|x^{i};\theta \right )
L(θ)=i=1∏mp(yi∣xi;θ)
需要求出似然函数的最大值,开始化简似然函数,两边同时取对数,化简函数:
l
(
θ
)
=
l
n
L
(
θ
)
=
l
n
∏
i
=
1
m
p
(
y
i
∣
x
i
;
θ
)
=
l
n
∏
i
=
1
m
(
1
2
π
σ
e
x
p
(
−
(
y
i
−
Θ
T
x
i
)
2
2
σ
2
)
)
l\left(\theta\right)=lnL\left(\theta\right)=ln\prod_{i=1}^{m}p\left(y^{i}|x^{i};\theta\right)=ln\prod_{i=1}^{m}\left(\frac{1}{\sqrt{2\pi\sigma}}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right)
l(θ)=lnL(θ)=lni=1∏mp(yi∣xi;θ)=lni=1∏m(2πσ1exp(−2σ2(yi−ΘTxi)2))
=
∑
i
=
1
m
l
n
(
1
2
π
σ
e
x
p
(
−
(
y
i
−
Θ
T
x
i
)
2
2
σ
2
)
)
=\sum_{i=1}^{m}ln\left(\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right)
=i=1∑mln(2πσ1exp(−2σ2(yi−ΘTxi)2))
=
∑
i
=
1
m
(
l
n
1
2
π
σ
+
l
n
e
x
p
(
−
(
y
i
−
Θ
T
x
i
)
2
2
σ
2
)
)
=\sum_{i=1}^{m}\left(ln\frac{1}{\sqrt{2\pi}\sigma}+lnexp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right)
=i=1∑m(ln2πσ1+lnexp(−2σ2(yi−ΘTxi)2))
=
m
l
n
1
2
π
σ
+
(
−
∑
i
=
1
m
(
(
y
i
−
Θ
T
x
i
)
2
2
σ
2
)
)
=mln\frac{1}{\sqrt{2\pi}\sigma}+\left(-\sum_{i=1}^{m}\left(\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right)
=mln2πσ1+(−i=1∑m(2σ2(yi−ΘTxi)2))
=
m
l
n
1
2
π
σ
−
1
σ
2
⋅
1
2
∑
i
=
1
m
(
y
i
−
Θ
T
x
i
)
2
=mln\frac{1}{\sqrt{2\pi}\sigma}-\frac{1}{\sigma^{2}}\cdot\frac{1}{2}\sum_{i=1}^{m}\left(y^{i}-\Theta^{T}x^{i}\right)^{2}
=mln2πσ1−σ21⋅21i=1∑m(yi−ΘTxi)2
这是以上化简用到的简单对数公式
这里需要求出似然函数的最大值,引入损失函数
J
(
θ
)
J\left ( \theta \right )
J(θ),当损失函数最小时候,似然函数取最大值:
J
(
θ
)
=
1
2
∑
i
=
1
m
(
y
i
−
Θ
T
x
i
)
2
J\left ( \theta \right )=\frac{1}{2}\sum_{i=1}^{m}\left(y^{i}-\Theta^{T}x^{i}\right)^{2}
J(θ)=21i=1∑m(yi−ΘTxi)2
求解损失函数最优解
-
矩阵法求解
第一步展开函数,第二步求偏导
1. J ( θ ) = 1 2 ( X Θ − Y ) T ⋅ ( X Θ − Y ) J\left ( \theta \right )= \frac{1}{2}\left ( X\Theta -Y \right )^{T}\cdot \left ( X\Theta -Y \right ) J(θ)=21(XΘ−Y)T⋅(XΘ−Y)
= 1 2 ( Θ T X T − Y T ) ⋅ ( X Θ − Y ) =\frac{1}{2}\left ( \Theta ^{T}X^{T}-Y^{T} \right )\cdot \left ( X\Theta -Y \right ) =21(ΘTXT−YT)⋅(XΘ−Y)
= 1 2 ( Θ T X T X Θ − Θ T X T Y − Y T X Θ + Y T Y ) =\frac{1}{2}\left ( \Theta ^{T}X^{T} X\Theta - \Theta ^{T}X^{T}Y - Y^{T}X\Theta+Y^{T}Y \right ) =21(ΘTXTXΘ−ΘTXTY−YTXΘ+YTY)
2. ∂ J ( θ ) ∂ θ = 1 2 ( 2 X T X Θ − X T Y − ( Y T X ) T + 0 ) = ( X T X Θ − X T Y ) = 0 \frac{\partial J\left ( \theta \right )}{\partial \theta }=\frac{1}{2}\left( 2X^{T} X \Theta - X^{T} Y-\left ( Y^{T}X \right )^{T}+0\right )=\left ( X^{T} X \Theta - X^{T} Y\right )=0 ∂θ∂J(θ)=21(2XTXΘ−XTY−(YTX)T+0)=(XTXΘ−XTY)=0
Θ = ( X T X ) − 1 X T Y \Theta =\left ( X^{T} X\right )^{-1}X^{T}Y Θ=(XTX)−1XTY
可以求出 Θ \Theta Θ,但是这里有个前提条件是需要可逆的 -
梯度下降法求最优解
梯度下降算法的基本公式: θ j = θ j − α ∂ J ( θ ) ∂ θ j \theta _{j} = \theta _{j} - \alpha \frac{\partial J\left ( \theta \right )}{\partial \theta j} θj=θj−α∂θj∂J(θ)根据步长、学习率 α \alpha α沿着梯度、偏导数 ∂ J ( θ ) ∂ θ j \frac{\partial J\left ( \theta \right )}{\partial \theta j} ∂θj∂J(θ)的反方向动态更新 θ j \theta _{j} θj,以求得到最优解
∂
J
(
θ
)
∂
θ
j
=
1
2
m
∑
i
=
1
m
(
2
(
y
i
−
h
θ
(
x
i
)
)
∂
(
y
i
−
h
θ
(
x
i
)
)
∂
θ
j
)
\frac{\partial J\left ( \theta \right )}{\partial \theta j} = \frac{1}{2m}\sum_{i=1}^{m}\left ( 2\left ( y^{i} -h_{\theta }\left ( x^{i} \right )\right ) \frac{\partial \left ( y^{i} -h_{\theta }\left ( x^{i} \right )\right ) }{\partial\theta_{j}}\right )
∂θj∂J(θ)=2m1i=1∑m(2(yi−hθ(xi))∂θj∂(yi−hθ(xi)))
=
−
1
m
∑
i
=
1
m
(
(
y
i
−
h
θ
(
x
i
)
)
x
j
i
)
=-\frac{1}{m}\sum_{i=1}^{m}\left (\left ( y^{i} -h_{\theta }\left ( x^{i} \right ) \right )x_{j}^{i}\right )
=−m1i=1∑m((yi−hθ(xi))xji)
带入基本公式可得到批量梯度下降式子:
θ
j
=
θ
j
−
α
m
∑
i
=
1
m
(
(
h
θ
(
x
i
)
−
y
i
)
x
j
i
)
\theta _{j} = \theta _{j} - \frac{\alpha}{m}\sum_{i=1}^{m}\left (\left ( h_{\theta }\left ( x^{i} \right )-y^{i} \right )x_{j}^{i}\right )
θj=θj−mαi=1∑m((hθ(xi)−yi)xji)
通过不断更新参数,最终找出
θ
\theta
θ的最优解