线性回归模型推导
线性拟合模型:
h
θ
(
x
)
=
θ
0
+
θ
1
x
1
+
θ
2
x
2
h
θ
(
x
)
=
∑
i
=
0
n
θ
i
x
i
=
θ
T
x
⋯
①
\begin{aligned}h_θ(x)&=θ_0+θ_1x_1+θ_2x_2 \\h_θ(x)&=\displaystyle \sum_{i=0}^nθ_ix_i=θ^Tx\cdots①\end{aligned}
hθ(x)hθ(x)=θ0+θ1x1+θ2x2=i=0∑nθixi=θTx⋯①
误差,真实值和预测值之间存在的差异ε
对于每个样本:
y
i
=
θ
T
x
i
+
ε
i
⋯
②
y_i=θ^Tx_i+ε_i\cdots②
yi=θTxi+εi⋯②
假设:误差 ε i ε_i εi是独立并肯有相同的分布,并且服从均值为0方差为 θ 2 θ^2 θ2的高斯分布
预测值与误差由于服从高斯分布:
p
(
ε
i
)
=
1
2
π
σ
e
x
p
(
−
(
ε
i
)
2
2
σ
2
)
p(ε_i)=\frac{1}{\sqrt{2\pi}σ}exp(-\frac{(ε_i)^2}{2σ^2})
p(εi)=2πσ1exp(−2σ2(εi)2)
由①代入②式:
p
(
y
i
∣
x
i
;
θ
)
=
1
2
π
σ
e
x
p
(
−
(
y
i
−
θ
T
x
i
)
2
2
σ
2
)
p(y_i|x_i;θ)=\frac{1}{\sqrt{2\pi}σ}exp(-\frac{(y_i-θ^Tx_i)^2}{2σ^2})
p(yi∣xi;θ)=2πσ1exp(−2σ2(yi−θTxi)2)
使用似然函数,求解最优参数
似然函数:
L
(
θ
)
=
∏
i
=
1
m
p
(
y
i
∣
x
i
;
θ
)
=
∏
i
=
1
m
1
2
π
σ
e
x
p
(
−
(
y
i
−
θ
T
x
i
)
2
2
σ
2
)
\displaystyle L(θ)=\prod_{i=1}^mp(y_i|x_i;θ)=\prod_{i=1}^m\frac{1}{\sqrt{2\pi}σ}exp(-\frac{(y_i-θ^Tx_i)^2}{2σ^2})
L(θ)=i=1∏mp(yi∣xi;θ)=i=1∏m2πσ1exp(−2σ2(yi−θTxi)2)
函数变换为对数似然:
l
o
g
L
(
θ
)
=
l
o
g
∏
i
=
1
m
1
2
π
σ
e
x
p
(
−
(
y
i
−
θ
T
x
i
)
2
2
σ
2
)
=
∑
i
=
1
m
l
o
g
1
2
π
σ
−
1
σ
2
1
2
∑
i
=
1
m
(
y
i
−
θ
T
x
i
)
2
\displaystyle \begin{aligned} logL(θ) &=log\prod_{i=1}^m\frac{1}{\sqrt{2\pi}σ}exp(-\frac{(y_i-θ^Tx_i)^2}{2σ^2}) \\ &=\sum_{i=1}^mlog\frac{1}{\sqrt{2\pi}σ}-\frac{1}{σ^2}\frac{1}{2}\sum_{i=1}^m(y_i-θ^Tx_i)^2\end{aligned}
logL(θ)=logi=1∏m2πσ1exp(−2σ2(yi−θTxi)2)=i=1∑mlog2πσ1−σ2121i=1∑m(yi−θTxi)2
目标:求似然函数的最大值,最小二乘法
领:
J
(
θ
)
=
1
2
∑
i
=
1
m
(
y
i
−
θ
T
x
i
)
2
\displaystyle J(θ)=\frac{1}{2}\sum_{i=1}^m(y_i-θ^Tx_i)^2
J(θ)=21i=1∑m(yi−θTxi)2
化简
J
(
θ
)
=
1
2
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
=
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
\displaystyle J(θ)=\frac{1}{2}\sum_{i=1}^m(h_θ(x_i)-y_i)=\frac{1}{2}(Xθ-y)^T(Xθ-y)
J(θ)=21i=1∑m(hθ(xi)−yi)=21(Xθ−y)T(Xθ−y)
求偏导:
▽
θ
J
(
θ
)
=
▽
θ
(
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
)
=
▽
θ
(
1
2
(
θ
T
X
T
−
y
T
)
(
X
θ
−
y
)
)
=
▽
θ
(
1
2
(
θ
T
X
T
X
θ
−
θ
T
X
T
y
−
y
T
X
θ
+
y
T
y
)
)
=
1
2
(
2
X
T
X
θ
−
X
T
y
−
(
y
T
X
)
T
)
=
X
T
X
θ
−
X
T
y
\begin{aligned} \triangledown_{\theta}J(\theta)&=\triangledown_\theta(\frac{1}{2}(X\theta-y)^T(X\theta-y)) \\&=\triangledown_\theta(\frac{1}{2}(\theta^TX^T-y^T)(X\theta-y)) \\&=\triangledown_\theta(\frac{1}{2}(\theta^TX^TX\theta-\theta^TX^Ty-y^TX\theta+y^Ty)) \\&=\frac{1}{2}(2X^TX\theta-X^Ty-(y^TX)^T) \\&=X^TX\theta-X^Ty \end{aligned}
▽θJ(θ)=▽θ(21(Xθ−y)T(Xθ−y))=▽θ(21(θTXT−yT)(Xθ−y))=▽θ(21(θTXTXθ−θTXTy−yTXθ+yTy))=21(2XTXθ−XTy−(yTX)T)=XTXθ−XTy
当偏导为0时最小, θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
评估方法
最常用的评估项
R
2
R^2
R2:
1
−
残
差
平
方
和
总
方
差
项
1-\frac{残差平方和}{总方差项}
1−总方差项残差平方和 解释因变量的度量值
1
−
∑
i
=
1
m
(
y
^
i
−
y
i
)
2
∑
i
=
1
m
(
y
i
−
y
ˉ
)
2
1- \frac{\displaystyle \sum_{i=1}^m(\widehat{y}_i-y_i)^2}{\sum_{i=1}^m(y_i-\text{\={y}})^2}
1−∑i=1m(yi−yˉ)2i=1∑m(y
i−yi)2