一元线性回归
设模型为一元线性函数:
y
=
w
1
x
+
w
0
y = w_1 x + w_0
y=w1x+w0
现有样本:
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
…
,
(
x
n
,
y
n
)
{(x_1, y_1), (x_2, y_2),\dots, (x_n, y_n)}
(x1,y1),(x2,y2),…,(xn,yn), 用于拟合这个一元线性函数,得到
w
1
∗
,
w
0
∗
w_1^*,w_0^*
w1∗,w0∗。
y
^
i
\hat y_i
y^i 为拟合后的预测值,指定 差平方 为 损失函数
L
(
w
1
∗
,
w
0
∗
)
L(w_1^*, w_0^*)
L(w1∗,w0∗),要使得其最小,从而得到
w
1
∗
,
w
0
∗
w_1^*, w_0^*
w1∗,w0∗,所以有目标:
arg min
w
1
∗
,
w
0
∗
L
(
w
1
∗
,
w
0
∗
)
\argmin_{w_1^*, w_0^*} L(w_1^*, w_0^*)
w1∗,w0∗argminL(w1∗,w0∗)
L
(
w
1
∗
,
w
0
∗
)
=
∑
i
=
1
n
(
y
^
i
−
y
i
)
2
(1)
L(w_1^*, w_0^*) = \sum_{i=1}^n {(\hat y_i-y_i)^2} \tag{1}
L(w1∗,w0∗)=i=1∑n(y^i−yi)2(1)
y
^
i
=
w
1
∗
x
i
+
w
0
∗
(2)
\hat y_i = w_1^* x_i + w_0^* \tag{2}
y^i=w1∗xi+w0∗(2)
结合 (1)(2)式, 有:
L = ∑ i = 1 n ( w 1 ∗ x i + w 0 ∗ − y i ) 2 = ∑ i = 1 n ( ( w 1 ∗ x i ) 2 + w 0 ∗ 2 + y i 2 + 2 w 1 w 0 ∗ x i − 2 w 1 ∗ y i x i − 2 w 0 ∗ y i ) (3) \begin{aligned}L & = \sum_{i=1}^n {(w_1^* x_i + w_0^* - y_i)^2}\\ & = \sum_{i=1}^n {((w_1^* x_i)^2 + {w_0^*}^2 + {y_i}^2 + 2 w_1 w_0^* x_i - 2 w_1^* y_i x_i - 2 w_0^* y_i)}\\ \end{aligned} \tag{3} L=i=1∑n(w1∗xi+w0∗−yi)2=i=1∑n((w1∗xi)2+w0∗2+yi2+2w1w0∗xi−2w1∗yixi−2w0∗yi)(3)
于 (3)式 分别对
w
1
∗
,
w
0
∗
w_1^*, w_0^*
w1∗,w0∗ 求偏导,有:
∂
L
∂
w
0
∗
=
∑
i
=
1
n
(
2
w
0
∗
+
2
w
1
∗
x
i
+
2
y
i
)
=
2
n
w
0
+
2
w
1
∗
∑
i
=
1
n
x
i
−
2
∑
i
=
1
n
y
i
∂
L
∂
w
1
∗
=
∑
i
=
1
n
(
2
x
i
2
w
1
+
2
x
i
w
0
−
2
x
i
y
i
)
=
2
w
1
∑
i
=
1
n
x
i
2
+
2
w
0
∑
i
=
1
n
x
i
−
2
∑
i
=
1
n
x
i
y
i
\begin{aligned} \frac {\partial {L}}{\partial {w_0^*}} &= \sum_{i=1}^n {(2 w_0^*+ 2 w_1^* x_i + 2 y_i)} =2 n w_0 + 2 w_1^*\sum_{i=1}^n {x_i} - 2 \sum_{i=1}^n {y_i}\\ \frac {\partial {L}}{\partial {w_1^*}} &= \sum_{i=1}^n {(2 x_i^2 w_1 + 2x_i w_0 - 2 x_i y_i)} = 2 w_1 \sum_{i=1}^n {x_i^2} + 2 w_0 \sum_{i=1}^n {x_i} - 2 \sum_{i=1}^n {x_i y_i} \end{aligned}
∂w0∗∂L∂w1∗∂L=i=1∑n(2w0∗+2w1∗xi+2yi)=2nw0+2w1∗i=1∑nxi−2i=1∑nyi=i=1∑n(2xi2w1+2xiw0−2xiyi)=2w1i=1∑nxi2+2w0i=1∑nxi−2i=1∑nxiyi
令上式偏导等于 0:
{
n
w
0
+
w
1
∗
∑
i
=
1
n
x
i
−
∑
i
=
1
n
y
i
=
0
w
1
∑
i
=
1
n
x
i
2
+
w
0
∑
i
=
1
n
x
i
−
∑
i
=
1
n
x
i
y
i
=
0
(4)
\left\{ \begin{aligned} n w_0 + w_1^*\sum_{i=1}^n {x_i} - \sum_{i=1}^n {y_i} &= 0\\ w_1 \sum_{i=1}^n {x_i^2} + w_0 \sum_{i=1}^n {x_i} - \sum_{i=1}^n {x_i y_i} &= 0 \end{aligned} \right . \tag{4}
⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧nw0+w1∗i=1∑nxi−i=1∑nyiw1i=1∑nxi2+w0i=1∑nxi−i=1∑nxiyi=0=0(4)
求解方程组(4),得:
{
w
0
=
∑
i
=
1
n
y
i
−
w
1
∑
i
=
1
n
x
i
n
w
1
=
∑
i
=
1
n
x
i
y
i
−
w
0
∑
i
=
1
n
x
i
∑
i
=
1
n
x
i
2
(5)
\left\{ \begin{aligned} w_0 &= \frac{\sum_{i=1}^n y_i - w_1 \sum_{i=1}^n x_i}{n}\\ w_1&= \frac{ \sum_{i=1}^n x_i y_i - w_0 \sum_{i=1}^n x_i}{ \sum_{i=1}^n x_i^2} \end{aligned} \right . \tag{5}
⎩⎪⎪⎨⎪⎪⎧w0w1=n∑i=1nyi−w1∑i=1nxi=∑i=1nxi2∑i=1nxiyi−w0∑i=1nxi(5)
方程组(5) 相互代入可得:
{
w
0
=
∑
i
=
1
n
y
i
−
∑
i
=
1
n
x
i
∑
i
=
1
n
x
i
y
i
(
∑
i
=
1
n
x
i
)
2
n
−
(
∑
i
=
1
n
x
i
)
2
∑
i
=
1
n
x
i
2
w
1
=
∑
i
=
1
n
x
i
y
i
−
w
0
∑
i
=
1
n
x
i
∑
i
=
1
n
x
i
2
−
(
∑
i
=
1
n
x
i
)
2
n
(6)
\left\{ \begin{aligned} w_0 &= \frac{\sum_{i=1}^n y_i - \frac{\sum_{i=1}^n {x_i} \sum_{i=1}^n {x_iy_i}}{(\sum_{i=1}^n x_i)^2}} {n - \frac{(\sum_{i=1}^n x_i)^2}{\sum_{i=1}^n x_i^2}}\\ w_1&= \frac{ \sum_{i=1}^n x_i y_i - w_0 \sum_{i=1}^n x_i}{ \sum_{i=1}^n x_i^2 - \frac{ {(\sum_{i=1}^n {x_i})}^2}{n}} \end{aligned} \right . \tag{6}
⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧w0w1=n−∑i=1nxi2(∑i=1nxi)2∑i=1nyi−(∑i=1nxi)2∑i=1nxi∑i=1nxiyi=∑i=1nxi2−n(∑i=1nxi)2∑i=1nxiyi−w0∑i=1nxi(6)
于 (6)式 求出
w
1
w_1
w1, 就可以带入 方程组(5) 直接计算
w
0
w_0
w0。
多元线性回归(最小二乘法)
以上方法类推到
w
2
,
w
3
,
⋯
w_2,w_3,\cdots
w2,w3,⋯,就可以得到多元线性回归的解。
但是这样挨个挨个推导着实麻烦。
若直接有多元函数:
y
=
x
w
y = \mathbf x \mathbf w
y=xw
其中,
w
=
[
w
1
,
w
2
,
…
,
w
n
]
T
,
x
=
[
x
1
,
x
2
,
…
,
x
n
]
,
x
1
=
1
\mathbf w = [w_1, w_2, \dots, w_n]^\mathrm T, \mathbf x = [x_1,x_2, \dots, x_n], x_1 = 1
w=[w1,w2,…,wn]T,x=[x1,x2,…,xn],x1=1。
因为直接将函数的常数项视作变量,所以
x
1
=
1
x_1 = 1
x1=1。
那么现在有多元线性回归的推导:
w
^
\hat \mathbf w
w^ 为求得的系数向量,
y
=
[
y
1
,
y
2
,
…
,
y
n
]
T
\mathbf y =[y_1,y_2,\dots, y_n]^\mathrm T
y=[y1,y2,…,yn]T ,
X
=
[
x
1
,
x
2
,
…
,
x
n
]
T
\mathbf X = [\mathbf x_1, \mathbf x_2, \dots, \mathbf x_n]^\mathrm T
X=[x1,x2,…,xn]T,指定损失函数:
L
(
w
^
)
=
∥
y
−
X
w
^
∥
2
2
L(\hat \mathbf w) = \Vert {\mathbf y - \mathbf X\hat \mathbf w} \Vert_2^2
L(w^)=∥y−Xw^∥22
有求解目标:
arg min
w
∥
y
−
X
w
^
∥
2
2
\argmin_{\mathbf w} \Vert \mathbf y - \mathbf X \hat \mathbf w \Vert_2^2
wargmin∥y−Xw^∥22
∥
y
−
X
w
^
∥
2
2
=
(
y
−
X
w
^
)
T
(
y
−
X
w
^
)
=
(
y
T
−
w
^
T
X
T
)
(
y
−
X
w
^
)
=
y
T
y
+
w
^
T
X
T
X
w
^
−
w
^
T
X
T
y
−
y
T
X
w
^
\begin{aligned} \Vert \mathbf y - \mathbf X \hat \mathbf w \Vert_2^2 &= (\mathbf y - \mathbf X \hat \mathbf w)^\mathrm T(\mathbf y - \mathbf X \hat \mathbf w)\\ &= (\mathbf y^\mathrm T - \hat \mathbf w^\mathrm T \mathbf X^\mathrm T)(\mathbf y - \mathbf X \hat \mathbf w)\\ &=\mathbf y^\mathrm T \mathbf y + \hat \mathbf w^\mathrm T \mathbf X^\mathrm T\mathbf X \hat \mathbf w - \hat \mathbf w^\mathrm T \mathbf X^\mathrm T \mathbf y - \mathbf y^\mathrm T \mathbf X \hat \mathbf w \end{aligned}
∥y−Xw^∥22=(y−Xw^)T(y−Xw^)=(yT−w^TXT)(y−Xw^)=yTy+w^TXTXw^−w^TXTy−yTXw^
有矩阵求导公式:
d
x
T
A
x
d
x
=
(
A
+
A
T
)
x
d
x
T
A
d
x
=
A
d
A
x
d
x
=
A
T
\begin{aligned} \frac{\rm d\mathbf x^\mathrm T \mathbf A \mathbf x}{\rm d \mathbf x} &= (\mathbf A + \mathbf A^\mathrm T)\mathbf x\\ \frac{\rm d\mathbf x^\mathrm T \mathbf A}{\rm d \mathbf x} &= \mathbf A\\ \frac{\rm d\mathbf A \mathbf x}{\rm d \mathbf x} &= \mathbf A^\mathrm T \end{aligned}
dxdxTAxdxdxTAdxdAx=(A+AT)x=A=AT
对
L
(
w
^
)
L(\hat \mathbf w)
L(w^) 求导:
∂
L
(
w
^
)
∂
w
^
=
2
X
T
X
w
^
−
X
T
y
−
X
T
y
=
2
X
T
X
w
^
−
2
X
T
y
\frac{\partial L(\hat \mathbf w)}{\partial \hat\mathbf w} = 2\mathbf X^\mathrm T \mathbf X \hat\mathbf w - \mathbf X^\mathrm T \mathbf y - \mathbf X^\mathrm T \mathbf y \\ = 2\mathbf X^\mathrm T \mathbf X \hat\mathbf w - 2 \mathbf X^\mathrm T \mathbf y
∂w^∂L(w^)=2XTXw^−XTy−XTy=2XTXw^−2XTy
为求极值,令导数等于 0,可得
w
^
=
(
X
T
X
)
−
1
X
T
y
\hat\mathbf w = (\mathbf X^\mathrm T \mathbf X)^{-1} \mathbf X^\mathrm T \mathbf y
w^=(XTX)−1XTy