普通最小二乘法线性回归
若数据集
D
D
D由
n
n
n个属性描述,则线性回归的假设函数为:
h
w
,
b
(
x
)
=
∑
i
=
1
n
w
i
x
i
+
b
=
w
T
x
+
b
h_{\boldsymbol{w}, b}(\boldsymbol{x})=\sum_{i=1}^{n} w_{i} x_{i}+b=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b
hw,b(x)=i=1∑nwixi+b=wTx+b
其中,
w
∈
R
n
\boldsymbol{w}\in \mathbb{R}^n
w∈Rn与
b
∈
R
b\in \mathbb{R}
b∈R为模型参数。
为了方便,我们通常将
b
b
b纳入权向量
w
\boldsymbol{w}
w,作为
w
0
w_0
w0,同时为输入向量
x
\boldsymbol{x}
x添加一个常数1,作为
x
0
x_0
x0.
w
=
(
b
,
w
1
,
w
2
,
…
w
n
)
T
x
=
(
1
,
x
1
,
x
2
,
…
x
n
)
T
\begin{array}{c}\boldsymbol{w}=\left(b, w_{1}, w_{2}, \ldots w_{n}\right)^{\mathrm{T}} \\\boldsymbol{x}=\left(1, x_{1}, x_{2}, \ldots x_{n}\right)^{\mathrm{T}}\end{array}
w=(b,w1,w2,…wn)Tx=(1,x1,x2,…xn)T
此时,假设函数为:
h
w
(
x
)
=
∑
i
=
0
n
w
i
x
i
=
w
T
x
h_{\boldsymbol{\boldsymbol{w}}}(\boldsymbol{x})=\sum_{i=0}^{n} w_{i} x_{i}=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}
hw(x)=i=0∑nwixi=wTx
其中, w ∈ R n + 1 \boldsymbol{w}\in \mathbb{R}^{n+1} w∈Rn+1,通过训练确定模型参数 w \boldsymbol{w} w后,便可使用模型对新的输入实例进行预测。
使用均方误差(MSE)作为损失函数,假设训练集
D
D
D有
m
m
m个样本,均方误差损失函数定义为
J
(
w
)
=
1
2
m
∑
i
=
1
m
(
h
w
(
x
i
)
−
y
i
)
2
=
1
2
m
∑
i
=
1
m
(
w
T
x
−
y
i
)
2
\begin{aligned}J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\boldsymbol{w}}\left(\boldsymbol{x}_{i}\right)-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}-y_{i}\right)^{2}\end{aligned}
J(w)=2m1i=1∑m(hw(xi)−yi)2=2m1i=1∑m(wTx−yi)2
损失函数 J ( w ) J(w) J(w)最小值点是其极值点,可先求 J ( w ) J(w) J(w)对 w w w的梯度并令其为0,再通过解方程求得。
计算
J
(
w
)
J(\boldsymbol{w})
J(w)的梯度:
∇
J
(
w
)
=
1
2
m
∑
i
=
1
m
∂
∂
w
(
w
T
x
i
−
y
i
)
2
=
1
2
m
∑
i
=
1
m
2
(
w
T
x
i
−
y
i
)
∂
∂
w
(
w
T
x
i
−
y
i
)
=
1
m
∑
i
=
1
m
(
w
T
x
i
−
y
i
)
x
i
\begin{aligned}\nabla J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m} \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m} 2\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \\&=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i}\end{aligned}
∇J(w)=2m1i=1∑m∂w∂(wTxi−yi)2=2m1i=1∑m2(wTxi−yi)∂w∂(wTxi−yi)=m1i=1∑m(wTxi−yi)xi
以上公式使用矩阵运算描述形式更为简洁,设:
X
=
[
1
,
x
11
,
x
12
…
x
1
n
1
,
x
21
x
22
…
x
2
n
⋮
⋮
⋮
⋱
⋮
1
,
x
m
1
x
m
2
…
x
m
n
]
=
[
x
1
T
x
2
T
⋮
x
m
T
]
\boldsymbol{X}=\left[\begin{array}{ccccc}1, & x_{11}, & x_{12} & \ldots & x_{1 n} \\1, & x_{21} & x_{22} & \ldots & x_{2 n} \\\vdots & \vdots & \vdots & \ddots & \vdots \\1, & x_{m 1} & x_{m 2} & \ldots & x_{m n}\end{array}\right]=\left[\begin{array}{c}\boldsymbol{x}_{1}^{\mathrm{T}} \\\boldsymbol{x}_{2}^{\mathrm{T}} \\\vdots \\\boldsymbol{x}_{m}^{\mathrm{T}}\end{array}\right]
X=⎣⎢⎢⎢⎡1,1,⋮1,x11,x21⋮xm1x12x22⋮xm2……⋱…x1nx2n⋮xmn⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡x1Tx2T⋮xmT⎦⎥⎥⎥⎤
y
=
[
y
1
y
2
⋮
y
m
]
\boldsymbol{y}=\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_{m}\end{array}\right]
y=⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤
w = [ b w 1 w 2 ⋮ w n ] \boldsymbol{w}=\left[\begin{array}{c}b \\w_{1} \\w_{2} \\\vdots \\w_{n}\end{array}\right] w=⎣⎢⎢⎢⎢⎢⎡bw1w2⋮wn⎦⎥⎥⎥⎥⎥⎤
那么,梯度计算公式可写为:
∇
J
(
w
)
=
1
m
∑
i
=
1
m
(
w
T
x
i
−
y
i
)
x
i
\nabla J(\boldsymbol{w})=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i}
∇J(w)=m1i=1∑m(wTxi−yi)xi
=
[
x
1
,
x
2
,
…
,
x
m
]
[
w
T
x
1
−
y
1
w
T
x
2
−
y
2
⋮
w
T
x
m
−
y
m
]
=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1}-y_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2}-y_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}-y_{m}\end{array}\right]
=[x1,x2,…,xm]⎣⎢⎢⎢⎡wTx1−y1wTx2−y2⋮wTxm−ym⎦⎥⎥⎥⎤
=
[
x
1
,
x
2
,
…
,
x
m
]
(
[
w
T
x
1
w
T
x
2
⋮
w
T
x
m
]
−
[
y
1
y
2
⋮
y
m
]
)
=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left(\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}\end{array}\right]-\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_m\end{array}\right]\right)
=[x1,x2,…,xm]⎝⎜⎜⎜⎛⎣⎢⎢⎢⎡wTx1wTx2⋮wTxm⎦⎥⎥⎥⎤−⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤⎠⎟⎟⎟⎞
=
1
m
X
T
(
X
w
−
y
)
=\frac{1}{m} \boldsymbol{X}^{\mathrm{T}}(\boldsymbol{X} \boldsymbol{w}-\boldsymbol{y})
=m1XT(Xw−y)
令梯度为0,解得:
w
^
=
(
X
T
X
)
−
1
X
T
y
\boldsymbol{\hat{w}}=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{y}
w^=(XTX)−1XTy
w ^ \boldsymbol{\hat{w}} w^即为使得损失函数(均方误差)最小的 w \boldsymbol{w} w。以上求解最优 w \boldsymbol{w} w的方法被称为普通最小二乘法(Ordinary Least Squares,OLS)。
import numpy as np
class OLSLinearRession:
def _ols(self, X, y):
'''普通最小二乘法估算w'''
tmp = np.linalg.inv(np.matmul(X.T, X))
tmp = np.matmul(tmp, X.T)
w = np.matmul(tmp, y)
return w
def _preprocess_data(self, X):
'''数据预处理:添加x0=1'''
m, n = X.shape
X_ = np.ones((m, n + 1))
X_[:, 1:] = X
return X_
def train(self, X, y):
'''训练模型'''
X = self._preprocess_data(X)
self.w = self._ols(X, y)
def predict(self, X):
'''预测'''
X = self._preprocess_data(X)
y = np.matmul(X, self.w)
return y