最小二乘法求解单变量线性回归
已知数据集
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
.
.
.
(
x
N
,
y
N
)
{(x_1, y_1), (x_2, y_2)...(x_N, y_N)}
(x1,y1),(x2,y2)...(xN,yN)
假设一元线性回归方程为
y
^
=
b
∗
x
+
a
\hat y = b*x+a
y^=b∗x+a,接下来用最小二乘法求解a和b
损
失
函
数
L
(
a
,
b
)
=
Σ
i
=
1
N
(
y
^
i
−
y
i
)
2
=
Σ
i
=
1
N
(
b
∗
x
i
+
a
−
y
i
)
2
\displaystyle 损失函数\mathcal{L}(a, b) = \Sigma_{i=1}^N (\hat y_i-y_i)^2 = \Sigma_{i=1}^N (b*x_i+a-y_i)^2
损失函数L(a,b)=Σi=1N(y^i−yi)2=Σi=1N(b∗xi+a−yi)2
∂
L
∂
a
=
Σ
i
=
1
N
2
(
b
∗
x
i
+
a
−
y
i
)
=
Σ
i
=
1
N
2
b
x
i
+
2
a
N
−
Σ
i
=
1
N
y
i
=
2
b
N
x
‾
+
2
a
N
−
N
y
‾
=
2
N
(
b
x
‾
+
a
−
y
‾
)
\displaystyle \frac {\partial \mathcal{L}} {\partial a} = \Sigma_{i=1}^N 2(b*x_i+a-y_i) = \Sigma_{i=1}^N 2bx_i + 2aN - \Sigma_{i=1}^N y_i =2bN \overline x +2aN-N \overline y = 2N(b \overline x +a- \overline y)
∂a∂L=Σi=1N2(b∗xi+a−yi)=Σi=1N2bxi+2aN−Σi=1Nyi=2bNx+2aN−Ny=2N(bx+a−y)
令
∂
L
∂
a
=
0
\displaystyle \frac {\partial \mathcal{L}} {\partial a} = 0
∂a∂L=0,求得
a
=
y
‾
−
b
x
‾
a = \overline y-b \overline x
a=y−bx,带入
L
(
a
,
b
)
\mathcal{L}(a, b)
L(a,b)
L ( a , b ) = Σ i = 1 N ( b ∗ x i + y ‾ − b x ‾ − y i ) 2 = Σ i = 1 N [ b ( x i − x ‾ ) − ( y i − y ‾ ) ] 2 \displaystyle \mathcal{L}(a, b) = \Sigma_{i=1}^N (b*x_i + \overline y - b \overline x - y_i)^2 = \Sigma_{i=1}^N [b(x_i - \overline x) - (y_i - \overline y)]^2 L(a,b)=Σi=1N(b∗xi+y−bx−yi)2=Σi=1N[b(xi−x)−(yi−y)]2
∂
L
∂
b
=
Σ
i
=
1
N
2
(
x
i
−
x
‾
)
[
b
(
x
i
−
x
‾
)
−
(
y
i
−
y
‾
)
]
=
Σ
i
=
1
N
[
2
b
(
x
i
−
x
‾
)
2
−
2
(
x
i
−
x
‾
)
(
y
i
−
y
‾
)
]
=
2
b
Σ
i
=
1
N
(
x
i
−
x
‾
)
2
−
2
Σ
i
=
1
N
(
x
i
−
x
‾
)
(
y
i
−
y
‾
)
=
2
b
V
a
r
(
x
)
−
2
C
o
v
(
x
,
y
)
\displaystyle \frac {\partial \mathcal{L}} {\partial b} = \Sigma_{i=1}^N 2(x_i - \overline x )[b(x_i - \overline x) - (y_i - \overline y)] = \Sigma_{i=1}^N[2b(x_i - \overline x)^2 - 2(x_i - \overline x)(y_i - \overline y)] = 2b\Sigma_{i=1}^N (x_i - \overline x)^2 - 2\Sigma_{i=1}^N (x_i - \overline x)(y_i - \overline y) =2bVar(x) - 2Cov(x, y)
∂b∂L=Σi=1N2(xi−x)[b(xi−x)−(yi−y)]=Σi=1N[2b(xi−x)2−2(xi−x)(yi−y)]=2bΣi=1N(xi−x)2−2Σi=1N(xi−x)(yi−y)=2bVar(x)−2Cov(x,y)
令
∂
L
∂
b
=
0
\displaystyle \frac {\partial \mathcal{L}} {\partial b} = 0
∂b∂L=0,求得
b
=
C
o
v
(
x
,
y
)
V
a
r
(
x
)
\displaystyle b = \frac {Cov(x, y)} {Var(x)}
b=Var(x)Cov(x,y)
最小二乘法求解多变量线性回归
上面处理的是
x
i
,
y
i
∈
R
x_i, y_i \in R
xi,yi∈R的情况,下面讨论多变量线性回归。假设
x
i
∈
R
1
×
D
(
行向量
)
,
y
i
∈
R
,
x
∈
R
N
×
D
,
y
∈
R
N
\boldsymbol x_i \in R^{1 \times D}(\textbf {行向量}), \boldsymbol y_i \in R, \boldsymbol x \in R^{N \times D}, \boldsymbol y \in R^N
xi∈R1×D(行向量),yi∈R,x∈RN×D,y∈RN,其中N为样本总个数,D为特征维数。
假设线性回归模型为
y
^
=
x
⋅
θ
\hat \boldsymbol y = \boldsymbol x \cdot \boldsymbol \theta
y^=x⋅θ,接下来用最小二乘法求解
θ
∈
R
D
\boldsymbol \theta \in R^D
θ∈RD
损
失
函
数
L
(
θ
)
=
∣
∣
x
θ
−
y
∣
∣
2
=
∣
∣
e
∣
∣
2
=
e
T
e
,
(
e
=
x
θ
−
y
)
损失函数\mathcal{L}(\boldsymbol \theta) = || \boldsymbol x \boldsymbol \theta - \boldsymbol y||^2 = ||\boldsymbol e||^2 = \boldsymbol e^\mathrm T \boldsymbol e, (\boldsymbol e = \boldsymbol x \boldsymbol \theta - \boldsymbol y)
损失函数L(θ)=∣∣xθ−y∣∣2=∣∣e∣∣2=eTe,(e=xθ−y)
根据链式法则
∂
L
∂
θ
=
∂
L
∂
e
∂
e
∂
θ
=
2
e
T
x
=
2
(
x
θ
−
y
)
T
x
=
2
θ
T
x
T
x
−
2
y
T
x
\displaystyle \frac {\partial \mathcal{L}} {\partial \boldsymbol \theta} = \frac {\partial \mathcal{L}} {\partial \boldsymbol e} \frac {\partial \boldsymbol e} {\partial \boldsymbol \theta}= 2\boldsymbol e^\mathrm T\boldsymbol x = 2(\boldsymbol x \boldsymbol \theta - \boldsymbol y)^\mathrm T \boldsymbol x = 2\boldsymbol \theta^\mathrm T \boldsymbol x^\mathrm T \boldsymbol x - 2\boldsymbol y^\mathrm T \boldsymbol x
∂θ∂L=∂e∂L∂θ∂e=2eTx=2(xθ−y)Tx=2θTxTx−2yTx
令
∂
L
∂
θ
=
0
\displaystyle \frac {\partial \mathcal{L}} {\partial \boldsymbol \theta} = 0
∂θ∂L=0,得到
θ
T
x
T
x
=
y
T
x
\boldsymbol \theta^\mathrm T \boldsymbol x^\mathrm T \boldsymbol x = \boldsymbol y^\mathrm T \boldsymbol x
θTxTx=yTx,两边同时转置,得到
x
T
x
θ
=
x
T
y
\boldsymbol x^\mathrm T \boldsymbol x \boldsymbol \theta = \boldsymbol x^\mathrm T \boldsymbol y
xTxθ=xTy
注意
x
T
x
∈
R
D
×
D
\displaystyle \boldsymbol x^\mathrm T \boldsymbol x \in R^{D \times D}
xTx∈RD×D是一个半正定对称矩阵,可逆。因此,最终的解为
θ
=
(
x
T
x
)
−
1
x
T
y
\boldsymbol \theta = (\boldsymbol x^\mathrm T \boldsymbol x )^{-1}\boldsymbol x^\mathrm T \boldsymbol y
θ=(xTx)−1xTy
mathematics for machine learning
http://detexify.kirelabs.org/symbols.html
https://www.jianshu.com/p/6de552393933