1. 一元线性回归
给定数据集
D
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
m
,
y
m
)
}
D=\{ (\boldsymbol{x_{1}}, y_{1}), (\boldsymbol{x_{2}}, y_{2}),...,(\boldsymbol{x_{m}}, y_{m})\}
D={(x1,y1),(x2,y2),...,(xm,ym)},考虑一元回归问题:我们希望寻找到
f
(
x
)
=
ω
x
+
b
f(x) = \boldsymbol{\omega}x + \boldsymbol{b}
f(x)=ωx+b 这样一条直线,使得数据集
D
D
D上的点到直线
f
(
x
)
f(x)
f(x)的距离之和的绝对值最小(
L
1
L_{1}
L1范数),即
m
i
n
∑
i
=
1
m
∣
f
(
x
i
)
−
y
i
∣
min \sum_{i =1}^{m}|f(x_{i}) -y_{i}|
min∑i=1m∣f(xi)−yi∣。
由于函数
∣
f
(
x
)
−
y
i
∣
|f(x) -y_{i}|
∣f(x)−yi∣存在尖点,不可导,因此采用
∣
f
(
x
i
)
−
y
i
∣
2
|f(x_{i}) -y_{i}|^{2}
∣f(xi)−yi∣2来替代其原有的距离,因此,对于数据集
D
D
D,我们希望最小拟合直线与数据点之间的距离,构造如下损失函数:
L
=
m
i
n
∑
i
=
1
m
∣
f
(
x
i
)
−
y
i
∣
2
⇔
m
i
n
∑
i
=
1
m
(
f
(
x
i
)
−
y
i
)
2
=
m
i
n
∑
i
=
1
m
(
ω
x
i
+
b
−
y
i
)
2
L = min \sum_{i=1}^{m} |f(x_{i}) -y_{i}|^{2} \Leftrightarrow min \sum_{i=1}^{m} (f(x_{i}) -y_{i})^{2} = min \sum_{i=1}^{m} ( \boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}
L=mini=1∑m∣f(xi)−yi∣2⇔mini=1∑m(f(xi)−yi)2=mini=1∑m(ωxi+b−yi)2
为了方便数学运算,习惯上将损失函数
L
L
L写作:
L
=
m
i
n
1
2
∑
i
=
1
m
(
ω
x
i
+
b
−
y
i
)
2
L = min \ \frac{1}{2} \sum_{i=1}^{m} ( \boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}
L=min 21i=1∑m(ωxi+b−yi)2
(
ω
∗
,
b
∗
)
=
a
r
g
ω
,
b
m
i
n
L
=
a
r
g
ω
,
b
m
i
n
1
2
∑
i
=
1
m
(
ω
x
i
+
b
−
y
i
)
2
(\boldsymbol{\omega^{*}},\boldsymbol{b^{*}})= \underset{\omega,b}{arg} \ min \ L= \underset{\omega,b}{arg} \ min \ \frac{1}{2} \sum_{i=1}^{m} ( \boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}
(ω∗,b∗)=ω,barg min L=ω,barg min 21i=1∑m(ωxi+b−yi)2
我们从损失函数
L
L
L中取出一项
(
ω
x
i
+
b
−
y
i
)
2
(\boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}
(ωxi+b−yi)2,分别求其关于
ω
\boldsymbol{\omega}
ω和
b
\boldsymbol{b}
b的偏导,就有:
∂
(
ω
x
i
+
b
−
y
i
)
2
∂
ω
=
2
x
i
(
ω
x
i
+
b
−
y
i
)
\frac{\partial (\boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}}{\partial \boldsymbol{\boldsymbol{\omega}}} = 2x_{i}( \boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})
∂ω∂(ωxi+b−yi)2=2xi(ωxi+b−yi)
∂ ( ω x i + b − y i ) 2 ∂ b = 2 ( ω x i + b − y i ) \frac{\partial (\boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i})^{2}}{\partial \boldsymbol{b}} = 2( \boldsymbol{\omega}x_{i} + \boldsymbol{b} -y_{i}) ∂b∂(ωxi+b−yi)2=2(ωxi+b−yi)
因此就有
∂
L
∂
ω
=
1
2
[
2
x
1
(
ω
x
1
+
b
−
y
1
)
+
2
x
2
(
ω
x
2
+
b
−
y
2
)
+
,
.
.
.
,
+
2
x
m
(
ω
x
m
+
b
−
y
m
)
]
\frac{\partial L}{\partial \boldsymbol{\omega}} = \frac{1}{2} [ 2x_{1}( \boldsymbol{\omega}x_{1} + \boldsymbol{b} -y_{1}) + 2x_{2}( \boldsymbol{\omega}x_{2} + \boldsymbol{b} -y_{2}) +,...,+ 2x_{m}( \boldsymbol{\omega}x_{m} + \boldsymbol{b} -y_{m})]
∂ω∂L=21[2x1(ωx1+b−y1)+2x2(ωx2+b−y2)+,...,+2xm(ωxm+b−ym)]
整理一下:
∂
L
∂
ω
=
ω
(
x
1
2
+
x
2
2
+
,
.
.
.
,
+
x
m
2
)
+
b
(
x
1
+
x
2
+
,
.
.
.
,
+
x
m
)
−
(
x
1
y
1
+
x
2
y
2
+
,
.
.
.
,
+
x
m
y
m
)
\frac{\partial L}{\partial \boldsymbol{\omega}} = \boldsymbol{\omega}(x_{1}^{2} + x_{2}^{2} +,...,+x_{m}^{2}) + \boldsymbol{b}(x_{1} + x{2} +,...,+x_{m}) - (x_{1}y_{1} + x_{2}y_{2} + ,...,+x_{m}y_{m})
∂ω∂L=ω(x12+x22+,...,+xm2)+b(x1+x2+,...,+xm)−(x1y1+x2y2+,...,+xmym)
∂ L ∂ ω = ω ∑ i = 1 m x i 2 + b ∑ i = 1 m x i − ∑ i = 1 m x i y i \frac{\partial L}{\partial \boldsymbol{\omega}} = \boldsymbol{\omega}\sum_{i =1}^{m}x_{i}^{2} + \boldsymbol{b}\sum_{i =1}^{m}x_{i} -\sum_{i =1}^{m}x_{i}y_{i} ∂ω∂L=ωi=1∑mxi2+bi=1∑mxi−i=1∑mxiyi
同样的,我们对 b \boldsymbol{b} b求偏导就有:
∂ L ∂ b = 1 2 [ 2 ( ω x 1 + b − y 1 ) + 2 ( ω x 2 + b − y 2 ) + , . . . , + 2 ( ω x m + b − y m ) ] \frac{\partial L}{\partial \boldsymbol{b}} = \frac{1}{2} [ 2( \boldsymbol{\omega}x_{1} + \boldsymbol{b} -y_{1}) + 2( \boldsymbol{\omega}x_{2} + \boldsymbol{b} -y_{2}) +,...,+ 2( \boldsymbol{\omega}x_{m} + \boldsymbol{b} -y_{m})] ∂b∂L=21[2(ωx1+b−y1)+2(ωx2+b−y2)+,...,+2(ωxm+b−ym)]
整理一下:
∂
L
∂
b
=
ω
(
x
1
+
x
2
+
,
.
.
.
,
+
x
m
)
+
b
(
1
+
1
+
,
.
.
.
,
+
1
)
−
(
y
1
+
y
2
+
,
.
.
.
,
+
y
m
)
\frac{\partial L}{\partial \boldsymbol{b}} = \boldsymbol{\omega}(x_{1} + x_{2} +,...,+x_{m}) + \boldsymbol{b}(1 + 1 +,...,+1) - (y_{1} + y_{2} + ,...,+y_{m})
∂b∂L=ω(x1+x2+,...,+xm)+b(1+1+,...,+1)−(y1+y2+,...,+ym)
∂ L ∂ b = ω ∑ i = 1 m x i + b ∑ i = 1 m 1 − ∑ i = 1 m y i \frac{\partial L}{\partial \boldsymbol{b}} = \boldsymbol{\omega}\sum_{i =1}^{m}x_{i} + \boldsymbol{b}\sum_{i =1}^{m}1 - \sum_{i =1}^{m}y_{i} ∂b∂L=ωi=1∑mxi+bi=1∑m1−i=1∑myi
分别令其偏导等于0,联立方程组求解可得:
{
∂
L
∂
ω
=
ω
∑
i
=
1
m
x
i
2
+
b
∑
i
=
1
m
x
i
−
∑
i
=
1
m
x
i
y
i
=
0
(1)
∂
L
∂
b
=
ω
∑
i
=
1
m
x
i
+
b
∑
i
=
1
m
1
−
∑
i
=
1
m
y
i
=
0
(2)
\begin{cases} &\frac{\partial L}{\partial \boldsymbol{\omega}} = \boldsymbol{\omega}\sum_{i =1}^{m}x_{i}^{2} + \boldsymbol{b}\sum_{i =1}^{m}x_{i} -\sum_{i =1}^{m}x_{i}y_{i} = 0 \ \ \ \text{(1)}\\ &\frac{\partial L}{\partial \boldsymbol{b}} = \boldsymbol{\omega}\sum_{i =1}^{m}x_{i} + \boldsymbol{b}\sum_{i =1}^{m}1 - \sum_{i =1}^{m}y_{i} =0 \ \ \ \text{(2)} \end{cases}
{∂ω∂L=ω∑i=1mxi2+b∑i=1mxi−∑i=1mxiyi=0 (1)∂b∂L=ω∑i=1mxi+b∑i=1m1−∑i=1myi=0 (2)
通过
(
2
)
(2)
(2)式我们可以求得:
b
=
∑
i
=
1
m
y
i
−
ω
∑
i
=
1
m
x
i
∑
i
=
1
m
1
=
∑
i
=
1
m
(
y
i
−
ω
x
i
)
m
(3)
\boldsymbol{b} = \frac{ \sum_{i =1}^{m}y_{i} - \boldsymbol{\omega}\sum_{i =1}^{m}x_{i}}{\sum_{i=1}^{m} 1} = \frac{\sum_{i=1}^{m} (y_{i} - \boldsymbol{\omega x_{i}}) }{m} \ \ \ \text{(3)}
b=∑i=1m1∑i=1myi−ω∑i=1mxi=m∑i=1m(yi−ωxi) (3)
将
(
3
)
(3)
(3)式代入
(
1
)
(1)
(1)式可以得到:
ω
∑
i
=
1
m
x
i
2
+
∑
i
=
1
m
(
y
i
−
ω
x
i
)
m
∑
i
=
1
m
x
i
−
∑
i
=
1
m
x
i
y
i
=
0
\boldsymbol{\omega}\sum_{i =1}^{m}x_{i}^{2} + \frac{\sum_{i=1}^{m} (y_{i} - \boldsymbol{\omega x_{i}}) }{m} \sum_{i =1}^{m}x_{i} -\sum_{i =1}^{m}x_{i}y_{i} = 0
ωi=1∑mxi2+m∑i=1m(yi−ωxi)i=1∑mxi−i=1∑mxiyi=0
将
ω
\boldsymbol{\omega}
ω提出来,就有:
ω
(
∑
i
=
1
m
x
i
2
−
1
m
(
∑
i
=
1
m
x
i
)
(
∑
i
=
1
m
x
i
)
)
=
∑
i
=
1
m
x
i
y
i
−
1
m
(
∑
i
=
1
m
y
i
)
(
∑
i
=
1
m
x
i
)
\boldsymbol{\omega}(\sum_{i =1}^{m}x_{i}^{2} - \frac{1}{m}(\sum_{i =1}^{m}x_{i})(\sum_{i =1}^{m}x_{i})) = \sum_{i =1}^{m}x_{i}y_{i} - \frac{1}{m}(\sum_{i=1}^{m} y_{i})(\sum_{i =1}^{m}x_{i})
ω(i=1∑mxi2−m1(i=1∑mxi)(i=1∑mxi))=i=1∑mxiyi−m1(i=1∑myi)(i=1∑mxi)
因此可以解得:
ω
=
∑
i
=
1
m
x
i
y
i
−
1
m
(
∑
i
=
1
m
y
i
)
(
∑
i
=
1
m
x
i
)
∑
i
=
1
m
x
i
2
−
1
m
(
∑
i
=
1
m
x
i
)
(
∑
i
=
1
m
x
i
)
\boldsymbol{\omega} = \frac{\sum_{i =1}^{m}x_{i}y_{i} - \frac{1}{m}(\sum_{i=1}^{m} y_{i})(\sum_{i =1}^{m}x_{i})}{\sum_{i =1}^{m}x_{i}^{2} - \frac{1}{m}(\sum_{i =1}^{m}x_{i})(\sum_{i =1}^{m}x_{i})}
ω=∑i=1mxi2−m1(∑i=1mxi)(∑i=1mxi)∑i=1mxiyi−m1(∑i=1myi)(∑i=1mxi)
如果令 x ˉ = 1 m ∑ i = 1 m x i \bar{x} =\frac{1}{m} \sum_{i=1}^{m}x_{i} xˉ=m1∑i=1mxi
则可以得到:
ω
=
∑
i
=
1
m
y
i
(
x
i
−
x
ˉ
)
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
x
i
x
ˉ
\boldsymbol{\omega} =\frac{\sum_{i=1}^{m} y_{i}(x_{i} - \bar{x})}{\sum_{i =1}^{m}x_{i}^{2} - \sum_{i=1}^{m}x_{i} \bar{x}}
ω=∑i=1mxi2−∑i=1mxixˉ∑i=1myi(xi−xˉ)
2.多元线性回归
给定数据集
D
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
m
,
y
m
)
}
D=\{ (\boldsymbol{x_{1}}, y_{1}), (\boldsymbol{x_{2}}, y_{2}),...,(\boldsymbol{x_{m}}, y_{m})\}
D={(x1,y1),(x2,y2),...,(xm,ym)},不同于一元线性回归,多元线性回归中,每一个
x
i
\boldsymbol{x_{i}}
xi都是一个
n
n
n维特征向量,即
x
i
=
(
x
i
1
,
x
i
2
,
x
i
3
,
.
.
.
,
x
i
j
,
.
.
.
,
x
i
n
)
\boldsymbol{x_{i}} = (x_{i}^{1},x_{i}^{2},x_{i}^{3},...,x_{i}^{j},...,x_{i}^{n})
xi=(xi1,xi2,xi3,...,xij,...,xin),其中
x
i
j
x_{i}^{j}
xij表示第
i
i
i个样本的第
j
j
j个特征值,如果我们将所有的样本特征值放入一个矩阵,将会得到如下:
[
x
1
x
2
.
.
.
x
m
]
m
×
1
=
[
x
1
1
x
1
2
,
.
.
.
,
x
1
n
x
2
1
x
2
2
,
.
.
.
,
x
2
n
.
.
.
.
.
.
,
.
.
.
,
.
.
.
x
m
1
x
m
2
,
.
.
.
,
x
m
n
]
m
×
n
\begin{bmatrix} \boldsymbol{x_{1}} \\ \boldsymbol{x_{2}} \\ ...\\ \boldsymbol{x_{m}} \end{bmatrix}_{m \times 1} = \begin{bmatrix} x_{1}^{1}& x_{1}^{2} & ,...,& x_{1}^{n} \\ x_{2}^{1}& x_{2}^{2} & ,...,& x_{2}^{n} \\ ...& ... & ,...,& ... \\ x_{m}^{1}& x_{m}^{2} & ,...,& x_{m}^{n} \end{bmatrix}_{m \times n}
⎣⎢⎢⎡x1x2...xm⎦⎥⎥⎤m×1=⎣⎢⎢⎡x11x21...xm1x12x22...xm2,...,,...,,...,,...,x1nx2n...xmn⎦⎥⎥⎤m×n
因此我们希望寻找到这样一条直线去拟合或者说靠近
y
y
y:
f
(
x
)
=
b
+
ω
1
x
1
+
ω
2
x
2
+
,
.
.
.
,
+
ω
n
x
n
≃
y
f(x) =\boldsymbol{b} + \omega_{1} x^{1} + \omega_{2} x^{2}+,...,+\omega_{n} x^{n} \simeq y
f(x)=b+ω1x1+ω2x2+,...,+ωnxn≃y
为了方便矩阵表示,不妨令
b
=
ω
0
\boldsymbol{b} = \omega_{0}
b=ω0,
x
i
0
=
1
x_{i}^{0} = 1
xi0=1则就有:
f
(
x
i
)
=
ω
0
x
i
0
+
ω
1
x
i
1
+
ω
2
x
i
2
+
,
.
.
.
,
+
ω
n
x
i
n
≃
y
i
f(x_{i}) =\omega_{0} x_{i}^{0} + \omega_{1} x_{i}^{1} + \omega_{2} x_{i}^{2}+,...,+\omega_{n} x_{i}^{n} \simeq y_{i}
f(xi)=ω0xi0+ω1xi1+ω2xi2+,...,+ωnxin≃yi
为了方便数学运算,我们将其表示为矩阵形式:
f
(
x
i
)
=
[
x
1
0
x
1
1
x
1
2
,
.
.
.
,
x
1
n
x
2
0
x
2
1
x
2
2
,
.
.
.
,
x
2
n
.
.
.
.
.
.
.
.
.
,
.
.
.
,
.
.
.
x
m
0
x
m
1
x
m
2
,
.
.
.
,
x
m
n
]
m
×
(
n
+
1
)
×
[
ω
0
ω
1
.
.
.
ω
n
]
(
n
+
1
)
×
1
≃
[
y
1
y
2
.
.
.
y
m
]
m
×
1
f(x_{i}) = \begin{bmatrix} x_{1}^{0}&x_{1}^{1}& x_{1}^{2} & ,...,& x_{1}^{n} \\ x_{2}^{0}&x_{2}^{1}& x_{2}^{2} & ,...,& x_{2}^{n} \\ ...&...& ... & ,...,& ... \\ x_{m}^{0}&x_{m}^{1}& x_{m}^{2} & ,...,& x_{m}^{n} \end{bmatrix}_{m \times (n+1)} \times \begin{bmatrix} \omega_{0} \\ \omega_{1} \\ ...\\ \omega_{n} \end{bmatrix}_{(n+1) \times 1} \simeq \begin{bmatrix} y_{1} \\ y_{2} \\ ...\\ y_{m} \end{bmatrix}_{m \times 1}
f(xi)=⎣⎢⎢⎡x10x20...xm0x11x21...xm1x12x22...xm2,...,,...,,...,,...,x1nx2n...xmn⎦⎥⎥⎤m×(n+1)×⎣⎢⎢⎡ω0ω1...ωn⎦⎥⎥⎤(n+1)×1≃⎣⎢⎢⎡y1y2...ym⎦⎥⎥⎤m×1
其中
x
i
0
=
0
x_{i}^{0} =0
xi0=0,分别令三个字母
X
,
ω
,
Y
X,\boldsymbol{\omega},Y
X,ω,Y,上式则表示为如下:
f
(
x
i
)
=
X
ω
≃
Y
f(x_{i}) = X \boldsymbol{\omega} \simeq Y
f(xi)=Xω≃Y
与一元线性回归类似,我们构造一个损失函数
L
L
L,并将其最小化以求得
ω
\boldsymbol{\omega}
ω:
ω
∗
=
a
r
g
ω
m
i
n
L
\boldsymbol{\omega^{*}} = \underset{\boldsymbol{\omega}}{arg} \ min \ L
ω∗=ωarg min L
参考一元线性回归:
L
=
∣
∣
X
ω
−
Y
∣
∣
2
=
(
X
ω
−
Y
)
T
(
X
ω
−
Y
)
L = ||X \boldsymbol{\omega} - Y||^{2} = (X \boldsymbol{\omega} - Y)^{T}(X \boldsymbol{\omega} - Y)
L=∣∣Xω−Y∣∣2=(Xω−Y)T(Xω−Y)
将
(
X
ω
−
Y
)
T
(
X
ω
−
Y
)
(X \boldsymbol{\omega} - Y)^{T}(X \boldsymbol{\omega} - Y)
(Xω−Y)T(Xω−Y)展开:
L
=
(
ω
T
X
T
−
Y
T
)
(
X
ω
−
Y
)
L = (\boldsymbol{\omega}^{T} X^{T} - Y^{T})(X \boldsymbol{\omega} - Y)
L=(ωTXT−YT)(Xω−Y)
L = ω T X T X ω − ω T X T Y − Y T X ω + Y T Y L = \boldsymbol{\omega}^{T} X^{T} X \boldsymbol{\omega} - \boldsymbol{\omega}^{T} X^{T} Y - Y^{T} X \boldsymbol{\omega} + Y^{T}Y L=ωTXTXω−ωTXTY−YTXω+YTY
对于矩阵相乘,有如下说明:
X
m
×
(
n
+
1
)
X_{m \times (n+1)}
Xm×(n+1),那么
X
(
n
+
1
)
×
m
T
X_{(n+1) \times m}^{T}
X(n+1)×mT
ω
(
n
+
1
)
×
1
\boldsymbol{\omega}_{(n+1) \times 1}
ω(n+1)×1,那么
ω
1
×
(
n
+
1
)
T
\boldsymbol{\omega}_{1 \times (n+1)}^{T}
ω1×(n+1)T
Y
m
×
1
Y_{m \times 1}
Ym×1,那么
Y
1
×
m
T
Y_{1 \times m}^{T}
Y1×mT
因此,对于 ω T X T Y \boldsymbol{\omega}^{T} X^{T} Y ωTXTY 和 Y T X ω Y^{T} X \boldsymbol{\omega} YTXω就有:
ω 1 × ( n + 1 ) T X ( n + 1 ) × m T Y m × 1 = ( ω T X T Y ) 1 × 1 \boldsymbol{\omega}_{1 \times (n+1)}^{T} X_{(n+1) \times m}^{T} Y_{m \times 1} = (\boldsymbol{\omega}^{T} X^{T} Y)_{1 \times 1} ω1×(n+1)TX(n+1)×mTYm×1=(ωTXTY)1×1,是一个标量(或者说一个数)
Y 1 × m T X m × ( n + 1 ) ω ( n + 1 ) × 1 = ( Y T X ω ) 1 × 1 Y_{1 \times m}^{T} X_{m \times (n+1)} \boldsymbol{\omega}_{(n+1) \times 1} = (Y^{T} X \boldsymbol{\omega})_{1 \times 1} Y1×mTXm×(n+1)ω(n+1)×1=(YTXω)1×1,也是一个标量(或者说一个数)
同样的, Y 1 × m T Y m × 1 = ( Y T Y ) 1 × 1 Y_{1 \times m}^{T}Y_{m \times 1} = (Y^{T}Y)_{1 \times 1} Y1×mTYm×1=(YTY)1×1,也是一个标量(或者说一个数)
据此,就可以将
L
L
L化简为如下:
L
=
ω
T
X
T
X
ω
−
2
ω
T
X
T
Y
+
Y
T
Y
L = \boldsymbol{\omega}^{T} X^{T} X \boldsymbol{\omega} - 2\boldsymbol{\omega}^{T} X^{T} Y + Y^{T}Y
L=ωTXTXω−2ωTXTY+YTY
对
ω
\boldsymbol{\omega}
ω求偏导:
∂
L
∂
ω
=
∂
(
ω
T
X
T
X
ω
)
∂
ω
−
2
X
T
Y
\frac{\partial L}{\partial \boldsymbol{\omega}} = \frac{\partial (\boldsymbol{\omega}^{T} X^{T} X \boldsymbol{\omega})}{\partial \boldsymbol{\omega}} - 2X^{T}Y
∂ω∂L=∂ω∂(ωTXTXω)−2XTY
由于
d
(
U
T
V
)
d
x
=
d
U
T
d
x
V
+
d
V
T
d
x
U
⇒
d
(
ω
T
ω
)
d
ω
=
d
ω
T
d
ω
ω
+
d
ω
T
d
ω
ω
=
2
ω
\frac{d(U^{T}V)}{dx} = \frac{d U^{T}}{dx} V + \frac{d V^{T}}{dx} U \Rightarrow \frac{d (\boldsymbol{\omega}^{T}\boldsymbol{\omega})}{d\boldsymbol{\omega}} = \frac{d \boldsymbol{\omega}^{T}}{d\boldsymbol{\omega}} \boldsymbol{\omega} + \frac{d \boldsymbol{\omega}^{T}}{d\boldsymbol{\omega}} \boldsymbol{\omega} = 2\boldsymbol{\omega}
dxd(UTV)=dxdUTV+dxdVTU⇒dωd(ωTω)=dωdωTω+dωdωTω=2ω
那么,如果给定
A
A
A是一个方阵,就有:
d
(
ω
T
A
ω
)
d
ω
=
d
ω
T
d
ω
A
ω
+
d
(
ω
T
A
T
)
d
ω
ω
=
(
A
+
A
T
)
ω
\frac{d (\boldsymbol{\omega}^{T} A \boldsymbol{\omega})}{d\boldsymbol{\omega}} = \frac{d \boldsymbol{\omega}^{T}}{d\boldsymbol{\omega}} A \boldsymbol{\omega} + \frac{d( \boldsymbol{\omega}^{T} A^{T})}{d\boldsymbol{\omega}} \boldsymbol{\omega} = (A + A^{T})\boldsymbol{\omega}
dωd(ωTAω)=dωdωTAω+dωd(ωTAT)ω=(A+AT)ω
由于
(
X
(
n
+
1
)
×
m
T
X
m
×
(
n
+
1
)
)
=
(
X
T
X
)
(
n
+
1
)
×
(
n
+
1
)
(X_{(n+1) \times m}^{T} X_{m \times (n+1)}) = (X^{T}X)_{(n+1) \times (n+1)}
(X(n+1)×mTXm×(n+1))=(XTX)(n+1)×(n+1),是一个方阵,不妨令
A
=
X
T
X
A = X^{T}X
A=XTX,则:
∂
(
ω
T
X
T
X
ω
)
∂
ω
=
(
X
T
X
+
(
X
T
X
)
T
)
ω
=
2
X
T
X
ω
\frac{\partial (\boldsymbol{\omega}^{T} X^{T} X \boldsymbol{\omega})}{\partial \boldsymbol{\omega}} = (X^{T}X + (X^{T}X)^{T}) \boldsymbol{\omega} = 2X^{T}X\boldsymbol{\omega}
∂ω∂(ωTXTXω)=(XTX+(XTX)T)ω=2XTXω
所以:
∂
L
∂
ω
=
2
X
T
X
ω
−
2
X
T
Y
\frac{\partial L}{\partial \boldsymbol{\omega}} = 2X^{T}X\boldsymbol{\omega} - 2X^{T}Y
∂ω∂L=2XTXω−2XTY
令
∂
L
∂
ω
=
0
\frac{\partial L}{\partial \boldsymbol{\omega}} = 0
∂ω∂L=0,求出
ω
∗
\boldsymbol{\omega^{*}}
ω∗得:
ω
∗
=
(
X
T
X
)
−
1
X
T
Y
\boldsymbol{\omega^{*}} = (X^{T} X)^{-1} X^{T} Y
ω∗=(XTX)−1XTY