多元线性回归
类似地把
w
w
w和
b
b
b吸收入向量形式
w
^
=
(
w
;
b
)
\hat{\boldsymbol{w}}=(\boldsymbol{w} ; \boldsymbol{b})
w^=(w;b)把数据集表示为一个
m
×
(
d
+
1
)
m \times(d+1)
m×(d+1)矩阵。前d个元素对应于示例的d个属性值,最后一个元素恒置为1
X
=
(
x
11
x
12
…
x
1
d
1
x
21
x
22
…
x
2
d
1
⋮
⋮
⋱
⋮
⋮
x
m
1
x
m
2
…
x
m
d
1
)
=
(
x
1
T
1
x
2
T
1
⋮
⋮
x
m
T
1
)
=
(
x
^
1
T
x
^
2
Γ
⋮
x
^
m
T
)
\mathbf{X}=\left(\begin{array}{ccccc}{x_{11}} & {x_{12}} & {\dots} & {x_{1 d}} & {1} \\ {x_{21}} & {x_{22}} & {\dots} & {x_{2 d}} & {1} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & {\vdots} \\ {x_{m 1}} & {x_{m 2}} & {\dots} & {x_{m d}} & {1}\end{array}\right)=\left(\begin{array}{cc}{\boldsymbol{x}_{1}^{\mathrm{T}}} & {1} \\ {\boldsymbol{x}_{2}^{\mathrm{T}}} & {1} \\ {\vdots} & {\vdots} \\ {\boldsymbol{x}_{m}^{\mathrm{T}}} & {1}\end{array}\right) = \left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{\Gamma}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}}\end{array}\right)
X=⎝⎜⎜⎜⎛x11x21⋮xm1x12x22⋮xm2……⋱…x1dx2d⋮xmd11⋮1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x1Tx2T⋮xmT11⋮1⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x^1Tx^2Γ⋮x^mT⎠⎟⎟⎟⎞
w w w和b的吸收:
f ( x i ) = w 1 x i 1 + w 2 x i 2 + … + w d x i d + b f\left(\boldsymbol{x}_{i}\right)=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+b f(xi)=w1xi1+w2xi2+…+wdxid+b
将b看作 w d + 1 ⋅ 1 w_{d+1}\cdot1 wd+1⋅1令 ( w 1 , w 2 . . . w d w d + 1 ) (w_1,w_2...w_d w_{d+1}) (w1,w2...wdwd+1)记作 w d ^ \hat{w_{d}} wd^, ( x i 1 x i 2 . . . x i d 1 ) (x_{i1} x_{i2}...x_{id} 1) (xi1xi2...xid1)记作 x i ^ \hat{x_i} xi^
则
f ( x i ^ ) = w ^ T x i ^ f(\hat{x_i}) = \hat{w}^T\hat{x_{i}} f(xi^)=w^Txi^
由最小二乘法导出损失函数
E
w
^
=
∑
i
=
1
m
(
y
i
−
f
(
x
i
^
)
)
2
=
∑
i
=
1
m
(
y
i
−
w
^
T
x
i
^
)
2
E_{\hat{w}} = \sum_{i=1}^{m}(y_{i}-f(\hat{x_i}))^{2 }\\ =\sum_{i=1}^{m}(y_i-\hat{w}^T\hat{x_{i}})^2
Ew^=i=1∑m(yi−f(xi^))2=i=1∑m(yi−w^Txi^)2
E w ^ = ∑ i = 1 m ( y i − w ^ T x ^ i ) 2 = ( y 1 − w ^ T x ^ 1 ) 2 + ( y 2 − w ^ T x ^ 2 ) 2 + … + ( y m − w ^ T x ^ m ) 2 \begin{aligned} E_{\hat{\boldsymbol{w}}} &=\sum_{i=1}^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{i}\right)^{2} \\ &=\left(y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}\right)^{2}+\left(y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}\right)^{2}+\ldots+\left(y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}\right)^{2} \end{aligned} Ew^=i=1∑m(yi−w^Tx^i)2=(y1−w^Tx^1)2+(y2−w^Tx^2)2+…+(ym−w^Tx^m)2
E w ^ = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y d − w ^ T x ^ d ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y d − w ^ T x ^ d ) E_{\hat{w}}=\left(\begin{array}{cccc}{y_{1}-\hat{w}^{T} \hat{x}_{1}} & {y_{2}-\hat{w}^{T} \hat{x}_{2}} & {\cdots} & {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)\left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right) Ew^=(y1−w^Tx^1y2−w^Tx^2⋯yd−w^Tx^d)⎝⎜⎜⎜⎛y1−w^Tx^1y2−w^Tx^2⋮yd−w^Tx^d⎠⎟⎟⎟⎞
由于
w
^
T
x
i
^
\hat{w}^T\hat{x_i}
w^Txi^为标量所以对它进行转置没有影响
(
y
1
−
w
^
T
x
^
1
y
2
−
w
^
T
x
^
2
⋮
y
d
−
w
^
T
x
^
d
)
=
(
y
1
y
2
⋮
y
d
)
−
(
w
^
T
x
^
1
w
^
T
x
^
2
⋮
w
^
T
x
^
d
)
=
(
y
1
y
2
⋮
y
d
)
−
(
x
^
1
T
w
^
x
^
2
T
w
^
⋮
x
^
d
T
w
^
)
\left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{w}^{T} \hat{x}_{1}} \\ {\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{x}_{1}^{T} \hat{w}} \\ {\hat{x}_{2}^{T} \hat{w}} \\ {\vdots} \\ {\hat{x}_{d}^{T} \hat{w}}\end{array}\right)
⎝⎜⎜⎜⎛y1−w^Tx^1y2−w^Tx^2⋮yd−w^Tx^d⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛y1y2⋮yd⎠⎟⎟⎟⎞−⎝⎜⎜⎜⎛w^Tx^1w^Tx^2⋮w^Tx^d⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛y1y2⋮yd⎠⎟⎟⎟⎞−⎝⎜⎜⎜⎛x^1Tw^x^2Tw^⋮x^dTw^⎠⎟⎟⎟⎞
又因为
(
x
^
1
T
w
^
x
^
2
T
w
^
⋮
x
^
d
T
w
^
)
=
(
x
^
1
T
x
^
2
T
⋮
x
^
m
T
)
⋅
w
^
=
X
⋅
w
^
\left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{d}^{T} \hat{\boldsymbol{w}}}\end{array}\right)=\left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}}\end{array}\right) \cdot \boldsymbol{\hat { w }}=\mathbf{X} \cdot \hat{\boldsymbol{w}}
⎝⎜⎜⎜⎛x^1Tw^x^2Tw^⋮x^dTw^⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛x^1Tx^2T⋮x^mT⎠⎟⎟⎟⎞⋅w^=X⋅w^
所以
(
y
1
−
w
^
T
x
^
1
y
2
−
w
^
T
x
^
2
⋮
y
d
−
w
^
T
x
^
d
)
=
(
y
1
y
2
⋮
y
d
)
−
(
x
^
1
T
w
^
x
^
2
T
w
^
⋮
x
^
d
T
w
^
)
=
y
−
X
w
^
\left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{d}^{T} \hat{\boldsymbol{w}}}\end{array}\right)=\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}
⎝⎜⎜⎜⎛y1−w^Tx^1y2−w^Tx^2⋮yd−w^Tx^d⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛y1y2⋮yd⎠⎟⎟⎟⎞−⎝⎜⎜⎜⎛x^1Tw^x^2Tw^⋮x^dTw^⎠⎟⎟⎟⎞=y−Xw^
令
y
=
(
y
1
;
y
2
;
…
;
y
m
)
\boldsymbol{y}=\left(y_{1} ; y_{2} ; \ldots ; y_{m}\right)
y=(y1;y2;…;ym)
目标:
w
^
∗
=
arg
min
w
^
(
y
−
X
w
^
)
T
(
y
−
X
w
^
)
\hat{\boldsymbol{w}}^{*}=\underset{\hat{\boldsymbol{w}}}{\arg \min }(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{\mathrm{T}}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})
w^∗=w^argmin(y−Xw^)T(y−Xw^)
∂ E w ^ ∂ w ^ = ∂ ∂ w ^ [ ( y − X w ^ ) T ( y − X w ^ ) ] = ∂ ∂ w ^ [ ( y T − w ^ T X T ) ( y − X w ^ ) ] = ∂ ∂ w ^ [ y T y − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = ∂ ∂ w ^ [ − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] \begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{T}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\left(\boldsymbol{y}^{T}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T}\right)(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\boldsymbol{y}^{T} \boldsymbol{y}-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \end{aligned} ∂w^∂Ew^=∂w^∂[(y−Xw^)T(y−Xw^)]=∂w^∂[(yT−w^TXT)(y−Xw^)]=∂w^∂[yTy−yTXw^−w^TXTy+w^TXTXw^]=∂w^∂[−yTXw^−w^TXTy+w^TXTXw^]
矩阵微分公式:
【标量-向量】的矩阵微分公式为:
其中 x = ( x 1 , x 2 , . . . , x n ) T x = (x_1,x_2,...,x_n)^T x=(x1,x2,...,xn)T为n维向量, y y y为 x x x的n元标量函数。
∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ) \frac{\partial y}{\partial x}=\left(\begin{array}{c}{\frac{\partial y}{\partial x_{1}}} \\ {\frac{\partial y}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial y}{\partial x_{n}}}\end{array}\right) ∂x∂y=⎝⎜⎜⎜⎜⎛∂x1∂y∂x2∂y⋮∂xn∂y⎠⎟⎟⎟⎟⎞
(分母布局)【默认采用】
∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ) \frac{\partial y}{\partial x}=\left(\begin{array}{ccc}{\frac{\partial y}{\partial x_{1}}} & {\frac{\partial y}{\partial x_{2}}} & {\cdots} & {\frac{\partial y}{\partial x_{n}}}\end{array}\right) ∂x∂y=(∂x1∂y∂x2∂y⋯∂xn∂y)
(分子布局)由【标量-向量】的矩阵微分公式可推得:
∂ x T a ∂ x = ∂ a T x ∂ x = ( ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x n ) = ( a 1 a 2 ⋮ a n ) \frac{\partial \boldsymbol{x}^{T} \boldsymbol{a}}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^{T} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\begin{array}{c}{\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{1}}} \\ {\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{n}}}\end{array}\right)=\left(\begin{array}{c}{a_{1}} \\ {a_{2}} \\ {\vdots} \\ {a_{n}}\end{array}\right) ∂x∂xTa=∂x∂aTx=⎝⎜⎜⎜⎜⎛∂x1∂(a1x1+a2x2+…+anxn)∂x2∂(a1x1+a2x2+…+anxn)⋮∂xn∂(a1x1+a2x2+…+anxn)⎠⎟⎟⎟⎟⎞=⎝⎜⎜⎜⎛a1a2⋮an⎠⎟⎟⎟⎞
同理可推得:
∂ x T B x ∂ x = ( B + B T ) x \frac{\partial \boldsymbol{x}^{T} \mathbf{B} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\mathbf{B}+\mathbf{B}^{T}\right) \boldsymbol{x} ∂x∂xTBx=(B+BT)x
∂ E w ^ ∂ w ^ = ∂ ∂ w ^ [ − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = − ∂ y T X w ^ ∂ w ^ − ∂ w ^ T X T y ∂ w ^ + ∂ w ^ T X T X w ^ ∂ w ^ \begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=-\frac{\partial \boldsymbol{y}^{T} \mathbf{X} \hat{w}}{\partial \hat{\boldsymbol{w}}}-\frac{\partial \hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}}{\partial \hat{\boldsymbol{w}}}+\frac{\partial \hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}}{\partial \hat{\boldsymbol{w}}} \end{aligned} ∂w^∂Ew^=∂w^∂[−yTXw^−w^TXTy+w^TXTXw^]=−∂w^∂yTXw^−∂w^∂w^TXTy+∂w^∂w^TXTXw^
∂ E w ^ ∂ w ^ = − X T y − X T y + ( X T X + X T X ) w ^ \frac{\partial E_{\hat{w}}}{\partial \hat{w}}=-\mathbf{X}^{T} \boldsymbol{y}-\mathbf{X}^{T} \boldsymbol{y}+\left(\mathbf{X}^{T} \mathbf{X}+\mathbf{X}^{T} \mathbf{X}\right) \hat{w} ∂w^∂Ew^=−XTy−XTy+(XTX+XTX)w^
= 2 X T ( X w ^ − y ) =2 \mathbf{X}^{T}(\mathbf{X} \hat{w}-\boldsymbol{y}) =2XT(Xw^−y)
∂ E w ^ ∂ w ^ = 2 X T ( X w ^ − y ) \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}}=2 \mathbf{X}^{\mathrm{T}}(\mathbf{X} \hat{\boldsymbol{w}}-\boldsymbol{y}) ∂w^∂Ew^=2XT(Xw^−y)
凸集:设集合 D ∈ R n D \in R^{n} D∈Rn,如果对任意的 x , y ∈ D x,y \in D x,y∈D与任意的 a ∈ [ 0 , 1 ] a \in [0,1] a∈[0,1],有 a x + ( 1 − a ) y ∈ D ax+(1-a)y \in D ax+(1−a)y∈D则称集合D是凸集。
凸集的几何意义是:若两个点属于此集合,则这两点连线上的任意一点均属于此集合。
多元实值函数的一级导数:
梯度的定义:
∇ f ( x ) = ( ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ) \nabla f(\boldsymbol{x})=\left(\begin{array}{c}{\frac{\partial f(\boldsymbol{x})}{\partial x_{1}}} \\ {\frac{\partial f(\boldsymbol{x})}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(\boldsymbol{x})}{\partial x_{n}}}\end{array}\right) ∇f(x)=⎝⎜⎜⎜⎜⎛∂x1∂f(x)∂x2∂f(x)⋮∂xn∂f(x)⎠⎟⎟⎟⎟⎞
多元实值函数的二级导数:
海塞因矩阵
∇ 2 f ( x ) = [ ∂ 2 f ( x ) ∂ x 1 2 ∂ 2 f ( x ) ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x 1 ∂ x n ∂ 2 f ( x ) ∂ x 2 ∂ x 1 ∂ 2 f ( x ) ∂ x 2 2 ⋯ ∂ 2 f ( x ) ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ( x ) ∂ x n ∂ x 1 ∂ 2 f ( x ) ∂ x n ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x n 2 ] \nabla^{2} f(\boldsymbol{x})=\left[\begin{array}{cccc}{\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{n}}} \\ {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2}^{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{n}}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{1}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n}^{2}}}\end{array}\right] ∇2f(x)=⎣⎢⎢⎢⎢⎢⎡∂x12∂2f(x)∂x2∂x1∂2f(x)⋮∂xn∂x1∂2f(x)∂x1∂x2∂2f(x)∂x22∂2f(x)⋮∂xn∂x2∂2f(x)⋯⋯⋱⋯∂x1∂xn∂2f(x)∂x2∂xn∂2f(x)⋮∂xn2∂2f(x)⎦⎥⎥⎥⎥⎥⎤
若 f ( x ) f(x) f(x)对 x x x各变元的所有二阶偏导数都连续,则 ∂ 2 f ( x ) ∂ x i ∂ x j = ∂ 2 f ( x ) ∂ x j ∂ x i \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{i} \partial x_{j}}=\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{j} \partial x_{i}} ∂xi∂xj∂2f(x)=∂xj∂xi∂2f(x)此时 ∇ 2 f ( x ) \nabla^{2} f(\boldsymbol{x}) ∇2f(x)为对称矩阵。
多元实值函数凹凸性判定定理:
设 D ⊂ R n D \subset R^{n} D⊂Rn是非空开凸集,f: D ⊂ R n → R D \subset R^{n} \rightarrow R D⊂Rn→R(即n元实值函数),且f(x)在D上二阶连续可微,如果 f ( x ) f(x) f(x)的Hessian矩阵 ∇ 2 f ( x ) \nabla^{2} f(\boldsymbol{x}) ∇2f(x)在D上是正定的,则 f ( x ) f(x) f(x)是D上的严格凸函数。
凸充分性定理:
若f: R n → R R^{n} \rightarrow R Rn→R是凸函数,且 f ( x ) f(x) f(x)一阶连续可微,则 x ∗ \boldsymbol{x}^{*} x∗是全局解的充分必要条件是 ∇ f ( x ∗ ) = 0 \nabla f\left(\boldsymbol{x}^{*}\right)=\mathbf{0} ∇f(x∗)=0。
∂ 2 E w ^ ∂ w ^ ∂ w ^ T = ∂ ∂ w ^ ( ∂ E w ^ ∂ w ^ ) = ∂ ∂ w ^ [ 2 X T ( X w ^ − y ) ] = ∂ ∂ w ^ ( 2 X T X w ^ − 2 X T y ) \begin{aligned} \frac{\partial^{2} E_{\hat{w}}}{\partial \hat{w} \partial \hat{w}^{T}} &=\frac{\partial}{\partial \hat{w}}\left(\frac{\partial E_{\hat{w}}}{\partial \hat{w}}\right) \\ &=\frac{\partial}{\partial \hat{w}}\left[2 \mathbf{X}^{T}(\mathbf{X} \hat{w}-y)\right] \\ &=\frac{\partial}{\partial \hat{w}}\left(2 \mathbf{X}^{T} \mathbf{X} \hat{w}-2 \mathbf{X}^{T} \boldsymbol{y}\right) \end{aligned} ∂w^∂w^T∂2Ew^=∂w^∂(∂w^∂Ew^)=∂w^∂[2XT(Xw^−y)]=∂w^∂(2XTXw^−2XTy)
= 2 X T X =2X^TX =2XTX
当$
\mathbf{X}^{\mathrm{T}} \mathbf{X}
$为满秩矩阵或正定矩阵,则可判定为凸函数。令上式为零
w
^
∗
=
(
X
T
X
)
−
1
X
T
y
\hat{\boldsymbol{w}}^{*}=\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y}
w^∗=(XTX)−1XTy
令
x
^
i
=
(
x
i
,
1
)
\hat{\boldsymbol{x}}_{i}=\left(\boldsymbol{x}_{i}, 1\right)
x^i=(xi,1)
得到多元线性回归模型为:
f
(
x
^
i
)
=
x
^
i
T
(
X
T
X
)
−
1
X
T
y
f\left(\hat{\boldsymbol{x}}_{i}\right)=\hat{\boldsymbol{x}}_{i}^{\mathrm{T}}\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y}
f(x^i)=x^iT(XTX)−1XTy