一、多元线性回归Tips
1.凸集
- 凸集定义:设集合 D ∈ R n D\in R^{n} D∈Rn,如果对任意的 x , y ∈ D x,y\in D x,y∈D与任意的 a ∈ [ 0 , 1 ] a\in \left[0,1\right] a∈[0,1],有 a x + ( 1 − a ) y ∈ D ax+\left(1-a \right)y \in D ax+(1−a)y∈D,则称集合 D D D是凸集。
- 凸集的几何意义是:若两个点属于此集合,则这两点连线上的任意一点均属于此集合。
- 如上两图所示,左边是凸集、右边是非凸集
2.梯度
- 梯度定义:设
n
n
n元函数
f
(
x
)
f\left(x\right)
f(x)对自变量
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T
x=\left(x_{1},x_{2},\cdots ,x_{n} \right)^T
x=(x1,x2,⋯,xn)T的各分量
x
i
x_{i}
xi的偏导数
∂
f
(
x
)
∂
x
i
(
i
=
1
,
2
,
⋯
,
n
)
\frac{\partial f\left (x \right )}{\partial x_{i}}\left ( i=1,2,\cdots,n\right )
∂xi∂f(x)(i=1,2,⋯,n)都存在,则称函数
f
(
x
)
f\left(x\right)
f(x)在
x
x
x处一阶可导,并称向量
▽
f
(
x
)
=
(
∂
f
(
x
)
∂
x
1
∂
f
(
x
)
∂
x
2
⋯
∂
f
(
x
)
∂
x
n
)
\bigtriangledown f\left ( x\right )= \begin{pmatrix} \frac{\partial f\left ( x\right )}{\partial x_{1}}\\ \frac{\partial f\left ( x\right )}{\partial x_{2}}\\ \cdots \\ \frac{\partial f\left ( x\right )}{\partial x_{n}}\\ \end{pmatrix}
▽f(x)=⎝⎜⎜⎜⎛∂x1∂f(x)∂x2∂f(x)⋯∂xn∂f(x)⎠⎟⎟⎟⎞
为函数 f ( x ) f\left(x\right) f(x)在 x x x处的一阶导数或者梯度,记为 ▽ f ( x ) \bigtriangledown f \left (x \right ) ▽f(x)(列向量)
3.Hessian矩阵(海塞矩阵)
- Hessian(海塞)矩阵定义:设
n
n
n元函数
f
(
x
)
f\left(x\right)
f(x)对自变量
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T
x=\left(x_{1},x_{2},\cdots ,x_{n} \right)^T
x=(x1,x2,⋯,xn)T的各个分量
x
i
x_{i}
xi的二阶偏导数
∂
2
f
(
x
)
∂
x
i
2
(
i
=
1
,
2
,
⋯
,
n
;
i
=
1
,
2
,
⋯
,
n
)
\frac{\partial^2 f\left (x \right )}{\partial x_{i}^2}\left ( i=1,2,\cdots,n;i=1,2,\cdots,n\right )
∂xi2∂2f(x)(i=1,2,⋯,n;i=1,2,⋯,n)都存在,则称函数
f
(
x
)
f\left(x\right)
f(x)在
x
x
x处二阶可导,并称矩阵
▽
2
f
(
x
)
=
(
∂
2
f
(
x
)
∂
x
1
2
∂
2
f
(
x
)
∂
x
1
∂
x
2
⋯
∂
2
f
(
x
)
∂
x
1
∂
x
n
∂
2
f
(
x
)
∂
x
2
∂
x
1
∂
2
f
(
x
)
∂
x
2
2
⋯
∂
2
f
(
x
)
∂
x
2
∂
x
n
⋯
⋯
⋱
⋯
∂
2
f
(
x
)
∂
x
n
∂
x
1
∂
2
f
(
x
)
∂
x
n
∂
x
2
⋯
∂
2
f
(
x
)
∂
x
n
2
)
\bigtriangledown^2 f\left ( x\right )= \begin{pmatrix} \frac{\partial^2 f\left ( x\right )}{\partial x_{1}^2}& \frac{\partial^2 f\left ( x\right )}{\partial x_{1}\partial x_{2}}&\cdots&\frac{\partial^2 f\left ( x\right )}{\partial x_{1}\partial x_{n}}\\ \frac{\partial^2 f\left ( x\right )}{\partial x_{2}\partial x_{1}}& \frac{\partial^2 f\left ( x\right )}{\partial x_{2}^2}&\cdots&\frac{\partial^2 f\left ( x\right )}{\partial x_{2}\partial x_{n}}\\ \cdots&\cdots&\ddots &\cdots& \\ \frac{\partial^2 f\left ( x\right )}{\partial x_{n}\partial x_{1}}& \frac{\partial^2 f\left ( x\right )}{\partial x_{n}\partial x_{2}}&\cdots&\frac{\partial^2 f\left ( x\right )}{\partial x_{n}^2}\\ \end{pmatrix}
▽2f(x)=⎝⎜⎜⎜⎜⎛∂x12∂2f(x)∂x2∂x1∂2f(x)⋯∂xn∂x1∂2f(x)∂x1∂x2∂2f(x)∂x22∂2f(x)⋯∂xn∂x2∂2f(x)⋯⋯⋱⋯∂x1∂xn∂2f(x)∂x2∂xn∂2f(x)⋯∂xn2∂2f(x)⎠⎟⎟⎟⎟⎞
为 f ( x ) f\left(x\right) f(x)在 x x x处的二阶导数或者Hessian矩阵,记为 ▽ 2 f ( x ) \bigtriangledown^2f\left(x\right) ▽2f(x),若 f ( x ) f\left(x\right) f(x)对 x x x各变元的所有二阶偏导数都连续,则 ∂ 2 f ( x ) ∂ x i ∂ x j = ∂ 2 f ( x ) ∂ x j ∂ x i \frac{\partial^2 f\left ( x\right )}{\partial x_{i}\partial x_{j}}=\frac{\partial^2 f\left ( x\right )}{\partial x_{j}\partial x_{i}} ∂xi∂xj∂2f(x)=∂xj∂xi∂2f(x)此时 ▽ 2 f ( x ) \bigtriangledown^2f\left(x\right) ▽2f(x)为对称矩阵。 - 补充在二元函数中,如果 f ( x , y ) f\left(x,y\right) f(x,y)对于 x , y x,y x,y的二阶偏导数都连续,则 f x y ′ ′ = f y x ′ ′ {f}''_{xy}={f}''_{yx} fxy′′=fyx′′
4.多元实值函数凹凸性判定定理
- 设 D ⊂ R n D\subset R^n D⊂Rn是非空开凸集, f : D ⊂ R n → R f:D\subset R^n \rightarrow R f:D⊂Rn→R,且 f ( x ) f\left(x\right) f(x)在 D D D上二阶连续可微,如果 f ( x ) f\left(x\right) f(x)的Hessian矩阵 ▽ 2 f ( x ) \bigtriangledown^2 f\left(x\right) ▽2f(x)在 D D D上是正定的,则 f ( x ) f\left(x\right) f(x)在 D D D上的严格凸函数。
5.凸充分性定理
- 若 f : R n → R f:R^n \rightarrow R f:Rn→R是凸函数,且 f ( x ) f\left(x\right) f(x)一阶连续可微,则 x ∗ x^* x∗是全局解的充分必要条件是 ▽ f ( x ∗ ) = 0 ⃗ \bigtriangledown f\left(x^*\right)=\vec{0} ▽f(x∗)=0,其中 ▽ f ( x ) \bigtriangledown f\left(x\right) ▽f(x)为 f ( x ) f\left(x\right) f(x)关于 x x x的一阶导数(也称梯度)。
6.[标量-向量]的矩阵微分公式
∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ) ∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ) \frac {\partial y}{\partial x}=\begin{pmatrix} \frac {\partial y}{\partial x_{1}} \\ \frac {\partial y}{\partial x_{2}} \\ \vdots \\ \frac {\partial y}{\partial x_{n}} \\ \end{pmatrix}\;\;\;\;\;\frac {\partial y}{\partial x}=\begin{pmatrix} \frac {\partial y}{\partial x_{1}} & \frac {\partial y}{\partial x_{2}} & \cdots & \frac {\partial y}{\partial x_{n}} & \end{pmatrix} ∂x∂y=⎝⎜⎜⎜⎜⎛∂x1∂y∂x2∂y⋮∂xn∂y⎠⎟⎟⎟⎟⎞∂x∂y=(∂x1∂y∂x2∂y⋯∂xn∂y)
-
左式为分母布局(默认采用),右式为分子布局,其中, x = ( x 1 , x 2 , ⋯ , x n ) T x=(x_{1},x_{2},\cdots,x_{n})^T x=(x1,x2,⋯,xn)T为 n n n维列向量, y y y为 x x x的 n n n元标量函数
-
由【标量-向量】的矩阵微分公式可推得:
∂ x T a ⃗ ∂ x = ∂ a ⃗ T x ∂ x = ( ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x 2 ⋯ ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x n ) = ( a 1 a 2 ⋯ a n ) = a ⃗ \frac {\partial x^T\vec{a}}{\partial x}=\frac {\partial \vec{a}^Tx}{\partial x}=\begin{pmatrix} \frac {\partial \left(a_{1}x_{1}+a_{2}x_{2}+\cdots+a_{n}x_{n} \right)}{\partial x_{1}} \\ \frac {\partial \left(a_{1}x_{1}+a_{2}x_{2}+\cdots+a_{n}x_{n} \right)}{\partial x_{2}} \\ \cdots \\ \frac {\partial \left(a_{1}x_{1}+a_{2}x_{2}+\cdots+a_{n}x_{n} \right)}{\partial x_{n}} \\ \end{pmatrix}=\begin{pmatrix} a_{1} \\ a_{2} \\ \cdots \\ a_{n} \\ \end{pmatrix} = \vec{a} ∂x∂xTa=∂x∂aTx=⎝⎜⎜⎜⎛∂x1∂(a1x1+a2x2+⋯+anxn)∂x2∂(a1x1+a2x2+⋯+anxn)⋯∂xn∂(a1x1+a2x2+⋯+anxn)⎠⎟⎟⎟⎞=⎝⎜⎜⎛a1a2⋯an⎠⎟⎟⎞=a -
同理可推得: ∂ x T B x ∂ x = ( B + B T ) x \frac {\partial x^TBx}{\partial x}=\left(B+B^T \right)x ∂x∂xTBx=(B+BT)x