# 机器人中常用矩阵等式 ([Identity 1] ~ [Identity 6] 的证明)
I. 机器人中常用矩阵等式-I (汇总)
II. 具体展开及证明
[Identity 1] Matrix Trace
t
r
(
A
B
)
=
t
r
(
B
A
)
{\rm tr}(\mathbf{A}\mathbf{B}) = {\rm tr}(\mathbf{B}\mathbf{A})
tr(AB)=tr(BA)
where
A
\mathbf{A}
A is an
n
×
m
n\times m
n×m matrix, and
B
\mathbf{B}
B is an
m
×
n
m \times n
m×n matrix. [1]
Proof
t r ( A B ) = ∑ i = 1 n ∑ j = 1 m A i j B j i = ∑ i = 1 n ∑ j = 1 m B j i A i j ‾ {\rm tr}(\mathbf{A}\mathbf{B}) = \sum_{i=1}^{n} \sum_{j=1}^{m} A_{ij} B_{ji} = \sum_{i=1}^{n} \sum_{j=1}^{m} \underline{B_{ji} A_{ij}} tr(AB)=i=1∑nj=1∑mAijBji=i=1∑nj=1∑mBjiAijt r ( B A ) = ∑ p = 1 m ∑ q = 1 n B p q A q p = ∑ q = 1 n ∑ p = 1 m ‾ B p q A q p {\rm tr}(\mathbf{B}\mathbf{A}) = \sum_{p=1}^{m} \sum_{q=1}^{n} B_{pq} A_{qp} = \underline{\sum_{q=1}^{n} \sum_{p=1}^{m}} B_{pq} A_{qp} tr(BA)=p=1∑mq=1∑nBpqAqp=q=1∑np=1∑mBpqAqp
比较两式得到
t r ( A B ) = t r ( B A ) {\rm tr}(\mathbf{A}\mathbf{B}) = {\rm tr}(\mathbf{B}\mathbf{A}) tr(AB)=tr(BA)
证毕.
[Identity 2] Derivative of Vector with Respect to Vector
Let
x
\mathbf{x}
x and
y
\mathbf{y}
y be vectors of orders
n
n
n and
m
m
m respectively
x
=
[
x
1
x
2
⋮
x
n
]
,
y
=
[
y
1
y
2
⋮
y
m
]
\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots\\ x_n\end{bmatrix}, \qquad \mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots\\ y_m\end{bmatrix}
x=
x1x2⋮xn
,y=
y1y2⋮ym
where
y
\mathbf{y}
y is a function of
x
\mathbf{x}
x, i.e.,
y
=
y
(
x
)
\mathbf{y} = \mathbf{y}(\mathbf{x})
y=y(x).
In denominator-layout notation, the derivative of the vector
y
\mathbf{y}
y with respect to vector
x
\mathbf{x}
x is the
n
×
m
n \times m
n×m matrix[2]
∂
y
∂
x
≜
[
∂
y
1
∂
x
1
∂
y
2
∂
x
1
⋯
∂
y
m
∂
x
1
∂
y
1
∂
x
2
∂
y
2
∂
x
2
⋯
∂
y
m
∂
x
2
⋮
⋮
⋱
⋮
∂
y
1
∂
x
n
∂
y
2
∂
x
n
⋯
∂
y
m
∂
x
n
]
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \triangleq \begin{bmatrix} \frac{\partial y_1}{\partial x_1} &\frac{\partial y_2}{\partial x_1} &\cdots &\frac{\partial y_m}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} &\frac{\partial y_2}{\partial x_2} &\cdots &\frac{\partial y_m}{\partial x_2}\\ \vdots &\vdots & \ddots &\vdots\\ \frac{\partial y_1}{\partial x_n} &\frac{\partial y_2}{\partial x_n} &\cdots &\frac{\partial y_m}{\partial x_n} \end{bmatrix}
∂x∂y≜
∂x1∂y1∂x2∂y1⋮∂xn∂y1∂x1∂y2∂x2∂y2⋮∂xn∂y2⋯⋯⋱⋯∂x1∂ym∂x2∂ym⋮∂xn∂ym
In numerator-layout notation, the derivative of the vector
y
\mathbf{y}
y with respect to vector
x
\mathbf{x}
x is the
m
×
n
m \times n
m×n matrix[6]
∂
y
∂
x
≜
[
∂
y
1
∂
x
1
∂
y
1
∂
x
2
⋯
∂
y
1
∂
x
n
∂
y
2
∂
x
1
∂
y
2
∂
x
2
⋯
∂
y
2
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
∂
x
1
∂
y
m
∂
x
2
⋯
∂
y
m
∂
x
n
]
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \triangleq \begin{bmatrix} \frac{\partial y_1}{\partial x_1} &\frac{\partial y_1}{\partial x_2} &\cdots &\frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} &\frac{\partial y_2}{\partial x_2} &\cdots &\frac{\partial y_2}{\partial x_n}\\ \vdots &\vdots & \ddots &\vdots\\\frac{\partial y_m}{\partial x_1} &\frac{\partial y_m}{\partial x_2} &\cdots &\frac{\partial y_m}{\partial x_n}\end{bmatrix}
∂x∂y≜
∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym
[Identity 3] Derivative of Scale with Respect to Matrix
Let
X
\mathbf{X}
X be an
m
×
n
m\times n
m×n matrix. For a scalar valued function
f
(
X
)
f(\mathbf X)
f(X), the result
∂
f
/
∂
X
\partial f/\partial \mathbf X
∂f/∂X has the same size with
X
\mathbf X
X. That is[1]
∂
f
∂
X
≜
[
∂
f
∂
X
11
∂
f
∂
X
12
⋯
∂
f
∂
X
1
n
∂
f
∂
X
21
∂
f
∂
X
22
⋯
∂
f
∂
X
2
n
⋮
⋮
⋱
⋮
∂
f
∂
X
m
1
∂
f
∂
X
m
2
⋯
∂
f
∂
X
m
n
]
\frac{\partial f}{\partial \mathbf X} \triangleq \begin{bmatrix} \frac{\partial f}{\partial X_{11}} & \frac{\partial f}{\partial X_{12}} &\cdots & \frac{\partial f}{\partial X_{1n}} \\ \frac{\partial f}{\partial X_{21}} & \frac{\partial f}{\partial X_{22}} &\cdots & \frac{\partial f}{\partial X_{2n}} \\ \vdots &\vdots &\ddots &\vdots\\ \frac{\partial f}{\partial X_{m1}} & \frac{\partial f}{\partial X_{m2}} &\cdots & \frac{\partial f}{\partial X_{mn}} \\ \end{bmatrix}
∂X∂f≜
∂X11∂f∂X21∂f⋮∂Xm1∂f∂X12∂f∂X22∂f⋮∂Xm2∂f⋯⋯⋱⋯∂X1n∂f∂X2n∂f⋮∂Xmn∂f
[Identity 4] Partial Derivative of a Matrix Trace of the First Order (1)
∂
t
r
X
Y
∂
X
=
∂
t
r
Y
X
∂
X
=
Y
T
\frac{\partial {\rm tr}{\mathbf X}{\mathbf Y}}{\partial \mathbf X} = \frac{\partial {\rm tr}{\mathbf Y} {\mathbf X}}{\partial \mathbf X} = {\mathbf Y}^{\small \rm T}
∂X∂trXY=∂X∂trYX=YT
where
X
\mathbf{X}
X is
m
×
n
m\times n
m×n and
Y
\mathbf{Y}
Y is
n
×
m
n\times m
n×m.[3]
Proof
计算可知
t r X Y = ∑ i = 1 m ∑ j = 1 n X i j Y j i {\rm tr} \mathbf{X} \mathbf{Y} = \sum_{i=1}^{m} \sum_{j=1}^{n} X_{ij} Y_{ji} trXY=i=1∑mj=1∑nXijYji
则有
∂ t r X Y ∂ X i j = ∂ ∑ i = 1 m ∑ j = 1 n X i j Y j i ∂ X i j = Y j i \frac{\partial {\rm tr} \mathbf{X} \mathbf{Y}}{\partial X_{ij}} =\frac{\partial \sum_{i=1}^{m} \sum_{j=1}^{n} X_{ij} Y_{ji}}{\partial X_{ij}} = Y_{ji} ∂Xij∂trXY=∂Xij∂∑i=1m∑j=1nXijYji=Yji
故有
∂ t r X Y ∂ X = Y T \frac{\partial {\rm tr}{\mathbf X}{\mathbf Y}}{\partial \mathbf X} = {\mathbf Y}^{\small \rm T} ∂X∂trXY=YT
根据 “[Identity 1] Matrix Trace” 得到
∂ t r Y X ∂ X = Y T \frac{\partial {\rm tr}{\mathbf Y} {\mathbf X}}{\partial \mathbf X} = {\mathbf Y}^{\small \rm T} ∂X∂trYX=YT
得证.
[Identity 5] Partial Derivative of a Matrix Trace of the First Order (2)
∂
t
r
X
T
Y
∂
X
=
∂
t
r
Y
X
T
∂
X
=
Y
\frac{\partial {\rm tr}{{\mathbf X}^{\small\rm T}} {\mathbf Y}}{\partial \mathbf X} = \frac{\partial {\rm tr}{\mathbf Y} {{\mathbf X}^{\small\rm T}} } {\partial \mathbf X} = {\mathbf Y}
∂X∂trXTY=∂X∂trYXT=Y
where
X
\mathbf{X}
X is
n
×
m
n \times m
n×m and
Y
\mathbf{Y}
Y is also
n
×
m
n\times m
n×m.
Proof
证明和 “[Identity 4] Partial Derivative of a Matrix Trace” 类似.
t r X T Y = ∑ i = 1 m ∑ j = 1 n X j i Y j i {\rm tr} {\mathbf{X}^{\small \rm T}} \mathbf{Y} = \sum_{i=1}^{m} \sum_{j=1}^{n} X_{ji} Y_{ji} trXTY=i=1∑mj=1∑nXjiYji
则有
∂ t r X T Y ∂ X i j = ∂ ∑ i = 1 m ∑ j = 1 n X j i Y j i ∂ X i j = Y i j \frac{\partial {\rm tr} \mathbf{X}^{\small \rm T} \mathbf{Y}}{\partial X_{ij}} =\frac{\partial \sum_{i=1}^{m} \sum_{j=1}^{n} X_{ji} Y_{ji}}{\partial X_{ij}} = Y_{ij} ∂Xij∂trXTY=∂Xij∂∑i=1m∑j=1nXjiYji=Yij
根据 “[Identity 3] Derivative of Scale with Respect to Matrix” 有
∂ t r X T Y ∂ X = Y \frac{\partial {\rm tr}{{\mathbf X}^{\small\rm T}} {\mathbf Y}}{\partial \mathbf X} = {\mathbf Y} ∂X∂trXTY=Y
又由 “[Identity 1] Matrix Trace”, 可知
∂ t r X T Y ∂ X = ∂ t r Y X T ∂ X = Y \frac{\partial {\rm tr}{{\mathbf X}^{\small\rm T}} {\mathbf Y}}{\partial \mathbf X} = \frac{\partial {\rm tr}{\mathbf Y} {{\mathbf X}^{\small\rm T}} } {\partial \mathbf X} = {\mathbf Y} ∂X∂trXTY=∂X∂trYXT=Y
得证.
[Identity 6] Partial Derivative of a Matrix Trace of the Second Order (1)
∂
t
r
X
Z
X
T
∂
X
=
X
Z
T
+
X
Z
\frac{\partial {\rm tr}{\mathbf X}{\mathbf Z}{\mathbf X}^{\small \rm T}}{\partial \mathbf X} = {\mathbf X}{\mathbf Z}^{\small\rm T} + {\mathbf X}{\mathbf Z}
∂X∂trXZXT=XZT+XZ
where
X
\mathbf{X}
X is
m
×
n
m\times n
m×n, and
Z
{\mathbf Z}
Z is
n
×
n
n\times n
n×n.[3]
If
Z
\mathbf{Z}
Z is symmetric, then
∂
t
r
X
Z
X
T
∂
X
=
2
X
Z
\frac{\partial {\rm tr}{\mathbf X}{\mathbf Z}{\mathbf X}^{\small \rm T}}{\partial \mathbf X} = 2{\mathbf X}{\mathbf Z}
∂X∂trXZXT=2XZ
Proof
取 X \mathbf{X} X 中第 k k k 行 X k {\mathbf X}_{k} Xk, X Z X T \mathbf{X} \mathbf{Z} \mathbf{X}^{\small \rm T} XZXT 的 k k k 行 k k k 列元素 (对角线上) 为
[ X Z X T ] k k = X k Z X k T = [ X k 1 X k 2 ⋯ X k n ] [ Z 11 Z 12 ⋯ Z 1 n Z 21 Z 22 ⋯ Z 2 n ⋮ ⋮ ⋱ ⋮ Z n 1 Z n 2 ⋯ Z n n ] [ X k 1 X k 2 ⋮ X k n ] = [ ∑ i = 1 n X k i Z i 1 ∑ i = 1 n X k i Z i 2 ⋯ ∑ i = 1 n X k i Z i n ] [ X k 1 X k 2 ⋮ X k n ] = ∑ j = 1 n ∑ i = 1 n X k i Z i j X k j \begin{aligned} \left[\mathbf{X} \mathbf{Z} \mathbf{X}^{\small \rm T}\right]_{kk} =\mathbf{X}_{k} \mathbf{Z} \mathbf{X}_{k}^{\small \rm T} &= \begin{bmatrix} X_{k1} &X_{k2} &\cdots & X_{kn} \end{bmatrix} \begin{bmatrix} Z_{11} & Z_{12} &\cdots &Z_{1n}\\ Z_{21} & Z_{22} &\cdots &Z_{2n}\\ \vdots & \vdots &\ddots &\vdots\\ Z_{n1} & Z_{n2} &\cdots &Z_{nn} \end{bmatrix} \begin{bmatrix} X_{k1} \\X_{k2} \\ \vdots \\ X_{kn} \end{bmatrix}\\ &= \begin{bmatrix} \sum_{i=1}^{n} X_{ki} Z_{i1} &\sum_{i=1}^{n} X_{ki} Z_{i2} &\cdots & \sum_{i=1}^{n} X_{ki} Z_{in} \end{bmatrix}\begin{bmatrix} X_{k1} \\X_{k2} \\ \vdots \\ X_{kn} \end{bmatrix}\\ &= \sum_{j=1}^{n}\sum_{i=1}^{n} X_{ki} Z_{ij} X_{kj} \end{aligned} [XZXT]kk=XkZXkT=[Xk1Xk2⋯Xkn] Z11Z21⋮Zn1Z12Z22⋮Zn2⋯⋯⋱⋯Z1nZ2n⋮Znn Xk1Xk2⋮Xkn =[∑i=1nXkiZi1∑i=1nXkiZi2⋯∑i=1nXkiZin] Xk1Xk2⋮Xkn =j=1∑ni=1∑nXkiZijXkj
由方阵迹的定义
t r ( X Z X T ) = ∑ k = 1 m X k Z X k T = ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n X k i Z i j X k j {\rm tr} \left( \mathbf{X} \mathbf{Z} \mathbf{X}^{\small \rm T}\right) = \sum_{k=1}^{m} \mathbf{X}_{k} \mathbf{Z} \mathbf{X}_{k}^{\small \rm T} = \sum_{k=1}^{m} \sum_{j=1}^{n}\sum_{i=1}^{n} X_{ki} Z_{ij} X_{kj} tr(XZXT)=k=1∑mXkZXkT=k=1∑mj=1∑ni=1∑nXkiZijXkj根据 “[Identity 3] Derivative of Scale with Respect to Matrix” 有
[ ∂ t r X Z X T ∂ X ] p q = ∂ t r ( X Z X T ) ∂ X p q = ∂ ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n X k i Z i j X k j ∂ X p q ( p r o d u c t r u l e [ f g ] ′ = f ′ g + f g ′ ) = ∑ j = 1 n Z q j X p j + ∑ i = 1 n X p i Z i q ( Z q j X p j = X p j Z q j ) = ∑ j = 1 n X p j Z q j + ∑ i = 1 n X p i Z i q \begin{aligned} \left[ \frac{\partial {\rm tr}{\mathbf X}{\mathbf Z}{\mathbf X}^{\small \rm T}}{\partial \mathbf X} \right]_{pq} &= \frac{\partial {\rm tr} \left( \mathbf{X} \mathbf{Z} \mathbf{X}^{\small \rm T}\right)}{\partial X_{pq}}\\ &= \frac{\partial{\sum_{k=1}^{m} \sum_{j=1}^{n}\sum_{i=1}^{n} X_{ki} Z_{ij} X_{kj}}}{\partial X_{pq}} \\ ({\rm product\ rule}\ [fg]'=f'g+fg') \qquad &= \sum_{j=1}^{n} Z_{qj}X_{pj} + \sum_{i=1}^{n} X_{pi} Z_{iq} \\ (Z_{qj}X_{pj} ={X_{pj} Z_{qj}}) \qquad &= \sum_{j=1}^{n}{X_{pj} Z_{qj}} + \sum_{i=1}^{n} X_{pi} Z_{iq} \\ \end{aligned} [∂X∂trXZXT]pq(product rule [fg]′=f′g+fg′)(ZqjXpj=XpjZqj)=∂Xpq∂tr(XZXT)=∂Xpq∂∑k=1m∑j=1n∑i=1nXkiZijXkj=j=1∑nZqjXpj+i=1∑nXpiZiq=j=1∑nXpjZqj+i=1∑nXpiZiq所以有
∂ t r X Z X T ∂ X = X Z T + X Z \frac{\partial {\rm tr}{\mathbf X}{\mathbf Z}{\mathbf X}^{\small \rm T}}{\partial \mathbf X} = {\mathbf X}{\mathbf Z}^{\small\rm T}+ {\mathbf X}{\mathbf Z} ∂X∂trXZXT=XZT+XZ
证毕
机器人中常用矩阵等式-III (Identities 7~10 的证明)
参考文献
[1] Hu Pili, Matrix Calculus: Derivation and Simple Application, https://project.hupili.net/tutorial/hu2012-matrix-calculus/hu2012matrix-calculus.pdf
[2] J. A. Dobelman, D Matrix Calculus, https://www.stat.rice.edu/~dobelman/notes_papers/math/Matrix.Calculus.AppD.pdf
[3] Timothy D. Barfoot, State Estimation for Robotics, Cambridge University Press, 2017
[4] Wolfram MathWorld, Vector Triple Product, https://mathworld.wolfram.com/VectorTripleProduct.html
[5] Chris Yeh, Schur Complements and the Matrix Inversion Lemma, https://chrisyeh96.github.io/2021/05/19/schur-complement.html
[6] Wikipedia, Matrix calculus, https://en.wikipedia.org/wiki/Matrix_calculus