文章目录
- 定义
- 性质
- Stack Operator
- Kronecker Product
- a ⊗ b = v e c ( b a T ) a \otimes b = vec(b a^T) a⊗b=vec(baT)
- ( A ⊗ B ) T = ( A T ⊗ B T ) (A \otimes B)^T = (A^T \otimes B^T) (A⊗B)T=(AT⊗BT)
- 半线性
- ( A ⊗ B ) ( C ⊗ D ) = ( A C ⊗ B D ) (A \otimes B) (C\otimes D) = (AC \otimes BD) (A⊗B)(C⊗D)=(AC⊗BD)
- ( A ⊗ B ) − 1 = ( A − 1 ⊗ B − 1 ) (A \otimes B)^{-1} = (A^{-1} \otimes B^{-1}) (A⊗B)−1=(A−1⊗B−1)
- d e t ( A n × n ⊗ B m × m ) = d e t ( A ) m ⋅ d e t ( B ) n \mathrm{det}(A_{n\times n} \otimes B_{m \times m}) = \mathrm{det}(A)^m \cdot \mathrm{det}(B)^n det(An×n⊗Bm×m)=det(A)m⋅det(B)n
- T r ( A ⊗ B ) = T r ( A ) ⋅ T r ( B ) \mathrm{Tr}(A \otimes B) = \mathrm{Tr}(A) \cdot \mathrm{Tr}(B) Tr(A⊗B)=Tr(A)⋅Tr(B)
- v e c ( A B C ) = ( C T ⊗ A ) v e c ( B ) vec(ABC) = (C^T \otimes A) vec(B) vec(ABC)=(CT⊗A)vec(B)
定义
Stack Operator
对于任意的矩阵
A
∈
R
m
×
n
A \in \mathbb{R}^{m \times n}
A∈Rm×n,
v
e
c
(
A
)
:
=
[
A
00
,
A
10
,
…
,
A
m
−
1
,
n
−
1
]
T
∈
R
m
n
,
vec(A) := [A_{00}, A_{10}, \ldots, A_{m-1,n-1}]^T \in \mathbb{R}^{mn},
vec(A):=[A00,A10,…,Am−1,n−1]T∈Rmn,
即按列展开.
Kronecker Product
对于任意的矩阵
A
∈
R
m
×
n
,
B
∈
R
p
×
q
A \in \mathbb{R}^{m\times n }, B \in \mathbb{R}^{p \times q}
A∈Rm×n,B∈Rp×q,
A
⊗
B
:
=
[
A
00
⋅
B
⋯
A
0
n
−
1
⋅
B
⋮
⋱
⋮
A
m
−
1
,
0
⋅
B
⋯
A
m
−
1
,
n
−
1
⋅
B
]
∈
R
m
p
×
n
q
A \otimes B := \left [ \begin{array}{ccc} A_{00} \cdot B & \cdots & A_{0n-1} \cdot B \\ \vdots & \ddots & \vdots \\ A_{m-1,0} \cdot B & \cdots & A_{m-1,n-1} \cdot B \end{array} \right ] \in \mathbb{R}^{mp \times nq}
A⊗B:=⎣⎢⎡A00⋅B⋮Am−1,0⋅B⋯⋱⋯A0n−1⋅B⋮Am−1,n−1⋅B⎦⎥⎤∈Rmp×nq
性质
Stack Operator
T r ( A T B ) = v e c ( A ) T v e c ( B ) . \mathrm{Tr}(A^TB) = vec(A)^T vec(B). Tr(ATB)=vec(A)Tvec(B).
Kronecker Product
易知,
[
A
⊗
B
]
i
p
+
s
,
j
q
+
t
=
A
i
,
j
⋅
B
s
,
t
,
i
∈
[
m
]
,
s
∈
[
p
]
,
j
∈
[
n
]
,
t
∈
[
q
]
,
[A \otimes B]_{ip+s,jq+t} = A_{i,j} \cdot B_{s,t}, \quad i\in [m], s\in[p], j \in [n], t \in [q],
[A⊗B]ip+s,jq+t=Ai,j⋅Bs,t,i∈[m],s∈[p],j∈[n],t∈[q],
这里
[
m
]
=
{
0
,
1
,
…
,
m
−
1
}
[m] = \{0, 1, \ldots, m-1\}
[m]={0,1,…,m−1}.
a ⊗ b = v e c ( b a T ) a \otimes b = vec(b a^T) a⊗b=vec(baT)
-
a
∈
R
m
,
b
∈
R
n
a \in \mathbb{R}^m, b \in \mathbb{R}^n
a∈Rm,b∈Rn, 则
a ⊗ b = v e c ( b a T ) a \otimes b = vec(b a^T) a⊗b=vec(baT)
( A ⊗ B ) T = ( A T ⊗ B T ) (A \otimes B)^T = (A^T \otimes B^T) (A⊗B)T=(AT⊗BT)
( A ⊗ B ) T = ( A T ⊗ B T ) (A \otimes B)^T = (A^T \otimes B^T) (A⊗B)T=(AT⊗BT)是显然的.
故
[
A
⊗
B
]
∗
,
j
q
+
t
=
A
∗
,
j
⊗
B
∗
,
t
=
v
e
c
(
B
∗
,
t
A
∗
,
j
T
)
[
A
⊗
B
]
i
p
+
s
,
∗
T
=
A
i
,
∗
T
⊗
B
s
,
∗
T
=
v
e
c
(
B
s
,
∗
T
A
i
,
∗
)
.
[A \otimes B]_{*, jq+t} = A_{*,j} \otimes B_{*, t} = vec(B_{*, t} A_{*,j}^T) \\ [A \otimes B]_{ip+s, *}^T = A_{i,*}^T \otimes B_{s,*}^T = vec(B^T_{s,*} A_{i,*}).
[A⊗B]∗,jq+t=A∗,j⊗B∗,t=vec(B∗,tA∗,jT)[A⊗B]ip+s,∗T=Ai,∗T⊗Bs,∗T=vec(Bs,∗TAi,∗).
半线性
-
A ⊗ α B = α A ⊗ B = α ( A ⊗ B ) . A \otimes \alpha B = \alpha A \otimes B = \alpha (A \otimes B). A⊗αB=αA⊗B=α(A⊗B).
-
( A + B ) ⊗ C = A ⊗ C + B ⊗ C A ⊗ ( B + C ) = A ⊗ B + A ⊗ C . (A+B) \otimes C = A \otimes C +B \otimes C \\ A \otimes (B+C) = A\otimes B + A \otimes C. (A+B)⊗C=A⊗C+B⊗CA⊗(B+C)=A⊗B+A⊗C.
-
( A ⊗ B ) ⊗ C = A ⊗ ( B ⊗ C ) (A \otimes B) \otimes C=A \otimes (B\otimes C) (A⊗B)⊗C=A⊗(B⊗C):
( A ⊗ B ) ⊗ C = [ A i , j ⋅ B s , t ⋅ C ] = A ⊗ ( B ⊗ C ) . \begin{array}{ll} (A \otimes B) \otimes C &= [A_{i,j} \cdot B_{s,t} \cdot C ]\\ &= A \otimes (B \otimes C). \end{array} (A⊗B)⊗C=[Ai,j⋅Bs,t⋅C]=A⊗(B⊗C). -
通常 ( A ⊗ B ) ≠ ( B ⊗ A ) (A \otimes B) \not= (B \otimes A) (A⊗B)=(B⊗A).
( A ⊗ B ) ( C ⊗ D ) = ( A C ⊗ B D ) (A \otimes B) (C\otimes D) = (AC \otimes BD) (A⊗B)(C⊗D)=(AC⊗BD)
[ ( A ⊗ B ) ( C ⊗ D ) ] i p + s , j q + t = [ A ⊗ B ] i p + s , ∗ [ C ⊗ D ] ∗ , j q + t = v e c ( B s , ∗ T A i , ∗ ) T v e c ( D ∗ , t C ∗ , j T ) = T r ( A i , ∗ T B s , ∗ D ∗ , t C ∗ , j T ) = T r ( C ∗ , j T A i , ∗ T B s , ∗ D ∗ , t ) = A i , ∗ C ∗ , j ⋅ B s , ∗ D ∗ , t = [ A C ] i j ⋅ [ B D ] s t = [ A C ⊗ B D ] i p + s , j q + t . \begin{array}{ll} [(A \otimes B) (C\otimes D)]_{ip+s, jq+t} &= [A \otimes B]_{ip+s, *} [C\otimes D]_{*,jq+t} \\ &= vec(B_{s, *}^TA_{i,*})^T vec(D_{*,t} C_{*,j}^T) \\ &= \mathrm{Tr}(A_{i,*}^TB_{s,*}D_{*,t} C_{*,j}^T) \\ &= \mathrm{Tr}(C_{*,j}^TA_{i,*}^TB_{s,*}D_{*,t}) \\ &= A_{i, *}C_{*,j} \cdot B_{s, *} D_{*,t} \\ &= [AC]_{ij} \cdot [BD]_{st} \\ &= [AC \otimes BD]_{ip+s,jq+t}. \end{array} [(A⊗B)(C⊗D)]ip+s,jq+t=[A⊗B]ip+s,∗[C⊗D]∗,jq+t=vec(Bs,∗TAi,∗)Tvec(D∗,tC∗,jT)=Tr(Ai,∗TBs,∗D∗,tC∗,jT)=Tr(C∗,jTAi,∗TBs,∗D∗,t)=Ai,∗C∗,j⋅Bs,∗D∗,t=[AC]ij⋅[BD]st=[AC⊗BD]ip+s,jq+t.
( A ⊗ B ) − 1 = ( A − 1 ⊗ B − 1 ) (A \otimes B)^{-1} = (A^{-1} \otimes B^{-1}) (A⊗B)−1=(A−1⊗B−1)
条件自然是A, B为满秩方阵:
(
A
⊗
B
)
(
A
−
1
⊗
B
−
1
)
=
(
A
A
−
1
⊗
B
B
−
1
)
=
I
(A \otimes B) (A^{-1} \otimes B^{-1}) = (AA^{-1} \otimes BB^{-1}) = I
(A⊗B)(A−1⊗B−1)=(AA−1⊗BB−1)=I
d e t ( A n × n ⊗ B m × m ) = d e t ( A ) m ⋅ d e t ( B ) n \mathrm{det}(A_{n\times n} \otimes B_{m \times m}) = \mathrm{det}(A)^m \cdot \mathrm{det}(B)^n det(An×n⊗Bm×m)=det(A)m⋅det(B)n
就像用普通的高斯消去法将矩阵化为对角型一样, 在对 A n × n ⊗ B m × m A_{n\times n } \otimes B_{m\times m} An×n⊗Bm×m消去的过程中可以发现, B B B不会产生丝毫的影响, 结果便是显而易见的了.
T r ( A ⊗ B ) = T r ( A ) ⋅ T r ( B ) \mathrm{Tr}(A \otimes B) = \mathrm{Tr}(A) \cdot \mathrm{Tr}(B) Tr(A⊗B)=Tr(A)⋅Tr(B)
T r ( A ⊗ B ) = ∑ i = 1 m ∑ j = 1 n A i B j = T r ( A ) ⋅ T r ( B ) . \mathrm{Tr}(A \otimes B) = \sum_{i=1}^m \sum_{j=1}^n A_iB_j = \mathrm{Tr}(A) \cdot \mathrm{Tr}(B). Tr(A⊗B)=i=1∑mj=1∑nAiBj=Tr(A)⋅Tr(B).
v e c ( A B C ) = ( C T ⊗ A ) v e c ( B ) vec(ABC) = (C^T \otimes A) vec(B) vec(ABC)=(CT⊗A)vec(B)
设
A
∈
R
m
×
n
,
B
∈
R
n
×
p
,
C
∈
R
p
×
q
A \in \mathbb{R}^{m\times n}, B \in \mathbb{R}^{n \times p}, C \in \mathbb{R}^{p \times q}
A∈Rm×n,B∈Rn×p,C∈Rp×q,
[
v
e
c
(
A
B
C
)
]
j
m
+
i
=
[
A
B
C
]
i
,
j
=
T
r
(
A
i
,
∗
B
C
∗
,
j
)
=
T
r
(
C
∗
,
j
A
i
,
∗
B
)
=
v
e
c
(
A
i
,
∗
T
C
∗
j
T
)
T
v
e
c
(
B
)
=
[
C
T
⊗
A
]
j
m
+
i
,
∗
v
e
c
(
B
)
[vec(ABC)]_{jm+i} = [ABC]_{i,j} = \mathrm{Tr}(A_{i,*}BC_{*,j}) = \mathrm{Tr}(C_{*,j}A_{i,*}B)=vec(A_{i,*}^TC_{*j}^T)^T vec(B) = [C^T \otimes A]_{jm+i,*} vec(B)
[vec(ABC)]jm+i=[ABC]i,j=Tr(Ai,∗BC∗,j)=Tr(C∗,jAi,∗B)=vec(Ai,∗TC∗jT)Tvec(B)=[CT⊗A]jm+i,∗vec(B)
特例:
A x = I A x = v e c ( I A x ) = ( x T ⊗ I ) v e c ( A ) Ax = IAx = vec(IAx) = (x^T \otimes I)vec(A) Ax=IAx=vec(IAx)=(xT⊗I)vec(A)
这个在处理梯度的时候会比较有用:
y
=
A
x
y = Ax
y=Ax
则
d
y
=
(
d
A
)
x
+
A
d
x
=
(
x
T
⊗
I
)
v
e
c
(
d
A
)
+
A
d
x
.
\mathrm{d}y = (\mathrm{d}A)x + A\mathrm{d}x = (x^T \otimes I) vec(\mathrm{d}A) + A \mathrm{d}x.
dy=(dA)x+Adx=(xT⊗I)vec(dA)+Adx.