矩阵的 Frobenius 范数及其求偏导法则
定义:设 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n是一个 m × n m \times n m×n矩阵,称 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} ∥A∥F=tr(ATA)=i=1∑mj=1∑naij2是这个矩阵得Frobenius范数
求导法则1
设 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n, X = [ x i j ] m × n X = [x_{ij}]_{m \times n} X=[xij]m×n是两个 m × n m \times n m×n的矩阵,则有
∂ t r ( A T X ) ∂ x i j = ∂ t r ( X T A ) ∂ x i j = a i j = [ A i j ] \frac{\partial tr(A^TX)}{\partial x_{ij}} = \frac{\partial tr(X^TA)}{\partial x_{ij}} = a_{ij} = [A_{ij}] ∂xij∂tr(ATX)=∂xij∂tr(XTA)=aij=[Aij]
证:因 A T X = [ ∑ p = 1 m a p i x p j ] n × n A^TX = \begin{bmatrix}\displaystyle\sum_{p=1}^{m}a_{pi}x_{pj} \end{bmatrix}_{n \times n} ATX=[p=1∑mapixpj]n×n, t r ( A T X ) = ∑ t = 1 n ∑ p = 1 m a p t x p t tr(A^TX) = \displaystyle\sum_{t=1}^{n}\displaystyle\sum_{p=1}^{m}a_{pt}x_{pt} tr(ATX)=t=1∑np=1∑maptxpt。
当 p = i,t = j 时有 ∂ ( a p t x p t ) ∂ ( x i j ) = ∂ ( a i j x i j ) ∂ ( x i j ) = a i j = [ A i j ] \frac{\partial (a_{pt}x_{pt})}{\partial (x_{ij})} = \frac{\partial (a_{ij}x_{ij})}{\partial (x_{ij})} = a_{ij} = [A_{ij}] ∂(xij)∂(aptxpt)=∂(xij)∂(aijxij)=aij=[Aij]。
t
r
(
A
T
X
)
tr(A^TX)
tr(ATX)求和式中其他各项
的偏导数都等于0,所以有
∂
(
A
T
X
)
∂
x
i
j
=
∂
∑
p
=
1
n
∑
q
=
1
n
∑
t
=
1
m
a
t
p
x
t
q
∂
x
i
j
=
a
i
j
=
[
A
]
i
j
\frac{\partial (A^TX)}{\partial x_{ij}} = \frac{\partial \displaystyle\sum_{p=1}^{n}\displaystyle\sum_{q=1}^{n}\displaystyle\sum_{t=1}^{m}a_{tp}x_{tq}}{\partial x_{ij}} = a_{ij} = [A]_{ij}
∂xij∂(ATX)=∂xij∂p=1∑nq=1∑nt=1∑matpxtq=aij=[A]ij
具体例子
让我们通过一个具体的例子来更好地理解这个求导过程。假设我们有一个2x2的矩阵 A A A和另一个2x2的矩阵 X X X,它们的形式如下:
A = [ a 11 a 12 a 21 a 22 ] , X = [ x 11 x 12 x 21 x 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad X = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} A=[a11a21a12a22],X=[x11x21x12x22]
现在我们要计算 ∂ t r ( A T X ) ∂ x i j \frac{\partial tr(A^TX)}{\partial x_{ij}} ∂xij∂tr(ATX)对于任意的 i , j i, j i,j。
首先, A T A^T AT是 A A A的转置,因此
A T = [ a 11 a 21 a 12 a 22 ] A^T = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} AT=[a11a12a21a22]
接着, A T X A^TX ATX的乘积是:
A T X = [ a 11 a 21 a 12 a 22 ] [ x 11 x 12 x 21 x 22 ] = [ a 11 x 11 + a 21 x 21 a 11 x 12 + a 21 x 22 a 12 x 11 + a 22 x 21 a 12 x 12 + a 22 x 22 ] A^TX = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} = \begin{bmatrix} a_{11}x_{11} + a_{21}x_{21} & a_{11}x_{12} + a_{21}x_{22} \\ a_{12}x_{11} + a_{22}x_{21} & a_{12}x_{12} + a_{22}x_{22} \end{bmatrix} ATX=[a11a12a21a22][x11x21x12x22]=[a11x11+a21x21a12x11+a22x21a11x12+a21x22a12x12+a22x22]
矩阵 A T X A^TX ATX的迹(即对角元素之和)为:
t r ( A T X ) = ( a 11 x 11 + a 21 x 21 ) + ( a 12 x 12 + a 22 x 22 ) tr(A^TX) = (a_{11}x_{11} + a_{21}x_{21}) + (a_{12}x_{12} + a_{22}x_{22}) tr(ATX)=(a11x11+a21x21)+(a12x12+a22x22)
现在,我们要计算 ∂ t r ( A T X ) ∂ x i j \frac{\partial tr(A^TX)}{\partial x_{ij}} ∂xij∂tr(ATX)。根据 i , j i, j i,j的不同取值,我们可以分别计算出:
- 当 i = 1 , j = 1 i = 1, j = 1 i=1,j=1时,
∂ t r ( A T X ) ∂ x 11 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 11 = a 11 \frac{\partial tr(A^TX)}{\partial x_{11}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{11}} = a_{11} ∂x11∂tr(ATX)=∂x11∂(a11x11+a21x21+a12x12+a22x22)=a11
- 当 i = 1 , j = 2 i = 1, j = 2 i=1,j=2时,
∂ t r ( A T X ) ∂ x 12 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 12 = a 12 \frac{\partial tr(A^TX)}{\partial x_{12}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{12}} = a_{12} ∂x12∂tr(ATX)=∂x12∂(a11x11+a21x21+a12x12+a22x22)=a12
- 当 i = 2 , j = 1 i = 2, j = 1 i=2,j=1时,
∂ t r ( A T X ) ∂ x 21 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 21 = a 21 \frac{\partial tr(A^TX)}{\partial x_{21}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{21}} = a_{21} ∂x21∂tr(ATX)=∂x21∂(a11x11+a21x21+a12x12+a22x22)=a21
- 当 i = 2 , j = 2 i = 2, j = 2 i=2,j=2时,
∂ t r ( A T X ) ∂ x 22 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 22 = a 22 \frac{\partial tr(A^TX)}{\partial x_{22}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{22}} = a_{22} ∂x22∂tr(ATX)=∂x22∂(a11x11+a21x21+a12x12+a22x22)=a22
因此 ∂ t r ( A T X ) ∂ x i j = A = [ a 11 a 12 a 21 a 22 ] \frac{\partial tr(A^TX)}{\partial x_{ij}} = A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} ∂xij∂tr(ATX)=A=[a11a21a12a22]
从上面的例子可以看出,对于任何 i , j i, j i,j,都有 ∂ t r ( A T X ) ∂ x i j = a i j \frac{\partial tr(A^TX)}{\partial x_{ij}} = a_{ij} ∂xij∂tr(ATX)=aij,这正是 A A A矩阵中的对应元素。这就是求导法则1所表达的意思:矩阵 A A A中的每个元素就是关于 X X X中
相应位置元素的偏导数。
其中,对于 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} ∥A∥F=tr(ATA)=i=1∑mj=1∑naij2,我们通过结合具体例子进行理解。
这里请注意: ∥ A ∥ F = t r ( A T A ) \left \|A \right \|_F = \sqrt{tr(A^TA)} ∥A∥F=tr(ATA)
具体例子
Frobenius范数是对矩阵元素的平方和开方(因此其就是一个实数)
,它衡量的是矩阵所有元素的欧几里得长度。通过一个具体的2x2矩阵来理解这个概念。
假设我们有一个2x2的矩阵 A A A:
A = [ a 11 a 12 a 21 a 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} A=[a11a21a12a22]
那么 A T A^T AT( A A A的转置)就是:
A T = [ a 11 a 21 a 12 a 22 ] A^T = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} AT=[a11a12a21a22]
现在计算 A T A A^TA ATA的乘积:
A T A = [ a 11 a 21 a 12 a 22 ] [ a 11 a 12 a 21 a 22 ] = [ a 11 2 + a 21 2 a 11 a 12 + a 21 a 22 a 11 a 12 + a 21 a 22 a 12 2 + a 22 2 ] A^TA = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} a_{11}^2 + a_{21}^2 & a_{11}a_{12} + a_{21}a_{22} \\ a_{11}a_{12} + a_{21}a_{22} & a_{12}^2 + a_{22}^2 \end{bmatrix} ATA=[a11a12a21a22][a11a21a12a22]=[a112+a212a11a12+a21a22a11a12+a21a22a122+a222]
接着,计算 A T A A^TA ATA的迹(对角线元素之和):
t r ( A T A ) = ( a 11 2 + a 21 2 ) + ( a 12 2 + a 22 2 ) = a 11 2 + a 21 2 + a 12 2 + a 22 2 tr(A^TA) = (a_{11}^2 + a_{21}^2) + (a_{12}^2 + a_{22}^2) = a_{11}^2 + a_{21}^2 + a_{12}^2 + a_{22}^2 tr(ATA)=(a112+a212)+(a122+a222)=a112+a212+a122+a222
最后,Frobenius范数 ∥ A ∥ F \|A\|_F ∥A∥F就是上述迹的平方根:
∥ A ∥ F = t r ( A T A ) = a 11 2 + a 21 2 + a 12 2 + a 22 2 \|A\|_F = \sqrt{tr(A^TA)} = \sqrt{a_{11}^2 + a_{21}^2 + a_{12}^2 + a_{22}^2} ∥A∥F=tr(ATA)=a112+a212+a122+a222
这就是矩阵 A A A所有元素的平方和再开方的结果。换句话说,它是矩阵 A A A中所有元素的平方和的平方根。
举个数值例子,如果
A = [ 1 2 3 4 ] A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} A=[1324]
那么
A T A = [ 1 3 2 4 ] [ 1 2 3 4 ] = [ 10 14 14 20 ] A^TA = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 10 & 14 \\ 14 & 20 \end{bmatrix} ATA=[1234][1324]=[10141420]
并且
t r ( A T A ) = 10 + 20 = 30 tr(A^TA) = 10 + 20 = 30 tr(ATA)=10+20=30
因此,
∥ A ∥ F = 30 \|A\|_F = \sqrt{30} ∥A∥F=30
这样我们就得到了矩阵 A A A的Frobenius范数。
定义:设 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n是一个 m × n m \times n m×n矩阵,称 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} ∥A∥F=tr(ATA)=i=1∑mj=1∑naij2是这个矩阵得Frobenius范数
求导法则 2
设 A = [ a i j ] m × m A = [a_{ij}]_{m \times m} A=[aij]m×m是 m × m m \times m m×m矩阵, X = [ x i j ] m × n X = [x_{ij}]_{m \times n} X=[xij]m×n是 m × n m \times n m×n矩阵,则有
∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}}=\sum_{q=1}^ma_{iq}x_{qj}+\sum_{p=1}^ma_{pi}x_{pj}=\Big[ AX+A^TX\Big]_{ij} ∂xij∂tr(XTAX)=q=1∑maiqxqj+p=1∑mapixpj=[AX+ATX]ij
证:因为 X T A X = [ ∑ p = 1 m ∑ q = 1 m x p i a p q x q j ] m × n , t r ( X T A X ) = ∑ t = 1 n ∑ p = 1 m ∑ q = 1 m x p i a p q x q t \quad X^{T}AX=\left[\sum_{p=1}^{m}\sum_{q=1}^{m}x_{pi}a_{pq}x_{qj}\right]_{m\times n},\quad tr(X^{T}AX)=\sum_{t=1}^{n}\sum_{p=1}^{m}\sum_{q=1}^{m}x_{pi}a_{pq}x_{qt}\quad XTAX=[∑p=1m∑q=1mxpiapqxqj]m×n,tr(XTAX)=∑t=1n∑p=1m∑q=1mxpiapqxqt
当 p = i,t = j 时有 ∂ ( ∑ q = 1 m x i j a i q x q j ) ∂ x i j = ∑ q = 1 m a i q x q j = [ A X ] i j \frac{\partial(\sum_{q=1}^{m}x_{ij}a_{iq}x_{qj})}{\partial x_{ij}}=\sum_{q=1}^{m}a_{iq}x_{qj}=[AX]_{ij} ∂xij∂(∑q=1mxijaiqxqj)=∑q=1maiqxqj=[AX]ij
当 q = i,t = j 时有 ∂ ( ∑ p = 1 m x p j a p i x i j ) ∂ x i j = ∑ p = 1 m a p i x p j = [ A T X ] i j \frac{\partial(\sum_{p=1}^mx_{pj}a_{pi}x_{ij})}{\partial x_{ij}}=\sum_{p=1}^ma_{pi}x_{pj}=\left[A^TX\right]_{ij} ∂xij∂(∑p=1mxpjapixij)=∑p=1mapixpj=[ATX]ij
t r ( X T A X ) tr(X^TAX) tr(XTAX)求和式中其他各项的偏导数都等于0,所以有 ∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}}=\sum_{q=1}^ma_{iq}x_{qj}+\sum_{p=1}^ma_{pi}x_{pj}=\Big[ AX+A^TX\Big]_{ij} ∂xij∂tr(XTAX)=∑q=1maiqxqj+∑p=1mapixpj=[AX+ATX]ij
- 具体例子
我们可以通过一个具体的2x2矩阵的例子来说明求导法则2。假设 A A A是一个2x2的方阵,而 X X X也是一个2x2的矩阵,那么我们可以具体计算 ∂ t r ( X T A X ) ∂ x i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} ∂xij∂tr(XTAX)。
令
A = [ a 11 a 12 a 21 a 22 ] , X = [ x 11 x 12 x 21 x 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad X = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} A=[a11a21a12a22],X=[x11x21x12x22]
首先,我们需要计算 X T A X X^TAX XTAX:
X T = [ x 11 x 21 x 12 x 22 ] X^T = \begin{bmatrix} x_{11} & x_{21} \\ x_{12} & x_{22} \end{bmatrix} XT=[x11x12x21x22]
X T A X = [ x 11 x 21 x 12 x 22 ] [ a 11 a 12 a 21 a 22 ] [ x 11 x 12 x 21 x 22 ] X^TAX = \begin{bmatrix} x_{11} & x_{21} \\ x_{12} & x_{22} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} XTAX=[x11x12x21x22][a11a21a12a22][x11x21x12x22]
这个乘积的结果是一个2x2的矩阵,其元素是通过以下方式得到的(注意,这里的计算可能有误)
:
(
X
T
A
X
)
11
=
x
11
(
a
11
x
11
+
a
12
x
21
)
+
x
21
(
a
21
x
11
+
a
22
x
21
)
(X^TAX)_{11} = x_{11}(a_{11}x_{11} + a_{12}x_{21}) + x_{21}(a_{21}x_{11} + a_{22}x_{21})
(XTAX)11=x11(a11x11+a12x21)+x21(a21x11+a22x21)
(
X
T
A
X
)
12
=
x
11
(
a
11
x
12
+
a
12
x
22
)
+
x
21
(
a
21
x
12
+
a
22
x
22
)
(X^TAX)_{12} = x_{11}(a_{11}x_{12} + a_{12}x_{22}) + x_{21}(a_{21}x_{12} + a_{22}x_{22})
(XTAX)12=x11(a11x12+a12x22)+x21(a21x12+a22x22)
(
X
T
A
X
)
21
=
x
12
(
a
11
x
11
+
a
12
x
21
)
+
x
22
(
a
21
x
11
+
a
22
x
21
)
(X^TAX)_{21} = x_{12}(a_{11}x_{11} + a_{12}x_{21}) + x_{22}(a_{21}x_{11} + a_{22}x_{21})
(XTAX)21=x12(a11x11+a12x21)+x22(a21x11+a22x21)
(
X
T
A
X
)
22
=
x
12
(
a
11
x
12
+
a
12
x
22
)
+
x
22
(
a
21
x
12
+
a
22
x
22
)
(X^TAX)_{22} = x_{12}(a_{11}x_{12} + a_{12}x_{22}) + x_{22}(a_{21}x_{12} + a_{22}x_{22})
(XTAX)22=x12(a11x12+a12x22)+x22(a21x12+a22x22)
接着, X T A X X^TAX XTAX的迹为对角线元素之和:
t r ( X T A X ) = ( X T A X ) 11 + ( X T A X ) 22 tr(X^TAX) = (X^TAX)_{11} + (X^TAX)_{22} tr(XTAX)=(XTAX)11+(XTAX)22
现在,我们要计算 ∂ t r ( X T A X ) ∂ x i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} ∂xij∂tr(XTAX)对于任意的 i , j i, j i,j。
例如,考虑 ∂ t r ( X T A X ) ∂ x 11 \frac{\partial tr(X^TAX)}{\partial x_{11}} ∂x11∂tr(XTAX):
∂ t r ( X T A X ) ∂ x 11 = ∂ ∂ x 11 ( ( X T A X ) 11 + ( X T A X ) 22 ) \frac{\partial tr(X^TAX)}{\partial x_{11}} = \frac{\partial}{\partial x_{11}} \Big( (X^TAX)_{11} + (X^TAX)_{22} \Big) ∂x11∂tr(XTAX)=∂x11∂((XTAX)11+(XTAX)22)
根据 ( X T A X ) 11 (X^TAX)_{11} (XTAX)11和 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22的表达式,我们可以看到 x 11 x_{11} x11出现在 ( X T A X ) 11 (X^TAX)_{11} (XTAX)11中两次(一次与 a 11 a_{11} a11相乘,一次与 a 21 a_{21} a21相乘),并且在 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22中不出现。因此,
∂ ( X T A X ) 11 ∂ x 11 = a 11 x 11 + a 21 x 21 \frac{\partial (X^TAX)_{11}}{\partial x_{11}} = a_{11}x_{11} + a_{21}x_{21} ∂x11∂(XTAX)11=a11x11+a21x21
对于 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22, x 11 x_{11} x11不参与,所以偏导数为0。于是我们有:
∂ t r ( X T A X ) ∂ x 11 = a 11 x 11 + a 21 x 21 \frac{\partial tr(X^TAX)}{\partial x_{11}} = a_{11}x_{11} + a_{21}x_{21} ∂x11∂tr(XTAX)=a11x11+a21x21
同样的方法可以用于其他元素 x i j x_{ij} xij。对于 ∂ t r ( X T A X ) ∂ x 12 \frac{\partial tr(X^TAX)}{\partial x_{12}} ∂x12∂tr(XTAX),我们会发现 x 12 x_{12} x12只出现在 ( X T A X ) 12 (X^TAX)_{12} (XTAX)12和 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22中,所以:
∂ t r ( X T A X ) ∂ x 12 = a 11 x 12 + a 12 x 22 + a 21 x 12 + a 22 x 22 \frac{\partial tr(X^TAX)}{\partial x_{12}} = a_{11}x_{12} + a_{12}x_{22} + a_{21}x_{12} + a_{22}x_{22} ∂x12∂tr(XTAX)=a11x12+a12x22+a21x12+a22x22
简化后就是 [ A X ] 12 + [ A T X ] 12 [AX]_{12} + [A^TX]_{12} [AX]12+[ATX]12。
按照这种方式,我们可以得出一般形式的结论:
∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} = \sum_{q=1}^m a_{iq}x_{qj} + \sum_{p=1}^m a_{pi}x_{pj} = [AX + A^TX]_{ij} ∂xij∂tr(XTAX)=∑q=1maiqxqj+∑p=1mapixpj=[AX+ATX]ij
这就是求导法则2所表达的内容:关于 X X X中元素 x i j x_{ij} xij的偏导数等于 A X AX AX与 A T X A^TX ATX在位置 ( i , j ) (i,j) (i,j)处的元素之和。
1 2 ∥ Y − A X ∥ F 2 = 1 2 t r [ ( Y − A X ) T ( Y − A X ) ] = 1 2 t r [ ( Y T − X T A T ) ( Y − A X ) ] \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2} tr[(Y-AX)^{T}(Y-AX)]=\frac{1}{2} tr[(Y^{T}-X^{T}A^{T})(Y-AX)] 21 Y−AX F2=21tr[(Y−AX)T(Y−AX)]=21tr[(YT−XTAT)(Y−AX)]
例题1:
设
Y
=
[
y
i
j
]
l
×
n
,
A
=
[
a
i
j
]
l
×
m
,
X
=
[
x
i
j
]
m
×
n
,
求
∂
(
1
2
∥
Y
−
A
X
∥
F
2
)
∂
x
i
j
\text{设} Y=\left[ y_{ij} \right]_{l\times n} , A=[a_{ij}]_{l\times m} , X=[x_{ij} ]_{m\times n} , \text{求} \frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial x_{ij}}
设Y=[yij]l×n,A=[aij]l×m,X=[xij]m×n,求∂xij∂(21∥Y−AX∥F2)
解: 1 2 ∥ Y − A X ∥ F 2 = 1 2 t r [ ( Y − A X ) T ( Y − A X ) ] = 1 2 t r [ ( Y T − X T A T ) ( Y − A X ) ] = 1 2 [ t r ( Y T Y ) − t r ( X T A T Y ) − t r ( Y T A X ) + t r ( X T A T A X ) ] = 1 2 [ t r ( Y T Y ) − t r ( X T A T Y ) − t r ( X T A T Y ) + t r ( X T A T A X ) ] = 1 2 t r ( Y T Y ) − t r ( X T A T Y ) + 1 2 t r ( X T A T A X ) 。 \begin{gathered} \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2} tr[(Y-AX)^{T}(Y-AX)]=\frac{1}{2} tr[(Y^{T}-X^{T}A^{T})(Y-AX)] \\ =\frac{1}{2}[tr(Y^{T}Y)-tr(X^{T}A^{T}Y)-tr(Y^{T}AX)+tr(X^{T}A^{T}AX)] \\ =\frac{1}{2}[tr(Y^{T}Y)-tr(X^{T}A^{T}Y)-tr(X^{T}A^{T}Y)+tr(X^{T}A^{T}AX)] \\ =\frac{1}{2}tr(Y^{T}Y)-tr(X^{T}A^{T}Y)+\frac{1}{2}tr(X^{T}A^{T}AX) 。 \end{gathered} 21 Y−AX F2=21tr[(Y−AX)T(Y−AX)]=21tr[(YT−XTAT)(Y−AX)]=21[tr(YTY)−tr(XTATY)−tr(YTAX)+tr(XTATAX)]=21[tr(YTY)−tr(XTATY)−tr(XTATY)+tr(XTATAX)]=21tr(YTY)−tr(XTATY)+21tr(XTATAX)。
这里注意, t r ( Y T A X ) = t r ( [ Y T A X ] T ) = t r ( X T A T Y ) tr(Y^TAX) = tr([Y^TAX]^T) = tr(X^TA^TY) tr(YTAX)=tr([YTAX]T)=tr(XTATY)
其中, Y T Y Y^TY YTY中不含 x i j x_{ij} xij,所以 ∂ t r ( Y T Y ) ∂ x i j = 0 \frac{\partial tr(Y^TY)}{\partial x_{ij}}=0 ∂xij∂tr(YTY)=0
把 t r ( X T A T Y ) tr(X^TA^TY) tr(XTATY)中的 A T Y A^TY ATY看作时上面求导法则1 中的 A,则有: ∂ t r ( X T A T Y ) ∂ x i j = [ A T Y ] i j \frac{\partial tr(X^TA^TY)}{\partial x_{ij}}=[A^TY]_{ij} ∂xij∂tr(XTATY)=[ATY]ij
把 t r ( X T A T A X ) tr(X^TA^TAX) tr(XTATAX)中 A T A A^TA ATA看着时上面求导法则2 中的 A,则有: ∂ t r ( X T A T A X ) ∂ x i j = [ A T A X + ( A T A ) T X ] i j = [ A T A X + A T A X ] i j = [ 2 A T A X ] i j \frac{\partial tr(X^TA^TAX)}{\partial x_{ij}}=[A^TAX+(A^TA)^TX]_{ij}=[A^TAX+A^TAX]_{ij}=[2A^TAX]_{ij} ∂xij∂tr(XTATAX)=[ATAX+(ATA)TX]ij=[ATAX+ATAX]ij=[2ATAX]ij
所以有 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ x i j = 1 2 ⋅ ∂ t r ( Y T Y ) ∂ x i j − ∂ t r ( X T A T Y ) ∂ x i j + 1 2 ⋅ ∂ t r ( X T A T A X ) ∂ x i j = 1 2 ⋅ 0 − [ A T Y ] i j + 1 2 [ 2 A T A X ] i j = [ − A T Y + A T A X ] i j 。 \begin{aligned}&\frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial x_{ij}}=\frac{1}{2}\cdot\frac{\partial tr(Y^{T}Y)}{\partial x_{ij}}-\frac{\partial tr(X^{T}A^{T}Y)}{\partial x_{ij}}+\frac{1}{2}\cdot\frac{\partial tr(X^{T}A^{T}AX)}{\partial x_{ij}}\\&=\frac{1}{2}\cdot0-[A^{T}Y]_{ij}+\frac{1}{2}[2A^{T}AX]_{ij}=[-A^{T}Y+A^{T}AX]_{ij} 。\end{aligned} ∂xij∂(21∥Y−AX∥F2)=21⋅∂xij∂tr(YTY)−∂xij∂tr(XTATY)+21⋅∂xij∂tr(XTATAX)=21⋅0−[ATY]ij+21[2ATAX]ij=[−ATY+ATAX]ij。
根据上述的内容,我们可得如下规律:
∂ t r ( X T A T ) ∂ x i j = [ A T ] i j ∂ t r ( A T X ) ∂ x i j = [ ( A T ) T ] i j = [ A ] i j ∂ t r ( X A T ) ∂ x i j = [ A ] j i ∂ t r ( X T A X ) ∂ x i j = [ A X + ( X T A ) T ] i j = [ A X + A T X ] i j = [ A + A T ] X \frac{\partial tr(X^TA^T)}{\partial x_{ij}}=[A^T]_{ij}\\ \frac{\partial tr(A^TX)}{\partial x_{ij}}=[(A^T)^T]_{ij} = [A]_{ij}\\ \frac{\partial tr(XA^T)}{\partial x_{ij}}=[A]_{ji}\\ \frac{\partial tr(X^TAX)}{\partial x_{ij}}=[AX+(X^TA)^T]_{ij}= [AX+A^TX]_{ij} = [A+A^T]X ∂xij∂tr(XTAT)=[AT]ij∂xij∂tr(ATX)=[(AT)T]ij=[A]ij∂xij∂tr(XAT)=[A]ji∂xij∂tr(XTAX)=[AX+(XTA)T]ij=[AX+ATX]ij=[A+AT]X
请注意: t r ( B C ) = t r ( C B ) tr(BC) = tr(CB) tr(BC)=tr(CB)
例题2:
设
Y
=
[
y
i
j
]
l
×
n
,
A
=
[
a
i
j
]
l
×
m
,
X
=
[
x
i
j
]
m
×
n
,
求
∂
(
1
2
∥
Y
−
A
X
∥
F
2
)
∂
a
i
j
\text{设} Y=\left[ y_{ij} \right]_{l\times n} , A=\left[ a_{ij} \right]_{l\times m} , X=\left[ x_{ij} \right]_{m\times n} , \text{求} \frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial a_{ij}}
设Y=[yij]l×n,A=[aij]l×m,X=[xij]m×n,求∂aij∂(21∥Y−AX∥F2)
解: 1 2 ∥ Y − A X ∥ F 2 = 1 2 t r ( Y T Y ) − t r ( X T A T Y ) + 1 2 t r ( X T A T A X ) = 1 2 t r ( Y T Y ) − t r ( A T Y X T ) + 1 2 t r ( A X X T A T ) 。 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ a i j = 1 2 ⋅ ∂ t r ( Y T Y ) ∂ a i j − ∂ t r ( A T Y X T ) ∂ a i j + 1 2 ⋅ ∂ t r ( A X X T A T ) ∂ a i j = 1 2 ⋅ 0 − [ Y X T ] i j + 1 2 [ 2 ( X X T A T ) T ] i j = [ − Y X T + A X X T ] i j 。 \begin{gathered} \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2}tr(Y^{T}Y)-tr(X^{T}A^{T}Y)+\frac{1}{2}tr(X^{T}A^{T}AX) \\ =\frac{1}{2}tr(Y^{T}Y)-tr(A^{T}YX^{T})+\frac{1}{2}tr(AXX^{T}A^{T}) 。 \\ \frac{\partial(\frac{1}{2}\|Y-AX\|_{F}^{2})}{\partial a_{ij}}=\frac{1}{2}\cdot\frac{\partial tr(Y^{T}Y)}{\partial a_{ij}}-\frac{\partial tr(A^{T}YX^{T})}{\partial a_{ij}}+\frac{1}{2}\cdot\frac{\partial tr(AXX^{T}A^{T})}{\partial a_{ij}} \\ =\frac{1}{2}\cdot0-[YX^{T}]_{ij}+\frac{1}{2}[2(XX^{T}A^{T})^{T}]_{ij}=[-YX^{T}+AXX^{T}]_{ij} 。 \end{gathered} 21 Y−AX F2=21tr(YTY)−tr(XTATY)+21tr(XTATAX)=21tr(YTY)−tr(ATYXT)+21tr(AXXTAT)。∂aij∂(21∥Y−AX∥F2)=21⋅∂aij∂tr(YTY)−∂aij∂tr(ATYXT)+21⋅∂aij∂tr(AXXTAT)=21⋅0−[YXT]ij+21[2(XXTAT)T]ij=[−YXT+AXXT]ij。
注 上面把 t r ( A X X T A T ) tr(AXX^TA^T) tr(AXXTAT)中的 A T A^T AT和 X X T XX^T XXT看着求导法则2中的 X 和 A,因为 A T A^T AT相对于法则中的 X 做了一个转置,所以对 a i j a_{ij} aij求偏导的结果中也要作一个转置
定理:
t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)
t r ( A B C ) = t r ( C A B ) = t r ( B C A ) tr(ABC) = tr(CAB) = tr(BCA) tr(ABC)=tr(CAB)=tr(BCA)
t r ( A ) = t r ( A T ) tr(A)=tr(A^T) tr(A)=tr(AT)
∂ t r ( X B ) ∂ X = ∂ t r ( B X ) ∂ X = B T \frac{\partial tr(XB)}{\partial X} = \frac{\partial tr(BX)}{\partial X} = B^T ∂X∂tr(XB)=∂X∂tr(BX)=BT
∂ t r ( X T B ) ∂ X = ∂ t r ( B X T ) ∂ X = B \frac{\partial tr(X^TB)}{\partial X} = \frac{\partial tr(BX^T)}{\partial X} = B ∂X∂tr(XTB)=∂X∂tr(BXT)=B
∂ t r ( X ) ∂ X = I ( 单位矩阵 ) \frac{\partial tr(X)}{\partial X} = I(单位矩阵) ∂X∂tr(X)=I(单位矩阵)
∂ t r ( A T X B T ) ∂ X = ∂ t r ( B X T A ) ∂ X = A B \frac{\partial tr(A^TXB^T)}{\partial X} = \frac{\partial tr(BX^TA)}{\partial X} = AB ∂X∂tr(ATXBT)=∂X∂tr(BXTA)=AB
∂ t r ( A X B X T ) ∂ X = A X B + A T X B T \frac{\partial tr(AXBX^T)}{\partial X} = AXB + A^TXB^T ∂X∂tr(AXBXT)=AXB+ATXBT
∂ t r ( A X B X ) ∂ X = A T X T B T + B T X T A T \frac{\partial tr(AXBX)}{\partial X} = A^TX^TB^T + B^TX^TA^T ∂X∂tr(AXBX)=ATXTBT+BTXTAT