矩阵的 Frobenius 范数及其求偏导法则

矩阵的 Frobenius 范数及其求偏导法则

定义:设 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n是一个 m × n m \times n m×n矩阵,称 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} AF=tr(ATA) =i=1mj=1naij2 是这个矩阵得Frobenius范数

  • 求导法则1 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n X = [ x i j ] m × n X = [x_{ij}]_{m \times n} X=[xij]m×n是两个 m × n m \times n m×n的矩阵,则有

∂ t r ( A T X ) ∂ x i j = ∂ t r ( X T A ) ∂ x i j = a i j = [ A i j ] \frac{\partial tr(A^TX)}{\partial x_{ij}} = \frac{\partial tr(X^TA)}{\partial x_{ij}} = a_{ij} = [A_{ij}] xijtr(ATX)=xijtr(XTA)=aij=[Aij]

证:因 A T X = [ ∑ p = 1 m a p i x p j ] n × n A^TX = \begin{bmatrix}\displaystyle\sum_{p=1}^{m}a_{pi}x_{pj} \end{bmatrix}_{n \times n} ATX=[p=1mapixpj]n×n t r ( A T X ) = ∑ t = 1 n ∑ p = 1 m a p t x p t tr(A^TX) = \displaystyle\sum_{t=1}^{n}\displaystyle\sum_{p=1}^{m}a_{pt}x_{pt} tr(ATX)=t=1np=1maptxpt

当 p = i,t = j 时有 ∂ ( a p t x p t ) ∂ ( x i j ) = ∂ ( a i j x i j ) ∂ ( x i j ) = a i j = [ A i j ] \frac{\partial (a_{pt}x_{pt})}{\partial (x_{ij})} = \frac{\partial (a_{ij}x_{ij})}{\partial (x_{ij})} = a_{ij} = [A_{ij}] (xij)(aptxpt)=(xij)(aijxij)=aij=[Aij]

t r ( A T X ) tr(A^TX) tr(ATX)求和式中其他各项的偏导数都等于0,所以有 ∂ ( A T X ) ∂ x i j = ∂ ∑ p = 1 n ∑ q = 1 n ∑ t = 1 m a t p x t q ∂ x i j = a i j = [ A ] i j \frac{\partial (A^TX)}{\partial x_{ij}} = \frac{\partial \displaystyle\sum_{p=1}^{n}\displaystyle\sum_{q=1}^{n}\displaystyle\sum_{t=1}^{m}a_{tp}x_{tq}}{\partial x_{ij}} = a_{ij} = [A]_{ij} xij(ATX)=xijp=1nq=1nt=1matpxtq=aij=[A]ij

  • 具体例子

让我们通过一个具体的例子来更好地理解这个求导过程。假设我们有一个2x2的矩阵 A A A和另一个2x2的矩阵 X X X,它们的形式如下:

A = [ a 11 a 12 a 21 a 22 ] , X = [ x 11 x 12 x 21 x 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad X = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} A=[a11a21a12a22],X=[x11x21x12x22]

现在我们要计算 ∂ t r ( A T X ) ∂ x i j \frac{\partial tr(A^TX)}{\partial x_{ij}} xijtr(ATX)对于任意的 i , j i, j i,j

首先, A T A^T AT A A A的转置,因此

A T = [ a 11 a 21 a 12 a 22 ] A^T = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} AT=[a11a12a21a22]

接着, A T X A^TX ATX的乘积是:

A T X = [ a 11 a 21 a 12 a 22 ] [ x 11 x 12 x 21 x 22 ] = [ a 11 x 11 + a 21 x 21 a 11 x 12 + a 21 x 22 a 12 x 11 + a 22 x 21 a 12 x 12 + a 22 x 22 ] A^TX = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} = \begin{bmatrix} a_{11}x_{11} + a_{21}x_{21} & a_{11}x_{12} + a_{21}x_{22} \\ a_{12}x_{11} + a_{22}x_{21} & a_{12}x_{12} + a_{22}x_{22} \end{bmatrix} ATX=[a11a12a21a22][x11x21x12x22]=[a11x11+a21x21a12x11+a22x21a11x12+a21x22a12x12+a22x22]

矩阵 A T X A^TX ATX的迹(即对角元素之和)为:

t r ( A T X ) = ( a 11 x 11 + a 21 x 21 ) + ( a 12 x 12 + a 22 x 22 ) tr(A^TX) = (a_{11}x_{11} + a_{21}x_{21}) + (a_{12}x_{12} + a_{22}x_{22}) tr(ATX)=(a11x11+a21x21)+(a12x12+a22x22)

现在,我们要计算 ∂ t r ( A T X ) ∂ x i j \frac{\partial tr(A^TX)}{\partial x_{ij}} xijtr(ATX)。根据 i , j i, j i,j的不同取值,我们可以分别计算出:

  • i = 1 , j = 1 i = 1, j = 1 i=1,j=1时,

∂ t r ( A T X ) ∂ x 11 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 11 = a 11 \frac{\partial tr(A^TX)}{\partial x_{11}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{11}} = a_{11} x11tr(ATX)=x11(a11x11+a21x21+a12x12+a22x22)=a11

  • i = 1 , j = 2 i = 1, j = 2 i=1,j=2时,

∂ t r ( A T X ) ∂ x 12 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 12 = a 12 \frac{\partial tr(A^TX)}{\partial x_{12}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{12}} = a_{12} x12tr(ATX)=x12(a11x11+a21x21+a12x12+a22x22)=a12

  • i = 2 , j = 1 i = 2, j = 1 i=2,j=1时,

∂ t r ( A T X ) ∂ x 21 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 21 = a 21 \frac{\partial tr(A^TX)}{\partial x_{21}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{21}} = a_{21} x21tr(ATX)=x21(a11x11+a21x21+a12x12+a22x22)=a21

  • i = 2 , j = 2 i = 2, j = 2 i=2,j=2时,

∂ t r ( A T X ) ∂ x 22 = ∂ ( a 11 x 11 + a 21 x 21 + a 12 x 12 + a 22 x 22 ) ∂ x 22 = a 22 \frac{\partial tr(A^TX)}{\partial x_{22}} = \frac{\partial (a_{11}x_{11} + a_{21}x_{21} + a_{12}x_{12} + a_{22}x_{22})}{\partial x_{22}} = a_{22} x22tr(ATX)=x22(a11x11+a21x21+a12x12+a22x22)=a22

因此 ∂ t r ( A T X ) ∂ x i j = A = [ a 11 a 12 a 21 a 22 ] \frac{\partial tr(A^TX)}{\partial x_{ij}} = A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} xijtr(ATX)=A=[a11a21a12a22]

从上面的例子可以看出,对于任何 i , j i, j i,j,都有 ∂ t r ( A T X ) ∂ x i j = a i j \frac{\partial tr(A^TX)}{\partial x_{ij}} = a_{ij} xijtr(ATX)=aij,这正是 A A A矩阵中的对应元素。这就是求导法则1所表达的意思:矩阵 A A A中的每个元素就是关于 X X X相应位置元素的偏导数。

其中,对于 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} AF=tr(ATA) =i=1mj=1naij2 ,我们通过结合具体例子进行理解。

这里请注意: ∥ A ∥ F = t r ( A T A ) \left \|A \right \|_F = \sqrt{tr(A^TA)} AF=tr(ATA)

  • 具体例子

Frobenius范数是对矩阵元素的平方和开方(因此其就是一个实数),它衡量的是矩阵所有元素的欧几里得长度。通过一个具体的2x2矩阵来理解这个概念。

假设我们有一个2x2的矩阵 A A A

A = [ a 11 a 12 a 21 a 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} A=[a11a21a12a22]

那么 A T A^T AT A A A的转置)就是:

A T = [ a 11 a 21 a 12 a 22 ] A^T = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} AT=[a11a12a21a22]

现在计算 A T A A^TA ATA的乘积:

A T A = [ a 11 a 21 a 12 a 22 ] [ a 11 a 12 a 21 a 22 ] = [ a 11 2 + a 21 2 a 11 a 12 + a 21 a 22 a 11 a 12 + a 21 a 22 a 12 2 + a 22 2 ] A^TA = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} = \begin{bmatrix} a_{11}^2 + a_{21}^2 & a_{11}a_{12} + a_{21}a_{22} \\ a_{11}a_{12} + a_{21}a_{22} & a_{12}^2 + a_{22}^2 \end{bmatrix} ATA=[a11a12a21a22][a11a21a12a22]=[a112+a212a11a12+a21a22a11a12+a21a22a122+a222]

接着,计算 A T A A^TA ATA的迹(对角线元素之和):

t r ( A T A ) = ( a 11 2 + a 21 2 ) + ( a 12 2 + a 22 2 ) = a 11 2 + a 21 2 + a 12 2 + a 22 2 tr(A^TA) = (a_{11}^2 + a_{21}^2) + (a_{12}^2 + a_{22}^2) = a_{11}^2 + a_{21}^2 + a_{12}^2 + a_{22}^2 tr(ATA)=(a112+a212)+(a122+a222)=a112+a212+a122+a222

最后,Frobenius范数 ∥ A ∥ F \|A\|_F AF就是上述迹的平方根:

∥ A ∥ F = t r ( A T A ) = a 11 2 + a 21 2 + a 12 2 + a 22 2 \|A\|_F = \sqrt{tr(A^TA)} = \sqrt{a_{11}^2 + a_{21}^2 + a_{12}^2 + a_{22}^2} AF=tr(ATA) =a112+a212+a122+a222

这就是矩阵 A A A所有元素的平方和再开方的结果。换句话说,它是矩阵 A A A中所有元素的平方和的平方根。

举个数值例子,如果

A = [ 1 2 3 4 ] A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} A=[1324]

那么

A T A = [ 1 3 2 4 ] [ 1 2 3 4 ] = [ 10 14 14 20 ] A^TA = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 10 & 14 \\ 14 & 20 \end{bmatrix} ATA=[1234][1324]=[10141420]

并且

t r ( A T A ) = 10 + 20 = 30 tr(A^TA) = 10 + 20 = 30 tr(ATA)=10+20=30

因此,

∥ A ∥ F = 30 \|A\|_F = \sqrt{30} AF=30

这样我们就得到了矩阵 A A A的Frobenius范数。

定义:设 A = [ a i j ] m × n A = [a_{ij}]_{m \times n} A=[aij]m×n是一个 m × n m \times n m×n矩阵,称 ∥ A ∥ F = t r ( A T A ) = ∑ i = 1 m ∑ j = 1 n a i j 2 \left \|A \right \|_F = \sqrt{tr(A^TA)} = \sqrt{\displaystyle\sum_{i=1}^{m}\displaystyle\sum_{j=1}^{n}a_{ij}^2} AF=tr(ATA) =i=1mj=1naij2 是这个矩阵得Frobenius范数

  • 求导法则 2 A = [ a i j ] m × m A = [a_{ij}]_{m \times m} A=[aij]m×m m × m m \times m m×m矩阵, X = [ x i j ] m × n X = [x_{ij}]_{m \times n} X=[xij]m×n m × n m \times n m×n矩阵,则有

∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}}=\sum_{q=1}^ma_{iq}x_{qj}+\sum_{p=1}^ma_{pi}x_{pj}=\Big[ AX+A^TX\Big]_{ij} xijtr(XTAX)=q=1maiqxqj+p=1mapixpj=[AX+ATX]ij

证:因为 X T A X = [ ∑ p = 1 m ∑ q = 1 m x p i a p q x q j ] m × n , t r ( X T A X ) = ∑ t = 1 n ∑ p = 1 m ∑ q = 1 m x p i a p q x q t \quad X^{T}AX=\left[\sum_{p=1}^{m}\sum_{q=1}^{m}x_{pi}a_{pq}x_{qj}\right]_{m\times n},\quad tr(X^{T}AX)=\sum_{t=1}^{n}\sum_{p=1}^{m}\sum_{q=1}^{m}x_{pi}a_{pq}x_{qt}\quad XTAX=[p=1mq=1mxpiapqxqj]m×n,tr(XTAX)=t=1np=1mq=1mxpiapqxqt

当 p = i,t = j 时有 ∂ ( ∑ q = 1 m x i j a i q x q j ) ∂ x i j = ∑ q = 1 m a i q x q j = [ A X ] i j \frac{\partial(\sum_{q=1}^{m}x_{ij}a_{iq}x_{qj})}{\partial x_{ij}}=\sum_{q=1}^{m}a_{iq}x_{qj}=[AX]_{ij} xij(q=1mxijaiqxqj)=q=1maiqxqj=[AX]ij

当 q = i,t = j 时有 ∂ ( ∑ p = 1 m x p j a p i x i j ) ∂ x i j = ∑ p = 1 m a p i x p j = [ A T X ] i j \frac{\partial(\sum_{p=1}^mx_{pj}a_{pi}x_{ij})}{\partial x_{ij}}=\sum_{p=1}^ma_{pi}x_{pj}=\left[A^TX\right]_{ij} xij(p=1mxpjapixij)=p=1mapixpj=[ATX]ij

t r ( X T A X ) tr(X^TAX) tr(XTAX)求和式中其他各项的偏导数都等于0,所以有 ∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}}=\sum_{q=1}^ma_{iq}x_{qj}+\sum_{p=1}^ma_{pi}x_{pj}=\Big[ AX+A^TX\Big]_{ij} xijtr(XTAX)=q=1maiqxqj+p=1mapixpj=[AX+ATX]ij

  • 具体例子

我们可以通过一个具体的2x2矩阵的例子来说明求导法则2。假设 A A A是一个2x2的方阵,而 X X X也是一个2x2的矩阵,那么我们可以具体计算 ∂ t r ( X T A X ) ∂ x i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} xijtr(XTAX)

A = [ a 11 a 12 a 21 a 22 ] , X = [ x 11 x 12 x 21 x 22 ] A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad X = \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} A=[a11a21a12a22],X=[x11x21x12x22]

首先,我们需要计算 X T A X X^TAX XTAX

X T = [ x 11 x 21 x 12 x 22 ] X^T = \begin{bmatrix} x_{11} & x_{21} \\ x_{12} & x_{22} \end{bmatrix} XT=[x11x12x21x22]

X T A X = [ x 11 x 21 x 12 x 22 ] [ a 11 a 12 a 21 a 22 ] [ x 11 x 12 x 21 x 22 ] X^TAX = \begin{bmatrix} x_{11} & x_{21} \\ x_{12} & x_{22} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} \\ x_{21} & x_{22} \end{bmatrix} XTAX=[x11x12x21x22][a11a21a12a22][x11x21x12x22]

这个乘积的结果是一个2x2的矩阵,其元素是通过以下方式得到的(注意,这里的计算可能有误)

( X T A X ) 11 = x 11 ( a 11 x 11 + a 12 x 21 ) + x 21 ( a 21 x 11 + a 22 x 21 ) (X^TAX)_{11} = x_{11}(a_{11}x_{11} + a_{12}x_{21}) + x_{21}(a_{21}x_{11} + a_{22}x_{21}) (XTAX)11=x11(a11x11+a12x21)+x21(a21x11+a22x21)
( X T A X ) 12 = x 11 ( a 11 x 12 + a 12 x 22 ) + x 21 ( a 21 x 12 + a 22 x 22 ) (X^TAX)_{12} = x_{11}(a_{11}x_{12} + a_{12}x_{22}) + x_{21}(a_{21}x_{12} + a_{22}x_{22}) (XTAX)12=x11(a11x12+a12x22)+x21(a21x12+a22x22)
( X T A X ) 21 = x 12 ( a 11 x 11 + a 12 x 21 ) + x 22 ( a 21 x 11 + a 22 x 21 ) (X^TAX)_{21} = x_{12}(a_{11}x_{11} + a_{12}x_{21}) + x_{22}(a_{21}x_{11} + a_{22}x_{21}) (XTAX)21=x12(a11x11+a12x21)+x22(a21x11+a22x21)
( X T A X ) 22 = x 12 ( a 11 x 12 + a 12 x 22 ) + x 22 ( a 21 x 12 + a 22 x 22 ) (X^TAX)_{22} = x_{12}(a_{11}x_{12} + a_{12}x_{22}) + x_{22}(a_{21}x_{12} + a_{22}x_{22}) (XTAX)22=x12(a11x12+a12x22)+x22(a21x12+a22x22)

接着, X T A X X^TAX XTAX的迹为对角线元素之和:

t r ( X T A X ) = ( X T A X ) 11 + ( X T A X ) 22 tr(X^TAX) = (X^TAX)_{11} + (X^TAX)_{22} tr(XTAX)=(XTAX)11+(XTAX)22

现在,我们要计算 ∂ t r ( X T A X ) ∂ x i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} xijtr(XTAX)对于任意的 i , j i, j i,j

例如,考虑 ∂ t r ( X T A X ) ∂ x 11 \frac{\partial tr(X^TAX)}{\partial x_{11}} x11tr(XTAX)

∂ t r ( X T A X ) ∂ x 11 = ∂ ∂ x 11 ( ( X T A X ) 11 + ( X T A X ) 22 ) \frac{\partial tr(X^TAX)}{\partial x_{11}} = \frac{\partial}{\partial x_{11}} \Big( (X^TAX)_{11} + (X^TAX)_{22} \Big) x11tr(XTAX)=x11((XTAX)11+(XTAX)22)

根据 ( X T A X ) 11 (X^TAX)_{11} (XTAX)11 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22的表达式,我们可以看到 x 11 x_{11} x11出现在 ( X T A X ) 11 (X^TAX)_{11} (XTAX)11中两次(一次与 a 11 a_{11} a11相乘,一次与 a 21 a_{21} a21相乘),并且在 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22中不出现。因此,

∂ ( X T A X ) 11 ∂ x 11 = a 11 x 11 + a 21 x 21 \frac{\partial (X^TAX)_{11}}{\partial x_{11}} = a_{11}x_{11} + a_{21}x_{21} x11(XTAX)11=a11x11+a21x21

对于 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22 x 11 x_{11} x11不参与,所以偏导数为0。于是我们有:

∂ t r ( X T A X ) ∂ x 11 = a 11 x 11 + a 21 x 21 \frac{\partial tr(X^TAX)}{\partial x_{11}} = a_{11}x_{11} + a_{21}x_{21} x11tr(XTAX)=a11x11+a21x21

同样的方法可以用于其他元素 x i j x_{ij} xij。对于 ∂ t r ( X T A X ) ∂ x 12 \frac{\partial tr(X^TAX)}{\partial x_{12}} x12tr(XTAX),我们会发现 x 12 x_{12} x12只出现在 ( X T A X ) 12 (X^TAX)_{12} (XTAX)12 ( X T A X ) 22 (X^TAX)_{22} (XTAX)22中,所以:

∂ t r ( X T A X ) ∂ x 12 = a 11 x 12 + a 12 x 22 + a 21 x 12 + a 22 x 22 \frac{\partial tr(X^TAX)}{\partial x_{12}} = a_{11}x_{12} + a_{12}x_{22} + a_{21}x_{12} + a_{22}x_{22} x12tr(XTAX)=a11x12+a12x22+a21x12+a22x22

简化后就是 [ A X ] 12 + [ A T X ] 12 [AX]_{12} + [A^TX]_{12} [AX]12+[ATX]12

按照这种方式,我们可以得出一般形式的结论:

∂ t r ( X T A X ) ∂ x i j = ∑ q = 1 m a i q x q j + ∑ p = 1 m a p i x p j = [ A X + A T X ] i j \frac{\partial tr(X^TAX)}{\partial x_{ij}} = \sum_{q=1}^m a_{iq}x_{qj} + \sum_{p=1}^m a_{pi}x_{pj} = [AX + A^TX]_{ij} xijtr(XTAX)=q=1maiqxqj+p=1mapixpj=[AX+ATX]ij

这就是求导法则2所表达的内容:关于 X X X中元素 x i j x_{ij} xij的偏导数等于 A X AX AX A T X A^TX ATX在位置 ( i , j ) (i,j) (i,j)处的元素之和。

1 2 ∥ Y − A X ∥ F 2 = 1 2 t r [ ( Y − A X ) T ( Y − A X ) ] = 1 2 t r [ ( Y T − X T A T ) ( Y − A X ) ] \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2} tr[(Y-AX)^{T}(Y-AX)]=\frac{1}{2} tr[(Y^{T}-X^{T}A^{T})(Y-AX)] 21 YAX F2=21tr[(YAX)T(YAX)]=21tr[(YTXTAT)(YAX)]

例题1: 设 Y = [ y i j ] l × n , A = [ a i j ] l × m , X = [ x i j ] m × n , 求 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ x i j \text{设} Y=\left[ y_{ij} \right]_{l\times n} , A=[a_{ij}]_{l\times m} , X=[x_{ij} ]_{m\times n} , \text{求} \frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial x_{ij}} Y=[yij]l×n,A=[aij]l×m,X=[xij]m×n,xij(21YAXF2)

解: 1 2 ∥ Y − A X ∥ F 2 = 1 2 t r [ ( Y − A X ) T ( Y − A X ) ] = 1 2 t r [ ( Y T − X T A T ) ( Y − A X ) ] = 1 2 [ t r ( Y T Y ) − t r ( X T A T Y ) − t r ( Y T A X ) + t r ( X T A T A X ) ] = 1 2 [ t r ( Y T Y ) − t r ( X T A T Y ) − t r ( X T A T Y ) + t r ( X T A T A X ) ] = 1 2 t r ( Y T Y ) − t r ( X T A T Y ) + 1 2 t r ( X T A T A X ) 。 \begin{gathered} \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2} tr[(Y-AX)^{T}(Y-AX)]=\frac{1}{2} tr[(Y^{T}-X^{T}A^{T})(Y-AX)] \\ =\frac{1}{2}[tr(Y^{T}Y)-tr(X^{T}A^{T}Y)-tr(Y^{T}AX)+tr(X^{T}A^{T}AX)] \\ =\frac{1}{2}[tr(Y^{T}Y)-tr(X^{T}A^{T}Y)-tr(X^{T}A^{T}Y)+tr(X^{T}A^{T}AX)] \\ =\frac{1}{2}tr(Y^{T}Y)-tr(X^{T}A^{T}Y)+\frac{1}{2}tr(X^{T}A^{T}AX) 。 \end{gathered} 21 YAX F2=21tr[(YAX)T(YAX)]=21tr[(YTXTAT)(YAX)]=21[tr(YTY)tr(XTATY)tr(YTAX)+tr(XTATAX)]=21[tr(YTY)tr(XTATY)tr(XTATY)+tr(XTATAX)]=21tr(YTY)tr(XTATY)+21tr(XTATAX)

这里注意, t r ( Y T A X ) = t r ( [ Y T A X ] T ) = t r ( X T A T Y ) tr(Y^TAX) = tr([Y^TAX]^T) = tr(X^TA^TY) tr(YTAX)=tr([YTAX]T)=tr(XTATY)

其中, Y T Y Y^TY YTY中不含 x i j x_{ij} xij,所以 ∂ t r ( Y T Y ) ∂ x i j = 0 \frac{\partial tr(Y^TY)}{\partial x_{ij}}=0 xijtr(YTY)=0

t r ( X T A T Y ) tr(X^TA^TY) tr(XTATY)中的 A T Y A^TY ATY看作时上面求导法则1 中的 A,则有: ∂ t r ( X T A T Y ) ∂ x i j = [ A T Y ] i j \frac{\partial tr(X^TA^TY)}{\partial x_{ij}}=[A^TY]_{ij} xijtr(XTATY)=[ATY]ij

t r ( X T A T A X ) tr(X^TA^TAX) tr(XTATAX) A T A A^TA ATA看着时上面求导法则2 中的 A,则有: ∂ t r ( X T A T A X ) ∂ x i j = [ A T A X + ( A T A ) T X ] i j = [ A T A X + A T A X ] i j = [ 2 A T A X ] i j \frac{\partial tr(X^TA^TAX)}{\partial x_{ij}}=[A^TAX+(A^TA)^TX]_{ij}=[A^TAX+A^TAX]_{ij}=[2A^TAX]_{ij} xijtr(XTATAX)=[ATAX+(ATA)TX]ij=[ATAX+ATAX]ij=[2ATAX]ij

所以有 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ x i j = 1 2 ⋅ ∂ t r ( Y T Y ) ∂ x i j − ∂ t r ( X T A T Y ) ∂ x i j + 1 2 ⋅ ∂ t r ( X T A T A X ) ∂ x i j = 1 2 ⋅ 0 − [ A T Y ] i j + 1 2 [ 2 A T A X ] i j = [ − A T Y + A T A X ] i j 。 \begin{aligned}&\frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial x_{ij}}=\frac{1}{2}\cdot\frac{\partial tr(Y^{T}Y)}{\partial x_{ij}}-\frac{\partial tr(X^{T}A^{T}Y)}{\partial x_{ij}}+\frac{1}{2}\cdot\frac{\partial tr(X^{T}A^{T}AX)}{\partial x_{ij}}\\&=\frac{1}{2}\cdot0-[A^{T}Y]_{ij}+\frac{1}{2}[2A^{T}AX]_{ij}=[-A^{T}Y+A^{T}AX]_{ij} 。\end{aligned} xij(21YAXF2)=21xijtr(YTY)xijtr(XTATY)+21xijtr(XTATAX)=210[ATY]ij+21[2ATAX]ij=[ATY+ATAX]ij

根据上述的内容,我们可得如下规律:
∂ t r ( X T A T ) ∂ x i j = [ A T ] i j ∂ t r ( A T X ) ∂ x i j = [ ( A T ) T ] i j = [ A ] i j ∂ t r ( X A T ) ∂ x i j = [ A ] j i ∂ t r ( X T A X ) ∂ x i j = [ A X + ( X T A ) T ] i j = [ A X + A T X ] i j = [ A + A T ] X \frac{\partial tr(X^TA^T)}{\partial x_{ij}}=[A^T]_{ij}\\ \frac{\partial tr(A^TX)}{\partial x_{ij}}=[(A^T)^T]_{ij} = [A]_{ij}\\ \frac{\partial tr(XA^T)}{\partial x_{ij}}=[A]_{ji}\\ \frac{\partial tr(X^TAX)}{\partial x_{ij}}=[AX+(X^TA)^T]_{ij}= [AX+A^TX]_{ij} = [A+A^T]X xijtr(XTAT)=[AT]ijxijtr(ATX)=[(AT)T]ij=[A]ijxijtr(XAT)=[A]jixijtr(XTAX)=[AX+(XTA)T]ij=[AX+ATX]ij=[A+AT]X

请注意: t r ( B C ) = t r ( C B ) tr(BC) = tr(CB) tr(BC)=tr(CB)

例题2: 设 Y = [ y i j ] l × n , A = [ a i j ] l × m , X = [ x i j ] m × n , 求 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ a i j \text{设} Y=\left[ y_{ij} \right]_{l\times n} , A=\left[ a_{ij} \right]_{l\times m} , X=\left[ x_{ij} \right]_{m\times n} , \text{求} \frac{\partial(\frac{1}{2}\left\|Y-AX\right\|_{F}^{2})}{\partial a_{ij}} Y=[yij]l×n,A=[aij]l×m,X=[xij]m×n,aij(21YAXF2)

解: 1 2 ∥ Y − A X ∥ F 2 = 1 2 t r ( Y T Y ) − t r ( X T A T Y ) + 1 2 t r ( X T A T A X ) = 1 2 t r ( Y T Y ) − t r ( A T Y X T ) + 1 2 t r ( A X X T A T ) 。 ∂ ( 1 2 ∥ Y − A X ∥ F 2 ) ∂ a i j = 1 2 ⋅ ∂ t r ( Y T Y ) ∂ a i j − ∂ t r ( A T Y X T ) ∂ a i j + 1 2 ⋅ ∂ t r ( A X X T A T ) ∂ a i j = 1 2 ⋅ 0 − [ Y X T ] i j + 1 2 [ 2 ( X X T A T ) T ] i j = [ − Y X T + A X X T ] i j 。 \begin{gathered} \frac{1}{2}\Big\|Y-AX\Big\|_{F}^{2}=\frac{1}{2}tr(Y^{T}Y)-tr(X^{T}A^{T}Y)+\frac{1}{2}tr(X^{T}A^{T}AX) \\ =\frac{1}{2}tr(Y^{T}Y)-tr(A^{T}YX^{T})+\frac{1}{2}tr(AXX^{T}A^{T}) 。 \\ \frac{\partial(\frac{1}{2}\|Y-AX\|_{F}^{2})}{\partial a_{ij}}=\frac{1}{2}\cdot\frac{\partial tr(Y^{T}Y)}{\partial a_{ij}}-\frac{\partial tr(A^{T}YX^{T})}{\partial a_{ij}}+\frac{1}{2}\cdot\frac{\partial tr(AXX^{T}A^{T})}{\partial a_{ij}} \\ =\frac{1}{2}\cdot0-[YX^{T}]_{ij}+\frac{1}{2}[2(XX^{T}A^{T})^{T}]_{ij}=[-YX^{T}+AXX^{T}]_{ij} 。 \end{gathered} 21 YAX F2=21tr(YTY)tr(XTATY)+21tr(XTATAX)=21tr(YTY)tr(ATYXT)+21tr(AXXTAT)aij(21YAXF2)=21aijtr(YTY)aijtr(ATYXT)+21aijtr(AXXTAT)=210[YXT]ij+21[2(XXTAT)T]ij=[YXT+AXXT]ij

注 上面把 t r ( A X X T A T ) tr(AXX^TA^T) tr(AXXTAT)中的 A T A^T AT X X T XX^T XXT看着求导法则2中的 X 和 A,因为 A T A^T AT相对于法则中的 X 做了一个转置,所以对 a i j a_{ij} aij求偏导的结果中也要作一个转置

定理:

t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)

t r ( A B C ) = t r ( C A B ) = t r ( B C A ) tr(ABC) = tr(CAB) = tr(BCA) tr(ABC)=tr(CAB)=tr(BCA)

t r ( A ) = t r ( A T ) tr(A)=tr(A^T) tr(A)=tr(AT)

∂ t r ( X B ) ∂ X = ∂ t r ( B X ) ∂ X = B T \frac{\partial tr(XB)}{\partial X} = \frac{\partial tr(BX)}{\partial X} = B^T Xtr(XB)=Xtr(BX)=BT

∂ t r ( X T B ) ∂ X = ∂ t r ( B X T ) ∂ X = B \frac{\partial tr(X^TB)}{\partial X} = \frac{\partial tr(BX^T)}{\partial X} = B Xtr(XTB)=Xtr(BXT)=B

∂ t r ( X ) ∂ X = I ( 单位矩阵 ) \frac{\partial tr(X)}{\partial X} = I(单位矩阵) Xtr(X)=I(单位矩阵)

∂ t r ( A T X B T ) ∂ X = ∂ t r ( B X T A ) ∂ X = A B \frac{\partial tr(A^TXB^T)}{\partial X} = \frac{\partial tr(BX^TA)}{\partial X} = AB Xtr(ATXBT)=Xtr(BXTA)=AB

∂ t r ( A X B X T ) ∂ X = A X B + A T X B T \frac{\partial tr(AXBX^T)}{\partial X} = AXB + A^TXB^T Xtr(AXBXT)=AXB+ATXBT
∂ t r ( A X B X ) ∂ X = A T X T B T + B T X T A T \frac{\partial tr(AXBX)}{\partial X} = A^TX^TB^T + B^TX^TA^T Xtr(AXBX)=ATXTBT+BTXTAT

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

不易撞的网名

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值