4. 前向/反向传播——Batch Normalization

参考资料

cs231n Course Materials: Backprop
Derivatives, Backpropagation, and Vectorization
cs231n Lecture 4:Neural Networks and Backpropagation
cs231n Assignment 2
笔记: Batch Normalization及其反向传播

4. Batch Normalization

前向传播

"""
	Forward pass for batch normalization.

    Input:
    - X: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - Y: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
"""

μ j = 1 N ∑ i = 1 N X i , j (4.1) \mu_{j}=\frac{1}{N}\sum_{i=1}^{N}{X_{i,j}}\tag{4.1} μj=N1i=1NXi,j(4.1)
σ j 2 = 1 N ∑ i = 1 N ( X i , j − μ j ) 2 (4.2) \sigma_j^2=\frac{1}{N}\sum_{i=1}^{N}{\left(X_{i,j}-\mu_j\right)^2}\tag{4.2} σj2=N1i=1N(Xi,jμj)2(4.2)
X ^ i , j = X i , j − μ j σ j 2 + ϵ (4.3) \hat{X}_{i,j}=\frac{X_{i,j}-\mu_j}{\sqrt{\sigma^2_{j}+\epsilon}}\tag{4.3} X^i,j=σj2+ϵ Xi,jμj(4.3)
在式(4.3)中, ϵ \epsilon ϵ是一个非常小的正数,防止分母变成0。

在此基础上,引入可学习参数 β \beta β γ \gamma γ
Y i , j = γ j X ^ i , j + β j (4.4) Y_{i,j}=\gamma_j\hat{X}_{i,j}+\beta_j\tag{4.4} Yi,j=γjX^i,j+βj(4.4)
需要注意,式(4.1)和式(4.2)计算的均值和方差只在训练阶段使用,测试阶段使用的均值和方差可由训练阶段的均值和方差求滑动平均值得到。

反向传播

将上述计算过程进行更细的拆分,就可得到下面的计算图:

在这里插入图片描述


下面从后往前推导反向传播。
Y i , j = X ^ i , j γ + β j (4.5) Y_{i,j}=\hat{X}^\gamma_{i,j}+\beta_j\tag{4.5} Yi,j=X^i,jγ+βj(4.5)
由式(4.5)可得:
∂ L ∂ β j = ∑ i ∂ L ∂ Y i , j ∂ Y i , j ∂ β j = ∑ i ∂ L ∂ Y i , j ⋅ 1 = ∑ i ∂ L ∂ Y i , j (4.6) \begin{aligned}\frac{\partial{L}}{\partial{\beta_j}}&=\sum_{i}{\frac{\partial{L}}{\partial{Y_{i,j}}}\frac{\partial{Y_{i,j}}}{\partial{\beta_j}}}\\&=\sum_{i}{\frac{\partial{L}}{\partial{Y_{i,j}}}\cdot1}\\&=\sum_{i}{\frac{\partial{L}}{\partial{Y_{i,j}}}}\end{aligned}\tag{4.6} βjL=iYi,jLβjYi,j=iYi,jL1=iYi,jL(4.6)

∂ L ∂ X ^ i , j γ = ∂ L ∂ Y i , j ∂ Y i , j ∂ X ^ i , j γ = ∂ L ∂ Y i , j ⋅ 1 = ∂ L ∂ Y i , j (4.7) \begin{aligned}\frac{\partial{L}}{\partial{\hat{X}^{\gamma}_{i,j}}}&=\frac{\partial{L}}{\partial{Y_{i,j}}}\frac{\partial{Y_{i,j}}}{\partial{\hat{X}^{\gamma}_{i,j}}}\\&=\frac{\partial{L}}{\partial{Y_{i,j}}}\cdot1\\&=\frac{\partial{L}}{\partial{Y_{i,j}}}\end{aligned}\tag{4.7} X^i,jγL=Yi,jLX^i,jγYi,j=Yi,jL1=Yi,jL(4.7)


X ^ i , j γ = γ j X ^ i , j (4.8) \hat{X}^\gamma_{i,j}=\gamma_j\hat{X}_{i,j}\tag{4.8} X^i,jγ=γjX^i,j(4.8)
由式(4.8)可得
∂ L ∂ γ j = ∑ i ∂ L ∂ X ^ i , j γ ∂ X ^ i , j γ ∂ γ j = ∑ i ∂ L ∂ X ^ i , j γ X ^ i , j (4.9) \begin{aligned}\frac{\partial{L}}{\partial{\gamma_j}}&=\sum_{i}{\frac{\partial{L}}{\partial{\hat{X}_{i,j}^{\gamma}}}\frac{\partial{\hat{X}_{i,j}^{\gamma}}}{\partial{\gamma_j}}}\\&=\sum_{i}{\frac{\partial{L}}{\partial{\hat{X}_{i,j}^{\gamma}}}\hat{X}_{i,j}}\end{aligned}\tag{4.9} γjL=iX^i,jγLγjX^i,jγ=iX^i,jγLX^i,j(4.9)

∂ L ∂ X ^ i , j = ∂ L ∂ X ^ i , j γ ∂ X ^ i , j γ ∂ X ^ i , j = ∂ L ∂ X ^ i , j γ γ j (4.10) \begin{aligned}\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}&=\frac{\partial{L}}{\partial{\hat{X}_{i,j}^{\gamma}}}\frac{\partial{\hat{X}_{i,j}^{\gamma}}}{\partial{\hat{X}_{i,j}}}\\&=\frac{\partial{L}}{\partial{\hat{X}_{i,j}^{\gamma}}}\gamma_j\end{aligned}\tag{4.10} X^i,jL=X^i,jγLX^i,jX^i,jγ=X^i,jγLγj(4.10)


X ^ i , j = ( 1 / σ ^ j ) X i , j m (4.11) \hat{X}_{i,j}=(1/\hat{\sigma}_j)X^m_{i,j}\tag{4.11} X^i,j=(1/σ^j)Xi,jm(4.11)
由式(4.11)可得
∂ L ∂ ( 1 / σ ^ j ) = ∑ i ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ ( 1 / σ ^ j ) = ∑ i ∂ L ∂ X ^ i , j X i , j m (4.12) \begin{aligned}\frac{\partial{L}}{\partial{(1/\hat{\sigma}_j)}}&=\sum_{i}{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{(1/\hat{\sigma}_j)}}}\\&=\sum_{i}{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}X^m_{i,j}}\end{aligned}\tag{4.12} (1/σ^j)L=iX^i,jL(1/σ^j)X^i,j=iX^i,jLXi,jm(4.12)


1 / σ ^ j = 1 σ ^ j (4.13) 1/\hat{\sigma}_j=\frac{1}{\hat{\sigma}_{j}}\tag{4.13} 1/σ^j=σ^j1(4.13)
由式(4.13)可得
∂ L ∂ σ ^ j = ∂ L ∂ ( 1 / σ ^ j ) ∂ ( 1 / σ ^ j ) ∂ σ ^ j = − ∂ L ∂ ( 1 / σ ^ j ) 1 σ ^ j 2 = − ∂ L ∂ ( 1 / σ ^ j ) 1 σ j 2 + ϵ (4.14) \begin{aligned}\frac{\partial{L}}{\partial{\hat{\sigma}_j}}&=\frac{\partial{L}}{\partial{(1/\hat{\sigma}_j)}}\frac{\partial{(1/\hat{\sigma}_j)}}{\partial{\hat{\sigma}_j}}\\&=-\frac{\partial{L}}{\partial{(1/\hat{\sigma}_j)}}\frac{1}{\hat{\sigma}_j^2}\\&=-\frac{\partial{L}}{\partial{(1/\hat{\sigma}_j)}}\frac{1}{\sigma_j^2+\epsilon}\end{aligned}\tag{4.14} σ^jL=(1/σ^j)Lσ^j(1/σ^j)=(1/σ^j)Lσ^j21=(1/σ^j)Lσj2+ϵ1(4.14)


σ ^ j = σ j 2 + ϵ (4.15) \hat{\sigma}_j=\sqrt{\sigma_j^2+\epsilon}\tag{4.15} σ^j=σj2+ϵ (4.15)
由式(4.15)可得
∂ L ∂ σ j 2 = ∂ L ∂ σ ^ j ∂ σ ^ j ∂ σ j 2 = ∂ L ∂ σ ^ j 1 2 σ j 2 + ϵ (4.16) \begin{aligned}\frac{\partial{L}}{\partial{\sigma_j^2}}&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{\partial{\hat{\sigma}_j}}{\partial{\sigma_j^2}}\\&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{1}{2\sqrt{\sigma^2_j+\epsilon}}\end{aligned}\tag{4.16} σj2L=σ^jLσj2σ^j=σ^jL2σj2+ϵ 1(4.16)


σ j 2 = 1 N ∑ i X i , j 2 (4.17) \sigma^2_j=\frac{1}{N}\sum_i{X^2_{i,j}}\tag{4.17} σj2=N1iXi,j2(4.17)
由式(4.17)可得
∂ L ∂ X i , j 2 = ∂ L ∂ σ j 2 ∂ σ j 2 ∂ X i , j 2 = ∂ L ∂ σ j 2 1 N (4.18) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,j}^2}}&=\frac{\partial{L}}{\partial{\sigma_j^2}}\frac{\partial{\sigma_j^2}}{\partial{X_{i,j}^2}}\\&=\frac{\partial{L}}{\partial{\sigma_j^2}}\frac{1}{N}\tag{4.18}\end{aligned} Xi,j2L=σj2LXi,j2σj2=σj2LN1(4.18)


X i , j 2 = ( X i , j m ) 2 (4.19) X^2_{i,j}=\left(X^m_{i,j}\right)^2\tag{4.19} Xi,j2=(Xi,jm)2(4.19)
由式(4.11)和式(4.19)可得
∂ L ∂ X i , j m = ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ X i , j m + ∂ L ∂ X i , j 2 ∂ X i , j 2 ∂ X i , j m = ∂ L ∂ X ^ i , j ( 1 / σ ^ j ) + ∂ L ∂ X i , j 2 ⋅ 2 X i , j m (4.20) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,j}^m}}&=\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{X^m_{i,j}}}+\frac{\partial{L}}{\partial{X^2_{i,j}}}\frac{\partial{X^2_{i,j}}}{\partial{X^m_{i,j}}}\\&=\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}(1/\hat{\sigma}_j)+\frac{\partial{L}}{\partial{X^2_{i,j}}}\cdot2X^m_{i,j}\end{aligned}\tag{4.20} Xi,jmL=X^i,jLXi,jmX^i,j+Xi,j2LXi,jmXi,j2=X^i,jL(1/σ^j)+Xi,j2L2Xi,jm(4.20)


X i , j m = X i , j − μ j (4.21) X^m_{i,j}=X_{i,j}-\mu_j\tag{4.21} Xi,jm=Xi,jμj(4.21)

∂ L ∂ μ j = ∑ i ∂ L ∂ X i , j m ∂ X i , j m ∂ μ j = ∑ i ∂ L ∂ X i , j m ⋅ ( − 1 ) = − ∑ i ∂ L ∂ X i , j m (4.22) \begin{aligned}\frac{\partial{L}}{\partial{\mu_j}}&=\sum_i{\frac{\partial{L}}{\partial{X^m_{i,j}}}\frac{\partial{X^m_{i,j}}}{\partial{\mu_j}}}\\&=\sum_i{\frac{\partial{L}}{\partial{X^m_{i,j}}}\cdot(-1)}\\&=-\sum_i{\frac{\partial{L}}{\partial{X^m_{i,j}}}}\end{aligned}\tag{4.22} μjL=iXi,jmLμjXi,jm=iXi,jmL(1)=iXi,jmL(4.22)


μ j = 1 N ∑ i X i , j (4.23) \mu_j=\frac{1}{N}\sum_i{X_{i,j}}\tag{4.23} μj=N1iXi,j(4.23)
由式(4.21)和式(4.23)得
∂ L ∂ X i , j = ∂ L ∂ X i , j m ∂ X i , j m ∂ X i , j + ∂ L ∂ μ j ∂ μ j ∂ X i , j = ∂ L ∂ X i , j m ⋅ 1 + ∂ L ∂ μ j 1 N (4.24) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,j}}}&=\frac{\partial{L}}{\partial{X^m_{i,j}}}\frac{\partial{X^m_{i,j}}}{\partial{X_{i,j}}}+\frac{\partial{L}}{\partial{\mu_j}}\frac{\partial{\mu_j}}{\partial{X_{i,j}}}\\&=\frac{\partial{L}}{\partial{X^m_{i,j}}}\cdot1+\frac{\partial{L}}{\partial{\mu_j}}\frac{1}{N}\end{aligned}\tag{4.24} Xi,jL=Xi,jmLXi,jXi,jm+μjLXi,jμj=Xi,jmL1+μjLN1(4.24)

以上,即为Batch Normalization的反向传播计算过程。


反向传播<简化版本>

实际上,可以将刚才的计算图进行简化(把一些节点合起来),从而减少中间变量。然后这个图不知道还算不算计算图,,,其实就是按着最开始的公式硬算2333。
在这里插入图片描述根据式(4.1-4)可得
{ μ j = 1 N ∑ i X i , j σ ^ j = 1 N ∑ i ( X i , j − μ j ) 2 + ϵ X ^ i , j = X i , j − μ j σ j ^ Y i , j = γ j X ^ i , j + β j (4.25) \left\{\begin{aligned}\mu_j&=\frac{1}{N}\sum_i{X_{i,j}}\\\hat{\sigma}_{j}&=\sqrt{\frac{1}{N}\sum_i{\left(X_{i,j}-\mu_j\right)^2}+\epsilon}\\\hat{X}_{i,j}&=\frac{X_{i,j}-\mu_j}{\hat{\sigma_j}}\\Y_{i,j}&=\gamma_j\hat{X}_{i,j}+\beta_j\end{aligned}\right.\tag{4.25} μjσ^jX^i,jYi,j=N1iXi,j=N1i(Xi,jμj)2+ϵ =σj^Xi,jμj=γjX^i,j+βj(4.25)
∂ L ∂ β j = ∑ i ∂ L ∂ Y i , j ∂ Y i , j ∂ β j = ∑ i ∂ L ∂ Y i , j (4.26) \frac{\partial{L}}{\partial{\beta_j}}=\sum_i{\frac{\partial{L}}{\partial{Y_{i,j}}}}\frac{\partial{Y_{i,j}}}{\partial{\beta_j}}=\sum_i{\frac{\partial{L}}{\partial{Y_{i,j}}}}\tag{4.26} βjL=iYi,jLβjYi,j=iYi,jL(4.26)
∂ L ∂ γ j = ∑ i ∂ L ∂ Y i , j ∂ Y i , j ∂ γ j = ∑ i ∂ L ∂ Y i , j X ^ i , j (4.27) \frac{\partial{L}}{\partial{\gamma_j}}=\sum_i{\frac{\partial{L}}{\partial{Y_{i,j}}}}\frac{\partial{Y_{i,j}}}{\partial{\gamma_j}}=\sum_i{\frac{\partial{L}}{\partial{Y_{i,j}}}\hat{X}_{i,j}}\tag{4.27} γjL=iYi,jLγjYi,j=iYi,jLX^i,j(4.27)
∂ L ∂ X ^ i , j = ∂ L ∂ Y i , j ∂ Y i , j ∂ X ^ i , j = ∂ L ∂ Y i , j γ j (4.28) \frac{\partial{L}}{\partial{\hat{X}_{i,j}}}=\frac{\partial{L}}{\partial{Y_{i,j}}}\frac{\partial{Y_{i,j}}}{\partial{\hat{X}_{i,j}}}=\frac{\partial{L}}{\partial{Y_{i,j}}}\gamma_j\tag{4.28} X^i,jL=Yi,jLX^i,jYi,j=Yi,jLγj(4.28)


∂ L ∂ σ ^ j = ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ σ ^ j = − ∑ i = 1 N ∂ L ∂ X ^ i , j X i , j − μ j ( σ j ^ ) 2 = − 1 σ j ∑ k = 1 N ∂ L ∂ X ^ k , j X ^ k , j (4.29) \begin{aligned}\frac{\partial{L}}{\partial{\hat{\sigma}_{j}}}&=\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\hat{\sigma}_{j}}}}\\&=-\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{X_{i,j}-\mu_j}{\left(\hat{\sigma_j}\right)^2}}\\&= -\frac{1}{\sigma_j}\sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}\hat{X}_{k,j}} \end{aligned}\tag{4.29} σ^jL=i=1NX^i,jLσ^jX^i,j=i=1NX^i,jL(σj^)2Xi,jμj=σj1k=1NX^k,jLX^k,j(4.29)
∂ L ∂ μ j = ∂ L ∂ σ ^ j ∂ σ ^ j ∂ μ j + ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ μ j = ∂ L ∂ σ ^ j 1 N ∑ i = 1 N − 2 ( X i , j − μ j ) 2 1 N ∑ i = 1 N ( X i , j − μ j ) 2 + ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ μ j = ∂ L ∂ σ ^ j 1 N ∑ i = 1 N ( μ j − X i , j ) 1 N ∑ i = 1 N ( X i , j − μ j ) 2 + ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ μ j = ∂ L ∂ σ ^ j 0 1 N ∑ i = 1 N ( X i , j − μ j ) 2 + ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ μ j = ∑ i = 1 N ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ μ j = ∑ i = 1 N ∂ L ∂ X ^ i , j ( − 1 σ ^ j ) = ∑ k = 1 N ∂ L ∂ X ^ k , j ( − 1 σ ^ j ) (4.30) \begin{aligned}\frac{\partial{L}}{\partial{\mu_j}}&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{\partial{\hat{\sigma}_j}}{\partial{\mu_j}}+\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\mu_j}}}\\&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{\frac{1}{N}\sum_{i=1}^{N}{-2\left(X_{i,j}-\mu_j\right)}}{2\sqrt{\frac{1}{N}\sum_{i=1}^{N}{\left(X_{i,j}-\mu_j\right)^2}}}+\sum_{i=1}^{N}{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\mu_j}}}\\&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{\frac{1}{N}\sum_{i=1}^N{\left(\mu_j-X_{i,j}\right)}}{\sqrt{\frac{1}{N}\sum_{i=1}^N{\left(X_{i,j}-\mu_j\right)^2}}}+\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\mu_j}}}\\&=\frac{\partial{L}}{\partial{\hat{\sigma}_j}}\frac{0}{\sqrt{\frac{1}{N}\sum_{i=1}^N{\left(X_{i,j}-\mu_j\right)^2}}}+\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\mu_j}}}\\&=\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{\mu_j}}}\\&=\sum_{i=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}}\left(-\frac{1}{\hat{\sigma}_j}\right)\\&= \sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}}\left(-\frac{1}{\hat{\sigma}_j}\right) \end{aligned} \tag{4.30} μjL=σ^jLμjσ^j+i=1NX^i,jLμjX^i,j=σ^jL2N1i=1N(Xi,jμj)2 N1i=1N2(Xi,jμj)+i=1NX^i,jLμjX^i,j=σ^jLN1i=1N(Xi,jμj)2 N1i=1N(μjXi,j)+i=1NX^i,jLμjX^i,j=σ^jLN1i=1N(Xi,jμj)2 0+i=1NX^i,jLμjX^i,j=i=1NX^i,jLμjX^i,j=i=1NX^i,jL(σ^j1)=k=1NX^k,jL(σ^j1)(4.30)


∂ L ∂ X i , j = ∂ L ∂ X ^ i , j ∂ X ^ i , j ∂ X i , j + ∂ L ∂ σ j ^ ∂ σ j ^ ∂ X i , j + ∂ L ∂ μ j ∂ μ j ∂ X i , j = ∂ L ∂ X ^ i , j 1 σ ^ j + ∂ L ∂ σ j ^ 2 N ( X i , j − μ j ) 2 1 N ∑ i = 1 N ( X i , j − μ j ) 2 + ∂ L ∂ μ j 1 N = ∂ L ∂ X ^ i , j 1 σ ^ j + ∂ L ∂ σ j ^ 2 N ( X i , j − μ j ) 2 σ ^ j + ∂ L ∂ μ j 1 N = ∂ L ∂ X ^ i , j 1 σ ^ j + ∂ L ∂ σ j ^ 1 N X ^ i , j + ∂ L ∂ μ j 1 N = ∂ L ∂ X ^ i , j 1 σ ^ j − ( 1 σ ^ j ∑ k = 1 N ∂ L ∂ X ^ k , j X ^ k , j ) 1 N X ^ i , j − ∑ k = 1 N ∂ L ∂ X ^ k , j ( 1 σ ^ j ) 1 N = 1 N 1 σ ^ j ( N ∂ L ∂ X ^ i , j − X ^ i , j ∑ k = 1 N ∂ L ∂ X ^ k , j X ^ k , j − ∑ k = 1 N ∂ L ∂ X ^ k , j ) = γ j N σ ^ j ( N ∂ L ∂ Y i , j − X ^ i , j ∑ k = 1 N ∂ L ∂ Y k , j X ^ k , j − ∑ k = 1 N ∂ L ∂ Y k , j ) (4.31) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,j}}}&=\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{\partial{\hat{X}_{i,j}}}{\partial{X_{i,j}}}+\frac{\partial{L}}{\partial{\hat{\sigma_j}}}\frac{\partial{\hat{\sigma_j}}}{\partial{X_{i,j}}}+\frac{\partial{L}}{\partial{\mu_j}}\frac{\partial{\mu_j}}{\partial{X_{i,j}}}\\&= \frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{1}{\hat{\sigma}_j}+\frac{\partial{L}}{\partial{\hat{\sigma_j}}}\frac{\frac{2}{N}\left(X_{i,j}-\mu_j\right)}{2\sqrt{\frac{1}{N}\sum_{i=1}^N{\left(X_{i,j}-\mu_j\right)^2}}}+\frac{\partial{L}}{\partial{\mu_j}}\frac{1}{N}\\&= \frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{1}{\hat{\sigma}_j}+\frac{\partial{L}}{\partial{\hat{\sigma_j}}}\frac{\frac{2}{N}\left(X_{i,j}-\mu_j\right)}{2\hat{\sigma}_j}+\frac{\partial{L}}{\partial{\mu_j}}\frac{1}{N}\\&= \frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{1}{\hat{\sigma}_j}+\frac{\partial{L}}{\partial{\hat{\sigma_j}}}\frac{1}{N}\hat{X}_{i,j}+\frac{\partial{L}}{\partial{\mu_j}}\frac{1}{N}\\&= \frac{\partial{L}}{\partial{\hat{X}_{i,j}}}\frac{1}{\hat{\sigma}_j}-\left(\frac{1}{\hat{\sigma}_j}\sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}\hat{X}_{k,j}}\right)\frac{1}{N}\hat{X}_{i,j}-\sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}}\left(\frac{1}{\hat{\sigma}_j}\right)\frac{1}{N}\\&= \frac{1}{N}\frac{1}{\hat{\sigma}_j}\left(N\frac{\partial{L}}{\partial{\hat{X}_{i,j}}}-\hat{X}_{i,j}\sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}\hat{X}_{k,j}}-\sum_{k=1}^N{\frac{\partial{L}}{\partial{\hat{X}_{k,j}}}}\right)\\&= \frac{\gamma_j}{N\hat{\sigma}_j}\left(N\frac{\partial{L}}{\partial{Y_{i,j}}}-\hat{X}_{i,j}\sum_{k=1}^N{\frac{\partial{L}}{\partial{Y_{k,j}}}\hat{X}_{k,j}}-\sum_{k=1}^N{\frac{\partial{L}}{\partial{Y_{k,j}}}}\right) \end{aligned}\tag{4.31} Xi,jL=X^i,jLXi,jX^i,j+σj^LXi,jσj^+μjLXi,jμj=X^i,jLσ^j1+σj^L2N1i=1N(Xi,jμj)2 N2(Xi,jμj)+μjLN1=X^i,jLσ^j1+σj^L2σ^jN2(Xi,jμj)+μjLN1=X^i,jLσ^j1+σj^LN1X^i,j+μjLN1=X^i,jLσ^j1(σ^j1k=1NX^k,jLX^k,j)N1X^i,jk=1NX^k,jL(σ^j1)N1=N1σ^j1(NX^i,jLX^i,jk=1NX^k,jLX^k,jk=1NX^k,jL)=Nσ^jγj(NYi,jLX^i,jk=1NYk,jLX^k,jk=1NYk,jL)(4.31)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值