最后loss对参数的总梯度,是所有的梯度之和
前向:
x
i
^
=
x
i
−
μ
σ
2
\hat{x_i} = \frac{x_i-\mu}{\sqrt{\sigma^2}}
xi^=σ2xi−μ
y
=
γ
x
i
+
β
y=\gamma x_i+\beta
y=γxi+β
基本求导:
∂
σ
2
∂
μ
=
1
m
∑
i
=
1
m
[
−
2
(
x
i
−
μ
)
]
\frac{\partial \sigma^{2}}{\partial \mu}= \frac{1}{m}\sum_{i=1}^{m}[-2(x_i-\mu)]
∂μ∂σ2=m1i=1∑m[−2(xi−μ)]
∂
x
i
^
∂
μ
=
−
1
σ
2
+
∂
x
i
^
∂
σ
2
∂
σ
2
∂
μ
\frac{\partial \hat{x_i}}{\partial \mu}= \frac{-1}{\sqrt{\sigma^{2}}}+\frac{\partial \hat{x_i}}{\partial \sigma^{2}}\frac{\partial \sigma^{2}}{\partial \mu}
∂μ∂xi^=σ2−1+∂σ2∂xi^∂μ∂σ2
∂
x
i
^
∂
σ
2
=
(
x
i
−
μ
)
(
−
1
2
)
(
σ
2
)
−
3
2
\frac{\partial \hat{x_i}}{\partial \sigma^{2}}= (x_i-\mu)(-\frac{1}{2})(\sigma^{2})^{\frac{-3}{2}}
∂σ2∂xi^=(xi−μ)(−21)(σ2)2−3
推导:
①
∂
l
∂
σ
2
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
∂
x
i
^
∂
σ
2
]
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
(
x
i
−
μ
)
(
−
1
2
)
(
σ
2
)
−
3
2
]
①\frac{\partial l}{\partial \sigma^{2}}= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [\frac{\partial \hat{x_i}}{\partial \sigma^{2}}]= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [(x_i-\mu)(-\frac{1}{2})(\sigma^2)^{\frac{-3}{2}}]
①∂σ2∂l=i=1∑m∂xi^∂l[∂σ2∂xi^]=i=1∑m∂xi^∂l[(xi−μ)(−21)(σ2)2−3]
②
∂
l
∂
μ
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
∂
x
i
^
∂
μ
]
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
x
i
−
μ
σ
2
∂
μ
]
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
−
1
σ
2
+
∂
x
i
^
∂
σ
2
∂
σ
2
∂
μ
]
②\frac{\partial l}{\partial \mu}= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [\frac{\partial \hat{x_i}}{\partial \mu}]= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [\frac{ \frac{x_i-\mu}{\sqrt{\sigma^2}}}{\partial \mu}] = \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [\frac{-1}{\sqrt{\sigma^2}}+\frac{\partial \hat{x_i}}{\partial \sigma^2}\frac{\partial \sigma^2}{\partial \mu}]
②∂μ∂l=i=1∑m∂xi^∂l[∂μ∂xi^]=i=1∑m∂xi^∂l[∂μσ2xi−μ]=i=1∑m∂xi^∂l[σ2−1+∂σ2∂xi^∂μ∂σ2]
=
∑
i
=
1
m
∂
l
∂
x
i
^
[
−
1
σ
2
+
(
x
i
−
μ
)
(
−
1
2
)
(
σ
2
)
−
3
2
∗
∂
σ
2
∂
μ
]
= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} [\frac{-1}{\sqrt{\sigma^2}}+(x_i-\mu)(-\frac{1}{2})(\sigma^{2})^{\frac{-3}{2}}*\frac{\partial \sigma^2}{\partial \mu}]
=i=1∑m∂xi^∂l[σ2−1+(xi−μ)(−21)(σ2)2−3∗∂μ∂σ2]
=
∑
i
=
1
m
∂
l
∂
x
i
^
−
1
σ
2
+
[
∑
i
=
1
m
∂
l
∂
x
i
^
(
x
i
−
μ
)
(
−
1
2
)
(
σ
2
)
−
3
2
]
∗
∂
σ
2
∂
μ
= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} \frac{-1}{\sqrt{\sigma^2}} + [\sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}}(x_i-\mu)(-\frac{1}{2})(\sigma^{2})^{\frac{-3}{2}}] * \frac{\partial \sigma^2}{\partial \mu}
=i=1∑m∂xi^∂lσ2−1+[i=1∑m∂xi^∂l(xi−μ)(−21)(σ2)2−3]∗∂μ∂σ2
=
∑
i
=
1
m
∂
l
∂
x
i
^
−
1
σ
2
+
∂
l
∂
σ
2
∗
∂
σ
2
∂
μ
=
∑
i
=
1
m
∂
l
∂
x
i
^
−
1
σ
2
+
∂
l
∂
σ
2
∗
1
m
∑
i
=
1
m
[
−
2
(
x
i
−
μ
)
]
= \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} \frac{-1}{\sqrt{\sigma^2}} + \frac{\partial l}{\partial \sigma^{2}} * \frac{\partial \sigma^{2}}{\partial \mu} = \sum_{i=1}^{m} \frac{\partial l}{\partial \hat{x_i}} \frac{-1}{\sqrt{\sigma^2}} + \frac{\partial l}{\partial \sigma^{2}} * \frac{1}{m}\sum_{i=1}^{m}[-2(x_i-\mu)]
=i=1∑m∂xi^∂lσ2−1+∂σ2∂l∗∂μ∂σ2=i=1∑m∂xi^∂lσ2−1+∂σ2∂l∗m1i=1∑m[−2(xi−μ)]
码完了这一条才想起来好像有学过多元函数求中间变量偏导的内容。。。去百度找了,贴图在下面了
所以根据这个全导数求导法则:
x
i
^
=
x
i
−
μ
σ
2
\hat{x_i} = \frac{x_i-\mu}{\sqrt{\sigma^2}}
xi^=σ2xi−μ
③
∂
l
∂
x
i
^
=
∂
l
∂
x
i
^
∂
x
i
^
∂
x
i
+
∂
l
∂
μ
∂
μ
∂
x
i
+
∂
l
∂
σ
2
∂
σ
2
∂
x
i
③\frac{\partial l}{\partial \hat{x_i}} = \frac{\partial l}{\partial \hat{x_i}}\frac{\partial \hat{x_i}}{\partial x_i}+\frac{\partial l}{\partial \mu}\frac{\partial \mu}{\partial x_i}+\frac{\partial l}{\partial \sigma^2}\frac{\partial \sigma^2}{\partial x_i}
③∂xi^∂l=∂xi^∂l∂xi∂xi^+∂μ∂l∂xi∂μ+∂σ2∂l∂xi∂σ2
=
∂
l
∂
x
i
^
1
σ
2
+
∂
l
∂
μ
1
x
+
∂
l
∂
σ
2
2
m
(
x
i
−
μ
)
=\frac{\partial l}{\partial \hat{x_i}} \frac{1}{\sqrt{\sigma^2}} + \frac{\partial l}{\partial \mu}\frac{1}{x} + \frac{\partial l}{\partial \sigma^2}\frac{2}{m}(x_i-\mu)
=∂xi^∂lσ21+∂μ∂lx1+∂σ2∂lm2(xi−μ)
BN层反向传播公式推导
最新推荐文章于 2023-06-28 18:47:14 发布