weight decay(权重衰减) 又叫regularization(正则化)。下面叙述如何用矩阵简明的描述loss表达式,以及矩阵求导问题。
loss表达式
L
(
w
,
b
)
=
η
2
∣
B
∣
∥
X
w
+
b
−
y
∥
2
+
λ
2
∥
w
∥
2
\begin{aligned} L(\mathbf{w}, \mathbf{b}) =\frac{\eta}{2|\mathcal{B}|} \| \mathbf{X} \mathbf{w} + \mathbf{b} - \mathbf{y} \| ^2 + \frac{\lambda}{2} \|\mathbf{w}\|^2 \end{aligned}
L(w,b)=2∣B∣η∥Xw+b−y∥2+2λ∥w∥2
对loss关于w、b求梯度
∇
w
L
(
w
,
b
)
=
η
∣
B
∣
X
T
(
X
w
+
b
−
y
)
+
λ
w
\begin{aligned} \nabla_w L(\mathbf{w}, \mathbf{b}) &= \frac{\eta}{|\mathcal{B}|} \mathbf{X}^T \left(\mathbf{X} \mathbf{w} + \mathbf{b} - \mathbf{y}\right) + \lambda \mathbf{w} \end{aligned}
∇wL(w,b)=∣B∣ηXT(Xw+b−y)+λw
∇
b
L
(
w
,
b
)
=
η
∣
B
∣
(
X
w
+
b
−
y
)
\begin{aligned} \nabla_b L(\mathbf{w}, \mathbf{b}) &= \frac{\eta}{|\mathcal{B}|} \left(\mathbf{X} \mathbf{w} + \mathbf{b} - \mathbf{y}\right) \end{aligned}
∇bL(w,b)=∣B∣η(Xw+b−y)