LaTeX / Markdown 书写公式
Here’s the regularized cross-entropy:
$$
\begin{aligned}
C = -\frac{1}{n} \sum_{xj} \left[ y_j \ln a^L_j+(1-y_j) \ln (1-a^L_j)\right] + \frac{\lambda}{2n} \sum_w w^2. \tag{85}
\end{aligned}
$$
C = − 1 n ∑ x j [ y j ln a j L + ( 1 − y j ) ln ( 1 − a j L ) ] + λ 2 n ∑ w w 2 . (85) \begin{aligned} C = -\frac{1}{n} \sum_{xj} \left[ y_j \ln a^L_j+(1-y_j) \ln (1-a^L_j)\right] + \frac{\lambda}{2n} \sum_w w^2. \tag{85} \end{aligned} C=−n1xj∑[yjlnajL+(1−yj)ln(1−ajL)]+2nλw∑w2.(85)
It’s possible to regularize other cost functions, such as the quadratic cost. This can be done in a similar way:
$$
\begin{aligned}
C = \frac{1}{2n} \sum_x \|y-a^L\|^2 + \frac{\lambda}{2n} \sum_w w^2. \tag{86}
\end{aligned}
$$
C = 1 2 n ∑ x ∥ y − a L ∥ 2 + λ 2 n ∑ w w 2 . (86) \begin{aligned} C = \frac{1}{2n} \sum_x \|y-a^L\|^2 + \frac{\lambda}{2n} \sum_w w^2. \tag{86} \end{aligned} C=2n1x∑∥y−aL∥2+2nλw∑w2.(86)
In both cases we can write the regularized cost function as
$$
\begin{aligned}
C = C_0 + \frac{\lambda}{2n} \sum_w w^2. \tag{87}
\end{aligned}
$$
C = C 0 + λ 2 n ∑ w w 2 . (87) \begin{aligned} C = C_0 + \frac{\lambda}{2n} \sum_w w^2. \tag{87} \end{aligned} C=C0+2nλw∑w2.(87)
where C 0 C_0 C0 is the original, unregularized cost function.
Taking the partial derivatives of Equation (87) gives
$$
\begin{aligned}
\frac{\partial C}{\partial w} & = \frac{\partial C_0}{\partial w} + \frac{\lambda}{n} w \tag{88}\\
\end{aligned}
$$
$$
\begin{aligned}
\frac{\partial C}{\partial b} & = \frac{\partial C_0}{\partial b}. \tag{89}
\end{aligned}
$$
∂ C ∂ w = ∂ C 0 ∂ w + λ n w (88) \begin{aligned} \frac{\partial C}{\partial w} & = \frac{\partial C_0}{\partial w} + \frac{\lambda}{n} w \tag{88}\\ \end{aligned} ∂w∂C=∂w∂C0+nλw(88)
∂ C ∂ b = ∂ C 0 ∂ b . (89) \begin{aligned} \frac{\partial C}{\partial b} & = \frac{\partial C_0}{\partial b}. \tag{89} \end{aligned} ∂b∂C=∂b∂C0.(89)
The ∂ C 0 / ∂ w \partial C_0 / \partial w ∂C0/∂w and ∂ C 0 / ∂ b \partial C_0 / \partial b ∂C0/∂b terms can be computed using backpropagation.
DenseNet.
$$
\begin{aligned}
x_{1} &= w_{1} * x_{0} \\
x_{2} &= w_{2} * [x_{0}, x_{1}] \\
\vdots \\
x_{k} &= w_{k} * [x_{0}, x_{1}, ..., x_{k-1}] \\
\tag{1}
\end{aligned}
$$
x 1 = w 1 ∗ x 0 x 2 = w 2 ∗ [ x 0 , x 1 ] ⋮ x k = w k ∗ [ x 0 , x 1 , . . . , x k − 1 ] (1) \begin{aligned} x_{1} &= w_{1} * x_{0} \\ x_{2} &= w_{2} * [x_{0}, x_{1}] \\ \vdots \\ x_{k} &= w_{k} * [x_{0}, x_{1}, ..., x_{k-1}] \\ \tag{1} \end{aligned} x1x2⋮xk=w1∗x0=w2∗[x0,x1]=wk∗[x0,x1,...,xk−1](1)
$$
\begin{aligned}
w_{1}^{,} &= f(w_{1}, \mathcal{g}_{0}) \\
w_{2}^{,} &= f(w_{2}, \mathcal{g}_{0}, \mathcal{g}_{1}) \\
w_{3}^{,} &= f(w_{3}, \mathcal{g}_{0}, \mathcal{g}_{1}, \mathcal{g}_{2}) \\
\vdots \\
w_{k}^{,} &= f(w_{k}, \mathcal{g}_{0}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k-1}) \\
\tag{2}
\end{aligned}
$$
w 1 , = f ( w 1 , g 0 ) w 2 , = f ( w 2 , g 0 , g 1 ) w 3 , = f ( w 3 , g 0 , g 1 , g 2 ) ⋮ w k , = f ( w k , g 0 , g 1 , g 2 , . . . , g k − 1 ) (2) \begin{aligned} w_{1}^{,} &= f(w_{1}, \mathcal{g}_{0}) \\ w_{2}^{,} &= f(w_{2}, \mathcal{g}_{0}, \mathcal{g}_{1}) \\ w_{3}^{,} &= f(w_{3}, \mathcal{g}_{0}, \mathcal{g}_{1}, \mathcal{g}_{2}) \\ \vdots \\ w_{k}^{,} &= f(w_{k}, \mathcal{g}_{0}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k-1}) \\ \tag{2} \end{aligned} w1,w2,w3,⋮wk,=f(w1,g0)=f(w2,g0,g1)=f(w3,g0,g1,g2)=f(wk,g0,g1,g2,...,gk−1)(2)
$$
\begin{aligned}
x_{k} &= w_{k} * [x_{0}^{,,}, x_{1}, ..., x_{k-1}] \\
x_{T} &= w_{T} * [x_{0}^{,,}, x_{1}, ..., x_{k}] \\
x_{U} &= w_{U} * [x_{0}^{,}, x_{T}] \\
\tag{3}
\end{aligned}
$$
x k = w k ∗ [ x 0 , , , x 1 , . . . , x k − 1 ] x T = w T ∗ [ x 0 , , , x 1 , . . . , x k ] x U = w U ∗ [ x 0 , , x T ] (3) \begin{aligned} x_{k} &= w_{k} * [x_{0}^{,,}, x_{1}, ..., x_{k-1}] \\ x_{T} &= w_{T} * [x_{0}^{,,}, x_{1}, ..., x_{k}] \\ x_{U} &= w_{U} * [x_{0}^{,}, x_{T}] \\ \tag{3} \end{aligned} xkxTxU=wk∗[x0,,,x1,...,xk−1]=wT∗[x0,,,x1,...,xk]=wU∗[x0,,xT](3)
$$
\begin{aligned}
w_{k}^{,} &= f(w_{k}, \mathcal{g}_{0}^{,,}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k-1}) \\
w_{T}^{,} &= f(w_{T}, \mathcal{g}_{0}^{,,}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k}) \\
w_{U}^{,} &= f(w_{U}, \mathcal{g}_{0}^{,}, \mathcal{g}_{T}) \\
\tag{4}
\end{aligned}
$$
w k , = f ( w k , g 0 , , , g 1 , g 2 , . . . , g k − 1 ) w T , = f ( w T , g 0 , , , g 1 , g 2 , . . . , g k ) w U , = f ( w U , g 0 , , g T ) (4) \begin{aligned} w_{k}^{,} &= f(w_{k}, \mathcal{g}_{0}^{,,}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k-1}) \\ w_{T}^{,} &= f(w_{T}, \mathcal{g}_{0}^{,,}, \mathcal{g}_{1}, \mathcal{g}_{2}, ..., \mathcal{g}_{k}) \\ w_{U}^{,} &= f(w_{U}, \mathcal{g}_{0}^{,}, \mathcal{g}_{T}) \\ \tag{4} \end{aligned} wk,wT,wU,=f(wk,g0,,,g1,g2,...,gk−1)=f(wT,g0,,,g1,g2,...,gk)=f(wU,g0,,gT)(4)
References
[1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/