一、思维导图
二、关键公式
(1)momentum梯度下降
$$\begin{array}{l}{{\rm{v}}_{dW}} = \beta {v_{dW}} + (1 - \beta )dW\\{{\rm{v}}_{db}} = \beta {v_{db}} + (1 - \beta )db\\W = W - \alpha {{\rm{v}}_{dW}},b = b - \alpha {{\rm{v}}_{db}}\end{array}$$
其中alpha和beta为超参数,beta取值一般为0.9
(2)RMSprop
$$\begin{array}{l}{s_{dW}} = {\beta _2}{s_{dW}} + (1 - {\beta _2})d{W^2}\\{s_{db}} = {\beta _2}{s_{db}} + (1 - {\beta _2})d{b^2}\\W = W - \alpha \frac{{dW}}{{\sqrt {{s_{dW}} + \varepsilon } }},b = b - \alpha \frac{{db}}{{\sqrt {{s_{db}} + \varepsilon } }}\end{array}$$
其中alpha和beta2为超参数.
(3)Adam
$$\begin{array}{l}{v_{dW}} = {\beta _1}{v_{dW}} + (1 - {\beta _1})dW\\{v_{db}} = {\beta _1}{v_{db}} + (1 - {\beta _1})db\\{s_{dW}} = {\beta _2}{s_{dW}} + (1 - {\beta _2})d{W^2}\\{s_{db}} = {\beta _2}{s_{db}} + (1 - {\beta _2})d{b^2}\\v_{dw}^{corrected} = {v_{dW}}/(1 - \beta _1^t)\\v_{db}^{corrected} = {v_{db}}/(1 - \beta _1^t)\\s_{dw}^{corrected} = {s_{dw}}/(1 - \beta _2^t)\\s_{db}^{corrected} = {s_{db}}/(1 - \beta _2^t)\\W = W - \alpha \frac{{v_{dw}^{corrected}}}{{\sqrt {s_{dw}^{corrected} + \varepsilon } }},b = b - \alpha \frac{{v_{db}^{corrected}}}{{\sqrt {s_{db}^{corrected} + \varepsilon } }}\end{array}$$
其中,beta1=0.9,beta2=0.999,epsilon=10^(-8).alpha为学习率,t为迭代次数。