# 优化算法框架

$$w_{t+1} = w_t - \eta_t \ \eta_t = \cfrac{\alpha}{\sqrt{V_t}} \cdot m_t$$

$$m_t = M_1(g_1,g_2,...,g_t) \ V_t = M_2(g_1,g_2,...,g_t) \ g_t = \nabla f(w_t)$$

# 优化算法

## 固定学习率优化算法

### 带动量的随机梯度下降（SGD with Momentum）

$$g_t = \nabla f(w_t) \ m_t = \beta \cdot m_{t-1} + (1-\beta)\cdot g_t \ \eta_t = \alpha \cdot m_t$$

### SGD with Nesterov Acceleration

$$g_t = \nabla f(w_t + \alpha \cdot m_{t-1}) \ m_t = \beta \cdot m_{t-1} + (1-\beta)\cdot g_t \ \eta_t = \alpha \cdot m_t$$

## 自适应学习率优化算法

$$V_1 = g^2_1 \ V_t = \beta \cdot V_{t-1} + (1-\beta) \cdot g_t^2$$

$$g_t = \nabla f(w_t) \ m_1 = g_1,V_1 = g_1^2 \ m_t = \beta_1 \cdot m_{t-1} + (1-\beta_1)\cdot g_t \ V_t = \beta_2 \cdot V_{t-1} + (1-\beta_2) \cdot g_t^2$$

$$g_t = \nabla f(w_t + \cfrac{\alpha}{\sqrt{V_{t-1}}} \cdot m_{t-1}) \ m_1 = g_1,V_1 = g_1^2 \ m_t = \beta_1 \cdot m_{t-1} + (1-\beta_1)\cdot g_t \ V_t = \beta_2 \cdot V_{t-1} + (1-\beta_2) \cdot g_t^2$$

03-24 1万+

10-17 1万+
06-30 1万+
08-02 9642
08-15 3万+
09-12 1万+
05-23 7386
10-30 3007
01-16 2027
11-02 915
08-06 2887
08-31 1156
05-11 2592
03-11 1248
06-05 2万+
08-18 6470