# 相关介绍

$\mathrm{\Delta }{\omega }_{t}=-\eta {g}_{t}$

${\omega }_{t+1}={\omega }_{t}+\mathrm{\Delta }{\omega }_{t}$

## TG

### 简单加入L1范数

$\mathrm{\Delta }{\omega }_{t}=-{g}_{t}-\lambda \mathrm{sgn}\left({\omega }_{t}\right)$

${\omega }_{t+1}={\omega }_{t}+\eta \mathrm{\Delta }{\omega }_{t}$

### 简单截断法

${\omega }_{t+1}={T}_{0}\left({\omega }_{t}+\eta \mathrm{\Delta }{\omega }_{t},\theta \right)$

${T}_{0}\left({v}_{i},\theta \right)=\left\{\begin{array}{cc}0,& if\phantom{\rule{1em}{0ex}}|{v}_{i}|\le 0\\ {v}_{i},& otherwise\end{array}$

$\theta$$\theta$为一正数，${v}_{i}$$v_i$为一向量。

### 梯度截断法

${\omega }_{t+1}={T}_{0}\left({\omega }_{t}+\eta \mathrm{\nabla }L\left({\omega }_{t}\right),\eta {g}_{i},\theta \right)$

$\begin{array}{}\text{(1)}& {T}_{1}\left({v}_{i},\alpha ,\theta \right)=\left\{\begin{array}{cc}max\left(0,{v}_{i}-\alpha \right),& if\phantom{\rule{1em}{0ex}}{v}_{i}\in \left[0,\theta \right]\\ min\left(0,{v}_{i}+\alpha \right),& if\phantom{\rule{1em}{0ex}}{v}_{i}\in \left[-\theta ,0\right]\\ {v}_{i},& otherwise\end{array}\end{array}$

## FOBOS: Forward Backward Splitting2

FOBOS将每一个数据的迭代过程，分解成一个经验损失梯度下降迭代Eq.(1)和一个最优化问题Eq.(2)。分解出的最优化问题Eq.(2)，有两项：第一项l2范数表示不能离第一步loss损失迭代结果太远，第二项是正则化项，用来限定模型复杂度抑制过拟合和做稀疏化等3

$\begin{array}{}\text{(1)}& {\omega }_{t+\frac{1}{2}}={\omega }_{t}-{\eta }_{t}{{g}_{t}}^{f}\end{array}$

$\begin{array}{}\text{(2)}& {\omega }_{t+1}=\underset{\omega }{\mathrm{arg}min}\left\{\frac{1}{2}{‖\omega -{\omega }_{t+\frac{1}{2}}‖}^{2}+{\eta }_{t+\frac{1}{2}}r\left({\omega }_{t}\right)\right\}\end{array}$

$\begin{array}{}\text{(3)}& 0\in \mathrm{\partial }{\left\{\frac{1}{2}{‖\omega -{\omega }_{t+\frac{1}{2}}‖}^{2}+{\eta }_{t+\frac{1}{2}}\mathrm{\partial }r\left({\omega }_{t+1}\right)\right\}|}_{\omega ={\omega }_{t+1}}\end{array}$

$\begin{array}{}\text{(4)}& 0\in {\omega }_{t+1}-{\omega }_{t}+{\eta }_{t}{{g}_{t}}^{f}+{\eta }_{t+\frac{1}{2}}\mathrm{\partial }r\left({\omega }_{t+1}\right)\end{array}$

Eq.(4)意味着只要选择使得Eq.(3)最小的${\omega }_{t+1}$${\omega _{t + 1}}$，那么就保证可以获得向量${{g}_{t+1}}^{f}\in \mathrm{\partial }r\left({\omega }_{t+1}\right)$${g_{t + 1}}^f \in \partial r({\omega _{t + 1}})$使得：

$\begin{array}{}\text{(5)}& 0={\omega }_{t+1}-{\omega }_{t}+{\eta }_{t}{{g}_{t}}^{f}+{\eta }_{t+\frac{1}{2}}{{g}_{t+1}}^{f}\end{array}$

$\begin{array}{}\text{(6)}& {\omega }_{t+1}={\omega }_{t}-{\eta }_{t}{{g}_{t}}^{f}-{\eta }_{t+\frac{1}{2}}{{g}_{t+1}}^{f}\end{array}$

### RDA: Regularized dual averaging4

L1-RDA特征权重各个纬度更新方式为：

FTRL综合考虑了FOBOS和RDA对于正则项和W的限制，其特征权重为：

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120