# NeurIPS'18 | 种群进化随机梯度下降深度神经网络优化算法框架

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks. NeurIPS 2018 (paper link)

## 问题描述

h ( x ; θ ) : X → Y h(x;\theta): \mathcal{X} \rightarrow \mathcal{Y}

R ( θ ) = E ( x , y ) [ ℓ ( h ( x ; θ ) , y ) ] R(\theta) = \mathbb{E}_{(x, y)}[\ell(h(x;\theta), y)]

R n ( θ ) = 1 n ∑ i = 1 n ℓ ( h ( x i ; θ ) , y i ) ≜ 1 n ∑ i = 1 n l i ( θ ) R_n(\theta) = \frac{1}{n}\sum_{i = 1}^n\ell(h(x_i;\theta), y_i) \triangleq \frac{1}{n}\sum_{i = 1}^n l_i(\theta)

R n ( θ ) = E ω [ l ω ( θ ) ] R_n(\theta) = \mathbb{E}_\omega[l_\omega(\theta)]

θ k + 1 = θ k − α k ∇ l i k ( θ k ) \theta_{k + 1} = \theta_k - \alpha_k \nabla l_{i_k} (\theta_k)

J = E θ [ R n ( θ ) ] = E θ [ E ω [ l ω ( θ ) ] ] J = \mathbb{E}_\theta[R_n(\theta)] = \mathbb{E}_\theta[\mathbb{E}_\omega[l_\omega(\theta)]]

J μ = 1 μ ∑ j = 1 μ R n ( θ j ) = 1 μ ∑ j = 1 μ ( 1 n ∑ i = 1 n l i ( θ ) ) J_\mu = \frac{1}{\mu}\sum_{j = 1}^\mu R_n(\theta_j) = \frac{1}{\mu}\sum_{j = 1}^\mu ( \frac{1}{n}\sum_{i = 1}^n l_i(\theta) )

J m ˉ : μ = 1 m ∑ k = 1 m f ( θ k : μ ) J_{\bar{m}:\mu} = \frac{1}{m}\sum_{k = 1}^m f(\theta_{k:\mu})

## 算法

θ i ( k ) = 1 ρ ∑ j = 1 ρ θ j ( k ) + ϵ i ( k ) \theta_i^{(k)} = \frac{1}{\rho} \sum_{j = 1}^\rho \theta_j^{(k)} + \epsilon_i^{(k)}

J m ˉ : μ ( k ) ≤ J m ˉ : μ ( k − 1 ) ， k ≥ 1 J_{\bar{m}:\mu}^{(k)} \leq J_{\bar{m}:\mu}^{(k - 1)}，k \geq 1

J m ˉ ′ : μ ( k ) ≤ J m ˉ ′ : μ ( k − 1 ) ， k ≥ 1 。 J_{\bar{m}':\mu}^{(k)} \leq J_{\bar{m}':\mu}^{(k - 1)}，k \geq 1。

f ( k ) ( θ 1 : μ ) ≤ f ( k − 1 ) ( θ 1 : μ ) ， k ≥ 1 f^{(k)}(\theta_{1:\mu}) \leq f^{(k - 1)}(\theta_{1:\mu})，k \geq 1

## 实验

## 参考文献

