# NeurIPS'18 | 种群进化随机梯度下降深度神经网络优化算法框架

## 因为排版问题，很多图片和公式无法直接显示，欢迎关注我们的公众号点击目录来阅读原文。

#### 原文点击 ↓

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks. NeurIPS 2018 (paper link)

## 问题描述

h ( x ; θ ) : X → Y h(x;\theta): \mathcal{X} \rightarrow \mathcal{Y}

R ( θ ) = E ( x , y ) [ ℓ ( h ( x ; θ ) , y ) ] R(\theta) = \mathbb{E}_{(x, y)}[\ell(h(x;\theta), y)]

R n ( θ ) = 1 n ∑ i = 1 n ℓ ( h ( x i ; θ ) , y i ) ≜ 1 n ∑ i = 1 n l i ( θ ) R_n(\theta) = \frac{1}{n}\sum_{i = 1}^n\ell(h(x_i;\theta), y_i) \triangleq \frac{1}{n}\sum_{i = 1}^n l_i(\theta)

R n ( θ ) = E ω [ l ω ( θ ) ] R_n(\theta) = \mathbb{E}_\omega[l_\omega(\theta)]

θ k + 1 = θ k − α k ∇ l i k ( θ k ) \theta_{k + 1} = \theta_k - \alpha_k \nabla l_{i_k} (\theta_k)

J = E θ [ R n ( θ ) ] = E θ [ E ω [ l ω ( θ ) ] ] J = \mathbb{E}_\theta[R_n(\theta)] = \mathbb{E}_\theta[\mathbb{E}_\omega[l_\omega(\theta)]]

J μ = 1 μ ∑ j = 1 μ R n ( θ j ) = 1 μ ∑ j = 1 μ ( 1 n ∑ i = 1 n l i ( θ ) ) J_\mu = \frac{1}{\mu}\sum_{j = 1}^\mu R_n(\theta_j) = \frac{1}{\mu}\sum_{j = 1}^\mu ( \frac{1}{n}\sum_{i = 1}^n l_i(\theta) )

J m ˉ : μ = 1 m ∑ k = 1 m f ( θ k : μ ) J_{\bar{m}:\mu} = \frac{1}{m}\sum_{k = 1}^m f(\theta_{k:\mu})

## 算法

θ i ( k ) = 1 ρ ∑ j = 1 ρ θ j ( k ) + ϵ i ( k ) \theta_i^{(k)} = \frac{1}{\rho} \sum_{j = 1}^\rho \theta_j^{(k)} + \epsilon_i^{(k)}

J m ˉ : μ ( k ) ≤ J m ˉ : μ ( k − 1 ) ， k ≥ 1 J_{\bar{m}:\mu}^{(k)} \leq J_{\bar{m}:\mu}^{(k - 1)}，k \geq 1

J m ˉ ′ : μ ( k ) ≤ J m ˉ ′ : μ ( k − 1 ) ， k ≥ 1 。 J_{\bar{m}':\mu}^{(k)} \leq J_{\bar{m}':\mu}^{(k - 1)}，k \geq 1。

f ( k ) ( θ 1 : μ ) ≤ f ( k − 1 ) ( θ 1 : μ ) ， k ≥ 1 f^{(k)}(\theta_{1:\mu}) \leq f^{(k - 1)}(\theta_{1:\mu})，k \geq 1

## 实验

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MmpDLte2-1570082285841)(result2.png)]

## 参考文献

1. Cui, Xiaodong, Wei Zhang, Zoltán Tüske, and Michael Picheny. “Evolutionary stochastic gradient descent for optimization of deep neural networks.” In Advances in neural information processing systems, pp. 6048-6058. 2018.
2. Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. “Optimization methods for large-scale machine learning.” Siam Review 60, no. 2 (2018): 223-311.
3. Loshchilov, Ilya. “LM-CMA: An alternative to L-BFGS for large-scale black box optimization.” Evolutionary computation 25, no. 1 (2017): 143-171.
4. Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, and Alexey Kurakin. “Large-scale evolution of image classifiers.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2902-2911. JMLR. org, 2017.
5. Such, Felipe Petroski, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. “Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning.” arXiv preprint arXiv:1712.06567 (2017).
6. Salimans, Tim, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. “Evolution strategies as a scalable alternative to reinforcement learning.” arXiv preprint arXiv:1703.03864 (2017).
7. Zhang, Xingwen, Jeff Clune, and Kenneth O. Stanley. “On the relationship between the openai evolution strategy and stochastic gradient descent.” arXiv preprint arXiv:1712.06564 (2017).
04-10 7万+

01-02 2228
08-15 3万+
02-26 2万+
09-12 1万+
04-07 1443
11-18 4万+
12-12 959
08-14 1万+
11-30 1327
05-05 4955
04-28 8204