1. Introduction
做法: Based on the idea of evolutionary computation[21],we propose a competitive random search (CRS) instead of the gradient-based method to solve the attention layer weights.
为什么要引入CRS:change the search direction to avoid falling into local optimum.
[22] X. Zhang, J. Clune, K.O. Stanley, On the relationship between the OpenAI evolution strategy and stochastic gradient descent, 2017, arXiv:1712. 06564.
[23] E. Conti, V. Madhavan, F. Petroski Such, J. Lehman, K.O. Stanley, J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, 2017, arXiv:1712. 06560.
[24] Joel Lehman, Jay Chen, Jeff Clune, Kenneth O. Stanley, Safe mutations for deep and recurrent neural multi-order through output gradients, 2017, arXiv:1712.06563.
对GA的操作:
-
In particular, theimproved crossoveroperator has integrated more stochastic mechanisms to maintain the differences between the progeny individuals,
目的:avoiding premature convergence of the algorithm and being trapped in local optimum. -
use the basic bit mutation operator to specifically perform the mutation operation by randomly inverting one or several gene values at the locus according to the mutation rate on a single encoded string.
introduction的知识点:
- 遗传算法
2. Preliminaries(预先准备)
时序数据用于分类和回归。
数据集为 X = ( X 1 , X 2 , . . . X T ) X=(X_1, X_2,... X_T) X=(X1,X2,...XT),其中每一个 X t = ( x t 1 , x t 2 , . . . x t L ) X_t=(x_t^1, x_t^2,...x_t^L) Xt=(xt1,xt2,...xtL)代表 L L L个timestamps。其中每个时刻对应的输出值则记为 y y y。
离散or 回归则取决于Y的数据是连续的还是discrete的。
目标:根据历史时刻的输入数据和输出数据找到映射函数
y
~
T
\widetilde{y}_T
y
T,数学表达式为:
y
~
T
=
f
(
X
,
y
)
\widetilde{y}_T=f(X, y)
y
T=f(X,y)
3. Methodology(方法论)
3.1 Overview
本章结构:
- ,we first give the overview of the model we proposed
- we will detail the evolutionary attention-based LSTM.
- we present the competitive random search and a collaborative(协作训练) training mechanism.
工作流程如下:
3.2 整体算法流程
- 定义注意力层的权重为:
W = ( W 1 , W 2 , . . . , W L ) W=(W^1, W^2, ..., W^L) W=(W1,W2,...,WL)
这里的L是timestamps的个数。根据注意力层的权重对LSTM层的输出进行采样。
X ~ t = ( x t 1 W 1 , x t 1 W 2 , . . . , x t 1 W L ) \widetilde{X}_t=(x_t^1W^1, x_t^1W^2, ..., x_t^1W^L) X t=(xt1W1,xt1W2,...,xt1WL) - 然后把
X
~
t
\widetilde{X}_t
X
t喂到LSTM层中,LSTM的计算公式:
- 作者把 h t − 1 h^{t-1} ht−1作为输出 y ~ t \widetilde{y}_t y t,然后拼成一个矩阵。 y ~ T = ( y ~ 1 , y ~ 2 , . . . , y ~ T ) \widetilde{y}_T=(\widetilde{y}^1, \widetilde{y}^2, ..., \widetilde{y}^T) y T=(y 1,y 2,...,y T)
3.2 Competitive random search
- 把part a中的权重进行二进制编码, 每一个个体 W i W_i Wi 对应的权重传递到 part b,利用遗传算法筛选出最合适的 权重组合
这里并未使用所有的权重,而是挑选出了最合适的权重,umm,跟原来想的不太一样,原本以为是通过遗传算法训练attention的weight,现在只是通过遗传算法找到那些weight合适,其实做了一个筛选操作。送到LSTM神经网络中根据误差进行训练。
2. 然后重复步骤 c。
3. 最后构建新的种群。