RNN训练中的Wash-out技术

最近在学习ESN的过程,遇到以下部分源码段:

// Sequence lengths
int W = 100; // Wash-out
int T = 2900; // Training
int F = 20; // Testing
int V = 20; // Validation
int J = (int)Math.floor((W + T) / (W + F)) - 1; // Testing pairs

关于Wash-out的用途,https://zhuanlan.zhihu.com/p/47825594给出解答如下:

昨天用的一个基于Pytorch RNN包在训练的时候,在接口的时候说明支持Wash-out技术,而且输入输出dimension和一般的不太一样,

input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See `torch.nn.utils.rnn.pack_padded_sequence`

washout (batch): number of initial timesteps during which output of the reservoir is not forwarded to the readout. One value per batch's sample.

h_0 (num_layers, batch, hidden_size): tensor containing the initial reservoir's hidden state for each element in the batch. Defaults to zero if not provided.

target (seq_len*batch - washout*batch, output_size): tensor containing the features of the batch's target sequences rolled out along one axis, minus the washouts and the padded values. It is only needed for readout's training in offline mode. Use `prepare_target` to compute it.

其中这个washout就很有趣了,我查了很多外文资料都只是大概提了一下,国内压根就没有资料谈这个,只有在[1]中提到过

For an RNN with hidden layers, the common approach is to initialize the hidden neuron outputs to zero (or random) values and run the network until the effect of the initial values washes out [16], [17]. Some of the drawbacks in the washout method are:
• The washout period is not fixed and hard to determine a priori, during which the network does not produce a reliable prediction.
• The network may become unstable during the washout period, leading to prolonged or even failed training sessions.
• In the training process, the input sequence used in washout does not contribute to the learning process.

接着在它引用的文献里面提到了[2]中提到了这个

"Trick 3. Handling the Uncertainty of the Initial State"

One of the difficulties with finite unfolding in time is to find a proper initialization for the first state vector of the RNN.

就是说RNN的一个问题就是初始化比较困难,一般我们的初始化是把初始状态设置为0,改进的方法就是通过一系列的输入量作为washout来使用,然后用这些输入量z仅仅做前向传播,不做训练,颇有些像高中化学实验称量试剂时,先要用目标试剂润洗容器的步骤,所以叫washout。不过[1]中也提到用作washout的输入不可以再用作training,这也是为什么要 - washout*batch,所以请根据神经网络规模和数据量酌情设置washout比例。第二就是washout期间的nn输出不稳定,请抛掉。

 

[1]Mohajerin, N., & Waslander, S. L. (n.d.). Multi-Step Prediction of Dynamic Systems with Recurrent Neural Networks, 1–13.

[2]Neuneier, R., & Zimmermann, H. G. (1998). Neural Networks: Tricks of the TradeLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1524).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值