RNN训练中的Wash-out技术

最新推荐文章于 2024-02-17 16:42:18 发布

狡童

最新推荐文章于 2024-02-17 16:42:18 发布

阅读量461

点赞数

分类专栏：神经网络

本文链接：https://blog.csdn.net/haoshu1231/article/details/107196835

版权

神经网络专栏收录该内容

4 篇文章 1 订阅

订阅专栏

最近在学习ESN的过程，遇到以下部分源码段：
// Sequence lengths
int W = 100; // Wash-out
int T = 2900; // Training
int F = 20; // Testing
int V = 20; // Validation
int J = (int)Math.floor((W + T) / (W + F)) - 1; // Testing pairs
关于Wash-out的用途，https://zhuanlan.zhihu.com/p/47825594给出解答如下：

昨天用的一个基于Pytorch RNN包在训练的时候，在接口的时候说明支持Wash-out技术，而且输入输出dimension和一般的不太一样，
input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See `torch.nn.utils.rnn.pack_padded_sequence`

washout (batch): number of initial timesteps during which output of the reservoir is not forwarded to the readout. One value per batch's sample.

h_0 (num_layers, batch, hidden_size): tensor containing the initial reservoir's hidden state for each element in the batch. Defaults to zero if not provided.

target (seq_len*batch - washout*batch, output_size): tensor containing the features of the batch's target sequences rolled out along one axis, minus the washouts and the padded values. It is only needed for readout's training in offline mode. Use `prepare_target` to compute it.
其中这个washout就很有趣了，我查了很多外文资料都只是大概提了一下，国内压根就没有资料谈这个，只有在[1]中提到过
For an RNN with hidden layers, the common approach is to initialize the hidden neuron outputs to zero (or random) values and run the network until the effect of the initial values washes out [16], [17]. Some of the drawbacks in the washout method are:
• The washout period is not ﬁxed and hard to determine a priori, during which the network does not produce a reliable prediction.
• The network may become unstable during the washout period, leading to prolonged or even failed training sessions.
• In the training process, the input sequence used in washout does not contribute to the learning process.
接着在它引用的文献里面提到了[2]中提到了这个

"Trick 3. Handling the Uncertainty of the Initial State"
One of the difficulties with finite unfolding in time is to find a proper initialization for the first state vector of the RNN.
就是说RNN的一个问题就是初始化比较困难，一般我们的初始化是把初始状态设置为0，改进的方法就是通过一系列的输入量作为washout来使用，然后用这些输入量z仅仅做前向传播，不做训练，颇有些像高中化学实验称量试剂时，先要用目标试剂润洗容器的步骤，所以叫washout。不过[1]中也提到用作washout的输入不可以再用作training，这也是为什么要 - washout*batch，所以请根据神经网络规模和数据量酌情设置washout比例。第二就是washout期间的nn输出不稳定，请抛掉。

[1]Mohajerin, N., & Waslander, S. L. (n.d.). Multi-Step Prediction of Dynamic Systems with Recurrent Neural Networks, 1–13.

[2]Neuneier, R., & Zimmermann, H. G. (1998). Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1524).