变压器次级输出为0v的原因_加速tensorflow中的Google临时融合变压器2 0

变压器次级输出为0v的原因by Martin Holecek, Amp X Machine learning作者: Amp X Machine Learning的Martin HolecekDeep learning has conclusively conquered many areas of machine learning like image recognition, natural ...
摘要由CSDN通过智能技术生成

变压器次级输出为0v的原因

by Martin Holecek, Amp X Machine learning

作者: Amp X Machine Learning的Martin Holecek

Deep learning has conclusively conquered many areas of machine learning like image recognition, natural language processing etc. Time series forecasting, however, is one of the last holdouts, where despite deep neural networks having won several battles, the war is not yet decidedly won.

深度学习已最终征服了机器学习的许多领域,例如图像识别,自然语言处理等。然而,时间序列预测是最后的挑战之一,尽管深度神经网络赢得了数场战斗,但战争仍未赢得胜利。

One of the most recent innovations in this area is the Temporal Fusion Transformer (TFT) neural network architecture introduced in Lim et al. 2019 accompanied with implementation covered here.

该领域的最新创新之一是Lim等人介绍的Temporal Fusion Transformer(TFT)神经网络体系结构 2019年伴随着这里的实施。

TFT brings together several interesting ideas for time series modelling. We wanted to explore the architecture and benchmark it with respect to well established models like the SeriesNet architecture by Shen et al. 2018, based on the popular WaveNet model by Google’s DeepMind.

TFT为时间序列建模汇集了一些有趣的想法。 我们想探索该架构,并针对成熟模型(例如Shen等人的SeriesNet架构)进行基准测试。 2018年,基于Google DeepMind流行的WaveNet模型。

There is a major difference between TFT and SeriesNet both in architectures and implementations.

TFT和SeriesNet在体系结构和实现上都存在重大差异。

SeriesNet has convolutional architecture that can operate on whole sequences, giving results as the convolutional window slides over the data set.

SeriesNet具有可在整个序列上操作的卷积体系结构,当卷积窗口在数据集上滑动时会给出结果。

The TFT architecture is derived from the Seq2Seq approach (with encoder and decoder), thus when operating on a sequence, it needs a specifically prepared (windowed) input. Consequently the training times of TFT are significantly longer than for SeriesNet.

TFT体系结构是从Seq2Seq方法(带有编码器和解码器)派生而来的,因此,在按顺序操作时,它需要专门准备的(窗口式)输入。 因此,TFT的训练时间比SeriesNet的要长得多。

Fortunately, there is a way to build the moving window algorithm to work in an accelerated manner on GPU with TensorFlow.

幸运的是,有一种方法可以构建移动窗口算法,以便在使用TensorFlow的GPU上加速工作。

We have developed, and are sharing here our solutions in TensorFlow model engineering. We have tried to make the code as similar to the original while adding those updates and options for the model:

我们已经开发并在这里分享我们在TensorFlow模型工程中的解决方案。 在添加模型的那些更新和选项时,我们尝试使代码类似于原始代码:

  • Converting the model from TensorFlow version 1 to 2

    将模型从TensorFlow版本1转换为2
  • Creating the model with named inputs (and thus removing the part where it needed to decode all the information from the windows).

    使用命名输入创建模型(从而从窗口中删除需要解码所有信息的部分)。
  • Creating the model wrapped in vectorizing routines.

    创建包装在矢量化例程中的模型。
  • Propagating the vectorization deeper in the model only where it is needed.

    仅在需要的地方在模型中更深入地传播向量化。

(All these steps have their corresponding parameterized examples included in the tests provided, we will occasionally mention the routine names here to make it easier to look everything up in the code .)

(所有这些步骤在提供的测试中都包含了其对应的参数化示例,我们有时会在此处提及例程名称,以使在代码中查找所有内容变得更加容易。)

The re

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值