pytorch dropout_Listen, Attend and Spell | 源码解析（pytorch）

最新推荐文章于 2022-11-28 21:37:44 发布

weixin_39558317

最新推荐文章于 2022-11-28 21:37:44 发布

阅读量488

点赞数

文章标签： pytorch dropout

本文代码: https://github.com/AzizCode92/Listen-Attend-and-Spell-Pytorch

Listen, Attend and Spell (LAS)主要由两个部分组成，一个是接收语音信息的listener, 另一个是以字符作为输出的speller。其中listener将语音信号

表达为高级特征

Listener

相比于文本序列特征，即便语音转为频谱图特征，其序列长度也是非常长的。为了提高训练速度，本文提出了pyramidal Bi-LSTM (pBLSTM)的结构。在通常的LSTM中，在第

层，第

时刻的输出由下式获得：

而在pBLSTM中，来自上层的输入不再是

时刻的输出，而是前后两个连续时刻输出的拼合：

pBLSTM的结构示意图如下所示：

本文中采用了3层pBLSTM，在上边的示意图中，只有第一层传到第二层，以及第二层传到第三层的时候序列长度发生了减半，实际上，在频谱信息输入到pBLSTM第一层的时候也进行了同样的操作。因此，listener最终输出的序列长度是原输入序列长度的

pytorch中实现pBLSTM的listener代码如下：

class

Speller

Speller根据时间序列依次预测每个时间步上的字符

。在时刻

,decoder的输入包括上一个时刻的state

, 字符

和context

，据此可得该时刻的state

利用该时刻的state

和encoder输出的声音特征

计算加权的context：

其具体的计算过程包括：

计算
与

中每一时刻

的scalar energy
利用softmax对
处理后得到

与

不同时刻相似程度的分布
以
为权重对

进行加权求和

相关公式表达如下：

其中

和

均表示MLP。

Attention的pytorch代码实现如下：

class

根据

和

预测该时刻的字符

其中

由MLP+Softmax构成。

Speller的pytorch实现：

class

@inproceedings{chan2016listen,
  title={Listen, attend and spell: A neural network for large vocabulary conversational speech recognition},
  author={Chan, William and Jaitly, Navdeep and Le, Quoc and Vinyals, Oriol},
  booktitle={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={4960--4964},
  year={2016},
  organization={IEEE}
}

weixin_39558317

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pytorch dropout_Listen, Attend and Spell | 源码解析（pytorch）

本文代码: https://github.com/AzizCode92/Listen-Attend-and-Spell-PytorchListen, Attend and Spell (LAS)主要由两个部分组成，一个是接收语音信息的listener, 另一个是以字符作为输出的speller。其中listener将语音信号表达为高级特征:Listener相比于文本序列特征，即便语音转为频谱图特征，...
复制链接

扫一扫