CTC与RNN-T的对比

u013250861

于 2024-06-20 00:10:22 发布

阅读量36

点赞数

分类专栏： # Audio/ASR 文章标签： rnn 人工智能深度学习

本文链接：https://blog.csdn.net/u013250861/article/details/139815977

版权

Audio/ASR 专栏收录该内容

31 篇文章 64 订阅 ¥15.90 ¥99.00

订阅专栏

超级会员免费看

RNN-T models the acoustic and language features jointly,which eliminates the drawbacks in the output-independent CTC model.Nevertheless,this appealing feature comes at thecost of high memory and computation consumption duringtraining.Specifically,the RNN-T loss calculates on a 4-D lat-tice of shape (N,T,U,V),where N is the batch size,T is theoutput length of the acoustic encoder,U is the output lengthof the prediction network,and V is the vocabulary size.

RNN-T模型将声学特性和语言特性联合起来建模，消除了CTC模型中输出无关的缺点。然而，这种引人注目的特性是以训练过程中高内存和计算消耗为代价的。具体来说，RNN-T损失函数在一个形状为(N, T, U, V)的4D张量上进行计算，其中N是批量大小，T是声学编码器的输出长度，U是预测网络的输出长度，V是词汇表大小。

An, K., et al. (2023) BAT: Boundary aware transducer for memory-efficient and low-latency ASR. arXiv:2305.11571 DOI: 10.48550/arXiv.2305.11571

u013250861

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CTC与RNN-T的对比

RNN-T模型将声学特性和语言特性联合起来建模，消除了CTC模型中输出无关的缺点。然而，这种引人注目的特性是以训练过程中高内存和计算消耗为代价的。具体来说，RNN-T损失函数在一个形状为(N, T, U, V)的4D张量上进行计算，其中N是批量大小，T是声学编码器的输出长度，U是预测网络的输出长度，V是词汇表大小。
复制链接

扫一扫

专栏目录