清华大学LightGrad-TTS,且流式实现

图片

论文链接:

https://arxiv.org/abs/2308.16569

代码地址:

https://github.com/thuhcsi/LightGrad

数据支持:

针对BZNSYP和LJSpeech提供训练脚本

图片

针对Grad-TTS提出两个问题:

  1. DPMs are not lightweight enough for resource-constrained devices.

  2. DPMs require many denoising steps in inference, which increases latency.

提出解决方案:

  1. To reduce model parameters, regular convolution networks in diffusion decoder are substituted with depthwise separable convolutions.

  2. To accelerate the inference procedure, we adopt a training-free fast sampling technique for DPMs (DPM-solver).

  3. Streaming inference is also implemented in LightGrad to reduce latency further.

图片

Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps.

LightGrad流式方案(基于三星论文):

论文链接:

https://arxiv.org/abs/2111.09052

具体实现:

  1. Decoder input is chopped into chunks at phoneme boundaries to cover several consecutive phonemes and the chunk lengths are limited to a predefined range.

  2. To incorporate context information into decoder, last phoneme of the previous chunk and first phoneme of the following chunk are padded to the head and tail of the current chunk.

  3. Then, the decoder generates mel-spectrogram for each padded chunk.

  4. After this, mel-spectrogram frames corresponding to the padded phonemes are removed to reverse the changes to each chunk.

图片

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值