复现END-TO-END CODE-SWITCHED TTS WITH MIX OF MONOLINGUAL RECORDINGS论文, 理解以及代码, 以及实验结果.

最新推荐文章于 2021-01-29 03:55:21 发布

ruclion

最新推荐文章于 2021-01-29 03:55:21 发布

阅读量741

点赞数

分类专栏：研二-语音合成文章标签：复现论文语音合成 code-switch

本文链接：https://blog.csdn.net/u013625492/article/details/101150532

版权

研二-语音合成专栏收录该内容

36 篇文章 8 订阅

订阅专栏

Show us the samples please? By the way, you had better change the mel loss function into MAE and watch the alignment again.

These plots show that BahdanauMonotonic Attention is better.

What are the advantages of Location Sensitive Attention?

Maybe it is better to let the network learn without any monotonic pressure. However https://arxiv.org/abs/1803.09047 claims to use GMM on Tacotron and obtain better results, especially for longer sequences.

do you have a change related to guided attention?

I am thinking use phone duration information to generate the guided attention for training; 对, 只提供"参考价值", 不用完全相信. 设计网络.

can you provide the code for the GMM attention? I cannot find a working version that gives good alignments anywhere.

I don't have it either anymore. I totally ditched it. You can pick that out from "voice loop" repo.

FORWARD ATTENTION IN SEQUENCE-TO-SEQUENCE ACOUSTIC MODELING FOR SPEECH SYNTHESIS

https://github.com/geneing/WaveRNN-Pytorch Fast WaveRNN

https://github.com/mozilla/TTS/blob/master/notebooks/Benchmark.ipynb

“On-line and linear-time attention by enforcing monotonic align-ments,

机器学习中，是否有给注意力机制加先验的工作或者特殊的初始化方法？

如题，有些问题里注意力有比较明显的规律，例如机器翻译中有些语言对的语序基本一致，这时候能否给注意力读写头加入适当的先验，让网络快速收敛？

自问自答一下，因为今天突然看到一篇文章，已被 ICML 2017 接受：

Online and Linear-Time Attention by Enforcing Monotonic Alignments

去搜索这个的名字, 可能能找到对应的结构. (1)

大意是用抛硬币的方法决定要不要继续往后走，每次只选一个 encoder 的状态来做 context，从而实现 attention 从前往后只走一遍

先近似使用:

注意力有content-based和location-based两种，我觉得location-based很像你说的先验。

参考：http://papers.nips.cc/paper/58

开始写代码: LDE

determined by the language boundary information in the CS text.

performing discriminative code lookup 对于 speaker id来说, 先近似实现, 是不是有可以差异化初始化或者查询的方法?

This design enables the gen-erated speech in a single speaker’s voice. The language embedding and discriminative embedding are jointly learned with the model by back-propagation. 这个也是一个切入点.

The discriminative embedding is obtained by performing discriminative code lookup, and is concate-nated with previous time-step decoder output and context informa-tion before being sent to decoder RNN. 这一点原版论文和大家理解的是不一样的, 这一版代码跑的是原版的Tacotron-2, 而不是微软理解的Tacotron-2.

https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn 论文写的不清楚, 按照自己理解的拼接进去, 其实init的时候也有错误. (2)

np.zeros() 和 list 的区别, 一直报错. return array(a, dtype, copy=False, order=order) ValueError: setting an array element with a sequence.

感觉少了一步decoder!!!!!!!!!!!先不改, 等效果, 然后再改.!!!!!!!!!!!!!! 看不懂, 看不懂, 可能没有错吧. (3)

在文件Architecture_wrappers.py中

https://github.com/begeekmyfriend?tab=repositories 探究人家的东西.

https://github.com/fatchord?tab=repositories 还有他的.

https://github.com/r9y9/gantts VAE另外的一条路.

Tacotron: Advanced attention module (e.g. Monotonic attention) #13

https://github.com/mozilla/TTS/issues/13

https://github.com/mozilla/TTS

https://github.com/mozilla/TTS/blob/master/notebooks/Benchmark.ipynb

Guided Attention Loss #346

https://github.com/Rayhane-mamah/Tacotron-2/issues/346

http://itjcc.com/1172/html 破解ultraledit 26. 等有工资了一定补上去.

https://blog.csdn.net/xiliuhu/article/details/5757305 ultral edit多窗口实现.

统计attention不加限制情况下的单调和不单调情况, 然后再加单调的要求, 这是其实是两条路, 都能解释, 同时在用si-单调, 指导它, 不改变他.

制作基于LJSpeech1.1和标贝的训练数据集和脚本

1. grapheme

Whether to rescale audio prior to preprocessing 参数弄不明白.

rescale = False, #Whether to rescale audio prior to preprocessing

#M-AILABS (and other datasets) trim params
   trim_fft_size = 512,
   trim_hop_size = 128,
   trim_top_db = 60,

也不明白.

sox的使用办法:

https://blog.csdn.net/centnetHY/article/details/88571352

Batch_Size=32 => 16, 因为内存不足.

watch -n 10 nvidia-smi

至于SPE, 代码很好写:

就剩下整理数据, 实验结果了, 放出来一个网页demo.

ruclion

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
复现END-TO-END CODE-SWITCHED TTS WITH MIX OF MONOLINGUAL RECORDINGS论文, 理解以及代码, 以及实验结果.

Show us the samples please? By the way, you had better change the mel loss function into MAE and watch the alignment again.These plots show that BahdanauMonotonic Attention is better.What are the ...
复制链接

扫一扫