Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
- R. Skerry-Ryan, Eric Battenberg, +6 authors R. Saurous
-
International Conference on Machine Learning
- 24 March 2018
1、通过韵律潜在空间的迁移来捕获语音中有意义的变化(即,使用潜在表示来使一个语音听起来像另一个一样);
Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis
- Younggun Lee, Taesu Kim
-
ICASSP - IEEE International Conference on…
- 6 November 2018
使用细粒度时间结构(通过调整可变长度韵律嵌入(对齐、下采样))来编码与来自对齐的目标语谱图的输入序列中的每个音素相关联的韵律(完成细粒度韵律控制)
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
- C. Chien, Hung-yi Lee
-
Spoken Language Technology Workshop
- 12 November 2020
分层韵律建模框架,其中音素级韵律预测以词级韵律预测为条件,以结合音素级和词级韵律建模的优势。通过客观和主观评价,我们验证了所提出的分层模型优于任何其他感兴趣的韵律建模范式。
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- Yuxuan Wang, Daisy Stanton, +7 authors R. Saurous
-
International Conference on Machine Learning
- 23 March 2018
我们提出了“global style tokens”(GST)学习对大量声学表达能力进行建模
Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis
- Ya-Jie Zhang, Shifeng Pan, Lei He, Zhenhua Ling
-
Computer Science
ICASSP - IEEE International Conference on…
- 11 December 2018
我们将变分自编码器(VAE)引入到一个端到端语音合成模型中,以无监督的方式学习说话风格的潜在表示。通过VAE学习的样式表示具有解纠缠、缩放和组合等良好的特性,这使样式控制变得容易。
Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation
- Guangyan Zhang, Ying Qin, Tan Lee
-
Interspeech
- 25 October 2020
Interactive Multi-Level Prosody Control for Expressive Speech Synthesis
- Tobias Cornille, Fengna Wang, Jessa Bekker
-
ICASSP - IEEE International Conference on…
- 23 May 2022
Fine-grained robust prosody transfer for single-speaker neural text-to-speech
- V. Klimkov, S. Ronanki, J. Rohnke, Thomas Drugman
-
Interspeech
- 4 July 2019
VAE+phoneme level
SUN G, ZHANG Y, WEISS R J, 等. Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis[C/OL]//ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain. 2020.
(以词级为条件利用条件VAE指导phoneme级韵律合成)
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
- Noé Tits, Fengna Wang, +2 authors T. Dutoit
- Published in Interspeech 27 March 2019
(潜在空间信息与声学特征之间的关系)
Y. Lei, S. Yang, X. Wang and L. Xie, "MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 853-864, 2022, doi: 10.1109/TASLP.2022.3145293.