A survey on prosody modeling

liujiahui295

已于 2023-04-26 10:45:33 修改

阅读量110

点赞数

文章标签：语音识别人工智能

于 2023-04-22 22:13:22 首次发布

本文链接：https://blog.csdn.net/qq_51589407/article/details/130311110

版权

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

R. Skerry-Ryan, Eric Battenberg, +6 authors R. Saurous
International Conference on Machine Learning
24 March 2018

1、通过韵律潜在空间的迁移来捕获语音中有意义的变化（即，使用潜在表示来使一个语音听起来像另一个一样）；

2、提出了reference encoder的架构

Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis

Younggun Lee, Taesu Kim
ICASSP - IEEE International Conference on…
6 November 2018

使用细粒度时间结构（通过调整可变长度韵律嵌入（对齐、下采样））来编码与来自对齐的目标语谱图的输入序列中的每个音素相关联的韵律(完成细粒度韵律控制）

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

C. Chien, Hung-yi Lee
Spoken Language Technology Workshop
12 November 2020

分层韵律建模框架，其中音素级韵律预测以词级韵律预测为条件，以结合音素级和词级韵律建模的优势。通过客观和主观评价，我们验证了所提出的分层模型优于任何其他感兴趣的韵律建模范式。

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Yuxuan Wang, Daisy Stanton, +7 authors R. Saurous
International Conference on Machine Learning
23 March 2018

我们提出了“global style tokens”（GST）学习对大量声学表达能力进行建模

Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis

Ya-Jie Zhang, Shifeng Pan, Lei He, Zhenhua Ling
Computer Science

ICASSP - IEEE International Conference on…
11 December 2018

我们将变分自编码器(VAE)引入到一个端到端语音合成模型中，以无监督的方式学习说话风格的潜在表示。通过VAE学习的样式表示具有解纠缠、缩放和组合等良好的特性，这使样式控制变得容易。

Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation

Guangyan Zhang, Ying Qin, Tan Lee
Interspeech
25 October 2020

从语音数据中通过矢量量化变分自动编码器 (VQ-VAE) 用于离散化学习到的连续韵律表示学习音节级离散韵律表示。结果表明，与传统的音素级 TTS 系统相比，所提出的音节级神经 TTS 系统产生了更自然的语音,实现了韵律迁移，并且潜在韵律编码可以根据特定的韵律变化来解释。

Interactive Multi-Level Prosody Control for Expressive Speech Synthesis

Tobias Cornille, Fengna Wang, Jessa Bekker
ICASSP - IEEE International Conference on…
23 May 2022

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

V. Klimkov, S. Ronanki, J. Rohnke, Thomas Drugman
Interspeech
4 July 2019

VAE+phoneme level

SUN G, ZHANG Y, WEISS R J, 等. Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis[C/OL]//ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain. 2020.

(以词级为条件利用条件VAE指导phoneme级韵律合成)

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

Noé Tits, Fengna Wang, +2 authors T. Dutoit
Published in Interspeech 27 March 2019

(潜在空间信息与声学特征之间的关系)

Y. Lei, S. Yang, X. Wang and L. Xie, "MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 853-864, 2022, doi: 10.1109/TASLP.2022.3145293.

liujiahui295

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
A survey on prosody modeling

来编码与来自对齐的目标语谱图的输入序列中的每个音素相关联的韵律(完成细粒度韵律控制）的迁移来捕获语音中有意义的变化（即，使用潜在表示来使一个语音听起来像另一个一样）；2、提出了reference encoder的架构。韵律嵌入（对齐、下采样））使用细粒度时间结构（
复制链接

扫一扫