训练基本的语音合成模型
基于Tacotron汉语语音合成的开源实践(整个训练的流程)
端到端的TTS深度学习模型tacotron(中文语音合成)(对网络结构描述较好)
Neural Speech Synthesis with Transformer Network
开源代码
Speaker adaptation
If you have very limited data, then you can
consider to try fine-turn pre-trained model. For example, using
pre-trained model on LJSpeech, you can adapt it to data from VCTK
speaker p225 (30 mins) by the following command From my experience, it
can get reasonable speech quality very quickly rather than training
the model from scratch.
speedyspeech
FastSpeech Fast, Robust and ControllableText to Speech, 源码
开源的工具箱,集成了tacotron2,transformerv3,fastspeechv3等先进模型
ESPNET-TTS: UNIFIED, REPRODUCIBLE, AND INTEGRATABLE OPEN SOURCE END-TO-END TEXT-TO-SPEECH TOOLKIT
人声的迁移
语音克隆(Voice-Cloning):
所谓voice clone就是,在拿到一个新的没见过speaker的语音之后,只要用户少量的句子(甚至一句), 就可以合成语音来。voice clone包含我们通常用到的adapt和本文新提出的speaker encoding。
最最传统的方式,就是把这些数据加进去微调得到新模型,这也就是clone了。
5秒克隆语音,我也能用周杰伦的声音唱歌了
论文:Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
github源码
语音转换(voice conversion)
语音转换技术综述
语音转换(voice conversion)是这样一个任务:输入一条语音,在保持说话内容不变的情况下,让它听起来像是另一个人说的。一个典型的用例,就是柯南的蝴蝶领结变声器。
语音转换的一般流程分为三步:1. 提取特征;2. 转换特征;3. 重新合成语音。