2021年语音合成论文统计(1~3月)

论文统计每月第一周更新一次,主要跟踪语音合成的发展状况(很多文章都是在会议后才发出,但不影响统计。统计过程难免存在疏漏,因此统计结果仅供参考。读者有什么建议可以直接向我发消息,我将不断修改该统计。历年文章统计可访问 http://yqli.tech/page/tts_paper.html)。如有转载,请注明出处。欢迎关注微信公众号:低调奋进。


 

语音合成文章情况表(单位:篇)

  1月2月3月
前端多音字,韵律,g2p等等。100
声学模型语言特征转声学特征,attention工作以及双重学习175
声码器波形生成133
个性化少数据,脏数据应用等115
多语言多语言多说话人模型000
歌唱合成歌唱和音乐合成012
情感风格和情感220
多模态talking head等等211
声音转换基于GAN方案和特征解耦方案424
其它基于EEG合成,数据,MOS评测以及语音合成的应用110

 

 

 


文章列表:

1月

  类型
1Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesisam
2Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learningfrontend
3Generating coherent spontaneous speech and gesture from textmultimodality
4Creating Song From Lip and Tongue Videos With a Convolutional Vocodermultimodality
5On Interfacing the Brain with Quantum Computers: An Approach to Listen to the Logic of the Mindother
6Whispered and Lombard Neural Speech Synthesisexpression
7Expressive Neural Voice Cloningexpression/

personalization

8High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversionvc
9EmoCat: Language-agnostic Emotional Voice Conversionvc
10Hierarchical disentangled representation learning for singing voice conversionvc
11Adversarially learning disentangled speech representations for robust multi-factor voice conversionvc
12Improved parallel WaveGAN vocoder with perceptually weighted spectrogram lossvocoder

 

2月

 

1Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnetam
2Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesisam
3VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attentionam
4Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Inputam

5

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speecham
6LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Searcham
7Data-Efficient Training Strategies for Neural TTS Systemsmatcham
8Model architectures to extrapolate emotional expressions in DNN-based text-to-speechexpression
9Model architectures to extrapolate emotional expressions in DNN-based text-to-speechexpression
10SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizermodal
11MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Networkother
12Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

personalization

13Anyone GAN Singsing
14Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgramvc
15Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversionvc
16Universal Neural Vocoding with Parallel WaveNetvocoder
17LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generationvocoder
18High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversionvocoder

 

3月

 

1Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learnersam
2Text-to-speech for the hearing impairedam
3Continual Speaker Adaptation for Text-to-Speech Synthesisam
4Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modelingam
5PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTSam
6Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS systemexpression
7STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speechexpression
8What is Multimodality?modal
9CUHK-EE voice cloning system for ICASSP 2021 M2VoC challengepersonal
10Real-time Timbre Transfer and Sound Synthesis using DDSPpersonal
11AdaSpeech: Adaptive Text to Speech for Custom Voicepersonal
12Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speechpersonal
13A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

personal/am

14Latent Space Explorations of Singing Voice Synthesis using DDSPsing
15Learning to Generate Music With Sentimentsing
16crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencodervc
17MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Framesvc
18Axial Residual Networks for CycleGAN-based Voice Conversionvc
19IMPROVING ZERO-SHOT VOICE STYLE TRANSFER VIA DISENTANGLED REPRESENTATION LEARNINGvc
20GAN Vocoder: Multi-Resolution Discriminator Is All You Needvocoder
21Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domainsvocoder
22Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GANvocoder

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

我叫永强

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值