总结一下自己看过的文章目录,以为看了很多,发现才一丢丢,距离读千篇论文的目标还很远啊
前端降噪
- 汪德亮2018–Supervised Speech Separation Based on DeepLearning: An Overview
声码器
- WaveNet:a generate model for raw audio
- WAVGLOW: A flow-based generative network for speech synthesis
- Flowavenet:A Generative Flow for Raw Audio
- LPCNET: IMPROVING NEURAL SPEECH SYNTHESIS THROUGH LINEAR PREDICTION
- WORLD声码器:A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications
- Harvest: A high-performance fundamental frequency estimator from speech signals
- 2018 ins : WaveNet Vocoder with Limited Training Data for Voice Conversion
识别
- x-vector:Deep Neural Network Embeddings for Text-Independent Speaker Verification
- [2019 ASRU] [fanzhiyun] SPEAKER-AWARE SPEECH-TRANSFORMER
- Language Identification with Deep Bottleneck Features
TTS
- Tacotron: Towards End-to-End Speech Synthesis
- tacotron2: Natural TTS Synthesis by Conditioning Wavenet on mel spectrogram predictions
- 2017NIPS----deep voice2:Multi-Speaker Neural Text-to-Speech
- Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
- GST–Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- 2018INPS:Neural Voice Cloning with a Few Samples
- Uncovering Latent Style Factors for Expressive Speech Synthesis
- Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
voice conversion
- 2016ICME:Phonetic posteriorgrams for many-to-one voice conversion without parallel data training
- Non-parallel voice conversion using variational auto-encoders conditioned by phonetic PPGs
- 2019trans–Sequence-to-Sequence Acoustic Modeling for Voice Conversion
- 2019ins:A Vocoder-free WaveNet Voice Conversion with Non-Parallel Data
- 2018ins–Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
- Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
- 2019icas–Cross-lingual Voice Conversion with Bilingual Phonetic PosteriorGram and Average Modeling
- Odessey 2018:Average Modeling Approach to Voice Conversion with Non-Parallel Data
- trans:Voice conversion with SI-DNN and KL divergence based mapping without parallel training data
- Voice Conversion Across Arbitrary Speakers based on a Single Target-Speaker Utterance
- 2018trans,zhangjingxuan----Sequence-to-Sequence Acoustic Modeling for Voice Conversion
- 2018 icassp:improving sequence-to-sequence voice conversion by adding text-supervision[zhangjinxuan]
- 2019trans:Non-Parallel Seq2Seq Voice Conversion with Disentangled Linguistic and Speaker Representations[zhangjingxuan]
- [2019ins] One-shot Voice Conversion with Global Speaker Embeddings
- 2019ins—Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star-GAN
- Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network
- Mellotron:Multi-speaker expressive voice synthesis by conditioning on rhythm, pitch and global style
- [2019 interspeech]One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
- [2019 interspeech]One-shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams
- [2019 ASRU]Zhou Y , Tian X , Emre Yılmaz, et al. A Modularized Neural Network with Language-specific Output Layers for Cross-lingual Voice Conversion[C]// Accepted by ASRU 2019. 2019.
- [2020] Vocoder-free End-to-End Voice Conversion with Transformer Network
GAN
- [2019ASRU]-ON THE STUDY OF GENERATIVE ADVERSARIAL NETWORKS FOR CROSS-LINGUAL VOICE CONVERSION
- [2019 interspeech] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
- [2017][cycle-GAN-vc的初文章]Parallel-data-free voice conversion using cycle-consistent adversarial networks
- [2018][IEEE SLT] StarGAN-VC: non-parallel many-to-many voice conversion with StaGAN
- [2019 interspeech]Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star-GAN
transformer结构
- Attention Is All You Need
- FastSpeech: Fast, Robust and Controllable Text to Speech
- Neural Speech Synthesis with Transformer Network
没有收获的
- [2019 interspeech] Whether To Pretrain DNN or Not?: An Empirical Analysis for Voice Conversion
2020 icassp
- [VAE][one-shot] ONE-SHOT VOICE CONVERSION BY VECTOR QUANTIZATION
- [FHVAE] [情感vc] MULTI-SPEAKER AND MULTI-DOMAIN EMOTIONAL VOICE CONVERSION USING FACTORIZED HIERARCHICAL VARIATIONAL AUTOENCODER
singing VC
- 2019 APSIPA —SINGAN: Singing Voice Conversion with Generative Adversarial Networks
- SINGING VOICE CONVERSION WITH NON-PARALLEL DATA