语音增强改进方法总结_多通道语音增强 ipd 知乎-CSDN博客

本文链接：https://blog.csdn.net/Pandade520/article/details/115092517

Mel frequency power spectrum (MFP) was used for speech enhancement in INTERSPEECH 2013 ：https://bio-asplab.citi.sinica.edu.tw/paper/conference/lu2013speech.pdf
Convolutional maxout neural networks for speech separation：https://ieeexplore.ieee.org/document/7394335
Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks:https://ieeexplore.ieee.org/document/7178896
Convolutional-recurrent neural networks for speech enhancement:https://ieeexplore.ieee.org/document/8462155
K. Tan, D. Wang. A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement[C]//Interspeech. 2018: 3229-3233.：https://www.researchgate.net/publication/325542192_A_Convolutional_Recurrent_Neural_Network_for_Real-Time_Speech_Enhancement

Y. Xu, J. Du, L. Dai, et al. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal processing letters, 2013, 21(1): 65-68.https://ieeexplore.ieee.org/document/6665000
SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement,Interspeech.2016-211:https://www.researchgate.net/publication/307889660_SNR-Aware_Convolutional_Neural_Network_Modeling_for_Speech_Enhancement
Y. Xu, J. Du, Z. Huang, et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement[J]. arXiv preprint arXiv:1703.07172, 2017.https://www.researchgate.net/publication/315489399_Multi-Objective_Learning_and_Mask-Based_Post-Processing_for_Deep_Neural_Network_Based_Speech_Enhancement
Sun, L. , Du, J. , Dai, L. R. , & Lee, C. H. . (2017). Multiple-target deep learning for LSTM-RNN based speech enhancement. Hands-free Speech Communications & Microphone Arrays. IEEE.https://ieeexplore.ieee.org/document/7895577
Z. Wang, D. Wang. Recurrent deep stacking networks for supervised speech separation[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017: 71-75.https://ieeexplore.ieee.org/document/7952120/citations

（VAD）[1] Tian, G. , Du, J. , Xu, Y. , Cong, L. , & Lee, C. H. . (2015). Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments. International Conference on Latent Variable Analysis & Signal Separation. Springer, Cham.http://home.ustc.edu.cn/~gtian09/publications/LVA-ICA2015-015-Gao.pdf
（MFCC）C. Liao, Y. Tsao, X. Lu, et al. Incorporating symbolic sequential modeling for speech enhancement[J]. arXiv preprint arXiv:1904.13142, 2019.：https://arxiv.org/pdf/1904.13142.pdf

Williamson, D. S. , Wang, Y. , & Wang, D. L. . (2016). Complex ratio masking for joint enhancement of magnitude and phase. IEEE International Conference on Acoustics. IEEE.https://ieeexplore.ieee.org/document/7472673
Fu, S. W. , Yu, T. , Lu, X. , & Kawai, H. . (2018). Raw waveform-based speech enhancement by fully convolutional networks. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE.https://ieeexplore.ieee.org/document/8281993/citations#citations
S. Pascual, A. Bonafonte, J. Serra. SEGAN: Speech enhancement generative adversarial network[J]. arXiv preprint arXiv:1703.09452, 2017.https://www.researchgate.net/publication/315682472_SEGAN_Speech_Enhancement_Generative_Adversarial_Network

Macartney, C. , & Weyde, T. . (2018). Improved speech enhancement with the wave-u-net.https://www.researchgate.net/publication/329266468_Improved_Speech_Enhancement_with_the_Wave-U-Net
Szu-Wei, Fu, Tao-Wei, Wang, Yu, & Tsao, et al. (2018). End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing.https://ieeexplore.ieee.org/document/8331910