2019 Interspeech
1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
- 东京大学
- 端到端多任务学习with self attention,辅助任务是gender。
首先从语谱图提取特征speech spectrogram,而不是用手工特征。然后CNN-BLSTM E2E网络。随后用self attention mechanism聚焦到情感 salient periods。最后考虑到emotion and gender classification tasks之间的相互特征,结合了性别分类作为附加task,与主要任务emotion classification share有用的信息。 - 摘要从人机交互应用说明SER has attracted great attention,更有画面感。介绍,分别叙述了特征、语谱图的优越性 、HMM GMM SVM等traditional machine learning approaches, CNN RNN traditional machine learning approaches。
- multi-headed self attention
- 提取语谱图:长度归一化到7.5s,不足的补零,长的cut。Hanning windows 800。sampling rate 16000Hz.
短时傅里叶变换 - α \alpha α和