1.基础信息
论文题目:A NEW AMHARIC SPEECH EMOTION DATASET AND CLASSIFICATION BENCHMARK
作者:Ephrem Afele Retta、Eiad Almekhlafi、Richard Sutcliffe、Mustafa Mhamed、Haider Ali、Jun Fengy
单位:School of Information Science and Technology Northwest University Xi’an 710127, China
语料库: Amharic Speech Emotion Dataset (ASED),(埃塞俄比亚、阿姆哈拉语)
特征:①MFCC;②Mel-spectrograms
使用的模型:VGGb、RESNet50, Alex-Netand LSTM.
评价指标:未加权准确率,加权准确率
发表于:ACM Transactions on Asian and Low-Resource Language Information Processing,Accepted on March 2022(美国计算机学会亚洲和低资源语言信息处理汇刊)
2. Introduction
2.1 Significant role of emotion
It is essential to rational decision making and helps us match and understand others’ feelings by conveying our own feelings and giving feedback to others.(它对理性决策至关重要,通过传达自己的感受和反馈给他人,帮助我们匹配和理解他人的感受。)
Emotion conveys considerable information about the mental state of an individual.
2.2 Methods for identifying emotional states
facial expressions, speech, physiological signals, etc.
Several inherent advantages make speech signals a good source for affective computing. For example, compared to many other biological signals (e.g., electrocardiograms), speech signals can usually be acquired more readily and economically.
2.3 Applications of SER
entertainment, computer games, audio monitoring, online learning, clinical research, polygraph tests, and call centers, and so on.
2.4 Audio features
Time-domain features:
The time-domain functions are elementary to extract and allow easy analysis of audio signals.
frequency-domain features:
In the case of small audio datasets, the frequency domain features will show deeper patterns, which may help distinguish the signal’s basic emotion.
Frequency-domain features include spectrograms, Mel Frequency Cepstral Coefficients (MFCCs), spectral centroid, spectral roll-off, spectral entropy, and Chroma coefficients.
In this paper,a comprehensive analysis of each feature was performed during the exploratory data analysis. However, for the purpose of this work, we limited ourselves to two principal features, Mel-spectrograms and MFCC.
2.5 Reasons of building ASED corpus
(1) No SER dataset has been creted for Amharic yet;
(2) Amharic is the second-largest Semiti