【信息技术】【2013.09】基于听觉模型的语音情感识别_基于人耳听觉机理的音乐情感力度识别-CSDN博客

在这里插入图片描述

本文为土耳其中东技术大学（作者：ENES YUNC ¨ U¨）的硕士论文，共86页。

随着计算技术的出现，人机交互已经超越了简单的逻辑计算。情感计算的目的是在心理水平上改善人机交互，使计算机能够根据人类的需要调整自己的反应。因此，情感计算的目的是通过捕捉来自人类的视觉、听觉、触觉和其他生物特征信号的线索来识别情感。情绪在调节人类与外界的体验和互动中起着至关重要的作用，对人类的决策过程有着巨大的影响。它们是人类社会关系的重要组成部分，在重要的人生决策中起着重要作用。因此，在高层互动中，情感的检测至关重要。每一种情感都有独特的属性，使我们能够识别它们。为同一个话语或句子变化而产生的声音信号，主要是由于生物物理变化（如压力引起的喉咙收缩）由情感触发的。这种声学线索与情感的关系使得语音情感识别成为情感计算领域的一个热门话题。语音情感识别算法的主要目的是从记录的语音信号中检测说话人的情感状态。人耳听觉系统是一种非线性的自适应机制，它包括频率依赖性滤波以及暂时和同时掩蔽。虽然情绪可以表现在使用高质量麦克风记录并使用高分辨率信号处理技术提取的声音信号中，但人类听者只能通过听觉系统获得他/她的可用提示。这种对情感线索的有限获取也降低了主观情感识别的准确性。

本文提出了一种基于人耳听觉系统模型的语音情感识别算法，并对其准确性进行了评价。采用最新的人类听觉滤波器组模型来处理干净的语音信号，然后从输出信号中提取简单的特征，并用于为七种不同的情绪类别（愤怒、恐惧、快乐、悲伤、厌恶、无聊和中性）训练二进制分类器。再使用验证集对分类器进行测试，以评估识别性能。采用德语、英语和波兰语三种语言的情感语音数据库对该方法进行了测试，识别率高达82%。利用德语情感语音数据库对非德语说话人进行的主观实验表明，该系统的性能与人类情感识别相当。

With the advent of computationaltechnology, human computer interaction (HCI) has gone beyond simple logicalcalculations. Affective computing aims to improve human computer interaction ina mental state level allowing computers to adapt their responses according tohuman needs. As such, affective computing aims to recognize emotions by capturingcues from visual, auditory, tactile and other biometric signals recorded fromhumans. Emotions play a crucial role in modulating how humans experience andinteract with the outside world and have a huge effect on the human decisionmaking process. They are an essential part of human social relations and takerole in important life decisions. Therefore detection of emotions is crucial inhigh level interactions. Each emotion has unique properties that make usrecognize them. Acoustic signal generated for the same utterance or sentencechanges primarily due to biophysical changes (such as stress-inducedconstriction of the larynx) triggered by emotions. This relation betweenacoustic cues and emotions made speech emotion recognition one of the trendingtopics of the affective computing domain. The main purpose of a speech emotionrecognition algorithm is to detect the emotional state of a speaker fromrecorded speech signals. Human auditory system is a non-linear and adaptivemechanism which involves frequency-dependent filtering as well as temporal andsimultaneous masking. While emotion can be manifested in acoustic signalsrecorded using a high quality microphone and extracted using high resolutionsignal processing techniques, a human listener has access only to cues whichare available to him/her via the auditory system. This type of limited accessto emotion cues also reduces the subjective emotion recognition accuracy. Aspeech emotion recognition algorithm based on a model of the human auditorysystem is developed and its accuracy is evaluated in this thesis. Astate-of-the-art human auditory filter bank model is used to process cleanspeech signals. Simple features are then extracted from the output signals andused to train binary classifiers for seven different classes (anger, fear,happiness, sadness, disgust, boredom and neutral) of emotions. The classifiersare then tested using a validation set to assess the recognition performance.Three emotional speech databases for German, English and Polish languages areused in testing the proposed method and recognition rates as high as 82% areachieved for the recognition of emotion from speech. A subjective experimentusing the German emotional speech database carried out on non-German speakersubjects indicates that the performance of the proposed system is comparable tohuman emotion recognition.