✔️ 自动抑郁检测:基于情感的音频-文本语料库及GRU/BiLSTM的模型
文章目录
ABSTRACT
Index Terms— Depression detection, Multi-modal fusion,EATD-Corpus
⭐️遇到翻译的问题,或者不太理解的地方,换个翻译工具
✔️ 1. INTRODUCTION引言
Depression is a common mental disorder, the three main symptoms of which are persistent low mood, loss of interest and lack of energy [1, 2]. In the worst case, depression can lead to suicide. According to World Health Organization reports, about 264 million people are suffering from depression worldwide [3]. However, the treatment rate of depressed people remains very low in the whole world [4]. There are mainly two factors accounting for the low treatment rate. Firstly, traditional treatments for depression are time-consuming, costly and sometimes ineffective [5]. The cost of diagnosis and treatment can be a heavy burden for individuals with financial difficulties, and thus makes them reluctant to seek help from physicians. Secondly, during the clinical interviews of depression diagnosis, patients may hide their real mental states in fear of prejudice or discriminatory behaviors towards the depressed people [6, 7].In such cases, the clinician is unable to make a correct diagnosis. The aforementioned factors have necessitated the automatic depression detection system, which can help individuals assess their depressive states privately as well as increase their willingness to consult the psychologists. Furthermore, such a system would be of great help to psychologists in depression diagnosis when patients hide their real mental states.
抑郁症是一种常见的心理疾病,其主要症状包括持续的情绪低落、失去兴趣和缺乏精力[1, 2]。在最严重的情况下,抑郁症可能导致自杀。根据世界卫生组织的报告,全球约有2.64亿人正在遭受抑郁症[3]。然而,全球范围内抑郁症患者的治疗率仍然非常低[4]。造成低治疗率的原因主要有两个。首先,传统的抑郁症治疗方法耗时长、费用高,而且有时效果不佳[5]。对于经济困难的个人来说,诊断和治疗的费用可能是沉重的负担,因此使他们不愿意寻求医生的帮助。其次,在抑郁症诊断的临床面谈中,患者可能会因为担心对抑郁症患者的偏见或歧视行为而隐瞒其真实的心理状态[6, 7]。在这种情况下,临床医生可能无法做出正确的诊断。这些因素突显了自动抑郁症检测系统的必要性,该系统可以帮助个人私密地评估其抑郁状态,同时增加他们咨询心理学家的意愿。此外,这样的系统在患者隐瞒真实心理状态时也能大大帮助心理学家进行抑郁症诊断。
✔️ 2. RELATED WORK AND OUR CONTRIBUTIONS相关工作和我们的贡献
✔️ Automatic depression detection. 自动抑郁检测
Early studies of automatic depression detection were dedicated to extracting effective features from questions that were highly correlated with depression. Sun et al. [8] conducted content analysis to the text transcripts of clinical interviews and manually selected questions related to certain topics (e.g. Sleeping quality or recent feelings). Based on the text features extracted from the selected questions, they used Random Forest to detect depression tendency. Similarly, Yang et al. [9] also manually selected depression related questions after analyzing interview transcripts. They constructed a decision tree with the selected questions to predict the participants’ depression states. Gong and Poellabauer [10] performed topic modeling to split the interviews into topic-related segments, from which audio, video, and semantic features are extracted. They employed a feature selection algorithm to maintain the most discriminating features. Williamson et al. [11] constructed semantic context indicators related to factors such as depression diagnosis, medical/psychological therapy or negative feelings. Utilizing Gaussian Staircase Model, they achieved a good performance in depression detection.
早期的自动抑郁症检测研究致力于从与抑郁症高度相关的问题中提取有效特征。Sun 等人[8] 对临床访谈的文本转录进行了内容分析,并手动选择了与特定话题相关的问题(如睡眠质量或近期情绪)。基于从这些选定问题中提取的文本特征,他们使用随机森林算法来检测抑郁倾向。类似地,Yang 等人[9] 也在分析了访谈转录后手动选择了与抑郁相关的问题。他们构建了一个决策树,通过这些选定的问题来预测参与者的抑郁状态。Gong 和 Poellabauer[10] 进行了主题建模,将访谈分割成与主题相关的段落,并从中提取音频、视频和语义特征。他们采用了特征选择算法以保留最具区分性的特征。Williamson 等人[11] 构建了与抑郁诊断、医疗/心理治疗或负面情绪等因素相关的语义上下文指标。他们利用高斯阶梯模型,在抑郁症检测中取得了良好的性能。
summary : 总结了早期研究中不同的方法和技术,用于自动检测抑郁症的研究,涉及特征提取、决策树、主题建模及语义上下文指标等方面。
Inspired by the emerging deep learning techniques, integrating multi-modal features through deep learning models is particularly promising for depression detection. Yang et al. [12] presented a depression detection model based on deep Convolution Neural Network (CNN). They additionally designed a set of audio and video descriptors to train their model. Tuka et al. [13] proposed a Long Short-Term Memory (LSTM) network to assess depression tendency. They calculated Pearson Coefficients to select audio features and text features that were strongly related to depression severity. With the combination of CNN and LSTM, Ma et al. [14] encoded the depressive audio characteristics to predict the presence of depression. Haque et al. [7] proposed a causal CNN model which summarized acoustic, visual and linguistic features into embeddings which were then used to predict depressive states.
受到新兴深度学习技术的启发,通过深度学习模型整合多模态特征在抑郁症检测中尤为有前景。Yang 等人[12] 提出了一个基于深度卷积神经网络(CNN)的抑郁症检测模型。他们额外设计了一组音频和视频描述符来训练他们的模型。Tuka 等人[13] 提出了一个长短期记忆(LSTM)网络来评估抑郁倾向。他们计算了皮尔逊相关系数,以选择与抑郁严重程度强相关的音频特征和文本特征。Ma 等人[14] 结合 CNN 和 LSTM 对抑郁的音频特征进行了编码,以预测抑郁的存在。Haque 等人[7] 提出了一个因果卷积神经网络模型,该模型将声学、视觉和语言特征总结为嵌入向量,然后用于预测抑郁状态。
summary:描述了基于深度学习的不同方法如何整合多模态特征,以提高抑郁症检测的效果,包括CNN、LSTM和因果卷积神经网络等技术。
✔️ Our motivations and contributions.动机和贡献
In the field of automatic depression detection, several limitations exist in current research. First of all, some methods rely heavily on manually selected questions which requires psychologists’ expertise involved. Besides, all these preset questions have to be answered during the interview, otherwise the analysis may fail. How to improve detection performance without preset questions remains a challenging task. In addition, publicly available depression datasets are scarce due to ethic issues. In this work, we make efforts to overcome the aforementioned drawbacks:
在自动抑郁症检测领域,当前研究存在若干局限性。首先,一些方法过于依赖手动选择的问题,这需要心理学家的专业知识。此外,这些预设的问题在访谈过程中必须得到回答,否则分析可能会失败。如何在没有预设问题的情况下提高检测性能仍然是一个具有挑战性的任务。此外,由于伦理问题,公开可用的抑郁症数据集稀缺。在本研究中,我们努力克服上述缺点:
summary:总结了当前自动抑郁症检测研究中的主要问题,并指出了研究工作试图解决的挑战。
(1) To facilitate study of depression detection, we first establish EATD-Corpus, a publicly available Chinese depression dataset, which comprises audios and text transcripts extracted from the interviews of 162 volunteers.
(1)为了促进抑郁症检测的研究,我们首先建立了 EATD-Corpus,这是一个公开可用的中文抑郁症数据集,包含了从162名志愿者的访谈中提取的音频和文本转录。
(2) We then propose a novel method for automatic depression detection. In this method, a Gate Recurrent Unit (GRU) model and a Bidirectional Long Short-Term Memory (BiLSTM) model with an attention layer are utilized to summarize representations from audio and text features. In addition, a multi-modal fusion network integrates the summarized features to detection depression.
(2)我们随后提出了一种新颖的自动抑郁症检测方法。在该方法中,利用了一个门控循环单元(GRU)模型和一个带有注意力层的双向长短期记忆(BiLSTM)模型来总结音频和文本特征的表示。此外,一个多模态融合网络整合这些总结后的特征以进行抑郁症检测。
✔️ 3. EATD-CORPUS 数据集
The depression datasets are quite scarce [15–19]. To the best of our knowledge, there are only two publicly available datasets referring to depression detection. The first one is DAIC-WoZ which contains recordings and transcripts of 142 American participants who were clinically interviewed by a computer agent [16]. The second one is AViD-Corpus [20] which also contains audios and videos of German participants answering a set of queries or reciting fables. However, the transcripts are not provided by the authors.
抑郁症数据集相当稀缺[15–19]。据我们所知,目前只有两个公开可用的数据集涉及抑郁症检测。第一个是 DAIC-WoZ 数据集,包含了142名美国参与者的录音和转录,这些参与者由计算机代理进行临床访谈[16]。第二个是 AViD-Corpus 数据集[20],也包含了德国参与者回答一系列问题或朗读寓言的音频和视频。然而,作者没有提供转录文本。
In this work, we release a new Chinese depression dataset, namely EATD-Corpus, to facilitate the research in depression detection. EATD-Corpus consists of audios and text transcripts extracted from the interviews of 162 student volunteers recruited from Tongji University. All the volunteers have signed informed consents and guarantee the authenticity of all the information provided. Each volunteer is required to answer three randomly selected questions and complete an SDS questionnaire. The SDS questionnaire consists of 20 items which rate the four common characteristics of depression: the pervasive effect, the physiological equivalents, other disturbances, and psychomotor activities [21]. SDS is a commonly used questionnaire for psychologists to screen depressed individuals in practise. A raw SDS score can be summarized from the questionnaire. For Chinese people, an index SDS score (i.e. raw SDS score×1.25) greater than or equal to 53 implies that he/she is in depression [22]. According to the criterion, there are 30 depressed volunteers and 132 non-depressed volunteers in EATD-Corpus. The overall duration of response audios in the dataset is about 2.26 hours.
在本研究中,我们发布了一个新的中文抑郁症数据集,即 EATD-Corpus,以促进抑郁症检测的研究。EATD-Corpus 包含了从招募自同济大学的162名学生志愿者访谈中提取的音频和文本转录。所有志愿者均已签署知情同意书,并保证所提供信息的真实性。每位志愿者需要回答三个随机选择的问题,并完成一个 SDS 问卷。SDS 问卷包含20个项目,用于评估抑郁症的四个常见特征:普遍影响、生理等效、其他干扰和精神运动活动[21]。SDS 是心理学家在实践中常用的抑郁筛查问卷。可以从问卷中总结出原始 SDS 得分。对于中国人来说,指数 SDS 得分(即原始 SDS 得分×1.25)大于或等于53 表明其处于抑郁状态[22]。根据这一标准,EATD-Corpus 中有30名抑郁志愿者和132名非抑郁志愿者。数据集中响应音频的总体时长约为2.26小时。
The process of constructing EATD-Corpus consists of two steps: data collection and data preprocessing.
✔️ Data collection. 数据收集
An APP, through which a virtual interviewer will ask the interviewee three questions, is developed to conduct the interview and to collect audio responses. The interviewees can record their responses and upload the response audios online. Besides, each volunteer is required to complete an SDS questionnaire, the score of which indicates the depression severity. Currently, 162 volunteers have successfully finished online interviews. Based on their SDS scores, 30 volunteers are regarded in depression and the other 132 volunteers are non-depressive.
开发了一款应用程序(APP),通过该应用程序,虚拟面试官将向受访者提出三个问题,以进行访谈并收集音频回应。受访者可以录制他们的回应并在线上传这些音频。此外,每位志愿者还需要完成一个 SDS 问卷,其得分指示抑郁的严重程度。目前,162名志愿者已经成功完成了在线访谈。根据他们的 SDS 得分,30名志愿者被认为处于抑郁状态,另外132名志愿者为非抑郁状态。
✔️ Data preprocessing. 数据预处理
Several preprocessing operations have been perform