③ (北科)Semi-Structural Interview-Based Chinese Multimodal Depression Corpus Towards...

from:
论文地址
数据可获得
期刊名:IEEE Transactions on Affective Computing
全文16页

摘要

Abstract—Depression is a common psychiatric disorder worldwide. However, in China, a considerable number of patients with depression are not diagnosed, and most of them are not aware of their depression. Despite increasing efforts, the goal of automatic depression screening from behavioral indicators has not been achieved. A major limitation is the lack of available multimodal depression corpus in Chinese since linguistic knowledge is crucial in clinical practice. Therefore , we first carried out a comprehensive survey with psychiatrists from a renowned psychiatric hospital to identify key interview topics which are highly related to the diagnosis of depression. Then , a semi-structural interview study was conducted over a year with subjects who have undergone clinical diagnosis and professional assessment. After that, Visual, acoustic ,and textual features were extracted and analyzed between the two groups, statistically significant differences were observed in all three modalities. Benchmark evaluations of both single modal and multimodal fusion methods of depression assessment were also performed. A multimodal transformer-based fusion approach achieved the best performance. Finally, the proposed Chinese Multimodal Depression Corpus (CMDC) was made publicly available after de-identification and annotation. Hopefully, the release of this corpus would promote the research progress and practical applications of automatic depression screening.
抑郁症是全球常见的精神疾病。然而,在中国,许多抑郁症患者没有被诊断出来,而且大多数人并没有意识到自己的抑郁症。尽管在这方面的努力不断增加,但从行为指标中自动筛查抑郁症的目标仍未实现。一个主要的限制是缺乏中文的多模态抑郁症语料库,因为语言学知识在临床实践中至关重要。因此,我们首先与一家著名精神病医院的精神病学家进行了全面调查,以确定与抑郁症诊断密切相关的关键访谈话题。然后,进行了为期一年的半结构化访谈研究,研究对象是经过临床诊断和专业评估的受试者。之后,提取并分析了两组之间的视觉、声学和文本特征,在所有三种模态中均观察到了统计显著差异。还进行了抑郁症评估的单一模态和多模态融合方法的基准评估。一种基于多模态变压器的融合方法取得了最佳性能。最后,拟议的中文多模态抑郁症语料库(CMDC)在去标识和注释后公开提供。希望该语料库的发布能促进自动抑郁症筛查的研究进展和实际应用。
Index Terms—Affective computing, depressive disorder, multimodal corpus, semi-structural interview

note:
在这里插入图片描述

1 INTRODUCTION

DEPRESSION, otherwise known as major depressive disorder (MDD), is a common psychiatric disorder that negatively impacts a person’s way of thinking, feeling, and behavior [1]. With the rapid development of society and the increasing pressure on people’s work and life, depression has become one of the most common and serious mental diseases worldwide [2]. Up to now, the number of patients with depression in China has increased to 95 million, becoming the country with the largest number of depressive patients in the world [3]. According to an epidemiological survey: In China, the lifetime prevalence rate of depression is about 6.9%, of which less than 10% of patients with depression have received professional assistance and treatment, and a considerable number of patients are not aware of their depression [4]. On the other hand, among the few patients who seek treatment in time, the first hospital most of them visit is not psychiatric hospitals nor the psychiatric department of general hospitals, which is easy to cause misdiagnosis, and ultimately delay the treatment. Consequently, the National Health Commission of China issued the first action plan for the prevention and control of depression, entitled “Action Plan for Explorations of Specialized Services for the Prevention and Treatment of Depression”, on September 11, 2020, which includes the routine screening of depression throughout the country [5].
抑郁症,也称为重度抑郁症(MDD),是一种常见的精神疾病,负面影响着人的思维、情感和行为。随着社会的快速发展以及人们工作和生活压力的增加,抑郁症已成为全球最常见且最严重的精神疾病之一。到目前为止,中国的抑郁症患者数量已增加到9500万,成为全球抑郁症患者最多的国家。根据一项流行病学调查:中国的抑郁症终生患病率约为6.9%,其中不到10%的抑郁症患者接受了专业的帮助和治疗,且有相当数量的患者没有意识到自己的抑郁症。另一方面,在那些及时寻求治疗的少数患者中,大多数首先就诊的医院既不是精神病医院也不是综合医院的精神科,这容易导致误诊,最终延误治疗。因此,中国国家卫生健康委员会于2020年9月11日发布了首个抑郁症防治行动计划,名为“抑郁症防治专项服务探索行动计划”,包括在全国范围内常规筛查抑郁症。

Screening and prevention of depression are of great significance. However, traditional questionnaire-based screening of depression is facing problems of lacking well-trained healthcare personnel since clinical interviewed-based screening is labor-intensive and self-evaluation questions lack accuracy [6]. Many symptoms of depression are considered observable [7], [8], [9]. The Diagnostic and Statistical Manual of Mental Disorders (DSM) is the standard of psychiatric diagnosis, which describes a series of audiovisual behavioral indicators of depression [10]. However, these indicators are often not fully considered when screening, diagnosing, and evaluating depression [11]. The assessment of depression relies almost entirely on patients’ orally reported symptoms described in particular questionnaires [12], such as the clinician-administered Hamilton Rating Scale for Depression (HAMD) [13] and the self-report Patient Health Questionnaire (PHQ-9) [14]. Although these tools are useful, they neither include visual, acoustic, or textual indicators of depression. To overcome this limitation, recent advancements in techniques for automatic analysis of human behaviors, such as computer vision, speech signal processing, natural language understanding, and multimodal learning could play an important role [15], [16].
抑郁症的筛查和预防具有重要意义。然而,传统的基于问卷的抑郁症筛查面临缺乏训练有素的医疗人员的问题,因为基于临床面试的筛查劳动强度大,而自我评估问卷缺乏准确性。许多抑郁症症状被认为是可观察的。精神疾病诊断和统计手册(DSM)是精神病学诊断的标准,描述了一系列抑郁症的视听行为指标。然而,这些指标在筛查、诊断和评估抑郁症时往往没有得到充分考虑。抑郁症的评估几乎完全依赖于患者在特定问卷中口头报告的症状,例如临床医师评定的汉密尔顿抑郁量表(HAMD)和自我报告的患者健康问卷(PHQ-9)。尽管这些工具很有用,但它们不包括抑郁症的视觉、声学或文本指标。为了克服这一限制,近年来在人类行为自动分析技术方面的进展,如计算机视觉、语音信号处理、自然语言理解和多模态学习,可能发挥重要作用。

note 目前的评估工具:

  • 汉密尔顿抑郁量表(HAMD):由临床医生评定。
  • 患者健康问卷(PHQ-9):患者自我报告。

There are considerable research interests in developing tools to analyze the video [17], audio [18], [19], [20], and text [21] content of clinical interviews automatically as a means of medical aided diagnosis [19], [22], [23], [24], [25]. Despite increasing efforts, the goal of automatically, reliably, and objectively screening of depression from behavioral indicators has not been achieved [26]. Because of the huge population and the prevalence rate of depressive disorder in China, the construction of a multimodal depression corpus with semi-structural interviews in Chinese would be helpful to promote the auxiliary screening of depression based on information technologies, which is expected to realize the automatic primary screening of depression and reduce the medical burden of society.
对开发工具以自动分析临床面试中的视频、音频和文本内容作为医学辅助诊断手段有着相当大的研究兴趣。尽管付出了越来越多的努力,但从行为指标中自动、可靠、客观地筛查抑郁症的目标尚未实现。由于中国人口众多且抑郁症患病率高,构建包含半结构化中文访谈的多模态抑郁症语料库将有助于基于信息技术的辅助筛查,预计可以实现抑郁症的自动初步筛查,减轻社会的医疗负担。
One challenge of automatic depression screening is the lack of available multimodal datasets which contain behavior observations of patients with clinically validated depression [27], [28]. Therefore, in this paper, we proposed the Chinese multimodal depression corpus (CMDC), which is a publicly available multimodal Chinese depression dataset based on clinically validated semi-structural interviews for AI-enabled depression screening, diagnosis, and assessment. The contributions of this paper are as follows:
自动抑郁症筛查的一个挑战是缺乏包含临床验证的抑郁症患者行为观察的多模态数据集。因此,在本文中,我们提出了中国多模态抑郁症语料库(CMDC),这是一个基于临床验证的半结构化访谈的公开可用的多模态中文抑郁症数据集,用于支持人工智能驱动的抑郁症筛查、诊断和评估。本文的贡献如下:

  • Key interview topics for the development of automatic depression screening tools are identified. We performed a comprehensive survey of MDD majored clinicians from a renowned psychiatric hospital in China to identify key interview questions which are highly related to the diagnosis of depression and can be used for AI-enabled automatic depression screening, as shown in Fig. 1.
    为了开发自动抑郁症筛查工具,我们确定了关键的访谈主题。我们对中国一家著名精神病医院的重度抑郁症(MDD)专业临床医生进行了全面调查,以确定与抑郁症诊断高度相关的关键访谈问题,这些问题可以用于人工智能驱动的自动抑郁症筛查,如图1所示。
  • We conducted semi-structural interviews based on key interview questions with participants of MDD and Health Control (HC) subjects who have undergone clinical diagnosis and professional assessment of symptom severity. During the interviews, the video and audio of participants were recorded simultaneously, and the audio was transcribed into text through automatic transcription tools and proofreading.
    我们根据关键访谈问题对重度抑郁症(MDD)患者和健康对照组(HC)参与者进行了半结构化访谈,这些参与者均接受了临床诊断和症状严重程度的专业评估。在访谈过程中,我们同时录制了参与者的视频和音频,并通过自动转录工具和校对将音频转录成文本。
  • Significant features of visual, acoustic, and textual modalities between MDD and HC groups are revealed with statistical analysis. These significant differences among features confirm the feasibility of automatic depression analysis with machine learning methods.
    通过统计分析揭示了重度抑郁症(MDD)组和健康对照(HC)组在视觉、声学和文本模态上的显著特征差异。这些特征之间的显著差异确认了使用机器学习方法进行自动抑郁症分析的可行性。
  • Comprehensive benchmark evaluations are conducted on the proposed dataset to provide a basis for any follow-up depression assessment research. A multimodal transformer-based fusion approach is applied in the domain of depression analysis which achieved the best result during evaluation.
    对所提出的数据集进行了全面的基准评估,以为后续的抑郁症评估研究提供基础。在抑郁症分析领域应用了一种基于多模态Transformer的融合方法,在评估中取得了最佳结果。
  • The proposed Chinese multimodal dataset was made publicly available after annotation and de-identification. To distribute the dataset publicly, we extract the features of key questions related videos and audios with open-source toolkits to eliminate personal information.
    在注释和去标识化后,所提出的中文多模态数据集被公开发布。为了公开分发该数据集,我们使用开源工具提取了与关键问题相关的视频和音频的特征,以消除个人信息。

We believe that the release of this dataset will promote the research progress and practical applications of depression screening and assessment with core affective computing technologies, which could have significant scientific value and broad application for societal questions of mental health.
我们相信,该数据集的发布将促进抑郁症筛查和评估研究的进展,并推动核心情感计算技术的实际应用,这对心理健康的社会问题具有重要的科学价值和广泛的应用前景。

This paper is organized as follows. Section 2 reviews related work on depression-related behavioral patterns, assessment methods, and multimodal datasets. Section 3 and 4 describe the experiment details and feature extraction procedures, respectively. Section 5 presents the results of statistical analysis and benchmark evaluations. Finally, Section 6 concludes the paper.
本文的组织结构如下:

  • 第2节 综述了与抑郁症相关的行为模式、评估方法以及多模态数据集的相关工作。
  • 第3节 和 第4节 分别描述了实验细节和特征提取过程。
  • 第5节 展示了统计分析和基准评估的结果。
  • 第6节 对本文进行了总结。

2 RELATED WORK

2.1 Depression Related Behavioral Patterns

Previous studies have shown significant differences between MDD and HC subjects in several aspects of behavioral signals, such as visual, acoustic, and textual cues observed during interviews [29], [30]. Depression could be depicted in patients’ appearance (facial expression and body posture) [31]. Both global and local facial features, such as eyes and mouth area, are of particular interest for depression assessment. For instance, the eye movements of depressed patients were shown to have statistically significant differences from the HC group [32].It was reported that larger downward angles of gaze, shorter average duration, and less intensity of smiles are the most significant facial cues of depression [33]. Findings regarding psychomotor disturbance of bipolar disorders showed an increase in reaction time in saccadic tasks [34]. Acoustic features were also found consistently different between MDD and HC with large effect sizes [35], [36]. Decreased speech rate and longer reaction time were found in depressed subjects [37]. Pitch and loudness were widely used features in depression detection studies and have been shown to have a negative relationship with depression severity [38], [39], [40]. There are studies showing that textual features also play an important role in depression detection, indicating the importance of semantic information [41], [42], [43]. These explorations facilitate the interpretation of depression behavior since it is obtained through multimodal behavior analysis of clinically matched control depression dataset. Thus, it is important to identify the most meaningful patterns of depressive behavior since behavior is associated with depression-related symptoms in psychiatry studies [27].
以前的研究已经显示,在行为信号的多个方面,如访谈中的视觉、声学和文本线索,重度抑郁症(MDD)患者与健康对照(HC)组之间存在显著差异[29][30]。抑郁症可以通过患者的外貌(面部表情和身体姿势)来描述[31]。特别是面部的整体和局部特征,如眼睛和嘴巴区域,对于抑郁症评估尤为重要。例如,抑郁患者的眼动与HC组存在统计学上的显著差异[32]。有研究报告称,向下的凝视角度较大、平均凝视时间较短、笑容强度较低是抑郁症最显著的面部线索[33]。关于双相情感障碍的精神运动障碍研究显示,眼动任务中的反应时间增加[34]。声学特征在MDD和HC组之间也发现了持续的显著差异,并且效应量较大[35][36]。抑郁患者的语速较慢,反应时间较长[37]。音高和响度是抑郁症检测研究中广泛使用的特征,且与抑郁症的严重程度呈负相关[38][39][40]。有研究表明,文本特征在抑郁症检测中也扮演着重要角色,强调了语义信息的重要性[41][42][43]。这些探索有助于解释抑郁症行为,因为它是通过多模态行为分析与临床匹配的抑郁症数据集获得的。因此,识别最有意义的抑郁行为模式非常重要,因为行为与精神病学研究中的抑郁症相关症状相关[27]。

2.2 Depression Assessment With Behavioral Indicators

There is an increasing number of studies on behavior indicators-based depression detection, and the research trend extends from traditional manually designed features to more advanced deep learning methods [19], [26], [34], [43], [44], [45], [46]. For visual modality, AUs, eye gaze, head poses, and facial landmarks are common features for depression-related analysis and can be combined with machine/deep learning tools for depression detection. Pampouchidou et al. [34] gave a systematic review of visual cues based methods. Recent studies began to pay more attention to the dynamics of visual information with spatial-temporal modeling architectures [26] and graph neural networks [47]. As to audio modality, Mel filters, spectrograms, and emotion feature sets, e.g., eGemaps, were widely adopted [48], [49]. Audio models pretrained on large scale datasets were deployed as features extractors to cope with the bottleneck of small sample number, such as in [19], they used pretrained VGGish [50] to extract the sentence-level vector of each speech segment, and then LSTM network with self-attention mechanism was integrated to train the downstream classification task. A latest review of deep learning based depression analysis with audiovisual cues can be found in [51]. Semantic information is also of great importance in depression detection, previous studies have suggested superior performance of textual features [52]. Contextual sentence embeddings, such as BERT [53], were shown to achieve better performance than that of wordlevel [41]. However, a recent study which used graph neural network to form the embedding of specific nodes of word vectors showed that their method outperforms previous state-of-art methods by a substantial margin [45].
基于行为指标的抑郁症检测研究数量不断增加,研究趋势从传统的手工设计特征扩展到更先进的深度学习方法[19][26][34][43][44][45][46]。在视觉模态方面,动作单位(AUs)、眼动、头部姿势和面部标志是常见的抑郁症相关分析特征,并可以与机器学习/深度学习工具结合用于抑郁症检测。Pampouchidou等人[34]对基于视觉线索的方法进行了系统的综述。最近的研究开始更加关注视觉信息的动态特性,使用了时空建模架构[26]和图神经网络[47]。在音频模态方面,Mel滤波器、声谱图和情感特征集(如eGemaps)被广泛采用[48][49]。音频模型通过在大规模数据集上进行预训练作为特征提取器,以应对样本量不足的问题,例如,在[19]中,他们使用预训练的VGGish[50]提取每个语音片段的句子级向量,然后将LSTM网络与自注意力机制结合,用于训练下游分类任务。最新的基于深度学习的抑郁症分析综述可以在[51]中找到。语义信息在抑郁症检测中也非常重要,之前的研究表明,文本特征表现优越[52]。上下文句子嵌入(如BERT)[53]被证明比词级嵌入具有更好的性能[41]。然而,最近的一项研究使用图神经网络形成特定节点的词向量嵌入,显示其方法比之前的最先进方法有显著优势[45]。

The results of several years’ Audio/Visual Emotion Challenge and Workshop (AVEC) showed that methods based on multimodal fusion usually achieved better results [48], [54], [55]. In terms of fusion paradigm, early fusion, feature-level fusion, and late fusion were all explored by relevant research [46]. With the success of transformers in natural language understanding and computer vision tasks, transformer-based fusion methods have also been proposed as fusion methods and demonstrated their advantages with temporal data by automatically aligning and capturing complementary features [56]. In summary, a wide range of studies have been conducted in the field of depression detection based on multimodal features, and considerable progress has been made in performance, which provides a guarantee for the practicability of preliminary screening for depression based on semi-structured interviews.
多年的音频/视觉情感挑战与研讨会(AVEC)的结果表明,基于多模态融合的方法通常能取得更好的效果[48][54][55]。在融合范式方面,早期融合、特征级融合和晚期融合都被相关研究探索过[46]。随着Transformer在自然语言理解和计算机视觉任务中的成功,基于Transformer的融合方法也被提出,并通过自动对齐和捕捉互补特征,展示了其在处理时间数据方面的优势[56]。总之,在基于多模态特征的抑郁症检测领域已经进行了广泛的研究,并且在性能上取得了相当大的进步,这为基于半结构化访谈的抑郁症初步筛查的实用性提供了保障。

2.3 Multimodal Datasets for Depression Assessment

Well-labeled multimodal recordings of clinically relevant behavioral differences between depressive and healthy subjects are necessary for an automatic screening system to train classifiers [26]. Clinical datasets are hard to construct since the difficulty in participant recruiting, and are usually public unavailable due to the confidentiality of patient data. Table 1 shows a summary of interview-based datasets for depression assessment. We put these together to conduct a thorough analysis to highlight the strength of the proposed dataset.
临床上具有明显行为差异的抑郁症和健康对照受试者的标注良好的多模态记录对于训练分类器的自动筛查系统是必需的[26]。临床数据集的构建困难重重,因为招募参与者很困难,并且由于患者数据的保密性,通常无法公开获取。表1总结了基于访谈的抑郁症评估数据集。我们将这些数据集汇总在一起进行深入分析,以突出我们所提数据集的优势。

Among them, the Distress Analysis Interview Corpus (DAIC-WOZ) [57], University of Pittsburgh depression dataset (Pitt) [7], and Black Dog Institute depression dataset [28] are the three influential depression datasets. Specifically, the BlackDog dataset was collected in a depression-specialized clinical research facility. Their interviews were conducted with specific open-ended questions, such as portrayal of occasions in their life that had excited critical feelings. Until now, the dataset has not been made public yet. The Pitt dataset contained 49 participants in a clinical trial for the treatment of depression [7]. All their participants met DSM-IV criteria for MDD. The severity of depression was evaluated with HAMD-17. This dataset is distributed upon request. The DAIC-WOZ dataset contains audio and facial features of depressed patients and control subjects. The expert evaluated HAMD-17 and self-report PHQ-8 scores are provided for each patient. This archive was also created from semi-structural interviews where research assistants or a computer agent asked a series of questions designed to identify depressive symptoms [57]. There are also depression detection datasets in the AVEC [49], which is a series of competitions that have been held for several years. In AVEC 2013, 2014, 2016, 2017, and 2019, there were sub-challenges in depression detection. The datasets in AVEC 2013 and 2014 were task-driven behavior observations of depressive people, which were not interview-based. For 2016, 2017, and 2019, they all employed the subset of DAIC-WOZ dataset.

其中,Distress Analysis Interview Corpus (DAIC-WOZ) [57]、University of Pittsburgh depression dataset (Pitt) [7] 和 Black Dog Institute depression dataset [28] 是三个具有重要影响力的抑郁症数据集。具体来说,BlackDog 数据集是在一个专门研究抑郁症的临床研究机构收集的。其访谈使用了特定的开放性问题,例如描述生活中引发强烈情感的场景。到目前为止,该数据集尚未公开Pitt 数据集包含了 49 名参与抑郁症治疗临床试验的参与者[7]。所有参与者均符合 DSM-IV 的 MDD 标准。抑郁症的严重程度通过 HAMD-17 评估。该数据集需申请才能获取DAIC-WOZ 数据集包含了抑郁症患者和对照受试者的音频和面部特征。每位患者都提供了专家评估的 HAMD-17 评分和自我报告的 PHQ-8 评分。该数据集也源于半结构化访谈,研究助理或计算机代理询问了一系列旨在识别抑郁症状的问题[57]。此外,还有 AVEC 数据集[49],这是一个已经举办了几年的系列竞赛。在 AVEC 2013、2014、2016、2017 和 2019 中,有抑郁症检测的子挑战。AVEC 2013 和 2014 中的数据集是基于任务驱动的抑郁症行为观察,而非基于访谈。2016、2017 和 2019 年的数据集都采用了 DAIC-WOZ 数据集的子集。

note:

  1. 主要数据集:
  • BlackDog 数据集:在抑郁症专门的临床研究设施收集,通过特定的开放性问题进行访谈,但尚未公开。
  • Pitt 数据集:包含 49 名参与抑郁症治疗的临床试验参与者,数据需申请获取,评估标准为 DSM-IV 和 HAMD-17。
  • DAIC-WOZ 数据集:包含音频和面部特征的记录,通过半结构化访谈进行数据收集,并提供 HAMD-17 和 PHQ-8 评分。
  1. AVEC 数据集:
  • 历史背景:AVEC 是一个已举办多年的系列竞赛,其中的子挑战涉及抑郁症检测。
  • 数据集特点:2013 和 2014 年的数据集是基于任务驱动的行为观察,不是访谈数据;2016、2017 和 2019 年的数据集则使用了 DAIC-WOZ 的子集。

Besides the above three datasets, Lin et al. [58] introduced a new audio-visual dataset containing full body videos for distress detection. Currently, only a few studies attempted to include the body modality which is worth exploring. In terms of dataset construction, their participants were recruited online without clinical diagnosis. Both depression and anxiety participants were included. The comorbidity of depression and anxiety makes it a big challenge to distinguish between these two. For our purpose, to develop prescreening of MDD, including patients with other psychotic disorders may introduce confounding factors. Visual and audio were recorded in their dataset but without text. Jiang et al.’s study [59] has a different research question with a cohort of 12 depressed patients aimed at assessing the recovery, as well as the response to deep brain stimulation treatment [63]. Their interview was unstructured and data are not available. There are also more early studies that are not included in Table 1 since they are not available anymore, such as ORYGEN [64] and MHMC [65], and they are not multimodal datasets, only containing video or audio.
除了上述三个数据集,Lin 等人[58] 引入了一个新的音频-视觉数据集,包含全身视频用于痛苦检测。目前,只有少数研究尝试包括身体模态,这是值得进一步探索的。在数据集构建方面,他们的参与者通过在线方式招募,没有进行临床诊断。数据集中包含了抑郁症和焦虑症参与者。抑郁症和焦虑症的共病使得区分这两者成为一个大挑战。为了我们的目标,开发 MDD 的初步筛查,包含其他精神障碍患者可能会引入混杂因素。他们的数据集中记录了视觉和音频,但没有文本数据。Jiang 等人的研究[59] 有一个不同的研究问题,研究对象是 12 名抑郁症患者,旨在评估其恢复情况以及对深脑刺激治疗的反应[63]。他们的访谈是非结构化的,数据不可获取。还有一些早期的研究没有被包含在表1中,因为它们已经不再可用,例如 ORYGEN [64] 和 MHMC [65],这些数据集也不是多模态数据集,只包含视频或音频。
总结:
在这里插入图片描述
In recent years, Chinese depression datasets were also developed by various studies. Guo et al. [60] proposed a large scale dataset (208 subjects) with audio and video recording while interviewing three categories of questions based on emotion polarity. But the authors claimed that the data would not be disclosed due to privacy issues. The MODMA [61] is a clinically validated and publicly available dataset. However, it only contains audio and EEG signals, while EEG is not feasible for preliminary screening. The newly published dataset by Shen et al. [62] recruited student volunteers from only one university which lacked a diversity of demographic characteristics. Their ground truth was from the Self-rating Depression Scale (SDS) [66]. A study with patients in China showed that SDS is less sensitive than PHQ-9 with statistically significant differences [67]. Moreover, the visual modality was not available in their dataset.
近年来,各种研究也开发了中文抑郁症数据集。Guo 等人[60] 提出了一个大规模数据集(208 名受试者),在采访时录制了音频和视频,问题分为三类,基于情感极性。然而,作者声称由于隐私问题,这些数据将不会公开。MODMA 数据集[61] 是一个经过临床验证并公开可用的数据集。然而,它仅包含音频和 EEG 信号,而 EEG 不适用于初步筛查。Shen 等人[62] 最近发布的数据集招募了来自同一所大学的学生志愿者,缺乏人口统计学特征的多样性。他们的真实数据来自自评抑郁量表(SDS)[66]。在中国的一项研究显示,SDS 的敏感性低于 PHQ-9,存在统计学上的显著差异[67]。此外,他们的数据集中没有视觉模态。

在这里插入图片描述
Considering the goal of preliminary screening of MDD in China, a clinically validated Chinese multimodal depression assessment dataset is certainly beneficial. As to inclusion/exclusion criteria, DAIC-WOZ has a range of depressive symptoms (depression, post-traumatic stress disorder, PTSD, and anxiety). Whether their participants met DSM was not considered, which we believe to be crucial since different psychotic disorders may show different behavior patterns. For the BlackDog dataset, they treated Melancholia and MDD patients as one class because of the relatively small sample size. Datasets of [59] and [60] do not state the diagnostic criteria they used. Datasets of [58] and [62] even do not have clinical diagnoses. Diagnostic criteria matter because depression is confusable with many nondepressive disorders. By using diagnostic criteria and paying attention to behavioral changes of depression, we can exclude other confounding factors. Besides, most of interview questions in previous studies were based on different questionnaires with various question numbers and were somehow random. The determination of the interview topic is key for prescreening which can ensure the accuracy and time consuming of the tool. For example, the number of questions would determine the time of screening and the robustness of the recognition algorithm will be beneficial from the structured interview.
考虑到中国的 MDD 初步筛查目标,一个经过临床验证的中文多模态抑郁症评估数据集无疑是有益的。关于纳入/排除标准,DAIC-WOZ 涵盖了一系列抑郁症状(抑郁症、创伤后应激障碍 PTSD 和焦虑症)。他们的参与者是否符合 DSM 标准并未被考虑,这被认为是至关重要的,因为不同的精神障碍可能表现出不同的行为模式。对于 BlackDog 数据集,他们将忧郁症和 MDD 患者视为一个类别,因为样本量相对较小。[59] 和 [60] 的数据集没有说明他们使用的诊断标准。[58] 和 [62] 的数据集甚至没有临床诊断。诊断标准很重要,因为抑郁症容易与许多非抑郁性障碍混淆。通过使用诊断标准并关注抑郁症的行为变化,我们可以排除其他混杂因素。此外,以往研究中的大多数访谈问题基于不同的问卷,问题数量各异,有时随机性较大。确定访谈主题对初步筛查至关重要,这可以确保工具的准确性和时间效率。例如,问题的数量将决定筛查的时间,而结构化访谈将有助于提高识别算法的鲁棒性。
在这里插入图片描述
Therefore, we conduct a comparably large-scale Chinese multimodal depression study with semi-structural interviews under clinically validated diagnosis. As shown in Table 1, the strength of the proposed dataset are highlighted in the following aspects: first, rigorous inclusion/exclusion criterion: clinical diagnosed pure MDD patients as subjects; second, well-defined semi-structural interview questions through an extensive survey of MDD majored clinicians; third, publicly available multimodal (video, audio, and text) depression dataset in Chinese.
因此,我们进行了一项大规模的中文多模态抑郁症研究,采用临床验证的诊断和半结构化访谈。正如表1所示,所提出的数据集在以下几个方面具有优势:
首先,严格的纳入/排除标准:以临床诊断的纯 MDD 患者为研究对象;
其次,通过对抑郁症专业临床医生的广泛调查,定义明确的半结构化访谈问题;
第三,公开可用的中文多模态(视频、音频和文本)抑郁症数据集。

3 EXPERIMENT

3.1 Participants

This was a cross-sectional study conducted from Nov. 2018 to Jan. 2020. The MDD patients were recruited from Beijing Anding Hospital and the HCs were recruited from Beijing Institute of Technology. In total, Seventy-eight subjects participated this experiment, which included 26 MDD patients (8 males, 18 females) with a mean age of 24.1 (SD ¼ 5.04, age range 19-30 yr), and 52 HC (17 males, 35 females) with a mean age of 30.5 (SD ¼ 12.06, age range 20-60 yr). The research was approved by the Independent Medical of Ethics Committee Board of Beijing Anding Hospital (2019 No. 53). Written informed consent was obtained from each subject. Subjects were asked before the experiment if they would like to be video recorded, and the recorded video may be published in research papers without any personal information in the consent form. Among all subjects, as shown in Table 2, 45 subjects have consented to video recording (19 MDD (14 females, mean age ¼ 23.6, SD ¼ 3.45) and 26 HC (20 females, mean age ¼ 30.5, SD ¼ 11.92)), the rest were audio-recorded only. The distribution of PHQ9 scores of 78 subjects, as well as 45 subjects with video recording, are shown in Fig. 2. The Mini International Neuropsychiatric Interview (MINI) was employed to obtain the diagnosis [68]. All MDD participants met DSM criteria for major depression as assessed by professional psychiatrists. Both HAMD-17 and PHQ-9 were assessed to serve as ground-truth labels for the development of automatic AI tools.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值