"Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text" 论文笔记

最新推荐文章于 2024-12-02 16:14:27 发布

我写代码像cxk

最新推荐文章于 2024-12-02 16:14:27 发布

阅读量886

点赞数 2

本文链接：https://blog.csdn.net/Nefelibata_eve/article/details/103901898

版权

本文介绍了使用DAIC-WOZ数据集进行抑郁症评估的研究，重点提取了视觉特征，包括地标运动历史图像、地标运动幅度、头部运动、眨眼率以及情绪、AUs和姿态。低层次特征涉及图像处理算法，如LBP和HOG，而高层次特征则涉及人类可理解的知识，如头部运动和眨眼。通过特征选择和分类算法（如决策树），实现不同信息源的融合以提升评估准确性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Introduction 以及 Related work 部分略过

Feature extraction

The current DAIC-WOZ dataset [10] includes audio recordings, audio features, interview transcripts, video features, pixel coordinates for 68 2D facial landmarks, world coordinates for 68 3D facial landmarks, gaze vector, head-pose vector, Histogram of Oriented Gradients (HOG) features, emotion, and Action Unit (AU) labels.（当前的DAIC-WOZ数据集[10]包括音频记录、音频特征、访谈记录、视频特征、68个二维面部地标的像素坐标、68个三维面部地标的世界坐标、注视向量、头姿态向量、方向梯度直方图(HOG)特征、情绪和动作单元(AU)标签。本文采用DAIC-WOZ数据集所提供的特征进行计算）

High level features are those which can be translated to common sense knowledge; for instance head motion, blinking, facial expressions, AUs, and text related features, can be annotated with a high degree of certainty by a human expert. （高层次的特征是那些可以转化为常识知识的特征;例如，头部运动、眨眼、面部表情、AUs和文本相关的特征，可以由人类专家进行高度确定的注释。）

Low level features, on the other hand, are derived from image processing algorithms, which extract descriptors from an image, but cannot be directly translated to human knowledge. Such features extracted here are the Landmark Motion History Images combined with LBP and HOG, Landmark Motion Magnitude, and most of the audio-based features.（另一方面，低层次的特征来自图像处理算法，这些算法从图像中提取描述符，但不能直接翻译成人类知识。本文中提取的这些特征是结合了LBP和HOG的地标运动历史图像、地标运动幅度以及大部分基于音频的特征。）