Introduction 以及 Related work 部分略过
Feature extraction
The current DAIC-WOZ dataset [10] includes audio recordings, audio features, interview transcripts, video features, pixel coordinates for 68 2D facial landmarks, world coordinates for 68 3D facial landmarks, gaze vector, head-pose vector, Histogram of Oriented Gradients (HOG) features, emotion, and Action Unit (AU) labels.(当前的DAIC-WOZ数据集[10]包括音频记录、音频特征、访谈记录、视频特征、68个二维面部地标的像素坐标、68个三维面部地标的世界坐标、注视向量、头姿态向量、方向梯度直方图(HOG)特征、情绪和动作单元(AU)标签。本文采用DAIC-WOZ数据集所提供的特征进行计算)
High level features are those which can be translated to common sense knowledge; for instance head motion, blinking, facial expressions, AUs, and text related features, can be annotated with a high degree of certainty by a human expert. (高层次的特征是那些可以转化为常识知识的特征;例如,头部运动、眨眼、面部表情、AUs和文本相关的特征,可以由人类专家进行高度确定的注释。)
Low level features, on the other hand, are derived from image processing algorithms, which extract descriptors from an image, but cannot be directly translated to human knowledge. Such features extracted here are the Landmark Motion History Images combined with LBP and HOG, Landmark Motion Magnitude, and most of the audio-based features.(另一方面,低层次的特征来自图像处理算法,这些算法从图像中提取描述符,但不能直接翻译成人类知识。本文中提取的这些特征是结合了LBP和HOG的地标运动历史图像、地标运动幅度以及大部分基于音频的特征。)
visual features
landmark motion history images
这部分其实没看太懂,大致说一说叭。
landmark motion history images(LMHI)是在没有视频帧的实际强度的情况下,在提供的2D坐标点上计算的。LMHI将面部特征的运动编码成灰度图,最近的运动对应白色像素,最早的运动对应最深的灰色,时