这段描述的 FER(面部表情识别)动态模型包括几个核心部分:
To leverage the most effective architectures and build a robust FER model, we employed the Transformer-based temporal aggregation method as well. The implemented FER dynamic model is schematically depicted in Figure 3. Thus, the dynamic model consists of a static feature extractor and the temporal part that consists of three consecutive Transformer-encoder-based layers inspired by [43]. Lastly, the classification or regression head completes the decision making process. For the regression case, a Tanh activation function is used.
-
静态特征提取器:这个模块主要负责提取面部表情的静态特征。静态特征通常包括脸部的基本形态、轮廓、关键点位置等,不会随着时间变化。这些特征帮助模型捕捉静态的情绪信息,作为后续时间序列处理的基础。
-
时间部分(Temporal Part):该模块用于处理面部表情随时间的动态变化。为了模拟时间上的信息,该模型引入了三层基于 Transformer 编码器(Transformer-encoder-based layers)的结构。Transformer 编码器是一种用于处理序列数据的架构