Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting 阅读笔记

本文提出结合VideoMAE预训练模型和光学流技术,提升长视频中宏表情和微表情识别的鲁棒性,通过自监督学习、区间融合及后处理策略优化检测性能。
摘要由CSDN通过智能技术生成
ACM上的会议文章,中科院自动化所的工作,用于做宏表情与微表情的检测
摘要:
In this paper, we propose a pre-trained model combined with the optical flow method to improve the accuracy and robustness of macro- and micro-expression spotting.
本文提出了一种结合预训练模型和光流方法来提高宏表情和微表情检测的准确性和鲁棒性。
1.简介
In general, tasks related to micro-expression include two main aspects: micro-expression spotting in a long video and emotion recognition in micro-expression clips
一般而言,与微表情相关的任务主要包括两个方面:长视频中的微表情识别和微表情片段中的情绪识别
At the same time, in a long video, there will inevitably be blinking and shaking of the head and other factors to interfere with, which makes micro-expression spotting task challenging.
同时,在长视频中,不可避免地会有眨眼和摇头等因素的干扰,这使得微表情识别任务具有挑战性。
we propose to utilize the mutual integration of pre-trained models and optical flow method [6,7] to spot macro-expression and micro-expression segments in long videos.
我们提出利用预训练模型和光流法的相互融合来检测长视频中的宏表情段和微表情段。
For the small intensity and short duration of micro-expression, the dense optical flow method can effectively capture the optical flow features between different frames, and the division of ROI (regions of interest) can effectively mitigate the interference of non-interested regions.
针对微表情强度小、持续时间短的特点,密集光流法可以有效捕捉不同帧之间的光流特征,ROI (感兴趣区域)的划分可以有效缓解非感兴趣区域的干扰。
本文的贡献:
1. We pre-train a VideoMAE based model, and fine-tuning on micro-expression dataset to spot macro- and micro-expression in long videos.
我们使用基于 VideoMAE 的模型进行预训练,然后在微表情数据集上进行微调,以在长视频中检测宏观和微观表情。
2. We explore the optimal combination method by training multiple models for macro and micro-expression with different fine-grainedness and generating different lengths of expression clips.
我们通过训练多个宏观和微观表情模型,采用不同的细粒度和生成不同长度的表情片段,探索最佳的组合方法。
3. We propose a fusion strategy and post-processing method to complement the spatio-temporal information not captured by the model and to exclude the interference of non-interested regions.
我们提出了一种融合策略和后处理方法,以补充模型未捕获的时空信息,并排除非感兴趣区域的干扰。
3.提出的方法
As shown in the general framework diagram in Figure 1, our method is divided into 3 parts: dataset preprocessing, self-supervised training based on VideoMAE [20], Interval fusion and post-processing strategies.
如图1中的总体框架图所示,我们的方法分为3个部分:数据集预处理、基于VideoMAE的自监督训练、区间融合和后处理策略。
3.1数据预处理
For the training sets and testsets, we use the Dlib toolkit [21] to spot the facial landmarks in each frame and crop out the faces based on the landmark points.
使用Dlib工具包在每一帧中检测面部标记点,并基于这些标记点裁剪出面部,以进行训练集和测试集的准备。
3.2基于VideoMAE的自监督学习
VideoMAE [20] is a self-supervised video pre-training method based on a video masking self-encoder [22,23]. It utilizes the temporal dimension of video as the temporal evolution of a still image and addresses semantic redundancy and temporal correlation in video.
VideoMAE 是基于视频掩模自编码器的自监督视频预训练方法,旨在利用视频的时间维度作为静止图像的时间演变,并解决视频中的语义冗余和时间相关性。
Specifically, VideoMAE consists of four main modules: temporal block embedding, pipe masking, high-capacity encoder, and lightweight decoder.
VideoMAE 包括四个主要模块:时间块嵌入、管道掩模、高容量编码器和轻量级解码器。
3.3区间融合和后处理策略
Then a post-processing strategy is that if the IOU between the expression interval output by the model and the expression interval output by the optical flow method [29] is greater than or equal to a certain value, it is retained.
然后一种后处理策略是,如果模型输出的表达区间与光流法输出的表达区间的IOU大于或等于某个值,则保留该IOU。
5.结论
We propose to utilize a pre-trained model combined with an optical flow method to spot macro-expression and micro-expression segments in long videos.
我们提出利用预训练模型结合光流方法,在长视频中发现宏观表情和微观表情片段。该方法的目标是自动识别微表情和宏观表情。
Therefore, in the future, we hope to investigate methods with greater robustness.
因此,我们希望未来能够研究更具鲁棒性的方法。
  • 10
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

pzb19841116

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值