Three Stream Graph Attention Network using Dynamic Patch Selection for the classification ofME 阅读笔记

最新推荐文章于 2024-04-10 11:40:01 发布

pzb19841116

最新推荐文章于 2024-04-10 11:40:01 发布

阅读量689

点赞数 11

分类专栏：论文解读深度学习计算机视觉文章标签：笔记计算机视觉算法

本文链接：https://blog.csdn.net/pzb19841116/article/details/136678011

版权

计算机视觉同时被 3 个专栏收录

40 篇文章 8 订阅

订阅专栏

论文解读

17 篇文章 0 订阅

订阅专栏

深度学习

14 篇文章 0 订阅

订阅专栏

CVPR2022上的一篇微表情识别的文章，伯克利河滨分校几位科学家的工作，几个笔记总结一下

摘要：

This paperpresents an end-to-end novel three-stream graph attentionnetwork model to capture the subtle changes on the faceand recognize micro-expressions(MEs)by exploiting therelationship between optical flow magnitude,optical flowdirection,and the node locations features.

论文提出了一种全新的三流图注意力网络模型，采用端到端的方法来捕捉面部微妙变化并识别微表情。该模型利用光流幅度、光流方向和节点位置特征之间的关系。

A facial graphrepresentational structure is used to extract the spatial andtemporal information using the three frames.The varyingdynamic patch size of optical flow features is used to extractthe local texture information across each landmark point.

使用面部图表示结构，通过三帧提取空间和时间信息。利用光流特征的动态变化裁剪大小，提取每个标志点周围的局部纹理信息。

1.简介

Re-searchers have been recognizing MEs by using hand-craftedapproaches such as Bi-Weighted Oriented Optical Flow(Bi-WOOF)[4],Local Binary Pattern with Three Orthog-onal Planes(LBP-TOP)[45],and 3D Histogram of Ori-ented Gradient(3DHOG)[21]to extract the textural spatio-temporal information.

研究者一直在使用手工制作方法，如Bi-Weighted Oriented Optical Flow（Bi-WOOF）、Local Binary Pattern with Three Orthogonal Planes（LBP-TOP）和3D Histogram of Oriented Gradient（3DHOG）等，来提取文本空时信息以识别微表情。

(i)subtle and brief behavior,(ii)ephemeral and spontaneouschange in the facial muscle movements,and(iii)short timeduration.

1.微表情具有微妙而短暂的行为特征。2.微表情表现为瞬时而自发的面部肌肉运动的变化。3.微表情具有短时间持续的特征。微表情的特点

end-to-end training of a graph structurethat uses three-stream Graph Attention Network

使用三流图注意力网络进行端到端的图结构训练

a self-attention graph pooling layer by exploiting the relationshipbetween the landmark points location,optical flow magni-tude and the optical flow direction。

引入自注意力图池化层，利用地标点位置、光流幅度和光流方向之间的关系。

use three framesstructure connections to exploit the spatio-temporal infor-mation.

通过三帧结构连接来利用空时信息。

The varying patch size across each landmark pointis dynamically selected based on the optical flow informa-tion.

根据光流信息动态选择每个地标点周围的补丁大小。

补丁大小的展示

To address the unbalanceddata samples issue,we use videos from the other datasets ofthe same class to increase the number of samples.Alongwith the above data augmentation method,we use variousvalues of magnification factors in EMM[38]techniques toincrease the number of data samples for classes with smallernumbers,thus balancing the dataset.

使用其他相同类别数据集的视频以增加样本数量，同时使用数据增强方法和 EMM[38] 技术中的放大因子来平衡数据集。

2.相关工作与贡献

In MER,the pre-processing stage includes allprocesses such as image resizing,alignment,motion magni-fication,and frame selection approaches that must be com-pleted before meaningful feature extraction can begin.

MER中的预处理阶段包括图像调整、对齐、运动放大和帧选择等所有过程，在有意义的特征提取开始之前必须完成。

We propose an end-to-end landmark-assisted three-stream Graph Attention Network with a self-attentiongraph pooling,which integrates optical flow magni-tude,optical flow direction and the landmark pointslocation features.

我们提出了一种端到端的、以关键点辅助的三流图注意力网络，其中包括自注意图池化，整合了光流幅度、光流方向和关键点位置特征。

We propose a dynamic selection of varying patch sizeacross each landmark points to capture the change inoptical flow magnitude and direction features.

我们提出了在每个关键点周围动态选择不同补丁大小的方法，以捕捉光流幅度和方向特征的变化。

3.提出的模型

整个模型的结构

3.1特征点检测与动态补丁尺寸选择

The graph is constructed using the 51 landmark points.The points are connected based on the human facial struc-ture.

一共使用了51个特征点。

At this point,we have 51 points with a(max-min)value calculated for an entire video.Next,wecalculate the percentile score component of these 51 pointsof the optical flow magnitude.

先计算出51个点的光流强度最大最小值，再将51个点的光流强度转化成一个百分比。

对补丁的大小按如下方法设定

We select a dynamic patch size in our approach basedon the above algorithm to capture the subtle changes ofmicro-expressions across each landmark point.

通过上述算法，我们的方法选择了动态补丁大小，以捕捉每个关键点上微表情的细微变化。

这个是动态设置的效果

The op-tical flow feature matrix of size(NxN)is computed,whereN is the patch size selected.After computation of the opti-cal flow feature matrix across each landmark,we zero padthe feature matrix to 10×10 patch size to make computationeasier.

计算了尺寸为（N×N）的光流特征矩阵，其中N是选择的补丁大小。在计算了每个关键点上的光流特征矩阵后，对特征矩阵进行了零填充，使其变成10×10的补丁大小，以便简化计算。

The feature matrix is flattened to a 1D vectorof the feature vector as shown in Fig.3.The optical flowmagnitude feature vector is an input to the second stream,and the optical flow direction feature vector is an input tothe third stream of the graph network.

特征矩阵被展平成一个1D向量，作为图3所示的特征向量。光流幅度特征向量是图网络的第二个流的输入，而光流方向特征向量是图网络的第三个流的输入。

3.2图注意力网络

The attention weightsindicate the importance of node features of one node to an-other node.

注意力权重表示一个节点的特征对另一个节点的重要性。这些权重反映了节点之间信息传递的重要性。

3.3自注意力池化层

First,the self-attention graph pooling layer calculates the attention scoresfrom the graph attention layer.Later,it selects the top-knodes to remain in the graph based on the attention scoredetermined from the graph attention layer for the nodes andalso based on the ratio k selected.

自注意力图池化层首先计算从图注意力层获取的注意力分数。然后，根据节点的注意力分数以及所选的池化比率 k，选择保留在图中的前 k 个节点。

Finally,based on the idsof the nodes remaining and their connections between thenodes,a new feature matrix and the new adjacency matrixare created to form a new graph structure,respectively.

基于保留节点及其之间的连接，创建新的特征矩阵和新的邻接矩阵，形成新的图结构。

3.4三流图注意力网络

For the first stream of the graph network,the node featurevector is the x and y location coordinates of the landmarkpoints.The node location features help in understandingthe change in the movement of each landmark point w.r.t toits previous position.For the second and the third streamof our network,we use the varying patch size of the opti-cal flow magnitude features and the optical flow directionfeatures.

对于图网络的第一流，节点特征向量是面部地标点的 x 和 y 位置坐标，用于理解每个地标点相对于其先前位置的运动变化。对于网络的第二和第三流，使用光流幅度特征和光流方向特征的变化补丁大小。

At the end of the readout layerof the three-stream graph networks,the results are concate-nated for the graph representation of the three streams.Fi-nally,the output is passed through the fully connected layerand softmax layer for classification.

在三流图网络的读出层末端，将三个流的结果连接起来形成图表示，最终通过全连接层和 softmax 层进行分类。

4.实验

没啥好说的，肯定是他的好。

5.结论与未来工作

In this paper,we proposed a Three-stream Graph Atten-tion Network for the node location features,optical flowmagnitude,and optical flow direction features with the helpof three frames structures to extract the spatio-temporal in-formation.

本文提出了一种三流图注意力网络，用于提取节点位置特征、光流幅度和光流方向特征，借助三帧结构来提取时空信息。

We designed an algorithm to dynamically selectthe varying patch size across each landmark point for theoptical flow features to be extracted.

我们设计了一种算法，动态选择每个地标点周围的光流特征的不同补丁大小。

pzb19841116

关注

11
点赞
踩
19

收藏

觉得还不错? 一键收藏
打赏
0
评论
Three Stream Graph Attention Network using Dynamic Patch Selection for the classification ofME 阅读笔记

对于图网络的第一流，节点特征向量是面部地标点的 x 和 y 位置坐标，用于理解每个地标点相对于其先前位置的运动变化。在计算了每个关键点上的光流特征矩阵后，对特征矩阵进行了零填充，使其变成10×10的补丁大小，以便简化计算。光流幅度特征向量是图网络的第二个流的输入，而光流方向特征向量是图网络的第三个流的输入。然后，根据节点的注意力分数以及所选的池化比率 k，选择保留在图中的前 k 个节点。1.微表情具有微妙而短暂的行为特征。基于保留节点及其之间的连接，创建新的特征矩阵和新的邻接矩阵，形成新的图结构。
复制链接

扫一扫