论文阅读：【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】

最新推荐文章于 2024-07-23 11:05:06 发布

BoeTh_00

最新推荐文章于 2024-07-23 11:05:06 发布

阅读量315

点赞数

分类专栏：论文阅读笔记文章标签：计算机视觉

本文链接：https://blog.csdn.net/qq_34855411/article/details/122014529

版权

论文阅读笔记专栏收录该内容

3 篇文章 1 订阅

订阅专栏

论文阅读：【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】

Based on the heatmap regression framework, a Graph CNN is utilized to encode the external geometric knowledge associated with facial geometric constraints and relationships among facial points.
An auxiliary loss is tailored to generate gradients enforcing the backbone model to learn the external knowledge.

incorporates the external geometric knowledge to guide the training of the heatmap regression network;
capture the facial geometric constraints and relationships among facial points by constructing a graph convolutional neural network.

在这里插入图片描述

Backbone Model:

A heatmap regression-based network, where feature maps contain rich semantic information of AU intensities and locations.

For each AU location $L_k=(i_k,j_k), (k = 1, ..., K)$ ,
the ground-truth heatmap is produced by applying a Gaussian function:
$S_k(i,j;X)=\frac{I_k}{2\pi\sigma^2}\exp(-\frac{||(i,j)-(i_k,j_k)||_2^2}{2\sigma^2})$ .
The optimization process with MSE loss is formulated as,
$L_S=\min_{\phi_\mathcal{I}} \sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I}}(X)-S(X)||_2^2$ ,
// $\phi_\mathcal{I}$ : the set of weight parameters of the network $\Phi$ .
During the inference stage,
the estimated AU locations are $\hat{L}=\arg\max\Phi_{\phi_\mathcal{I}}(X)$ ,
the corresponding AU intensities are $\hat{I}=\max\Phi_{\phi_\mathcal{I}}(X)$ .

External Geometric Knowledge Module

// The goal of the external geometric knowledge module is to summarize the face shape pattern and the interdependencies of facial points into a latent vector G.

3 GCN layers to extract geometric features:
obtain a multi-resolution feature representation integrating both low- and high-level geometric information via,
$F'=||_{l=1}^3ReLU(A^{l-1}F^{l-1}W^{l-1})$ ,
// ||: concatenation operation;
// $F^l$ : the $l_{th}$ GCN layer;
// $W^{l-1}$ : a trainable weight matrix for the specific layer;
// $A^{l-1}$ : the adjacency matrix determined by the Euclidean distance between the nodes.
3 FC layers to obtain the latent vectors:
$G=F_c(F_c(F_c(Conv(F'))))$ ,
// Conv: 1×1 convolution operation to aggregate the concatenated features.

Knowledge Projection Layer

– During the training process, a linear projection W is applied between the geometric knowledge representation G and the feature maps M before the predicted heatmaps.
// Inject the geometric knowledge into the feature maps M as they are highly correlated with the predicted heatmaps. Moreover, they share the same parameters on former layers and can influence the parameter learning of these layers.

– During the test stage, the external geometric knowledge module is removed, with only the backbone model retained.
// The heatmaps are directly produced by the backbone model that has learned external geometric knowledge.

An auxiliary loss $L_G$ to generate gradients enforcing the backbone model to learn the external knowledge:
$L_G=\min_{\phi_\mathcal{I,P}}||G-W\times M||_2^2$ ,
// $\mathcal{I}$ : the set of training images;
// $\mathcal{P}$ : facial landmark annotations of the training set.
With the geometric knowledge introduced, the MSE loss is then reformulated as,
$L'_S=\min_{\phi_\mathcal{I,P}}\sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I,P}}(X,P)-S(X)||_2^2$ .
The overall loss for joint training is a weighted combination of $L_G$ and $L'_H$ ,
$L=\min_{\phi_\mathcal{I,P}}(\lambda\times L_G+(1-\lambda)\times L'_S)$ .

BoeTh_00

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读：【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】

论文阅读：【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】
复制链接

扫一扫

专栏目录