论文阅读:【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】

论文阅读:【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】

Based on the heatmap regression framework, a Graph CNN is utilized to encode the external geometric knowledge associated with facial geometric constraints and relationships among facial points.
An auxiliary loss is tailored to generate gradients enforcing the backbone model to learn the external knowledge.

  • incorporates the external geometric knowledge to guide the training of the heatmap regression network;
  • capture the facial geometric constraints and relationships among facial points by constructing a graph convolutional neural network.

在这里插入图片描述

Backbone Model:

A heatmap regression-based network, where feature maps contain rich semantic information of AU intensities and locations.

  • For each AU location L k = ( i k , j k ) , ( k = 1 , . . . , K ) L_k=(i_k,j_k), (k = 1, ..., K) Lk=(ik,jk),(k=1,...,K),
    the ground-truth heatmap is produced by applying a Gaussian function:
    S k ( i , j ; X ) = I k 2 π σ 2 exp ⁡ ( − ∣ ∣ ( i , j ) − ( i k , j k ) ∣ ∣ 2 2 2 σ 2 ) S_k(i,j;X)=\frac{I_k}{2\pi\sigma^2}\exp(-\frac{||(i,j)-(i_k,j_k)||_2^2}{2\sigma^2}) Sk(i,j;X)=2πσ2Ikexp(2σ2(i,j)(ik,jk)22).
  • The optimization process with MSE loss is formulated as,
    L S = min ⁡ ϕ I ∑ X ∈ I ∣ ∣ Φ ϕ I ( X ) − S ( X ) ∣ ∣ 2 2 L_S=\min_{\phi_\mathcal{I}} \sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I}}(X)-S(X)||_2^2 LS=minϕIXIΦϕI(X)S(X)22,
    // ϕ I \phi_\mathcal{I} ϕI: the set of weight parameters of the network Φ \Phi Φ.
  • During the inference stage,
    the estimated AU locations are L ^ = arg ⁡ max ⁡ Φ ϕ I ( X ) \hat{L}=\arg\max\Phi_{\phi_\mathcal{I}}(X) L^=argmaxΦϕI(X),
    the corresponding AU intensities are I ^ = max ⁡ Φ ϕ I ( X ) \hat{I}=\max\Phi_{\phi_\mathcal{I}}(X) I^=maxΦϕI(X).

External Geometric Knowledge Module

// The goal of the external geometric knowledge module is to summarize the face shape pattern and the interdependencies of facial points into a latent vector G.

  • 3 GCN layers to extract geometric features:
    obtain a multi-resolution feature representation integrating both low- and high-level geometric information via,
    F ′ = ∣ ∣ l = 1 3 R e L U ( A l − 1 F l − 1 W l − 1 ) F'=||_{l=1}^3ReLU(A^{l-1}F^{l-1}W^{l-1}) F=l=13ReLU(Al1Fl1Wl1),
    // ||: concatenation operation;
    // F l F^l Fl: the l t h l_{th} lth GCN layer;
    // W l − 1 W^{l-1} Wl1: a trainable weight matrix for the specific layer;
    // A l − 1 A^{l-1} Al1: the adjacency matrix determined by the Euclidean distance between the nodes.
  • 3 FC layers to obtain the latent vectors:
    G = F c ( F c ( F c ( C o n v ( F ′ ) ) ) ) G=F_c(F_c(F_c(Conv(F')))) G=Fc(Fc(Fc(Conv(F)))),
    // Conv: 1×1 convolution operation to aggregate the concatenated features.

Knowledge Projection Layer

– During the training process, a linear projection W is applied between the geometric knowledge representation G and the feature maps M before the predicted heatmaps.
// Inject the geometric knowledge into the feature maps M as they are highly correlated with the predicted heatmaps. Moreover, they share the same parameters on former layers and can influence the parameter learning of these layers.

– During the test stage, the external geometric knowledge module is removed, with only the backbone model retained.
// The heatmaps are directly produced by the backbone model that has learned external geometric knowledge.

  • An auxiliary loss L G L_G LG to generate gradients enforcing the backbone model to learn the external knowledge:
    L G = min ⁡ ϕ I , P ∣ ∣ G − W × M ∣ ∣ 2 2 L_G=\min_{\phi_\mathcal{I,P}}||G-W\times M||_2^2 LG=minϕI,PGW×M22,
    // I \mathcal{I} I: the set of training images;
    // P \mathcal{P} P: facial landmark annotations of the training set.
  • With the geometric knowledge introduced, the MSE loss is then reformulated as,
    L S ′ = min ⁡ ϕ I , P ∑ X ∈ I ∣ ∣ Φ ϕ I , P ( X , P ) − S ( X ) ∣ ∣ 2 2 L'_S=\min_{\phi_\mathcal{I,P}}\sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I,P}}(X,P)-S(X)||_2^2 LS=minϕI,PXIΦϕI,P(X,P)S(X)22.
  • The overall loss for joint training is a weighted combination of L G L_G LG and L H ′ L'_H LH,
    L = min ⁡ ϕ I , P ( λ × L G + ( 1 − λ ) × L S ′ ) L=\min_{\phi_\mathcal{I,P}}(\lambda\times L_G+(1-\lambda)\times L'_S) L=minϕI,P(λ×LG+(1λ)×LS).
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值