论文阅读:【G2RL: Geometry-Guided Representation Learning for AU Intensity Estimation】
Based on the heatmap regression framework, a Graph CNN is utilized to encode the external geometric knowledge associated with facial geometric constraints and relationships among facial points.
An auxiliary loss is tailored to generate gradients enforcing the backbone model to learn the external knowledge.
- incorporates the external geometric knowledge to guide the training of the heatmap regression network;
- capture the facial geometric constraints and relationships among facial points by constructing a graph convolutional neural network.
Backbone Model:
A heatmap regression-based network, where feature maps contain rich semantic information of AU intensities and locations.
- For each AU location
L
k
=
(
i
k
,
j
k
)
,
(
k
=
1
,
.
.
.
,
K
)
L_k=(i_k,j_k), (k = 1, ..., K)
Lk=(ik,jk),(k=1,...,K),
the ground-truth heatmap is produced by applying a Gaussian function:
S k ( i , j ; X ) = I k 2 π σ 2 exp ( − ∣ ∣ ( i , j ) − ( i k , j k ) ∣ ∣ 2 2 2 σ 2 ) S_k(i,j;X)=\frac{I_k}{2\pi\sigma^2}\exp(-\frac{||(i,j)-(i_k,j_k)||_2^2}{2\sigma^2}) Sk(i,j;X)=2πσ2Ikexp(−2σ2∣∣(i,j)−(ik,jk)∣∣22). - The optimization process with MSE loss is formulated as,
L S = min ϕ I ∑ X ∈ I ∣ ∣ Φ ϕ I ( X ) − S ( X ) ∣ ∣ 2 2 L_S=\min_{\phi_\mathcal{I}} \sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I}}(X)-S(X)||_2^2 LS=minϕI∑X∈I∣∣ΦϕI(X)−S(X)∣∣22,
// ϕ I \phi_\mathcal{I} ϕI: the set of weight parameters of the network Φ \Phi Φ. - During the inference stage,
the estimated AU locations are L ^ = arg max Φ ϕ I ( X ) \hat{L}=\arg\max\Phi_{\phi_\mathcal{I}}(X) L^=argmaxΦϕI(X),
the corresponding AU intensities are I ^ = max Φ ϕ I ( X ) \hat{I}=\max\Phi_{\phi_\mathcal{I}}(X) I^=maxΦϕI(X).
External Geometric Knowledge Module
// The goal of the external geometric knowledge module is to summarize the face shape pattern and the interdependencies of facial points into a latent vector G.
- 3 GCN layers to extract geometric features:
obtain a multi-resolution feature representation integrating both low- and high-level geometric information via,
F ′ = ∣ ∣ l = 1 3 R e L U ( A l − 1 F l − 1 W l − 1 ) F'=||_{l=1}^3ReLU(A^{l-1}F^{l-1}W^{l-1}) F′=∣∣l=13ReLU(Al−1Fl−1Wl−1),
// ||: concatenation operation;
// F l F^l Fl: the l t h l_{th} lth GCN layer;
// W l − 1 W^{l-1} Wl−1: a trainable weight matrix for the specific layer;
// A l − 1 A^{l-1} Al−1: the adjacency matrix determined by the Euclidean distance between the nodes. - 3 FC layers to obtain the latent vectors:
G = F c ( F c ( F c ( C o n v ( F ′ ) ) ) ) G=F_c(F_c(F_c(Conv(F')))) G=Fc(Fc(Fc(Conv(F′)))),
// Conv: 1×1 convolution operation to aggregate the concatenated features.
Knowledge Projection Layer
– During the training process, a linear projection W is applied between the geometric knowledge representation G and the feature maps M before the predicted heatmaps.
// Inject the geometric knowledge into the feature maps M as they are highly correlated with the predicted heatmaps. Moreover, they share the same parameters on former layers and can influence the parameter learning of these layers.
– During the test stage, the external geometric knowledge module is removed, with only the backbone model retained.
// The heatmaps are directly produced by the backbone model that has learned external geometric knowledge.
- An auxiliary loss
L
G
L_G
LG to generate gradients enforcing the backbone model to learn the external knowledge:
L G = min ϕ I , P ∣ ∣ G − W × M ∣ ∣ 2 2 L_G=\min_{\phi_\mathcal{I,P}}||G-W\times M||_2^2 LG=minϕI,P∣∣G−W×M∣∣22,
// I \mathcal{I} I: the set of training images;
// P \mathcal{P} P: facial landmark annotations of the training set. - With the geometric knowledge introduced, the MSE loss is then reformulated as,
L S ′ = min ϕ I , P ∑ X ∈ I ∣ ∣ Φ ϕ I , P ( X , P ) − S ( X ) ∣ ∣ 2 2 L'_S=\min_{\phi_\mathcal{I,P}}\sum_{X\in\mathcal{I}}||\Phi_{\phi_\mathcal{I,P}}(X,P)-S(X)||_2^2 LS′=minϕI,P∑X∈I∣∣ΦϕI,P(X,P)−S(X)∣∣22. - The overall loss for joint training is a weighted combination of
L
G
L_G
LG and
L
H
′
L'_H
LH′,
L = min ϕ I , P ( λ × L G + ( 1 − λ ) × L S ′ ) L=\min_{\phi_\mathcal{I,P}}(\lambda\times L_G+(1-\lambda)\times L'_S) L=minϕI,P(λ×LG+(1−λ)×LS′).