论文阅读:【AU Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution】

论文阅读:【AU Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution】

Summary

A new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps.

  • Heatmap regression-based network: feature maps preserve rich semantic information associated with AU intensities and locations.
  • GCNN: describes the intrinsic relationship between various vertex nodes of the graph by learning an adjacency matrix, to explore the relationships among multiple feature maps.
  • Semantic correspondence convolution module (SCC): automatically learns the semantic relationships among feature maps to discove the latent co-occurrence relationships of AU intensities.

Key Contributions:

  1. leverage the semantic correspondence for modeling the implicit co-occurrence relations of AU intensity levels in a heatmap regression framework, where the feature maps encode rich semantic descriptions and spatial distributions of AUs.
  2. SCC module to dynamically compute the correspondences among feature maps layer by layer.

在这里插入图片描述

Heatmap Regression

  • In Stream 1:
    Each deconvolutional layer is followed with an SCC module that models the relationship among multiple feature maps at this specific resolution level.
  • In Stream 2:
    The ground-truth possibility heatmap g i ( x ) g_i(x) gi(x) for a predefined AU location L i ( i = 1 , . . . , N ) L_i (i = {1, . . . , N}) Li(i=1,...,N) is generated by applying a Gaussian function centered on its corresponding coordinate x ^ i \hat{x}_i x^i,
    g i ( x ) = I 2 π σ 2 e x p ( − ∣ ∣ x − x ^ i ∣ ∣ 2 2 2 σ 2 ) g_i(x)=\frac{I}{2\pi {\sigma}^2}exp(-\frac{{||x-\hat{x}_i||}_2^2}{2{\sigma}^2}) gi(x)=2πσ2Iexp(2σ2xx^i22),
    // I: the labeled intensity of the specific AU;
    // σ \sigma σ: the standard deviation.
  • Utilize the L2 distance to minimize the difference between h i ( x ; w , b ) h_i(x;w,b) hi(x;w,b) (the predicted heatmap) and g i ( x ) g_i(x) gi(x), then calculate the MSE loss.

SCC: Semantic Correspondence Convolution

  • Aiming to model the correlation among feature channels, where each channel encodes a specific visual pattern of AU. The feature channels with similar semantic patterns would be activated simultaneously when a specific co-occurrence pattern of AU intensities emerges.
  • In SCC module:
    – first construct the k-NN graph by grouping sets of closest feature maps to find different co-occurrence patterns;
    – then apply the convolution operations on the edges that connect feature maps sharing similar semantic patterns to further exploit the edge information of the graph;
    – afterwards, the aggregation function, i.e., MAX, is applied to summarize the most discriminative features for improving AU intensity estimation.

Graph Construction

  • The feature maps set is denoted by F = { f 1 , f 2 , . . . , f n } ⊆ R F=\{ f_1,f_2,...,f_n\}\subseteq \mathbb R F={f1,f2,...,fn}R, and the size of each feature map (channel) is given by M×M.
    Rearrange the M×M feature map in a feature vector with the length of L=M×M.
    Construct the graph G as the k-nearest neighbor (k-NN) graph of F, and each node represents a specific feature map.
  • The edge feature is defined by e i j = h Θ ( f i , f j ) e_{ij}=h_{\Theta}(f_i,f_j) eij=hΘ(fi,fj), where h Θ : R L × R L → R L ′ h_{\Theta}:{\mathbb R}^L\times{\mathbb R}^L\rightarrow{\mathbb R}^{L'} hΘ:RL×RLRL is a nonlinear function with trainable parameters Θ.
    Combine the global information encoded by f i f_i fi, with its local neighborhood characteristics, captured by f j − f i f_j-f_i fjfi.
    The edge feature function is formulated as, e i j k ′ = R e L U ( ϕ k ⋅ f i + ω k ⋅ ( f j − f i ) ) e'_{ijk}=ReLU(\phi_k\cdot f_i+\omega_k\cdot(f_j-f_i)) eijk=ReLU(ϕkfi+ωk(fjfi)),
    // Θ: ( ϕ 1 , . . . , ϕ K , ω 1 , . . . , ω K ) (\phi_1,...,\phi_K,\omega_1,...,\omega_K) (ϕ1,...,ϕK,ω1,...,ωK), where K is the number of filters.
  • For each f i f_i fi, the k-NN graph is built by computing a pairwise distance matrix (calculated based on the Euclidean distance) and then taking the closest k feature maps.
    Adopt a channel-wise aggregation function, i.e., MAX, to summarize the edge features, as it can capture the most salient features.
    The output of the SCC module at the i-th vertex is then produced by, f i k ′ = max ⁡ j : ( i , j ) ∈ E e i j k ′ f'_{ik}=\max\limits_{j:(i,j)\in E}{e'_{ijk}} fik=j:(i,j)Emaxeijk.

Dynamic Graph Update

The dynamic graph convolutions are performed on both low and high resolution feature maps, aiming to capture the high-order AU interactions.
The SCC module can be integrated into multiple convolutional layers, and learn to semantically group similar feature channels that would be activated together for a specific co-occurrence pattern of AU intensities.

Correspondence with AU Heatmaps

The predicted heatmap h i h_i hi for AU-i is computed as, h i = F L ⊗ W i L h_i=F^L\otimes W_i^L hi=FLWiL,
// ⊗ \otimes : the tensor product;
// F L = { f 1 L , . . . , f C L } F^L=\{f_1^L,...,f_C^L\} FL={f1L,...,fCL}: the feature maps set generated from the last SCC layer;
// W i L = { w 1 i L , . . . , w C i L } W_i^L=\{w_{1i}^L,...,w_{Ci}^L\} WiL={w1iL,...,wCiL}, (i=1, 2, …, N): the the 1×1 filter bank for a specific AU-i.

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值