GmFace: A Mathematical Model for Face Image Representation Using Multi-Gaussian

最新推荐文章于 2021-04-15 19:34:15 发布

EntropyPlus

最新推荐文章于 2021-04-15 19:34:15 发布

阅读量244

点赞数

分类专栏：文献阅读计算机视觉模型解释

本文链接：https://blog.csdn.net/u012759262/article/details/110480394

版权

计算机视觉同时被 3 个专栏收录

9 篇文章 3 订阅

订阅专栏

文献阅读

7 篇文章 1 订阅

订阅专栏

模型解释

1 篇文章 0 订阅

订阅专栏

Title：Continuous learning of face attribute synthesis

1. Summary

Supported by the theory that finite multi-Gaussian functions can approximate any non-negative integrable functions on a real number with arbitary accuracy. Author set the Gaussian function as a neuro units, constructing the liearn fully connected layer as model to fit the surface data. In the model, making the face image to learn the “average face” and controlling the parameters to adjust the image scale, rotation, translation. Finally, get the best epoch of GmNet

2. Research Objective(s)

Author proposed a mathmetical representation of GmFacethe human facce to understand the obejective world. 2 dimensional Gaussian fuction which provides symmetric bell surface with shape that can contralled by parameters
在这里插入图片描述
GmFace model 求解其实就是最优化GmNet（using Gaussian fucntions as neurons）

face modeling process:

GmNet initialization
feeding GmNet with face images
training GmNet until convergence
drawing out the parameters of GmNet
recording the face model GmFace

key word: multi-Gaussian function

3. Problem Statement

3.1 问题陈述：

In face attribute synthesis task, GAN have limited effects on the expansion of new attributes.

However, the spatial coordinates and relations between pixel intensities and spatial coordinates are often ignored.

4. Method(s)

4.1 前人的方法？

traditional methods
face images are described by a number of digital features extracted from the local or holistic region.
利用图像特征提取方法：(local binary pattern , Gabor wavelet kernel, SIFT, HOG)提取特征之后，利用PCA个LDA进行降维，get the feature vector from high-dimensional pixel space to subspace by soling the defined objective function optimally.
deeplearning methods

4.2 作者解决问题的方法/算法是什么？

4.2.1 Face Image Model: GmFace

4.2.1.1 理论基础

the advantages of two-dimensional Gaussian function which provides a symmetric bell surface with a shape that can be controlled by parameters（控参方便）
Gaussian function is a complete set on $L^2(R^n)$ means that the finite multi-Gaussian functions can approximate any non-negative integrable functions on a real number with arbitary accuracy.（良好的拟合性能，说白了就是：多个多元高斯函数可以在实数域上通过线性加权的方式，以任意的精度逼近一个可积分的非负函数）

4.2.1.2 实际应用

GmFace的理论基础写成表达式为：
$\hat{f}(\mathbf{x})=\sum_{i=1}^{m}w_iG_i(\mathbf{x}, \theta_i) \tag{1}$
其中 $w_i$ 代表第 $i$ 个多元高斯函数的权重， $\mathbf{x}$ 是input， $G_i$ 是第 $i$ 个多元高斯函数， $\theta_i$ 代表第 $i$ 个高斯函数的参数， $m$ 代表高斯函数的数量。
那么在人脸的图像上（图片上的某个像素的空间坐标可以表示为 $x_1, x_2$ ），就可以表示为：
$\begin{aligned} GmFace(x_1, x_2)&=\sum_{i=1}^{m}w_iG_i(x_1, x_2| \mathbf{\mu_i}, \mathbf{A}_i) \\ GmFace(\bold{x})&=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{A}_i) \\ \tag{2} \end{aligned}$
在这里，作者把 $\bold{A}$ 定义为一个positive-define symmetric matrix，在GmFace中被称为precision matrix，也就是说， $\bold{A}$ 是协方差阵的逆。 $\bold{x}=[x_1\ x_2]^T$ 代表像素的坐标， $\bold{\mu}=[\mu_1\ \mu_2]^T$ 代表Gassian的中点。
$\begin{aligned} GmFace(\bold{x})&=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{A}(\bold{x-\mu})\}) \\ \tag{3} \end{aligned}$

Q1: $\bold{A}$ 为啥是协方差阵的逆？
因为正常的多元高斯函数的表达式为¹：

4.2.2 Multi-Gaussian Network: GmNet

4.2.2.1 理论基础

pixel points of face are huge in size and the parameter estimations for GmFace is the large amount of calculatioin required, the best way to handle it is using nueral network.

4.2.2.2 实际应用

4.2.2.2.1 神经元结构

把GmFace函数当成一个神经元，构建了一个简单的三层神经网络，称之为GmNet.
在这里插入图片描述

Input: 2D surface data of face images
hidden layer is a group of Gaussian modules truncated to a region bounded by the image size (被边界点截断是什么意思？代表hidden units的数量取决于图像大小吗？网络结构宽而浅？)
output layer: GmFace的value值

因为 $\bold{A}$ 被作者定义成了一个正定的矩阵，根据正定矩阵的性质，可以进行三角分解，表达为 $\bold{A}=\bold{L}\bold{L}^T$

推导
因为对于任意的 $n$ 阶方阵 $\bold{A}$ ，存在 $L$ 是单位下三角矩阵， $U$ 是上三角矩阵，使得 $\bold{A}=\bold{L}\bold{U}$
所以：
$A^T=A\\ (L_1U)^T=L_1U\\ U^T(L_1)^T=L_1U\\ (D^{-1}U)^TD(L_1)^T=L_1D(D^{-1}U)\\ (L_1)^T(D^{-1}U)^{-1}=D^{-1}(U^{-1}D)^TL_1D\\ 因为：D^{-1}U=(L_1)^T\\ 所以：A=L_1D(L_1)^T \\A=L_1D^{1/2}D^{1/2}(L_1)^T\\ A=(L_1D^{1/2})(L_1D^{1/2})^T$

在上述推导过程中，可知 $L=L_1D^{1/2}$ 是一个下三角的严格正的实矩阵。所以公式 3 可以写成下面的形式
$\begin{aligned} GmFace(\bold{x})&=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{A}_i) \\ GmFace(\bold{x})&=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{A}(\bold{x-\mu})\}) \\ &=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{L}_i)\\ &=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{L}\bold{L}^T(\bold{x-\mu})\}) \\ \tag{4} \end{aligned}$

进一步的，把输入 $\bold{x}$ 正则化，令
$\bold{x}= \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} =\begin{bmatrix} \frac{r}{H}\\ \frac{c}{W}\\ \end{bmatrix}$
其中， $r$ 和 $c$ 是某个像素的行索引和列索引， $W$ 和 $H$ 是图像的宽度和高度。

总的来说，GmNet就是一个由 $m$ 个group组成的高斯函数，一个group由 $\bold{\mu}$ 和 $\bold{L}$ 进行表达。在整个GmNet中， $w_i$ ， $\mu_i$ 和 $\bold{L}_i$ 就是我们需要用反向传播算法优化的目标。

4.2.2.2.2 参数优化

通过链式法则，分别对 $w_i$ ， $\mu_i$ 和 $\bold{L}_i$ 求导，可得：
在这里插入图片描述

4.2.2.2.3 损失函数

这里使用了 $L_2$ 范数（MSE）和无穷范数（proposed loss function peak absolute error）
在这里插入图片描述
总的loss函数可以表示为，其中的 $\alpha$ 是一个平衡因子：

上式中， $N$ 代表训练集的数量

Q3: 这里的 $f(\bold{x})$ 代表啥？真实的图像吗
The 2D surface of the specific face image is considered the learning target and the GmNet is established to optimize the parameters.
Q4：无穷范式做求导？如果是 $L_1$ 范式求导可以用次微分，但是无穷范式的倒数为0啊？

4.2.2.3 Personal Face Modeling

Each personal face image can be regarded as a point in the space which needs to further express the individual characteristics based on common face features
大致意思是说，每一张脸，其实都是一个common face + individual characteristics

4.2.2.4 Face Image Transformation through GmFace

4.2.2.4.1 Image Translation

通过调节高斯函数的均值实现 Image translation
在这里插入图片描述

4.2.2.4.2 Image Scaling

在这里插入图片描述

4.2.2.4.3 Image Rotation

在这里插入图片描述

4.3 code

4.3.1 数据集

验证集： 1040 frontal normal images from Chinese face database CAS-PEAL-R1

预处理：the face image samples were cropped and adjusted to pixels in the preprocessing, The image size was 120×120 pixels and a selection of data examples are provided in Fig. 4.
在这里插入图片描述
参数量：

$m$ represents the quantity of Gaussian components in multi-Gaussian function
6 is the parameter size of each 2D Gaussian component（ $\mu$ 2个参数， $\bold{A}$ 3个参数：正定矩阵；权重系数 $m$ ）

batch_size=256

5.Evaluation

parametric solution for GmFace model is not unique
The common face model fitted by GmFace can be observed to provide the same visual effect as the average face.
评价方法： MSE between two face images was calculated(gray values are normalized to [0, 1])
对 face进一步的dissect：每一个Gaussian unit peak value is 1, 选择权重最大的k个Gaussian components，生成的结果如下图所示。

也就是说，每一个Gaussian组件其实对应人脸中的一个组件，这些个组件（眼睛、鼻孔。。。）是由参数为 $\bold{L}_i$ 和 $\bold{\mu}$ 的Gaussian函数表示的，虽然很难看出GmNet的学习顺序（眼睛、鼻子）但是可以确定的是，参数 $w_i$ 的值对于protray的生成非常重要。
在最优情况（80个Gaussian components）中， $w_i$ 最大的40个用于生成face的大致全貌，后面40个act as local regulators.
作者放出这些图，是为了证明4.2.2.4中的结论

6. Conclusion

strong conclusions
GmFace is not the simplest model for face representation, but it has taken the first step towards this goal.
weak conclusions
The other study is to explore the simplest face model by replacing the multiGaussian function in GmFace with other elementary functions, such as exponential, trigonometric, logarithmic or composite functions.