GmFace: A Mathematical Model for Face Image Representation Using Multi-Gaussian

Title:Continuous learning of face attribute synthesis

1. Summary

Supported by the theory that finite multi-Gaussian functions can approximate any non-negative integrable functions on a real number with arbitary accuracy. Author set the Gaussian function as a neuro units, constructing the liearn fully connected layer as model to fit the surface data. In the model, making the face image to learn the “average face” and controlling the parameters to adjust the image scale, rotation, translation. Finally, get the best epoch of GmNet

2. Research Objective(s)

Author proposed a mathmetical representation of GmFacethe human facce to understand the obejective world. 2 dimensional Gaussian fuction which provides symmetric bell surface with shape that can contralled by parameters
在这里插入图片描述
GmFace model 求解其实就是最优化GmNet(using Gaussian fucntions as neurons)

face modeling process:

  1. GmNet initialization
  2. feeding GmNet with face images
  3. training GmNet until convergence
  4. drawing out the parameters of GmNet
  5. recording the face model GmFace

key word: multi-Gaussian function

3. Problem Statement

3.1 问题陈述:

In face attribute synthesis task, GAN have limited effects on the expansion of new attributes.

However, the spatial coordinates and relations between pixel intensities and spatial coordinates are often ignored.

4. Method(s)

4.1 前人的方法?

  • traditional methods
    face images are described by a number of digital features extracted from the local or holistic region.
    利用图像特征提取方法:(local binary pattern , Gabor wavelet kernel, SIFT, HOG)提取特征之后,利用PCA个LDA进行降维,get the feature vector from high-dimensional pixel space to subspace by soling the defined objective function optimally.

  • deeplearning methods

4.2 作者解决问题的方法/算法是什么?

4.2.1 Face Image Model: GmFace
4.2.1.1 理论基础
  1. the advantages of two-dimensional Gaussian function which provides a symmetric bell surface with a shape that can be controlled by parameters(控参方便)
  2. Gaussian function is a complete set on L 2 ( R n ) L^2(R^n) L2(Rn) means that the finite multi-Gaussian functions can approximate any non-negative integrable functions on a real number with arbitary accuracy.(良好的拟合性能,说白了就是:多个多元高斯函数可以在实数域上通过线性加权的方式,以任意的精度逼近一个可积分的非负函数

4.2.1.2 实际应用

GmFace的理论基础写成表达式为:
f ^ ( x ) = ∑ i = 1 m w i G i ( x , θ i ) (1) \hat{f}(\mathbf{x})=\sum_{i=1}^{m}w_iG_i(\mathbf{x}, \theta_i) \tag{1} f^(x)=i=1mwiGi(x,θi)(1)
其中 w i w_i wi代表第 i i i个多元高斯函数的权重, x \mathbf{x} x是input, G i G_i Gi是第 i i i个多元高斯函数, θ i \theta_i θi代表第 i i i个高斯函数的参数, m m m代表高斯函数的数量。
那么在人脸的图像上(图片上的某个像素的空间坐标可以表示为 x 1 , x 2 x_1, x_2 x1,x2),就可以表示为:
G m F a c e ( x 1 , x 2 ) = ∑ i = 1 m w i G i ( x 1 , x 2 ∣ μ i , A i ) G m F a c e ( x ) = ∑ i = 1 m w i G i ( x ∣ μ i , A i ) (2) \begin{aligned} GmFace(x_1, x_2)&=\sum_{i=1}^{m}w_iG_i(x_1, x_2| \mathbf{\mu_i}, \mathbf{A}_i) \\ GmFace(\bold{x})&=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{A}_i) \\ \tag{2} \end{aligned} GmFace(x1,x2)GmFace(x)=i=1mwiGi(x1,x2μi,Ai)=i=1mwiGi(xμi,Ai)(2)
在这里,作者把 A \bold{A} A定义为一个positive-define symmetric matrix,在GmFace中被称为precision matrix,也就是说, A \bold{A} A协方差阵的逆 x = [ x 1   x 2 ] T \bold{x}=[x_1\ x_2]^T x=[x1 x2]T代表像素的坐标, μ = [ μ 1   μ 2 ] T \bold{\mu}=[\mu_1\ \mu_2]^T μ=[μ1 μ2]T代表Gassian的中点。
G m F a c e ( x ) = ∑ i = 1 m w i exp ⁡ { − ( x − μ ) T A ( x − μ ) } ) (3) \begin{aligned} GmFace(\bold{x})&=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{A}(\bold{x-\mu})\}) \\ \tag{3} \end{aligned} GmFace(x)=i=1mwiexp{(xμ)TA(xμ)})(3)

Q1: A \bold{A} A为啥是协方差阵的逆
因为正常的多元高斯函数的表达式为1
在这里插入图片描述

4.2.2 Multi-Gaussian Network: GmNet
4.2.2.1 理论基础
  1. pixel points of face are huge in size and the parameter estimations for GmFace is the large amount of calculatioin required, the best way to handle it is using nueral network.
4.2.2.2 实际应用
4.2.2.2.1 神经元结构

把GmFace函数当成一个神经元,构建了一个简单的三层神经网络,称之为GmNet.
在这里插入图片描述

  • Input: 2D surface data of face images
  • hidden layer is a group of Gaussian modules truncated to a region bounded by the image size (被边界点截断是什么意思?代表hidden units的数量取决于图像大小吗?网络结构宽而浅?)
  • output layer: GmFace的value值

因为 A \bold{A} A被作者定义成了一个正定的矩阵,根据正定矩阵的性质,可以进行三角分解,表达为 A = L L T \bold{A}=\bold{L}\bold{L}^T A=LLT

推导
因为对于任意的 n n n方阵 A \bold{A} A,存在 L L L是单位下三角矩阵, U U U是上三角矩阵,使得 A = L U \bold{A}=\bold{L}\bold{U} A=LU
所以:
A T = A ( L 1 U ) T = L 1 U U T ( L 1 ) T = L 1 U ( D − 1 U ) T D ( L 1 ) T = L 1 D ( D − 1 U ) ( L 1 ) T ( D − 1 U ) − 1 = D − 1 ( U − 1 D ) T L 1 D 因 为 : D − 1 U = ( L 1 ) T 所 以 : A = L 1 D ( L 1 ) T A = L 1 D 1 / 2 D 1 / 2 ( L 1 ) T A = ( L 1 D 1 / 2 ) ( L 1 D 1 / 2 ) T A^T=A\\ (L_1U)^T=L_1U\\ U^T(L_1)^T=L_1U\\ (D^{-1}U)^TD(L_1)^T=L_1D(D^{-1}U)\\ (L_1)^T(D^{-1}U)^{-1}=D^{-1}(U^{-1}D)^TL_1D\\ 因为:D^{-1}U=(L_1)^T\\ 所以:A=L_1D(L_1)^T \\A=L_1D^{1/2}D^{1/2}(L_1)^T\\ A=(L_1D^{1/2})(L_1D^{1/2})^T AT=A(L1U)T=L1UUT(L1)T=L1U(D1U)TD(L1)T=L1D(D1U)(L1)T(D1U)1=D1(U1D)TL1DD1U=(L1)TA=L1D(L1)TA=L1D1/2D1/2(L1)TA=(L1D1/2)(L1D1/2)T

在上述推导过程中,可知 L = L 1 D 1 / 2 L=L_1D^{1/2} L=L1D1/2是一个下三角的严格正的实矩阵。所以公式 3 可以写成下面的形式
G m F a c e ( x ) = ∑ i = 1 m w i G i ( x ∣ μ i , A i ) G m F a c e ( x ) = ∑ i = 1 m w i exp ⁡ { − ( x − μ ) T A ( x − μ ) } ) = ∑ i = 1 m w i G i ( x ∣ μ i , L i ) = ∑ i = 1 m w i exp ⁡ { − ( x − μ ) T L L T ( x − μ ) } ) (4) \begin{aligned} GmFace(\bold{x})&=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{A}_i) \\ GmFace(\bold{x})&=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{A}(\bold{x-\mu})\}) \\ &=\sum_{i=1}^{m}w_iG_i(\bold{x}| \mathbf{\mu_i}, \mathbf{L}_i)\\ &=\sum_{i=1}^{m}w_i\exp\{-(\bold{x-\mu})^T\bold{L}\bold{L}^T(\bold{x-\mu})\}) \\ \tag{4} \end{aligned} GmFace(x)GmFace(x)=i=1mwiGi(xμi,Ai)=i=1mwiexp{(xμ)TA(xμ)})=i=1mwiGi(xμi,Li)=i=1mwiexp{(xμ)TLLT(xμ)})(4)

进一步的,把输入 x \bold{x} x正则化,令
x = [ x 1 x 2 ] = [ r H c W ] \bold{x}= \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} =\begin{bmatrix} \frac{r}{H}\\ \frac{c}{W}\\ \end{bmatrix} x=[x1x2]=[HrWc]
其中, r r r c c c是某个像素的行索引和列索引, W W W H H H是图像的宽度和高度。

总的来说,GmNet就是一个由 m m m个group组成的高斯函数,一个group由 μ \bold{\mu} μ L \bold{L} L进行表达。在整个GmNet中, w i w_i wi μ i \mu_i μi L i \bold{L}_i Li就是我们需要用反向传播算法优化的目标。

4.2.2.2.2 参数优化

通过链式法则,分别对 w i w_i wi μ i \mu_i μi L i \bold{L}_i Li求导,可得:
在这里插入图片描述

4.2.2.2.3 损失函数

这里使用了 L 2 L_2 L2范数(MSE)和无穷范数(proposed loss function peak absolute error)
在这里插入图片描述
总的loss函数可以表示为,其中的 α \alpha α是一个平衡因子:
在这里插入图片描述
在这里插入图片描述
上式中, N N N代表训练集的数量

Q3: 这里的 f ( x ) f(\bold{x}) f(x)代表啥?真实的图像吗
The 2D surface of the specific face image is considered the learning target and the GmNet is established to optimize the parameters.
Q4:无穷范式做求导?如果是 L 1 L_1 L1范式求导可以用次微分,但是无穷范式的倒数为0啊?

4.2.2.3 Personal Face Modeling

Each personal face image can be regarded as a point in the space which needs to further express the individual characteristics based on common face features
大致意思是说,每一张脸,其实都是一个common face + individual characteristics

4.2.2.4 Face Image Transformation through GmFace
4.2.2.4.1 Image Translation

通过调节高斯函数的均值实现 Image translation
在这里插入图片描述

4.2.2.4.2 Image Scaling

在这里插入图片描述

4.2.2.4.3 Image Rotation

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

4.3 code

4.3.1 数据集

验证集: 1040 frontal normal images from Chinese face database CAS-PEAL-R1

预处理:the face image samples were cropped and adjusted to pixels in the preprocessing, The image size was 120×120 pixels and a selection of data examples are provided in Fig. 4.
在这里插入图片描述
参数量:
在这里插入图片描述
m m m represents the quantity of Gaussian components in multi-Gaussian function
6 is the parameter size of each 2D Gaussian component( μ \mu μ 2个参数, A \bold{A} A 3个参数:正定矩阵;权重系数 m m m

batch_size=256

5.Evaluation

  1. parametric solution for GmFace model is not unique

  2. The common face model fitted by GmFace can be observed to provide the same visual effect as the average face.

  3. 评价方法: MSE between two face images was calculated(gray values are normalized to [0, 1])

  4. 对 face进一步的dissect:每一个Gaussian unit peak value is 1, 选择权重最大的k个Gaussian components,生成的结果如下图所示。
    在这里插入图片描述
    也就是说,每一个Gaussian组件其实对应人脸中的一个组件,这些个组件(眼睛、鼻孔。。。)是由参数为 L i \bold{L}_i Li μ \bold{\mu} μ的Gaussian函数表示的,虽然很难看出GmNet的学习顺序(眼睛、鼻子)但是可以确定的是,参数 w i w_i wi的值对于protray的生成非常重要。

  5. 在最优情况(80个Gaussian components)中, w i w_i wi最大的40个用于生成face的大致全貌,后面40个act as local regulators.

  6. 作者放出这些图,是为了证明4.2.2.4中的结论
    在这里插入图片描述

6. Conclusion

  1. strong conclusions
    GmFace is not the simplest model for face representation, but it has taken the first step towards this goal.
  2. weak conclusions
    The other study is to explore the simplest face model by replacing the multiGaussian function in GmFace with other elementary functions, such as exponential, trigonometric, logarithmic or composite functions.

6. Notes(optional)

不在以上列表中,但需要特别记录的笔记。

7. References(optional)


  1. 从0开始推到多元高斯分布 ↩︎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值