# 3D Morphable Model Method

This note is a brief summary of the 3DMM paper A Morphable Model For The Synthesis Of 3D Faces.

Note: I have no idea why there is a “|” following each math environment and “\bold” has no effect, for clarity, see the original version 3D Morphable Model Method.

# Prerequisite

• 3D head lase scans
• full correspondence of faces (the method for acquiring this condition is described in the last section of the note.)

# Model Construction

The model construction process consists of two steps: compute correspondence and construct model. Notice that these are steps for TRAINING, when the model is constructed, we can apply this model to new faces and scans through matching algorithm.

## concept of morphable face model

A face has two major properties: geometry represented as shape-vector S=(X1,Y1,Z1,X2,...,Yn,Zn)T3n$S=(X_{1},Y_{1},Z_{1},X_{2},...,Y_{n},Z_{n})^{T}\in{\Re^{3n}}$ that contains the X,Y,Z$X,Y,Z$-coordinates of its n$n$ vertices; texture represented as texture-vector T=(R1,G1,B1,R2,...,Gn,Bn)T3n$T=(R_{1},G_{1},B_{1},R_{2},...,G_{n},B_{n})^{T}\in{\Re^{3n}}$ that contains the R,G,B$R,G,B$ color values of the n$n$ corresponding vertices. An arbitrary new shapes Smodel$S_{model}$ and new texture Tmodel$T_{model}$ can be expressed in linear combination of the m$m$ exemplar faces:

\boldSmod=i=1mai\boldSi,\boldTmod=i=1mbi\boldTi,i=1mai=i=1mbi=1.

Notice that this representation is based on exemplar faces, we actually use the PCA form in next section to perform reconstruction.

## model representation

The construction process can be described as a PCA procedure, i.e., use principle components(eigenvectors of convariance matrices of shape and texture) to represent the model:

Smodel=S¯+i=1m1αisi,Tmodel=T¯+i=1m1βiti,(1)

in equation(1), S¯,T¯$\bar{S},\bar{T}$ denotes the average shape and texture of the trained set, si,ti$s_{i},t_{i}$ are the eigenvectors of the covariance matrices( in descending order according to their eigenvalues), αi,βi$\alpha _{i},\beta _{i}$ are model coefficients.

To quantify the results in terms of the plausibility of being faces, the author fits a multivariate normal distribution to the data set of 200 faces, then the probability for coefficients α⃗ $\vec{\alpha}$ is given by

p(α⃗ )exp[12i=1m1(αi/σi)2],(2)

In addition, we can divide the faces into independent subregions that are morphed independently. In this paper, the author defines four subregions, by which the complete 3D face is generated by computing linear combinations for each segment simultaneously and blending them at the borders according to algorithm [1].

## face attributes

To map facial attributes(gender, fullness of faces, darkness of eyebrows, double chins, hooked and concave noses) defined by hand-labeled set of example faces to the parameter space of the morphable model, first define shape and texture vectors that will manipulate a specific attribute:

ΔS=i=1mμi(SiS¯),ΔT=i=1mμi(TiT¯).(3)

where μi$\mu_{i}$ are maually assigned labels describing the markedness of the attribute. According to the author, this is motivated by a performance based technique of facial expressions transferring. Multiples of (ΔS,ΔT)$(\Delta S,\Delta T)$ can now be added or subtracted from any face. But I’m confused about how to adjust the parameters for a specific attribute, e.g., smile.

# Application– Matching 3DMMs to images and 3D scans

Matching a morphable model to images is to optimize the coefficients of the 3D model(α⃗ ,β⃗ $\vec{\alpha},\vec{\beta}$) along with a set of rendering parameters(ρ⃗ $\vec{\rho}$) such that they produce an image as close as possible to the input image.

From parameters(α⃗ ,β⃗ ,ρ⃗ )$(\vec{\alpha},\vec{\beta},\vec{\rho})$,colored images

\boldImodel(x,y)=(Ir,mod(x,y),Ig,mod(x,y),Ib,mod(x,y))T(4)

are rendered using perspective projection and the Phong illumination model. To estimate the maximum posterior probability of P(α⃗ ,β⃗ ,ρ⃗ |\boldIinput)$P(\vec{\alpha},\vec{\beta},\vec{\rho}\;|\;\bold{I_{input}})$, we need to minimize is the Euclidean distance between the reconstructed image and the input image:

EI=x,y||\boldIinput(x,y)\boldImodel(x,y)||2

The optimazation of this cost function is the process of obtain the optimal parameters α,β,ρ$\alpha, \beta, \rho$. While according to the author, to avoid non-face-like surfaces leading to the same image, we need to impose some constraints to the solutions. This is achieved through the spanned space of shape and texture vectors, as well as a tradeoff between matching quality and prior probabilities, i.e., P(α),P(β),P(ρ)$P(\alpha),P(\beta),P(\rho)$. Thus this is transformed to a maximum posterior estimation problem.

p(α,β,ρ|\boldIinput)p(\boldIinput|α,β,ρ)P(α,β,ρ)

If we neglect correlations between some of the variables, the right-hand side is

p(\boldIinput|α,β,ρ)P(α)P(β)P(ρ)

In which, P(α),P(β)$P(\alpha),P(\beta)$ can be estimated by Eq (2), P(ρ)$P(\rho)$ is a normal distribution and use the starting values for ρ¯j$\bar{\rho}_{j}$ and a ad hoc values for σR,j$\sigma_{R,j}$. And p(\boldIinput|α,β,ρ)exp(12σ2IEI)$p(\bold{I}_{input}|\alpha ,\beta ,\rho)\sim exp(\frac{-1}{2\sigma _{I}^{2}}\cdot E_{I})$. In [2], the reason of the distribution is “For Gaussian pixel noise with a standard deviation σI$\sigma_{I}$, the likelihood of observing \boldIinput$\bold{I}_{input}$, given α,β,ρ$\alpha ,\beta ,\rho$, is a product of one-dimensional normal distributions, with one distribution for each pixel and each color channel.” I still cannot understand this sentence (expect some explanations from readers). Posterior probability is then maximized by minimizing

E=2logp(α,β,ρ|\boldIinput)

E=1σ2NEI+j=1m1α2jσ2S,j+j=1m1β2jσ2T,j+j(ρjρj¯)2σ2ρ,j(5)

In the process of optimization, we need to inference \boldImodel$\bold{I}_{model}$, then compute the cost function, and then find a new \boldImodel$\bold{I}_{model}$ value to make the cost smaller, then iterate the two steps.

The above are the procedure to map a 3D morphable model to images, in order to apply to scans, we just need to replace I(x,y)$I(x,y)$ to I(h,ϕ)$I(h,\phi)$,

I(h,ϕ)=(R(h,ϕ),G(h,ϕ),B(h,ϕ),r(h,ϕ))T.(6)

in which h,ϕ$h,\phi$ are vertical steps and angles in laser scan representation.

# Building morphable model without correspondence

All process stated above are based on the assumption that all exemplar faces are in full correspondence. This section will describe two algorithms for computing correspondence.

## 3D corresponding using optic flow

Optic flow is first proposed to estimate corresponding points in images I(x,y)$I(x,y)$, a gradient-based optic flow is modified for applying to 3D scans I(h,ϕ)$I(h,\phi)$, taking into account color and radius values simultaneously [3].

## Bootstrapping the model

Since optic flow does not incorporate any constraints on the set of solutions, it fails on some of the more unusual faces in the database. The modified bootstrapping method improve correspondence iteratively.

The process if as follows:
1. use optic flow to compute preliminary correspondences between faces and a reference face.
2. compute morphable models based on the correspondences and average faces as new reference face.
3. match the models to 3D scans, now we have original scans and approximated scans.
4. compute the correspondences between the two scans using optic flow.
5. iterate above steps.

### Reference

[1] P.J. Burt and E.H. Adelson. Merging images through pattern decomposition. In Applications of Digital Image Processing VIII, number 575, pages 173–181. SPIE The International Society for Optical Engeneering, 1985.

[2] Blanz,V.,&Vetter,T.(2003).Face recognition based on fitting a 3d morphable model. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(9), 1063–1074.

[3] T. Vetter and V. Blanz. Estimating coloured 3d face models from single images:An example based approach. In Burkhardt and Neumann, editors, ComputerVision – ECCV’98 Vol. II, Freiburg, Germany, 1998. Springer, Lecture Notes in Computer Science 1407.

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客