3. Introduction
What is our goal?
Obtain a neutral face 3D model from an expressional face
3D model
How can we achieve this?
Learning how to infer the expression
Subtracting the expression
4/22/2012 Facial Expression Recognition/Removal 3
4. Motivations
3D Facial expression removal benefits…
Performance of 3D face recognition
Improve 3D gender classification methods
Analyzing complex expressions
Face synthesis
4/22/2012 Facial Expression Recognition/Removal 4
5. Background
This is probably the first attempt in 3D removal…
Comparing it to 3D face synthesis as its opposite process
Interpolation-based
Muscle-based
Example-based
4/22/2012 Facial Expression Recognition/Removal 5
7. Alignment
We need to adapt the input to a generic 3D model
Why?
Input faces are irregular and posture-variant
They would be difficult to map
Input = A cloud of points
Generic model = Triangle mesh
How can we obtain a normalized mesh?
Fitting the cloud of points to a generic model
4/22/2012 Facial Expression Recognition/Removal 7
8. Alignment – 1st step
Landmark-constrained Rigid Adjustment
We adjust the posture of O towards G
Landmarks to constrain the fitting
Iterative Closest Point
Creating pairs between both sets
Original Generic
For each point xi ∈ PO
If xi ∈ LO Model O G
Find corresponding landmark yi ∈ LG Point set PO PG
Else
Find nearest point yi ∈ PG Landmark set LO LG
4/22/2012 Facial Expression Recognition/Removal 8
9. Alignment – 2nd step
Energy-based Generic Model Adaptation
The generic mesh G is deformed to wrap O
It is a energy minimization problem
First, we have to explain these two energy measures
Eg = Geometric Error
Measures the quality of the wrapping
Es = Smooth Error
Measures the smoothness of the process
4/22/2012 Facial Expression Recognition/Removal 9
10. Alignment – 2nd step
Geometric error is measured:
δ is the weight of landmarks
xi ∈ PO yi ∈ PG
ti denotes the offset of yi and its pair xi
It will be calculated by minimizing the total energy function
Landmarks
Rest of the points
4/22/2012 Facial Expression Recognition/Removal 10
11. Alignment – 2nd step
Smoth error is measured:
N(i) is the 1-ring neighbor at point i
ti and tj denote the offset of points i and j
Landmarks
Rest of the points
4/22/2012 Facial Expression Recognition/Removal 11
12. Alignment – 2nd step
The energy function
λ (0 ≤ λ ≤ 1) is used as a tradeoff between the errors
Taking in account both λ and δ, they define:
A tradeoff between time-consuming and accuracy
4/22/2012 Facial Expression Recognition/Removal 12
13. Alignment – 2nd step
Algorithm:
For each point yi ∈ PG
If yi ∉ LG
Find its nearest point xi ∈ PO
Else
Choose its corresponding point xi ∈ LO
For each yi ∈ PG
Calculate its offset ti by minimizing the energy function: E(λ,δ)
Update the point: yi = yi + ti
Compute the total root mean squared distance εk between PO and PG
If εk < threshold
Start again reducing value of λ and δ M is the
Else aligned
Obtain aligned 3D face: M = O 3D model
4/22/2012 Facial Expression Recognition/Removal 13
14. Training – Building spaces
Normal Space
Properties of facial
expressions
Expression Residue Space
Expression variations
compared with their neutral
faces
Each point on the spaces
stores one face sample
4/22/2012 Facial Expression Recognition/Removal 14
15. Training – Building spaces
Normal space
T represents the triangle set of M
n = (nx,ny,nz) is a normal vector
nj is the normal of a jth triangle on M
C represents the normal space
Is composed by all the normal vectors on T
4/22/2012 Facial Expression Recognition/Removal 15
16. Training – Building spaces
Expression residue space
How a facial expression is understood?
The difference between the expressional face and the neutral face
Δ(Mexpresional ,Mneutral)
This is stored as a combination of movements over each triangle on a
neutral face model
How each movement is encoded?
5-tuple:
azimuth angle
elevation angle
x translation
y translation
z translation
4/22/2012 Facial Expression Recognition/Removal 16
17. Training – Relationship model
We want to be able to:
Infer the expression given a expressional face
In order to do that we need:
A Relationship Model that maps normal space and
expression residue space.
This process is not trivial:
Dimension reduction of Normal Space
Inferring Expression Residue
4/22/2012 Facial Expression Recognition/Removal 17
18. Training – Relationship model
Dimension reduction of Normal Space
Normal Space contains redundant and noisy information
We will use Principal Component Analysis
ui represents the vector of the ith training sample
Cj represents the jth centralized geodesical coordinate
u1 u2 … uN S matrix
…
C1 C1 C1
C2 C2 C2
C3 C3 C3
U matrix
… … …
CK CK CK Covariance matrix
KxN
KxK
4/22/2012 Facial Expression Recognition/Removal 18
19. Training – Relationship model
Dimension reduction of Normal Space
Once we have the covariance matrix we perform Singular Value
Decomposition (SVD) to obtain:
Eigenvectors (v1, …, vN)
Eigenvalues (λ1, …, λN), sorted from highest to lowest
Selecting the most relevant eigenvectors
P is the set of eigenvectors selected (v1, …, vV)
ξ is a predefined threshold to avoid selecting too many eigenvectors
Finally, we get the reduced normal space
4/22/2012 Facial Expression Recognition/Removal 19
20. Training – Relationship model
Inference of Expressional Residue
RBF regression stands for Radial Basis Functions
They depend only on the distance from a point to the
center
4/22/2012 Facial Expression Recognition/Removal 20
21. Training – Relationship model
Inference of Expressional Residue
RBF Networks use radial basis functions as activation
functions
4/22/2012 Facial Expression Recognition/Removal 21
22. Training – Relationship model
Inference of Expressional Residue
RBF(1)
C1 sum(1) e1
C2 RBF(2)
sum(2) e2
sum(k) ek
Cn RBF(n)
Inputs: centralized geodesical coordinates of reduced normal space.
uiP = (C1, C2, …, Cn)
The intermediate nodes compute a RBF that relate Ci to its neighborhood
Outputs: value for each dimension of the expression space
The weights matrix will be computed by least squares method
4/22/2012 Facial Expression Recognition/Removal 22
23. Testing
Given an expressional face…
Infer the expression residue
Subtract the expression residue
Reconstruct the face
Obtaing the neutral face
Mathematical expression
Mneu = Mexp – Δ(Mexp ,Mneu) M is the
aligned
3D model
4/22/2012 Facial Expression Recognition/Removal 23
24. Testing - Infering
Being Cexp the normal representation of Mexp
Let Φ(Cexp) be the result of RBF network to the new input
Cexp
Φ(Cexp) is the inference of Δ(Mexp ,Mneu)
Δ(Mexp ,Mneu) ≅ Φ(Cexp)
Final mathematical expression
Mneu = Mexp – Φ(Cexp)
4/22/2012 Facial Expression Recognition/Removal 24
25. Testing - Reconstruction
Having inferred the expression residue:
We have a set of movements for each triangle on Mexp
Applying them causes the mesh to be deformed
Poisson-based reconstruction
4/22/2012 Facial Expression Recognition/Removal 25
26. Experiments
BU-3DFED (Binghamton University 3D Facial Expression
Database)
44 males 56 females
Each made 6 different expressions and 1 neutral face
Each expression had 4 levels of intensity
Total number of face models = 700
4/22/2012 Facial Expression Recognition/Removal 26
27. Experiments
The RMS (root mean square) is used to measure the
performance between the two neutral face models
Xi is a point on X and Yi is a point of Y which is the nearest
to Xi
4/22/2012 Facial Expression Recognition/Removal 27
34. Introduction
Expressions are dynamic
Easier to recognize them by video than static images
4/22/2012 Facial Expression Recognition/Removal 34
35. Haar-like features
Our “experts” from face detection
Binary patterns that are convoluted with the images
producing a single value result
Each frame has many important haar-features
4/22/2012 Facial Expression Recognition/Removal 35
36. Clustering Temporal Patterns
5 stages of an expression will be considered
A clustering method will be used to classify the haar-
features into the 5 stages
4/22/2012 Facial Expression Recognition/Removal 36
37. Clustering Temporal Patterns
K-Means
N → number of clusters
N random vectors will be initialized, representing the center of
the clusters
For each point in the database:
Which is the closest vector to me?
That's the cluster I belong to!
Recalculate cluster descriptor vectors: they must represent the
mass-center of the points in the cluster
Repeat until there's no more changes
4/22/2012 Facial Expression Recognition/Removal 37
39. Building our Experts
For representation purpose, a five-dimension vector is
used for each haar-feature
*0 0 0 1 0+ → the haar-feature belongs to the forth stage
(middle+)
4/22/2012 Facial Expression Recognition/Removal 39
40. Building our Experts
A normalized histogram is calculated, considering all the
features in the sequence
Ex for 7 features: [0 0 1/7 2/7 4/7]
4/22/2012 Facial Expression Recognition/Removal 40
41. Building our Experts
We will convert the binary vector to decimal
[ 0/7 0/7 1/7 2/7 4/7 ]
= [ 1 2 4 8 16 ]
= 0 + 0 + 4/7 + 16/7 + 64/7 = 84/7 = 12
4/22/2012 Facial Expression Recognition/Removal 41
42. Building our Experts
An one-against-all approach is used
“Is it a happy expression or not?”
Other moods will work as negative examples
4/22/2012 Facial Expression Recognition/Removal 42
43. Building our Experts
After repeating the clustering and summarizing process for all
examples in database, we can produce a histogram of YES/NO
to each expressions
A threshold will define if a face represent that expression or
not
4/22/2012 Facial Expression Recognition/Removal 43
44. Building our Experts
That is one weak classifier
The final strong classifier is build by Adaboost
4/22/2012 Facial Expression Recognition/Removal 44
45. Testing
For a new sequence:
Calculate the haar-features
Cluster into stages
Summarize (output a decimal)
Compare this value with the threshold of each expression
4/22/2012 Facial Expression Recognition/Removal 45
46. Experiments
Cohn-Kanade faces database
100 students, from 18 to 30
65% woman, 35% man
15% african-american, 5% asian or latin
Each performed 23 poses, including prototypical expressions
In this work, they used 90 of those expressions (60 for training,
30 for testing)
Experiments were made with sequences of 7 and 9 frames
4/22/2012 Facial Expression Recognition/Removal 46
51. Poisson-based reconstruction
We paste all the triangles together solving:
AU = b
Being:
U the coordinates of the deformed mesh
b the divergence of the gradient fields modified
A a sparse matrix defined as:
4/22/2012 Facial Expression Recognition/Removal 51