Facial Expression Recognition / Removal

Facial Expression
Recognition/Removal
Robot Vision CAP4453

Alejandro Avilés
Rafael Dahis

CVPR 2010

Facial Expression Recognition/Removal 2

Introduction
 What is our goal?
 Obtain a neutral face 3D model from an expressional face
3D model
 How can we achieve this?
 Learning how to infer the expression
 Subtracting the expression

4/22/2012 Facial Expression Recognition/Removal 3

Motivations
 3D Facial expression removal benefits…

 Performance of 3D face recognition

 Improve 3D gender classification methods

 Analyzing complex expressions

 Face synthesis


Background
 This is probably the first attempt in 3D removal…
 Comparing it to 3D face synthesis as its opposite process

 Interpolation-based

 Muscle-based

 Example-based


Framework
 Steps
 Alignment

 Training
 Building spaces
 Learning

 Testing
 Subtract expression
 Reconstruction


Alignment
 We need to adapt the input to a generic 3D model

 Why?
 Input faces are irregular and posture-variant
 They would be difficult to map

 Input = A cloud of points
 Generic model = Triangle mesh

 How can we obtain a normalized mesh?
 Fitting the cloud of points to a generic model


Alignment – 1st step
 Landmark-constrained Rigid Adjustment
 We adjust the posture of O towards G
 Landmarks to constrain the fitting

 Iterative Closest Point
Creating pairs between both sets
Original Generic

For each point xi ∈ PO
If xi ∈ LO Model O G
Find corresponding landmark yi ∈ LG Point set PO PG
Else
Find nearest point yi ∈ PG Landmark set LO LG


Alignment – 2nd step
 Energy-based Generic Model Adaptation
 The generic mesh G is deformed to wrap O
 It is a energy minimization problem

 First, we have to explain these two energy measures
 Eg = Geometric Error
 Measures the quality of the wrapping
 Es = Smooth Error
 Measures the smoothness of the process


 Geometric error is measured:
 δ is the weight of landmarks
 xi ∈ PO yi ∈ PG
 ti denotes the offset of yi and its pair xi
 It will be calculated by minimizing the total energy function

Landmarks

Rest of the points


 Smoth error is measured:
 N(i) is the 1-ring neighbor at point i
 ti and tj denote the offset of points i and j

Landmarks

Rest of the points


 The energy function
 λ (0 ≤ λ ≤ 1) is used as a tradeoff between the errors

 Taking in account both λ and δ, they define:
 A tradeoff between time-consuming and accuracy


 Algorithm:
For each point yi ∈ PG
If yi ∉ LG
Find its nearest point xi ∈ PO
Else
Choose its corresponding point xi ∈ LO

For each yi ∈ PG
Calculate its offset ti by minimizing the energy function: E(λ,δ)
Update the point: yi = yi + ti

Compute the total root mean squared distance εk between PO and PG

If εk < threshold
Start again reducing value of λ and δ M is the
Else aligned
Obtain aligned 3D face: M = O 3D model


Training – Building spaces
 Normal Space
 Properties of facial
expressions

 Expression Residue Space
 Expression variations
compared with their neutral
faces

 Each point on the spaces
stores one face sample


 Normal space
 T represents the triangle set of M

 n = (nx,ny,nz) is a normal vector

 nj is the normal of a jth triangle on M

 C represents the normal space
 Is composed by all the normal vectors on T


 Expression residue space
 How a facial expression is understood?
 The difference between the expressional face and the neutral face
 Δ(Mexpresional ,Mneutral)

 This is stored as a combination of movements over each triangle on a
neutral face model

 How each movement is encoded?
 5-tuple:
 azimuth angle
 elevation angle
 x translation
 y translation
 z translation


Training – Relationship model
 We want to be able to:
 Infer the expression given a expressional face

 In order to do that we need:
 A Relationship Model that maps normal space and
expression residue space.

 This process is not trivial:
 Dimension reduction of Normal Space
 Inferring Expression Residue


 Normal Space contains redundant and noisy information
 We will use Principal Component Analysis
 ui represents the vector of the ith training sample
 Cj represents the jth centralized geodesical coordinate
u1 u2 … uN S matrix
…
C1 C1 C1
C2 C2 C2
C3 C3 C3
U matrix

… … …
CK CK CK Covariance matrix
KxN
KxK


 Once we have the covariance matrix we perform Singular Value
Decomposition (SVD) to obtain:
 Eigenvectors (v1, …, vN)
 Eigenvalues (λ1, …, λN), sorted from highest to lowest

 Selecting the most relevant eigenvectors
 P is the set of eigenvectors selected (v1, …, vV)

 ξ is a predefined threshold to avoid selecting too many eigenvectors

 Finally, we get the reduced normal space



 Inference of Expressional Residue
 RBF regression stands for Radial Basis Functions
 They depend only on the distance from a point to the
center


 RBF Networks use radial basis functions as activation
functions


RBF(1)
C1 sum(1) e1
C2 RBF(2)
sum(2) e2

sum(k) ek
Cn RBF(n)
 Inputs: centralized geodesical coordinates of reduced normal space.
uiP = (C1, C2, …, Cn)
 The intermediate nodes compute a RBF that relate Ci to its neighborhood
 Outputs: value for each dimension of the expression space
 The weights matrix will be computed by least squares method


Testing
 Given an expressional face…
 Infer the expression residue
 Subtract the expression residue
 Reconstruct the face
 Obtaing the neutral face

 Mathematical expression
 Mneu = Mexp – Δ(Mexp ,Mneu) M is the
aligned
3D model


Testing - Infering
 Being Cexp the normal representation of Mexp
 Let Φ(Cexp) be the result of RBF network to the new input
Cexp

 Φ(Cexp) is the inference of Δ(Mexp ,Mneu)
 Δ(Mexp ,Mneu) ≅ Φ(Cexp)

 Final mathematical expression
 Mneu = Mexp – Φ(Cexp)


Testing - Reconstruction
 Having inferred the expression residue:
 We have a set of movements for each triangle on Mexp
 Applying them causes the mesh to be deformed

Poisson-based reconstruction


Experiments
 BU-3DFED (Binghamton University 3D Facial Expression
Database)

 44 males 56 females
 Each made 6 different expressions and 1 neutral face
 Each expression had 4 levels of intensity

 Total number of face models = 700


Experiments
 The RMS (root mean square) is used to measure the
performance between the two neutral face models

 Xi is a point on X and Yi is a point of Y which is the nearest
to Xi

Experiments


Experiments - Results
 Anger

Expressional face
Resulting neutral face
of input
True neutral face


 Disgust

 Fear


 Happiness

 Sadness


 Surprise


CVPR 2008


Introduction
 Expressions are dynamic
 Easier to recognize them by video than static images


Haar-like features
 Our “experts” from face detection
 Binary patterns that are convoluted with the images
producing a single value result
 Each frame has many important haar-features


Clustering Temporal Patterns
 5 stages of an expression will be considered

 A clustering method will be used to classify the haar-
features into the 5 stages


 K-Means
 N → number of clusters
 N random vectors will be initialized, representing the center of
the clusters

 For each point in the database:
 Which is the closest vector to me?
 That's the cluster I belong to!

 Recalculate cluster descriptor vectors: they must represent the
mass-center of the points in the cluster
 Repeat until there's no more changes


Building our Experts
 For representation purpose, a five-dimension vector is
used for each haar-feature
 *0 0 0 1 0+ → the haar-feature belongs to the forth stage
(middle+)


 A normalized histogram is calculated, considering all the
features in the sequence
 Ex for 7 features: [0 0 1/7 2/7 4/7]


 We will convert the binary vector to decimal
[ 0/7 0/7 1/7 2/7 4/7 ]
= [ 1 2 4 8 16 ]
= 0 + 0 + 4/7 + 16/7 + 64/7 = 84/7 = 12


 An one-against-all approach is used
 “Is it a happy expression or not?”
 Other moods will work as negative examples


 After repeating the clustering and summarizing process for all
examples in database, we can produce a histogram of YES/NO
to each expressions

 A threshold will define if a face represent that expression or
not


 That is one weak classifier
 The final strong classifier is build by Adaboost


Testing
 For a new sequence:
 Calculate the haar-features
 Cluster into stages
 Summarize (output a decimal)
 Compare this value with the threshold of each expression


Experiments
 Cohn-Kanade faces database
 100 students, from 18 to 30
 65% woman, 35% man
 15% african-american, 5% asian or latin
 Each performed 23 poses, including prototypical expressions

 In this work, they used 90 of those expressions (60 for training,
30 for testing)
 Experiments were made with sequences of 7 and 9 frames


Experiments
 Compared with DBP (another method)
 ROC curve
 True positive rate x false positive rate (graph)


Experiments
 The value for comparison is the “area under the ROC curve”


Experiments


Poisson-based reconstruction
 We paste all the triangles together solving:
 AU = b

 Being:
 U the coordinates of the deformed mesh
 b the divergence of the gradient fields modified
 A a sparse matrix defined as:


RBF Regression


Facial Expression Recognition / Removal

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Facial Expression Recognition / Removal

Ähnlich wie Facial Expression Recognition / Removal (20)

Mehr von Rafael Dahis

Mehr von Rafael Dahis (13)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Facial Expression Recognition / Removal