SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Learning a Multi-Center
Convolutional Network for
Unconstrained Face Alignment
Zhiwen Shao, Hengliang Zhu, Yangyang Hao,
Min Wang, and Lizhuang Ma
Shanghai Jiao Tong University
Introduction
Face Alignment
Detecting facial landmarks like pupil
centers, nose tip, mouth corners
Unconstrained scenarios including severe
occlusions and large face variations
Challenges
 Methods based on low-level handcrafted features have a
limited capacity to represent highly complex faces
Deep convolutional network
 A nonlinear regression problem, which transforms
appearance to shape
Motivation
Cascaded CNN [1], Zhou et al. [2], CFAN [3], and CDAN [4]
employ cascaded deep networks to refine predicted shapes
Previous Deep Learning Methods
time-consuming training processes
high model complexity
[1] Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point
detection,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2013, pp.
3476–3483.[2] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin, “Extensive facial landmark localization with
coarse-to-fine convolutional network cascade,” in IEEE International Conference on Computer
Vision Workshops. IEEE, 2013, pp. 386–391.
[3] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (cfan) for real-
time face alignment,” in European Conference on Computer Vision. Springer, 2014, pp. 1–16.
[4] R. Weng, J. Lu, Y.-P. Tan, and J. Zhou, “Learning cascaded deep auto-encoder networks for
face alignment,” IEEE Transactions on Multimedia, vol. 18, no. 10, pp. 2066–2078, 2016.
Multiple networks based
TCDCN [5] needs extra labels of facial attributes for
samples
one single network without auxiliary information
Previous Deep Learning Methods
limits the universality of this method
Single network based
[5] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Learning deep representation for face alignment with
auxiliary attributes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no.
5, pp. 918–930, 2016.
Methodology
Structural Correlations
Chin is occluded Right contour is invisible
 Unconstrained faces with partial occlusion and large pose
Landmarks in the same local region have similar properties
including occlusion and visibility
Face Partition
29 landmarks 68 landmarks
Partition of facial landmarks for different labeling patterns
Left eye, right eye, nose, mouth, left contour, chin, and
right contour
Network Architecture
 Shared layers
 Multiple center-specific shape prediction layers
Network Architecture
 Shared layers
• Eight convolutional layers and one fully-connected layer
• Each max-pooling layer follows a stack of two convolutional layers
Network Architecture
• Each cluster of facial landmarks is treated as a separate center
• Each layer estimates x and y coordinates of all n facial landmarks
• Focusing on the shape estimation of a specific face region
 Multiple center-specific shape prediction layers
Loss Function
^ ^
2 2 2
2 1 22 1 2
1
[( ) ( ) ]/ (2 )
n
j j jj j
j
E w f f f f d− −
=
= − + −∑
 Weighted inter-ocular distance normalized Euclidean loss
jw weight of the j-th landmark
ground truth coordinatesf predicted coordinates
^
f
d ground truth inter-ocular distance
the first center-specific layer:
larger weights for landmarks around the left eye
Network Architecture
Multi-Center Learning
Basic Model
Network Architecture
Multi-Center Learning
Reinforcement
for Each Center
Network Architecture
Multi-Center Learning
Combined Model
Weight Computation
 Multiple relationship
( ) ( )i c i m
P P
w wη=
( )i c
P set of center-specific landmarks
( )i m
P set of remaining minor landmarks
amplification factor
Different fine-tuning steps have different center-
specific and minor facial landmarks
 Consistent with the basic model
( ) ( )
( ) ( )
| | ( | |)i c i m
i c i c
P P
w P w n P n+ − =
| |× number of elements in a set
During the i-th fine-tuning step
( )
( )
( )
( )
/[( 1) | | ]
/[( 1) | | ]
i c
i m
i c
P
i c
P
w n P n
w n P n
η η
η
= − +
= − +
other centers with relatively small weights rather than
zeroutilize implicit structural correlations among different parts
landmarks from the same cluster have similar properties
share an identical weight
search the solution smoothly
Weight Computation
During the i-th fine-tuning step
Combined Model
high-level representation
( 1) 1
0 1( , , , ) ( 1024)T D
Dx x x D+ ×
= ∈ =x L ¡
weight matrix ( 1) 2
1 2 2( , , , ) D n
n
+ ×
= ∈W w w wL ¡
0 1( , , , ) , 1, ,2T
k k k Dkw w w k n= =w L L
^
2 12 1
^
22
T
jj
T
jj
f
f
−− =
=
w x
w x
weight matrix of the i-th center-specific layer
i
W
2 1 2 1
2 2
combined i
j j
combined i
j j
− −=
=
w w
w w
( )
1, , , i c
i m j P= ∈L
Combined Model
Combined Model S combined
Θ ∪ W
complexity is as same as the basic model
improves the location performance by
exploiting the advantage of each center-specific
solution
Our multi-center learning algorithm takes full advantage of each
stage and searches the optimal solution smoothly
Experiments
Datasets
COFW
occluded dataset in the wild
1345 training images
507 testing images
IBUG
large appearance variations
3148 training images
135 testing images
Evaluation Metric
 inter-ocular distance normalized mean error
 cumulative errors distribution (CED) curves
 failure rate
failure: mean error larger than 10%
Validation of Multi-Center Learning Algorithm
Method COFW IBUG
Mean Failure Mean Failure
Basic 6.26 3.16 9.23 33.33
Combined 6.08 2.96 8.87 25.93
Mean Error (%) and Failure Rate (%)
improve the accuracy and robustness
good performance of basic model
effectiveness of our network
reinforce the learning for each local face region
Validation of Multi-Center Learning Algorithm
Mean error for different clusters on COFW
Comparison with Other Methods
Method COFW IBUG
ESR 11.2 17.00
SDM 11.14 15.40
RCPR 8.5 17.26
CFAN - 16.78
LBF - 11.98
cGPRT - 11.03
CFSS - 9.98
TCDCN 8.05 8.60
CFT 6.33 10.06
Wu et al. 5.93 -
MCNet 6.08 8.87
Comparison with Other Methods
Comparison with Other Methods
COFW
Comparison with Other Methods
IBUG
Comparison with Other Methods
Deep model Speed (FPS) CPU
Cascaded CNN 5 single core, i5-6200U 2.3GHz
CFAN* 43 i7-3770 3.4GHz
CDAN* 50 i5 3.2GHz
TCDCN 50 single core, i5-6200U 2.3GHz
CFT 31 single core, i5-6200U 2.3GHz
MCNet 67 single core, i5-6200U 2.3GHz
Time of face detection is excluded
Conclusions
 We propose a novel multi-center convolutional network, which
exploits the representation power of each center
 We propose the reinforcement for each center to improve the
shape estimation precision of each facial part
 Comprehensive experiments demonstrate that our method
achieves real-time and competitive performance compared to
other state-of-the-art techniques
Code
 Matlab
https://github.com/ZhiwenShao/MCNet
 C++
https://github.com/ZhiwenShao/MCNet-Extension
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Extraction of texture features by using gabor filter in wheat crop disease de...
Extraction of texture features by using gabor filter in wheat crop disease de...Extraction of texture features by using gabor filter in wheat crop disease de...
Extraction of texture features by using gabor filter in wheat crop disease de...
eSAT Journals
 
Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...
IJECEIAES
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 

Was ist angesagt? (17)

A Review Paper on Stereo Vision Based Depth Estimation
A Review Paper on Stereo Vision Based Depth EstimationA Review Paper on Stereo Vision Based Depth Estimation
A Review Paper on Stereo Vision Based Depth Estimation
 
Ijetcas14 447
Ijetcas14 447Ijetcas14 447
Ijetcas14 447
 
Hierarchical Approach for Total Variation Digital Image Inpainting
Hierarchical Approach for Total Variation Digital Image InpaintingHierarchical Approach for Total Variation Digital Image Inpainting
Hierarchical Approach for Total Variation Digital Image Inpainting
 
Ijetr021113
Ijetr021113Ijetr021113
Ijetr021113
 
An Improved Hybrid Model for Molecular Image Denoising
An Improved Hybrid Model for Molecular Image DenoisingAn Improved Hybrid Model for Molecular Image Denoising
An Improved Hybrid Model for Molecular Image Denoising
 
Segmentation of medical images using metric topology – a region growing approach
Segmentation of medical images using metric topology – a region growing approachSegmentation of medical images using metric topology – a region growing approach
Segmentation of medical images using metric topology – a region growing approach
 
A010210106
A010210106A010210106
A010210106
 
Image restoration based on morphological operations
Image restoration based on morphological operationsImage restoration based on morphological operations
Image restoration based on morphological operations
 
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDFINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
 
Face skin color based recognition using local spectral and gray scale features
Face skin color based recognition using local spectral and gray scale featuresFace skin color based recognition using local spectral and gray scale features
Face skin color based recognition using local spectral and gray scale features
 
STUDY ANALYSIS ON TEETH SEGMENTATION USING LEVEL SET METHOD
STUDY ANALYSIS ON TEETH SEGMENTATION USING LEVEL SET METHODSTUDY ANALYSIS ON TEETH SEGMENTATION USING LEVEL SET METHOD
STUDY ANALYSIS ON TEETH SEGMENTATION USING LEVEL SET METHOD
 
Extraction of texture features by using gabor filter in wheat crop disease de...
Extraction of texture features by using gabor filter in wheat crop disease de...Extraction of texture features by using gabor filter in wheat crop disease de...
Extraction of texture features by using gabor filter in wheat crop disease de...
 
V.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLEV.KARTHIKEYAN PUBLISHED ARTICLE
V.KARTHIKEYAN PUBLISHED ARTICLE
 
Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...Development of stereo matching algorithm based on sum of absolute RGB color d...
Development of stereo matching algorithm based on sum of absolute RGB color d...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
Face Recognition Using Neural Network Based Fourier Gabor Filters & Random Pr...
Face Recognition Using Neural Network Based Fourier Gabor Filters & Random Pr...Face Recognition Using Neural Network Based Fourier Gabor Filters & Random Pr...
Face Recognition Using Neural Network Based Fourier Gabor Filters & Random Pr...
 

Ähnlich wie Learning a multi-center convolutional network for unconstrained face alignment

icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhang
Zhiding Yu
 

Ähnlich wie Learning a multi-center convolutional network for unconstrained face alignment (20)

LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
LITERATURE SURVEY ON SPARSE REPRESENTATION FOR NEURAL NETWORK BASED FACE DETE...
 
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
 
Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...Robust face recognition by applying partitioning around medoids over eigen fa...
Robust face recognition by applying partitioning around medoids over eigen fa...
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Model Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point CloudsModel Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point Clouds
 
Face Recognition System Using Local Ternary Pattern and Signed Number Multipl...
Face Recognition System Using Local Ternary Pattern and Signed Number Multipl...Face Recognition System Using Local Ternary Pattern and Signed Number Multipl...
Face Recognition System Using Local Ternary Pattern and Signed Number Multipl...
 
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
K-Medoids Clustering Using Partitioning Around Medoids for Performing Face Re...
 
Long-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep LearningLong-term Face Tracking in the Wild using Deep Learning
Long-term Face Tracking in the Wild using Deep Learning
 
Face recognition
Face recognitionFace recognition
Face recognition
 
icmi2015_ChaZhang
icmi2015_ChaZhangicmi2015_ChaZhang
icmi2015_ChaZhang
 
Web image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithmWeb image annotation by diffusion maps manifold learning algorithm
Web image annotation by diffusion maps manifold learning algorithm
 
Image Redundancy and Its Elimination
Image Redundancy and Its EliminationImage Redundancy and Its Elimination
Image Redundancy and Its Elimination
 
S0450598102
S0450598102S0450598102
S0450598102
 
Selective local binary pattern with convolutional neural network for facial ...
Selective local binary pattern with convolutional neural  network for facial ...Selective local binary pattern with convolutional neural  network for facial ...
Selective local binary pattern with convolutional neural network for facial ...
 
Realtime face matching and gender prediction based on deep learning
Realtime face matching and gender prediction based on deep learningRealtime face matching and gender prediction based on deep learning
Realtime face matching and gender prediction based on deep learning
 
A Robust & Fast Face Detection System
A Robust & Fast Face Detection SystemA Robust & Fast Face Detection System
A Robust & Fast Face Detection System
 
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
 
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
DeepFace: Closing the Gap to Human-Level Performance in Face VerificationDeepFace: Closing the Gap to Human-Level Performance in Face Verification
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
 
IRJET- Face Recognition by Additive Block based Feature Extraction
IRJET- Face Recognition by Additive Block based Feature ExtractionIRJET- Face Recognition by Additive Block based Feature Extraction
IRJET- Face Recognition by Additive Block based Feature Extraction
 
Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011Person re-identification, PhD Day 2011
Person re-identification, PhD Day 2011
 

Kürzlich hochgeladen

Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 

Kürzlich hochgeladen (20)

Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 

Learning a multi-center convolutional network for unconstrained face alignment

  • 1. Learning a Multi-Center Convolutional Network for Unconstrained Face Alignment Zhiwen Shao, Hengliang Zhu, Yangyang Hao, Min Wang, and Lizhuang Ma Shanghai Jiao Tong University
  • 3. Face Alignment Detecting facial landmarks like pupil centers, nose tip, mouth corners
  • 4. Unconstrained scenarios including severe occlusions and large face variations Challenges
  • 5.  Methods based on low-level handcrafted features have a limited capacity to represent highly complex faces Deep convolutional network  A nonlinear regression problem, which transforms appearance to shape Motivation
  • 6. Cascaded CNN [1], Zhou et al. [2], CFAN [3], and CDAN [4] employ cascaded deep networks to refine predicted shapes Previous Deep Learning Methods time-consuming training processes high model complexity [1] Y. Sun, X. Wang, and X. Tang, “Deep convolutional network cascade for facial point detection,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2013, pp. 3476–3483.[2] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin, “Extensive facial landmark localization with coarse-to-fine convolutional network cascade,” in IEEE International Conference on Computer Vision Workshops. IEEE, 2013, pp. 386–391. [3] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (cfan) for real- time face alignment,” in European Conference on Computer Vision. Springer, 2014, pp. 1–16. [4] R. Weng, J. Lu, Y.-P. Tan, and J. Zhou, “Learning cascaded deep auto-encoder networks for face alignment,” IEEE Transactions on Multimedia, vol. 18, no. 10, pp. 2066–2078, 2016. Multiple networks based
  • 7. TCDCN [5] needs extra labels of facial attributes for samples one single network without auxiliary information Previous Deep Learning Methods limits the universality of this method Single network based [5] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Learning deep representation for face alignment with auxiliary attributes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 5, pp. 918–930, 2016.
  • 9. Structural Correlations Chin is occluded Right contour is invisible  Unconstrained faces with partial occlusion and large pose Landmarks in the same local region have similar properties including occlusion and visibility
  • 10. Face Partition 29 landmarks 68 landmarks Partition of facial landmarks for different labeling patterns Left eye, right eye, nose, mouth, left contour, chin, and right contour
  • 11. Network Architecture  Shared layers  Multiple center-specific shape prediction layers
  • 12. Network Architecture  Shared layers • Eight convolutional layers and one fully-connected layer • Each max-pooling layer follows a stack of two convolutional layers
  • 13. Network Architecture • Each cluster of facial landmarks is treated as a separate center • Each layer estimates x and y coordinates of all n facial landmarks • Focusing on the shape estimation of a specific face region  Multiple center-specific shape prediction layers
  • 14. Loss Function ^ ^ 2 2 2 2 1 22 1 2 1 [( ) ( ) ]/ (2 ) n j j jj j j E w f f f f d− − = = − + −∑  Weighted inter-ocular distance normalized Euclidean loss jw weight of the j-th landmark ground truth coordinatesf predicted coordinates ^ f d ground truth inter-ocular distance the first center-specific layer: larger weights for landmarks around the left eye
  • 21. Weight Computation  Multiple relationship ( ) ( )i c i m P P w wη= ( )i c P set of center-specific landmarks ( )i m P set of remaining minor landmarks amplification factor Different fine-tuning steps have different center- specific and minor facial landmarks  Consistent with the basic model ( ) ( ) ( ) ( ) | | ( | |)i c i m i c i c P P w P w n P n+ − = | |× number of elements in a set During the i-th fine-tuning step
  • 22. ( ) ( ) ( ) ( ) /[( 1) | | ] /[( 1) | | ] i c i m i c P i c P w n P n w n P n η η η = − + = − + other centers with relatively small weights rather than zeroutilize implicit structural correlations among different parts landmarks from the same cluster have similar properties share an identical weight search the solution smoothly Weight Computation During the i-th fine-tuning step
  • 23. Combined Model high-level representation ( 1) 1 0 1( , , , ) ( 1024)T D Dx x x D+ × = ∈ =x L ¡ weight matrix ( 1) 2 1 2 2( , , , ) D n n + × = ∈W w w wL ¡ 0 1( , , , ) , 1, ,2T k k k Dkw w w k n= =w L L ^ 2 12 1 ^ 22 T jj T jj f f −− = = w x w x weight matrix of the i-th center-specific layer i W 2 1 2 1 2 2 combined i j j combined i j j − −= = w w w w ( ) 1, , , i c i m j P= ∈L
  • 24. Combined Model Combined Model S combined Θ ∪ W complexity is as same as the basic model improves the location performance by exploiting the advantage of each center-specific solution Our multi-center learning algorithm takes full advantage of each stage and searches the optimal solution smoothly
  • 26. Datasets COFW occluded dataset in the wild 1345 training images 507 testing images IBUG large appearance variations 3148 training images 135 testing images
  • 27. Evaluation Metric  inter-ocular distance normalized mean error  cumulative errors distribution (CED) curves  failure rate failure: mean error larger than 10%
  • 28. Validation of Multi-Center Learning Algorithm Method COFW IBUG Mean Failure Mean Failure Basic 6.26 3.16 9.23 33.33 Combined 6.08 2.96 8.87 25.93 Mean Error (%) and Failure Rate (%) improve the accuracy and robustness good performance of basic model effectiveness of our network reinforce the learning for each local face region
  • 29. Validation of Multi-Center Learning Algorithm Mean error for different clusters on COFW
  • 30. Comparison with Other Methods Method COFW IBUG ESR 11.2 17.00 SDM 11.14 15.40 RCPR 8.5 17.26 CFAN - 16.78 LBF - 11.98 cGPRT - 11.03 CFSS - 9.98 TCDCN 8.05 8.60 CFT 6.33 10.06 Wu et al. 5.93 - MCNet 6.08 8.87
  • 32. Comparison with Other Methods COFW
  • 33. Comparison with Other Methods IBUG
  • 34. Comparison with Other Methods Deep model Speed (FPS) CPU Cascaded CNN 5 single core, i5-6200U 2.3GHz CFAN* 43 i7-3770 3.4GHz CDAN* 50 i5 3.2GHz TCDCN 50 single core, i5-6200U 2.3GHz CFT 31 single core, i5-6200U 2.3GHz MCNet 67 single core, i5-6200U 2.3GHz Time of face detection is excluded
  • 35. Conclusions  We propose a novel multi-center convolutional network, which exploits the representation power of each center  We propose the reinforcement for each center to improve the shape estimation precision of each facial part  Comprehensive experiments demonstrate that our method achieves real-time and competitive performance compared to other state-of-the-art techniques

Hinweis der Redaktion

  1. Good morning, everyone. I am Zhiwen Shao. I come from Shanghai Jiao Tong University. In our paper, we propose a Multi-Center Convolutional Network to achieve face alignment.
  2. I first show the background of face alignment
  3. These images illustrate the results of face alignment.
  4. We can observe that these face images are very challenging. They have severe occlusions and large variations of pose, expression, illumination. Our goal is to develop an efficient method to handle unconstrained faces
  5. Face alignment can be regarded as a nonlinear regression problem, which transforms appearance to shape Most conventional methods are based on low-level handcrafted features, so they have a limited capacity to represent complex faces As we all know, a deep convolutional network has an outstanding representation ability. Therefore we use it to model the highly nonlinear function
  6. There are two types of deep learning methods. The first is multiple networks based. These methods employ cascaded deep networks to refine predicted shapes successively. Their training processes are complicated and time-consuming. And they have high computational cost and model complexity due to the use of multiple networks
  7. A very typical method is TCDCN. It trains only one deep network, but it needs extra labels of facial attributes for training samples. This limits the universality of this method. In contrast, our method uses one single network without auxiliary information
  8. Next I introduce our method in details
  9. Partial occlusion and large pose are main characteristics of unconstrained faces. We discover that each facial landmark is not isolated but highly correlated with adjacent landmarks. There are two examples. In the left figure, facial landmarks along the chin are all occluded. And the right figure shows that landmarks on the right side of the face are almost invisible. Therefore, landmarks in the same local face region have similar properties including occlusion and visibility.
  10. We analyze the structure of a face, and partition it into seven clusters: left eye, right eye, nose, mouth, left contour, chin, and right contour. As shown in these two figures, different labeling patterns of 29 and 68 facial landmarks are partitioned into 5 and 7 clusters respectively. Each cluster contains structurally relevant facial landmarks.
  11. This is the structure of our multi-center convolutional network. Our network consists of shared layers and multiple center-specific shape prediction layers.
  12. The shared layers contain eight convolutional layers and one fully-connected layer. Each max-pooling layer follows a stack of two convolutional layers The stack of convolutional layers is excellent in feature learning, which is proposed by VGGNet.
  13. According to the evaluation metric, we use weighted inter-ocular distance normalized Euclidean loss
  14. We first pre-train a basic model with shared layers and one shape prediction layer.
  15. Corresponding to Step 1
  16. We further fine-tune each center-specific layer respectively
  17. Corresponding to Step 2 to Step 6 Based on the pre-trained model, our network keeps shared layers and initializes each center-specific layer with the shape prediction parameters. There are m branches of center-specific layers at the end of our network. The fine-tuning of center-specific layers is mutually independent.
  18. Shared layers and integrated shape prediction layer constitute the combined model
  19. Corresponding to Step 7 We obtain the integrated shape prediction layer by combining corresponding parameters from each center-specific layer.
  20. We assume there is a multiple relationship between two weights To be consistent with the basic model, we keep weights conforming to this formula The summation of weights is ensured to equal n
  21. By solving two equations, we obtain the respective weights When emphasizing on the detection of current center, we still consider other centers with relatively small weights rather than zero. This is beneficial for utilizing implicit structural correlations among different facial parts and searching the solution smoothly
  22. Then I show the experiments
  23. Euclidean distance between two pupil centers
  24. We show the mean error of each cluster for basic model and combined model on COFW dataset It can be observed that the combined model improves the detection performance of each cluster
  25. We report the results of our method MCNet and previous works. We can see that our method outperforms most state-of-the-art methods It is worth noting that TCDCN obtains better performance than our method on IBUG partly owing to their larger training data. Although occlusions are not detected explicitly, we achieve an outstanding performance on par with Wu et al. on COFW benchmark.
  26. We plot the CED curves for our method and several state-of-the-art methods. It is observed that our method achieves competitive performance on both two benchmarks. Our method achieves better performance for high-level normalized mean error. Therefore, our method is strongly robust to unconstrained environments.
  27. There are several images from COFW We can see our method indicates higher accuracy than RCPR and CFT in the details Benefiting from utilizing structural correlations among different facial parts, our method is robust to severe occlusions.
  28. We also show example images from IBUG where our method MCNet outperforms LBF and CFSS Our method also achieves higher accuracy in the details. Therefore our method demonstrates superior capability of handling severe occlusions and complex variations of pose, expression, illumination.
  29. To obtain a more comprehensive comparison, we present the average running speed of different deep learning methods for face alignment We evaluate these methods on a single core i5-6200U 2.3GHz CPU with 1000 face images. Since CFAN and CDAN do not share their code, we use their published speed results. Both TCDCN and our method MCNet are based on only one network, so they show relatively quick speed. Cascaded CNN, CFAN and CDAN employ multiple networks, so they cost more running time. Our method only takes 15 ms on average to process one face, profiting from low model complexity and computational cost of our network. We believe that our method can be extended to real-time facial landmark tracking in unconstrained scenarios.
  30. Finally, I conclude our paper