SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Gestures and Lip Shape Integration
                                     for
               Cued Speech Recognition

Seminar By:             Seminar Coordinator:
Mohammed Musfir         Mr. Rino P. C.
ECE-B, 08104131         Assistant Professor, ECE

                        Seminar Guide:
                        Mr. Edet Bijoy K.
                        Assistant Professor, ECE
02/12/2011   2
02/12/2011   3
02/12/2011   4
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Overview of Presentation

                                                                            Objective
                                                                            Introduction
                                                                            ASR Techniques
                                                                            Lip Reading – AVSR
                                                                            Cued Speech
                                                                            Integrated Recognition
                                                                            Conclusion


                                                                02/12/2011                            5
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Objective

                                                                       Developments in ASR technique
                                                                       AVSR Accessibility solution
                                                                              Lip Detection
                                                                              Cued Speech detection
                                                                              Integration of both




                                                                02/12/2011                              6
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                  INTRODUCTION



7
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Briefing ASR

                                                                       First successful system in 1970
                                                                       Consist of two systems
                                                                              ASR – Transcribe
                                                                              SU- Understand transcription
                                                                       Knowledge Intensive




                                                                02/12/2011                                    8
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                  ASR TECHNIQUES



9
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                ASR Industry

                                                                       Industry pioneers – NUANCE, NTT Labs, AT
                                                                        & T labs
                                                                       MIT and GPL – Vox Forge, Gvoice
                                                                       Desktop Dictation -1990
                                                                       Types of ASR
                                                                              DVI – Word or phrase spotting
                                                                              LVCSR- Several thousands words



                                                                02/12/2011                                      10
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Techniques




                                                                       Sequence of sounds
                                                                       ASR involves
                                                                              Acquisition - Recording
                                                                              Feature Extraction – Spectral analysis
                                                                              Pattern matching and decoding


                                                                02/12/2011                                              11
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                                         Techniques




12
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Approaches

                                                                       Template Based
                                                                       Knowledge Based
                                                                       Statistical
                                                                       Learning based
                                                                       Artificial Intelligence




                                                                02/12/2011                        13
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                  LIP READING



14
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




                                                




02/12/2011
                                              Front end Lips detection
                                                                         Lip Reading - AVSR




15
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Localisation and Tracking

                                                                            ROI determination – Sobel Edge Filtering
                                                                                Kalman Filter – Tracking
                                                                            Principal Component Analysis – Feature
                                                                             Coefficients
                                                                            Audio feature - MFCC




                                                                02/12/2011                                              16
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                CUED SPEECH



17
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                                         Overview of Cued Speech




18
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                INTEGRATION



19
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Steps

                                                                       Lip feature extraction
                                                                       Audio Synchronization with the Image
                                                                       Multistream HMM Fusion – State Synchronous
                                                                        Decision
                                                                       Automatic Image Processing to record the CUEs
                                                                       Lip Width, Aperture, Area, Upper pinch and
                                                                        Lower Pinch
                                                                       Modeling - 8 lip parameters and 10 hand
                                                                        parameters
                                                                02/12/2011                                         20
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Fusion

                                                                       Feature Fusion – Concatenation

                                                                                          ������ ������    ������ ������  ������ ������ ������
                                                                                       ������������ = [������������ , ������������ ] ∈               ������������
                                                                                 ������������ ������ - Lip hand feature vector
                                                                                   ������

                                                                                        ������ ������
                                                                                 ������������           - Lip shape feature vector
                                                                                        ������ ������
                                                                                 ������������ - Hand feature vector
                                                                             D - Dimensionality

                                                                02/12/2011                                                          21
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Conclusion

                                                                       Cued Speech Recognition – 80% accuracy
                                                                       Outstands ASR in normal environment
                                                                       Visual mode – Education of the hearing impaired
                                                                       Phoneme recognition successful
                                                                       Another product over SIRI




                                                                02/12/2011                                           22
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION


                                                                Reference
                                                                 1.     Baum L.E., Petrie T., “Statistical Inference for Probabilistic functions of Finite-State Markov
                                                                        Chains”, Annotated Mathematical Statistics, Volume 37, Number 6, pp.1554-1563, 1966
                                                                 2.     XiaoZheng Zhang, Charles C. Broun, Russell M. Mersereau, Mark A. Clements, “Automatic
                                                                        speech reading with applications to human computer interfaces”, Eurasip Journal on Applied
                                                                        Signal Processing, Volume 2002, Issue 11, pp. 1228-1247.
                                                                 3.     Jian-Ming Zhang, Liang-Min Wang, De-Jiao Niu,Yong-Zhao Zhan, “Research and
                                                                        implementation of a real time approach to lip detection in video sequence”, International
                                                                        Conference on Machine Learning and Cybernetics, IEEE, 2003.
                                                                 4.     Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md Saifur Rahman, “Speaker
                                                                        identification using Mel frequency cepstral coefficients”, 3rd International Conference on
                                                                        Electrical And Computer Engineering, ICECE 2004.
                                                                 5.     P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, and H. Ney, “Speech recognition techniques
                                                                        for a sign language recognition system,” In Proceedings of Interspeech, pp. 2513–2516, 2007.
                                                                 6.     A. A. Montgomery and P. L. Jackson, “Physical characteristics of the lips underlying vowel lip
                                                                        reading performance,” Journal of the Acoustical Society of America, Volume 73, Number 6,
                                                                        pp. 2134–2144, 1983.
                                                                 7.     J. Leybaert, “Phonology acquired through the eyes and spelling in deaf children,” Journal of
                                                                        Experimental Child Psychology, Volume 75, pp. 291–318, 2000.



                                                                02/12/2011                                                                                           23
GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION




02/12/2011
                                THANK YOU



24

Weitere ähnliche Inhalte

Andere mochten auch (7)

Electronic Toll Tax collection system in india
Electronic Toll Tax collection system in india Electronic Toll Tax collection system in india
Electronic Toll Tax collection system in india
 
Smart quill
Smart quillSmart quill
Smart quill
 
Electronic Toll Collection System
Electronic Toll Collection SystemElectronic Toll Collection System
Electronic Toll Collection System
 
Smart card technology
Smart card technologySmart card technology
Smart card technology
 
Embedded system in automobile
Embedded system in automobileEmbedded system in automobile
Embedded system in automobile
 
Toll plaza ppt
Toll plaza pptToll plaza ppt
Toll plaza ppt
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Gestures and Lip Shape Integration for Cued Speech Recognition

  • 1. Gestures and Lip Shape Integration for Cued Speech Recognition Seminar By: Seminar Coordinator: Mohammed Musfir Mr. Rino P. C. ECE-B, 08104131 Assistant Professor, ECE Seminar Guide: Mr. Edet Bijoy K. Assistant Professor, ECE
  • 5. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Overview of Presentation  Objective  Introduction  ASR Techniques  Lip Reading – AVSR  Cued Speech  Integrated Recognition  Conclusion 02/12/2011 5
  • 6. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Objective  Developments in ASR technique  AVSR Accessibility solution  Lip Detection  Cued Speech detection  Integration of both 02/12/2011 6
  • 7. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 INTRODUCTION 7
  • 8. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Briefing ASR  First successful system in 1970  Consist of two systems  ASR – Transcribe  SU- Understand transcription  Knowledge Intensive 02/12/2011 8
  • 9. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 ASR TECHNIQUES 9
  • 10. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION ASR Industry  Industry pioneers – NUANCE, NTT Labs, AT & T labs  MIT and GPL – Vox Forge, Gvoice  Desktop Dictation -1990  Types of ASR  DVI – Word or phrase spotting  LVCSR- Several thousands words 02/12/2011 10
  • 11. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Techniques  Sequence of sounds  ASR involves  Acquisition - Recording  Feature Extraction – Spectral analysis  Pattern matching and decoding 02/12/2011 11
  • 12. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 Techniques 12
  • 13. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Approaches  Template Based  Knowledge Based  Statistical  Learning based  Artificial Intelligence 02/12/2011 13
  • 14. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 LIP READING 14
  • 15. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION  02/12/2011 Front end Lips detection Lip Reading - AVSR 15
  • 16. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Localisation and Tracking  ROI determination – Sobel Edge Filtering  Kalman Filter – Tracking  Principal Component Analysis – Feature Coefficients  Audio feature - MFCC 02/12/2011 16
  • 17. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 CUED SPEECH 17
  • 18. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 Overview of Cued Speech 18
  • 19. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 INTEGRATION 19
  • 20. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Steps  Lip feature extraction  Audio Synchronization with the Image  Multistream HMM Fusion – State Synchronous Decision  Automatic Image Processing to record the CUEs  Lip Width, Aperture, Area, Upper pinch and Lower Pinch  Modeling - 8 lip parameters and 10 hand parameters 02/12/2011 20
  • 21. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Fusion  Feature Fusion – Concatenation ������ ������ ������ ������ ������ ������ ������ ������������ = [������������ , ������������ ] ∈ ������������ ������������ ������ - Lip hand feature vector ������ ������ ������ ������������ - Lip shape feature vector ������ ������ ������������ - Hand feature vector D - Dimensionality 02/12/2011 21
  • 22. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Conclusion  Cued Speech Recognition – 80% accuracy  Outstands ASR in normal environment  Visual mode – Education of the hearing impaired  Phoneme recognition successful  Another product over SIRI 02/12/2011 22
  • 23. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION Reference 1. Baum L.E., Petrie T., “Statistical Inference for Probabilistic functions of Finite-State Markov Chains”, Annotated Mathematical Statistics, Volume 37, Number 6, pp.1554-1563, 1966 2. XiaoZheng Zhang, Charles C. Broun, Russell M. Mersereau, Mark A. Clements, “Automatic speech reading with applications to human computer interfaces”, Eurasip Journal on Applied Signal Processing, Volume 2002, Issue 11, pp. 1228-1247. 3. Jian-Ming Zhang, Liang-Min Wang, De-Jiao Niu,Yong-Zhao Zhan, “Research and implementation of a real time approach to lip detection in video sequence”, International Conference on Machine Learning and Cybernetics, IEEE, 2003. 4. Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md Saifur Rahman, “Speaker identification using Mel frequency cepstral coefficients”, 3rd International Conference on Electrical And Computer Engineering, ICECE 2004. 5. P. Dreuw, D. Rybach, T. Deselaers, M. Zahedi, and H. Ney, “Speech recognition techniques for a sign language recognition system,” In Proceedings of Interspeech, pp. 2513–2516, 2007. 6. A. A. Montgomery and P. L. Jackson, “Physical characteristics of the lips underlying vowel lip reading performance,” Journal of the Acoustical Society of America, Volume 73, Number 6, pp. 2134–2144, 1983. 7. J. Leybaert, “Phonology acquired through the eyes and spelling in deaf children,” Journal of Experimental Child Psychology, Volume 75, pp. 291–318, 2000. 02/12/2011 23
  • 24. GESTURE AND LIP SHAPE INTEGRATION FOR CUED SPEECH RECOGNITION 02/12/2011 THANK YOU 24