SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
IEEE EUROCON 2019
1 – 4 July 2019 Novi Sad, Serbia
Isolated Sign Recognition with a Siamese
Neural Network of RGB and Depth Streams
Anil Osman TUR1,2, Hacer YALIM KELES1,3
1Ankara University Computer Engineering Department
2aotur@ankara.edu.tr, 3hkeles@ankara.edu.tr
Paper №: 02728
1 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Motivation
To solve communication problems between the deaf and the
hearing communities.
Human-machine interface that can be useful for controlling
machines with human gestures for other purposes.
2 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Problem & Challenges
Recognizing signs independent from each other.
Each sign is a composition of hand, face and body features.
High variance of the signs among different signers i.e. body and
pose variations, duration variance of the signs etc.
Multiple modalities of the input information i.e. illumination
changes, occlusion problems etc.
3 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Solution
1. To be able to represent inputs in more effective feature space, we
employed pretrained Convolutional Neural Networks (CNNs).
2. To classify generated feature vectors from CNN we need to
interpret sequences Recurrent Neural Networks (RNNs) used.
Specially Long-Short Term Memory (LSTM) [4] and Gated
Recurrent Unit (GRU) [5] models.
3. To generalize inputs and be robust to changes and variations e.g.
lightning, person in training regularization methods used.
4 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Montalbano Gesture Dataset1
used in experiments.
Video samples are in 640x480
pixels and recorded with
speed of 20 fps.
20 different Italian hand
gestures from 27 different
users.
Dataset includes clothing,
lightning, background
changes.
Dataset
1. S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, “Chalearn looking at people
challenge 2014: Dataset and results”. In: ECCV workshop. 2014
Depth SkeletalUserRGB
5 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
RGB and Depth input cropped to
400 by 400 square images.
Median filter applied to both of the
inputs
User index data used as mask to
depth input to get background
subtraction.
Number of frames fixed to 40.
Preprocess
Cropping
RGB Image Depth Image
6 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Model Architecture
7 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Model Architecture
Convolutional parts from pretrained ResNet-50 [2] and VGG16 [3]
models used.
Global max pooling or global average pooling layers applied to the
outputs of pretrained networks.
Pooling layer outputs connected to Fully-connected (FC) layers.
We experimented FC layers with ReLu, Sigmoid and ReLu + Batch
Normalization configurations.
RGB and Depth outputs from FC layers concatenated and
connected to LSTM.
Output of LSTM connected to Softmax layer to classify gestures.
8 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Training
We used Adam optimizer with 1e-4 learning rate.
We chose batch size as 16.
Pretrained models are used as feature extractors and no finetuning
applied to them.
We experimented with L2 norm and Dropout as regularization
methods.
We chose 0.2 lambda constant for L2 norm and 0.5 probability rate
for Dropout.
9 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Results
ResNet-50
With ResNet-50 and LSTM network we reach to 93.1% accuracy.
Accuracy
Results
ResNet50
Avg Max
LSTM Relu Sigmoid
Relu + Batch
norm
Relu Sigmoid
Relu + Batch
norm
No Regularization 85,49 84,7 86,87 87,96 79,96 92,1
L2 87,17 86,97 85,78 86,08 86,97
Dropout 89,34 89,34 93,19
Dropout + L2 90,92 54,89 89,04
Accuracy
Results
ResNet50
Avg Max
GRU Relu Sigmoid
Relu + Batch
norm
Relu Sigmoid
Relu + Batch
norm
No Regularization 89,04 88,15 85,59 90,03 86,87 90,92
L2 85,19 80,75 79,17 82,43 67,82
Dropout 90,92 85,09 27,34 91,91
Dropout + L2 89,24 89,63 82,92 81,54
(a) ResNet-50 + LSTM
(b) ResNet-50 + GRU
10 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Results
VGG16
With VGG16 and LSTM network we reach to 91.6% accuracy.
Accuracy
Results
VGG16
Avg Max
LSTM Relu Sigmoid
Relu + Batch
norm
Relu Sigmoid
Relu + Batch
norm
No Regularization 87,27 88,15 87,56 85,49 83,32 85,39
L2 88,35 86,57 84,6 87,36 85,78 84,01
Dropout 89,24 89,14 87,86 86,28 88,25
Dropout + L2 89,73 88,25 87,86 88,55 85,88
Accuracy
Results
VGG16
Avg Max
GRU Relu Sigmoid
Relu + Batch
norm
Relu Sigmoid
Relu + Batch
norm
No Regularization 89,63 87,07 85,39 82,43 87,96 84,5
L2 86,48 68,41 87,46 54,59
Dropout 91,51 90,82 89,34 81,84 90,03 89,24
Dropout + L2 91,61 87,27 87,46 86,38 87,46
(a) VGG16 + LSTM
(b) VGG16 + GRU
11 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Results
Summary
Pretrained ResNet-50 and VGG16 networks used as feature
extractors.
We obtained the best results, i.e. 93.19% accuracy, using ResNet-
50 with LSTM.
We have not applied hand or face segmentation to the inputs.
We purposed simple yet effective architecture.
We observed that when LSTM model starts memorization GRU
model solves the memorization problem.
12 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Acknowledgement
The research presented is part of a project funded by TÜBİTAK (The
Scientific and Technological Research Council of Turkey) under grant
number 217E022.
13 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
References
1. S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi, M. Reyes, V.
Ponce, H.J. Escalante, J. Shotton, I. Guyon, “Chalearn looking at people
challenge 2014: Dataset and results”. In: ECCV workshop. 2014.
2. K. He, X. Zhang, S. Ren, J. Su, “Deep Residual Learning for Image
Recognition”. Proceedings of the IEEE conference on computer vision and
pattern recognition. 2016.
3. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-
scale image recognition”. arXiv preprint arXiv:1409.1556, 2014.
4. I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to sequence learning with
neural networks”. In: Advances in neural information processing systems,
pp. 3104-3112, 2014.
5. J. Chung, C. Gulcehre, K. Cho, Y. Bengio, "Empirical evaluation of gated
recurrent neural networks on sequence modeling." arXiv preprint
arXiv:1412.3555. 2014.
14 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
Questions?
Questions?
15 / 16
18th IEEE International Conference on Smart Technologies - EUROCON 2019,
1–4 JULY 2019, NOVI SAD, SERBIA
The End
Thank you for your attention
16 / 16

Weitere ähnliche Inhalte

Was ist angesagt?

Bring a copy of your resume for your future placements
Bring a copy of your resume for your future placementsBring a copy of your resume for your future placements
Bring a copy of your resume for your future placements
Future-tec technologies
 

Was ist angesagt? (19)

Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...
 
Call for papers - 8 th International Conference on Signal and Image Processin...
Call for papers - 8 th International Conference on Signal and Image Processin...Call for papers - 8 th International Conference on Signal and Image Processin...
Call for papers - 8 th International Conference on Signal and Image Processin...
 
Self-introduction Ajinkya Kulkarni
Self-introduction Ajinkya KulkarniSelf-introduction Ajinkya Kulkarni
Self-introduction Ajinkya Kulkarni
 
11th International Conference on Digital Image Processing and Pattern Recogni...
11th International Conference on Digital Image Processing and Pattern Recogni...11th International Conference on Digital Image Processing and Pattern Recogni...
11th International Conference on Digital Image Processing and Pattern Recogni...
 
11th International Conference on Digital Image Processing and Pattern Recogni...
11th International Conference on Digital Image Processing and Pattern Recogni...11th International Conference on Digital Image Processing and Pattern Recogni...
11th International Conference on Digital Image Processing and Pattern Recogni...
 
Visible Light Communications for Li-Fi Technology Using PWM Signals
Visible Light Communications for Li-Fi Technology Using PWM SignalsVisible Light Communications for Li-Fi Technology Using PWM Signals
Visible Light Communications for Li-Fi Technology Using PWM Signals
 
Call for papers - 9th International Conference on Signal, Image Processing an...
Call for papers - 9th International Conference on Signal, Image Processing an...Call for papers - 9th International Conference on Signal, Image Processing an...
Call for papers - 9th International Conference on Signal, Image Processing an...
 
7th International Conference on Signal and Image Processing (SIPRO 2021)
7th International Conference on Signal and Image Processing (SIPRO 2021)7th International Conference on Signal and Image Processing (SIPRO 2021)
7th International Conference on Signal and Image Processing (SIPRO 2021)
 
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
IRJET- Review on Raspberry Pi based Assistive Communication System for Blind,...
 
Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...
 
Call for papers - 8th International Conference on Signal and Image Processing...
Call for papers - 8th International Conference on Signal and Image Processing...Call for papers - 8th International Conference on Signal and Image Processing...
Call for papers - 8th International Conference on Signal and Image Processing...
 
saurabh cv
saurabh cvsaurabh cv
saurabh cv
 
delna's journal
delna's journaldelna's journal
delna's journal
 
Nitesh Exp. PHOTO
Nitesh Exp. PHOTONitesh Exp. PHOTO
Nitesh Exp. PHOTO
 
Bring a copy of your resume for your future placements
Bring a copy of your resume for your future placementsBring a copy of your resume for your future placements
Bring a copy of your resume for your future placements
 
7th International Conference on Signal and Image Processing (SIPRO 2021)
7th International Conference on Signal and Image Processing (SIPRO 2021)7th International Conference on Signal and Image Processing (SIPRO 2021)
7th International Conference on Signal and Image Processing (SIPRO 2021)
 
5th International Conference on Signal and Image Processing (SIGI 2019)
5th International Conference on Signal and Image Processing (SIGI 2019) 5th International Conference on Signal and Image Processing (SIGI 2019)
5th International Conference on Signal and Image Processing (SIGI 2019)
 
Vlsi Education In India
Vlsi Education In IndiaVlsi Education In India
Vlsi Education In India
 
7 th International Conference on Signal Processing and Pattern Recognition (...
7 th International Conference on Signal Processing and Pattern Recognition  (...7 th International Conference on Signal Processing and Pattern Recognition  (...
7 th International Conference on Signal Processing and Pattern Recognition (...
 

Ähnlich wie "Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" EUROCON 2019 Presentation

A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
ijtsrd
 

Ähnlich wie "Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" EUROCON 2019 Presentation (20)

Sign Language Recognition
Sign Language RecognitionSign Language Recognition
Sign Language Recognition
 
IRJET- Automated Face Detection and Recognition for Detecting Impersonati...
IRJET-  	  Automated Face Detection and Recognition for Detecting Impersonati...IRJET-  	  Automated Face Detection and Recognition for Detecting Impersonati...
IRJET- Automated Face Detection and Recognition for Detecting Impersonati...
 
IRJET - Android based Portable Hand Sign Recognition System
IRJET -  	  Android based Portable Hand Sign Recognition SystemIRJET -  	  Android based Portable Hand Sign Recognition System
IRJET - Android based Portable Hand Sign Recognition System
 
Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...Dual method cryptography image by two force secure and steganography secret m...
Dual method cryptography image by two force secure and steganography secret m...
 
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
IRJET- Text Reading for Visually Impaired Person using Raspberry PiIRJET- Text Reading for Visually Impaired Person using Raspberry Pi
IRJET- Text Reading for Visually Impaired Person using Raspberry Pi
 
IRJET - Sign Language to Speech Conversion Gloves using Arduino and Flex Sens...
IRJET - Sign Language to Speech Conversion Gloves using Arduino and Flex Sens...IRJET - Sign Language to Speech Conversion Gloves using Arduino and Flex Sens...
IRJET - Sign Language to Speech Conversion Gloves using Arduino and Flex Sens...
 
A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
A Deep Neural Framework for Continuous Sign Language Recognition by Iterative...
 
IRJET- Concealing of Deets using Steganography Technique
IRJET- Concealing of Deets using Steganography TechniqueIRJET- Concealing of Deets using Steganography Technique
IRJET- Concealing of Deets using Steganography Technique
 
IRJET- Concealing of Deets using Steganography Technique
IRJET- Concealing of Deets using Steganography TechniqueIRJET- Concealing of Deets using Steganography Technique
IRJET- Concealing of Deets using Steganography Technique
 
IRJET - Ultrasonic Navigation Enabled Walking Stick for the Visually Impa...
IRJET -  	  Ultrasonic Navigation Enabled Walking Stick for the Visually Impa...IRJET -  	  Ultrasonic Navigation Enabled Walking Stick for the Visually Impa...
IRJET - Ultrasonic Navigation Enabled Walking Stick for the Visually Impa...
 
Call for Paper - 7th International Conference on Image Processing and Pattern...
Call for Paper - 7th International Conference on Image Processing and Pattern...Call for Paper - 7th International Conference on Image Processing and Pattern...
Call for Paper - 7th International Conference on Image Processing and Pattern...
 
IRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural NetworkIRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural Network
 
A Survey on Fingerprint Identification for Different Orientation Images.
A Survey on Fingerprint Identification for Different Orientation Images.A Survey on Fingerprint Identification for Different Orientation Images.
A Survey on Fingerprint Identification for Different Orientation Images.
 
Smart Stick for Blind People with Live Video Feed
Smart Stick for Blind People with Live Video FeedSmart Stick for Blind People with Live Video Feed
Smart Stick for Blind People with Live Video Feed
 
Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...Call for Papers - 11th International Conference on Digital Image Processing a...
Call for Papers - 11th International Conference on Digital Image Processing a...
 
IRJET - Speaking System for Mute People
 IRJET -  	  Speaking System for Mute People IRJET -  	  Speaking System for Mute People
IRJET - Speaking System for Mute People
 
HAND GESTURE VOCALIZER
HAND GESTURE VOCALIZERHAND GESTURE VOCALIZER
HAND GESTURE VOCALIZER
 
IRJET - Third Eye for Blind People using Ultrasonic Vibrating Gloves with Ima...
IRJET - Third Eye for Blind People using Ultrasonic Vibrating Gloves with Ima...IRJET - Third Eye for Blind People using Ultrasonic Vibrating Gloves with Ima...
IRJET - Third Eye for Blind People using Ultrasonic Vibrating Gloves with Ima...
 
IJCNC Top 10 Trending Articles in Academia !!!
IJCNC Top 10 Trending Articles in Academia !!!IJCNC Top 10 Trending Articles in Academia !!!
IJCNC Top 10 Trending Articles in Academia !!!
 
IRJET- Development of a Face Recognition System with Deep Learning and Py...
IRJET-  	  Development of a Face Recognition System with Deep Learning and Py...IRJET-  	  Development of a Face Recognition System with Deep Learning and Py...
IRJET- Development of a Face Recognition System with Deep Learning and Py...
 

Kürzlich hochgeladen

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Kürzlich hochgeladen (20)

A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 

"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" EUROCON 2019 Presentation

  • 1. IEEE EUROCON 2019 1 – 4 July 2019 Novi Sad, Serbia Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams Anil Osman TUR1,2, Hacer YALIM KELES1,3 1Ankara University Computer Engineering Department 2aotur@ankara.edu.tr, 3hkeles@ankara.edu.tr Paper №: 02728 1 / 16
  • 2. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Motivation To solve communication problems between the deaf and the hearing communities. Human-machine interface that can be useful for controlling machines with human gestures for other purposes. 2 / 16
  • 3. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Problem & Challenges Recognizing signs independent from each other. Each sign is a composition of hand, face and body features. High variance of the signs among different signers i.e. body and pose variations, duration variance of the signs etc. Multiple modalities of the input information i.e. illumination changes, occlusion problems etc. 3 / 16
  • 4. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Solution 1. To be able to represent inputs in more effective feature space, we employed pretrained Convolutional Neural Networks (CNNs). 2. To classify generated feature vectors from CNN we need to interpret sequences Recurrent Neural Networks (RNNs) used. Specially Long-Short Term Memory (LSTM) [4] and Gated Recurrent Unit (GRU) [5] models. 3. To generalize inputs and be robust to changes and variations e.g. lightning, person in training regularization methods used. 4 / 16
  • 5. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Montalbano Gesture Dataset1 used in experiments. Video samples are in 640x480 pixels and recorded with speed of 20 fps. 20 different Italian hand gestures from 27 different users. Dataset includes clothing, lightning, background changes. Dataset 1. S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, “Chalearn looking at people challenge 2014: Dataset and results”. In: ECCV workshop. 2014 Depth SkeletalUserRGB 5 / 16
  • 6. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA RGB and Depth input cropped to 400 by 400 square images. Median filter applied to both of the inputs User index data used as mask to depth input to get background subtraction. Number of frames fixed to 40. Preprocess Cropping RGB Image Depth Image 6 / 16
  • 7. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Model Architecture 7 / 16
  • 8. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Model Architecture Convolutional parts from pretrained ResNet-50 [2] and VGG16 [3] models used. Global max pooling or global average pooling layers applied to the outputs of pretrained networks. Pooling layer outputs connected to Fully-connected (FC) layers. We experimented FC layers with ReLu, Sigmoid and ReLu + Batch Normalization configurations. RGB and Depth outputs from FC layers concatenated and connected to LSTM. Output of LSTM connected to Softmax layer to classify gestures. 8 / 16
  • 9. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Training We used Adam optimizer with 1e-4 learning rate. We chose batch size as 16. Pretrained models are used as feature extractors and no finetuning applied to them. We experimented with L2 norm and Dropout as regularization methods. We chose 0.2 lambda constant for L2 norm and 0.5 probability rate for Dropout. 9 / 16
  • 10. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Results ResNet-50 With ResNet-50 and LSTM network we reach to 93.1% accuracy. Accuracy Results ResNet50 Avg Max LSTM Relu Sigmoid Relu + Batch norm Relu Sigmoid Relu + Batch norm No Regularization 85,49 84,7 86,87 87,96 79,96 92,1 L2 87,17 86,97 85,78 86,08 86,97 Dropout 89,34 89,34 93,19 Dropout + L2 90,92 54,89 89,04 Accuracy Results ResNet50 Avg Max GRU Relu Sigmoid Relu + Batch norm Relu Sigmoid Relu + Batch norm No Regularization 89,04 88,15 85,59 90,03 86,87 90,92 L2 85,19 80,75 79,17 82,43 67,82 Dropout 90,92 85,09 27,34 91,91 Dropout + L2 89,24 89,63 82,92 81,54 (a) ResNet-50 + LSTM (b) ResNet-50 + GRU 10 / 16
  • 11. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Results VGG16 With VGG16 and LSTM network we reach to 91.6% accuracy. Accuracy Results VGG16 Avg Max LSTM Relu Sigmoid Relu + Batch norm Relu Sigmoid Relu + Batch norm No Regularization 87,27 88,15 87,56 85,49 83,32 85,39 L2 88,35 86,57 84,6 87,36 85,78 84,01 Dropout 89,24 89,14 87,86 86,28 88,25 Dropout + L2 89,73 88,25 87,86 88,55 85,88 Accuracy Results VGG16 Avg Max GRU Relu Sigmoid Relu + Batch norm Relu Sigmoid Relu + Batch norm No Regularization 89,63 87,07 85,39 82,43 87,96 84,5 L2 86,48 68,41 87,46 54,59 Dropout 91,51 90,82 89,34 81,84 90,03 89,24 Dropout + L2 91,61 87,27 87,46 86,38 87,46 (a) VGG16 + LSTM (b) VGG16 + GRU 11 / 16
  • 12. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Results Summary Pretrained ResNet-50 and VGG16 networks used as feature extractors. We obtained the best results, i.e. 93.19% accuracy, using ResNet- 50 with LSTM. We have not applied hand or face segmentation to the inputs. We purposed simple yet effective architecture. We observed that when LSTM model starts memorization GRU model solves the memorization problem. 12 / 16
  • 13. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Acknowledgement The research presented is part of a project funded by TÜBİTAK (The Scientific and Technological Research Council of Turkey) under grant number 217E022. 13 / 16
  • 14. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA References 1. S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, “Chalearn looking at people challenge 2014: Dataset and results”. In: ECCV workshop. 2014. 2. K. He, X. Zhang, S. Ren, J. Su, “Deep Residual Learning for Image Recognition”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. 3. K. Simonyan, A. Zisserman, “Very deep convolutional networks for large- scale image recognition”. arXiv preprint arXiv:1409.1556, 2014. 4. I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to sequence learning with neural networks”. In: Advances in neural information processing systems, pp. 3104-3112, 2014. 5. J. Chung, C. Gulcehre, K. Cho, Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555. 2014. 14 / 16
  • 15. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA Questions? Questions? 15 / 16
  • 16. 18th IEEE International Conference on Smart Technologies - EUROCON 2019, 1–4 JULY 2019, NOVI SAD, SERBIA The End Thank you for your attention 16 / 16