Anzeige

An efficient technique for ISL Recognition.pdf

28. Mar 2023
Anzeige

Más contenido relacionado

Similar a An efficient technique for ISL Recognition.pdf(20)

Anzeige

An efficient technique for ISL Recognition.pdf

  1. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Ph.D. THESIS PRESENTATION ON
  2. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Introduction ØTypes of SL ØMotivation ØTypes of gestures in SL ØSign Language Recognition System (SLRS) →Image Acquisition Technique →Image Pre-processing →Image Segmentation →Feature Extraction →Classification ØChallenges
  3. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ØObjectives of work • Literature Review ØLiterature Review on different SLR processing methods • Comparative Analysis of Feature Detection and Extraction methods for Vision - based ISLRS (Objective-1) ØTaxonomy of Feature Extraction Techniques ØFeature Extraction ØSIFT ØSURF ØFAST ØBRIEF ØORB ØExperimental Results ØSummary
  4. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • A Hybrid approach for Feature Extraction for Vision-Based Indian Sign Language Recognition (Objective-2) ØProblems in existing system ØProposed Solution ØBrief overview of FiST_CNN ØBasic Terminology ØDataset used ØExperimental Results ØSummary • Hand Anatomy and Neural Network-Based Recognition for Indian Sign Language (Objective-3) ØHand Geometry ØBrief overview of FiST_HGNN ØBasic Terminology ØDataset used ØExperimental Results ØSummary
  5. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Applying FiST_HGNN for Recognition of ISL Words used in Daily Life (Objective-4) ØDataset Creation ØExperimental Results ØCode Snippets ØSummary • Conclusion • Future Scope • List of Publication • References
  6. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Humans communicate to express their emotions, share ideas, collaborate, support one another, serve the community, and advance society. • Communication is carried out in spoken form by speech and in non-verbal form through gestures. • It is based on hand motions, body parts, and facial expressions. • A Non-government agency(NGO), i.e. World Federation of the Deaf(WDF), states that there are around 70 million population of deaf-mute across the world[1]. 6
  7. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Within each country or region, wherever deaf-mute communities exist, sign languages develop independently from the spoken language of the region. • Each sign language has its grammar and rules with a shared property that is all visually perceived. As a result, there is a communication gap between the deaf-mute community and the others. • However, advances in science, technology, and computer vision have evolved as a tool to help the deaf-mute community interact with the broader community. 7
  8. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The first sign language recognized by researchers was American Sign Language (ASL). British Sign Language (BSL), developed in the United Kingdom, is the most widely used. Currently, around 7000 sign languages are being used worldwide [1]. • In India, with the range of different sign language, various sign language dictionary has been published, such as Delhi sign language[2], Mumbai sign language [3][4], Calcutta sign language[5] and Bangalore sign language [6]. • A decade ago, [7] produced a dictionary consisting of 1830 sign words found across 14 different states in India. 8
  9. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 9 Figure 1.1. Different SLs used around the world
  10. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • According to the 2011 Census, India's population of deaf people was around 50 lakhs. • To provide a suitable medium for ISL training, the Indian Ministry of Social Justice and Empowerment assisted the Indira Gandhi National Open University (IGNOU) in establishing the Indian Sign Language Research and Training Centre (ISLRTC) in 2011. • ISLRTC conducts research and training in various nodal centres around the country. The ISL dictionary(http://indiansignlanguage.org/dictionary/, 2018) has approximately 2500 signs from 12 states and 42 cities. • On February 27, 2019, a new dictionary containing 6000 words in medical, academic, legal, technological, and daily phrases was released. 10
  11. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 11 0 1000000 2000000 3000000 4000000 5000000 6000000 In Seeing In Hearing In Speech In Movement Disable person by the type of Disability Total Female Males Figure 1.2. India’s disability distribution
  12. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • SL is the only mode of communication for deaf and mute people. It provides a medium to share thoughts, emotions, and feelings with hard-of-hearing people. Some people have been deaf-mute since birth. However, some people suffered from this afterwards. • Learnings for such people became very hard. Hence, they need to learn things only through vision. The SL instructors resolve this problem. They became the medium of communication between ordinary persons and the deaf-mute community. But not everyone can afford these instructors. • There is a lack of schools or learning centers for such people in India. There is still a lack of attention for these people. There are only 478 government schools and 372 private schools throughout India. And mostly schools use the oral mode of communication. • Further, they face communication problems at public places such as banks, medical, home-related, public transport, schools, etc. So, a Vision-based system should be effective and reliable when used in the real world and is less expensive, affordable, and easy to use. • This work emphasizes using soft computing techniques to reduce the conversation rift between the deaf-mute community and other people. 12
  13. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The hand gesture recognition system primarily depends on hand posture. • Sign language gestures broadly can be divided into two categories: Manual and non- manual gestures. • Based on motion, the gestures are classified as static and dynamic gestures. • The manual gesture is much simpler for recognition compared to non-manual gestures. • Manual gestures use only hand gestures, while non-manual gestures include mouth morphemes, eye gazes, facial expressions, body shifting and head tilting. 13
  14. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.3. Classification of gestures in SL 14
  15. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The signer's body is not involved in these movements. • This category includes alphabets like 'J' and 'Z,' as well as the different 24 alphabets that can be rendered in static form. • On the one hand, ISL alphabets and numbers are shown in Figures 1.4 and 1.5, respectively 15
  16. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.4. ISL Alphabets using one hand gesture Figure 1.5. ISL Number using one hand gesture(adopted from google.com) 16
  17. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.6. ASL Alphabets gesture’s 17
  18. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Both hands are used to symbolize the two-handed gestures. • Figure 1.7 illustrate the two-hand ISL alphabets gesture. • Figure 1.8 shows the examples of ISL words. 18
  19. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.8. Sample images for static two-hand ISL gesture 19 Figure 1.7. Sample images for static two-hand ISL alphabets
  20. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • These gestures contain motions. Dynamic gestures can be further depicted by using one hand or two-hand. a) One-hand gesture: In these gestures, only one hand is used to depict the gesture; however, the hand will be in motion while communicating. Among the alphabet, ‘J’ and ‘Z’ are the gestures which require motion as shown in Figure 1.9. 20 Figure 1.9. Sample images for alphabet “J” and “Z.”
  21. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 21 b) Two-hand gestures: Both hands are required to depict these gestures, as shown in Figure 1.10. Based on hand movement, two- hand gestures are categorized into Type-0 and Type-1. In type 0, one hand will be considered as the principal hand, and the other will be a non-principal hand. While in the case of type-1, both hands are in continuous motion. Figure 1.10. Sample images for two hand dynamic gestures
  22. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 22 Non-manual gesture mainly resembles gestures having movement. The non-manual gesture consists of mouth gesture, body posture and facial expression, as shown in Figure 1.11. Figure 1.11. Sample images for non-manual gestures
  23. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • There can be two types of SLRS systems, one is device based, and another is vision based. • A device-based system, a device is used to acquire and predict the gesture. • The vision-based system uses the webcam to develop and predict the gesture. • In sign language recognition, for the vision-based system, no trainer is required, and it is also versatile. • The vision-based system is a much more straightforward and intuitive communication between the deaf-mute and the computer. 23
  24. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • With the advances in image and video processing techniques, hand gesture applications such as virtual reality, medical applications, video game consoles, touch screens, and sign language recognition systems also progressed. • The device-based method is less flexible as a user must wear a glove always. In contrast, the vision- based method allows users to interact remotely [9]. • The basic steps presented in the next section mainly focused on the design of a vision-based sign language recognition. The processing steps needed to develop a predictive model for SLRS are discussed further. 24
  25. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.12. Steps in SLRS recognition 25
  26. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 26
  27. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • These gadgets have recently become popular for SLR [2, 3, 4]. Surface electromyogram devices use sensors on the skin surface to measure the signals non-invasively created in the muscles. • These systems consist of sensor-based devices that can measure the hand's motion irrespective of the rotation, such as gyroscopes and accelerometers [5]. The sensors are placed in the hand gloves, as shown in Figure 1.13. • In these systems, the gestures are acquired through signals using data gloves [6], and two cyber-gloves, one on each hand [7], are also used. There are benefits and cons to each sensor modality device. • However, wearing a glove for the whole duration is difficult and obstructs the natural signing. Not only these devices are costly, but they also need a controlled environment to operate. 27
  28. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.13. Data acquisition devices 28
  29. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • These systems use a camera in front of the signer to track hand motion [10]. In [8], the authors classified individual signs and signed words from video streams with up to 97.3% accuracy using hand shape, velocity, and location as components. • Hand geometry parameters are also used to evaluate the hand features for 10 ISL gestures [12]. However, deep learning techniques such as CNN, VGG16, MobileNets etc., have provided a boon for SLR systems. • To improve recognition through a vision-based system, state-of-art techniques have been hybridized with deep learning techniques, as shown in Figure 1.14. • These systems are inexpensive in terms of implementation, and recently, much progress has been made on vision-based recognition systems. 29
  30. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.14. Recognition process through the vision-based system 30
  31. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • In ISLR, the dataset is pre-processed as the cleaning step. This stage aims to prepare the dataset for training, making it easier to evaluate and process computationally. • Pre-processing is a technique for reducing the algorithm's complexity and increasing its correctness. Image processing might include tasks such as image resizing, geometric and colour transformations, colour to grayscale conversion, and many others. • The images to a greyscale is converted to Black, and white, as shown in Figure 1.15. The pixels with value 1 are the black pixels, i.e., the object; however, zero represents the white area. 31
  32. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.15. Hand gesture conversion from RGB to Grey 32
  33. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Scaling, rotations, and other affine transformations are all part of data augmentation. Different types of data augmentation techniques are: i. Flipping: These tasks are typically completed to increase the dataset size and expose the neural network to various image variants. The model can recognize the object in any shape or form. ii. Colour Space: Color channels are used in this technique. Colors such as R, G, B are isolated into single matrix to transform the colors. iii. Cropping: In this procedure, a center patch of the same height and breadth is cropped for all images in the collection. Random cropping is also employed to create a translation-like effect. iv. Rotation: The rotation degree parameters determine the precision of this augmentation. An SLR camera may overlap with other gestures if a picture is rotated at different angles. 33
  34. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques v. Translation: To avoid positional bias in the data, translation is done by shifting the images left, right, up, or down. This padding is used to keep the image's spatial dimensions intact. vi. Noise injection: Injecting a matrix of random values from a Gaussian distribution is known as noise injection. To make the model more robust, noise is added to the images. Figure 1.16. Translation applied on an ISL word "Above" gesture. 34
  35. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 35
  36. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Segmentation Techniques Pros Cons Thresholding based segmentation It's a simple and uncomplicated method. It does not require any prior knowledge to function. As a result, it has a cheap cost of calculation. It relies heavily on peaks, with minimal regard for spatial subtleties—noise sensitivity. It's challenging to choose the best threshold value. Edge-based segmentation Appropriate for images with higher object contrast. It's not suitable for images with a lot of noise or edges. Region-based segmentation It's less prone to noise and more valuable when creating similarity criteria is straightforward. In terms of processing time and memory usage, it is pretty expensive. Clustering based segmentation Because of the fuzzy partial membership used, it is more beneficial for real-world problems. It's not simple to figure out how to determine membership functions. Table 1.3. A comparison based on the study is illustrated below 36
  37. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The process of discovering the most promising and informative characteristics set to increase the accuracy and efficiency of the data to be tested is referred to as feature extraction[23]. • After image pre-processing, feature extraction is the most important phase in SLRS. • Local and global descriptors are the two types of features. Global features characterize the image, allowing the entire item to be generalized. • Shape matrices, invariant moments (Hu, Zernike), HOG, and Co-HOG are common global descriptors[24][25][26] for image retrieval, object recognition, and classification. • The local descriptors[27][28][29] are used for object recognition and identification [30] like SIFT, SURF. For feature extraction in vision-based gesture recognition systems, a variety of approaches have been used, including Zernike moments, Hu moments, HOG, SIFT, ED, FD, DWT, ANN, CNN, Fuzzy, GA[33][34][36][37][41][44][64]. 37
  38. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Neural Network(NN): NN is a multi-layer network with n layers 𝑛 ≥ 3 and the model has 𝑥! neurons at the last layer. • A NN is a three-layered architecture composite of an input layer, a hidden layer and an output layer. In the input layer, data processing is done using weights. Further, the output layer is responsible for the prediction of results. • The NN model is mainly of two types: feedforward and backpropagation. A feedforward NN is shown in Figure 1.17. 38
  39. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.17. The architecture of the Neural Network 39
  40. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Convolutional Neural Network is part of deep, feed-forward artificial neural networks that can perform various tasks with even better time and accuracy than other classifiers. • A typical CNN has three layers: a convolution layer, a Max-pooling layer and a fully connected layer, as shown in Figure 1.18. • The first layer is the convolution layer, where the list of 'filters' such as 'blur', 'sharpen' and 'edge-detection' are all done with a convolution of kernel or filter with the image. • Each feature or pixel of the convolved image is a node in the hidden layer. • Each number in the kernel is a weight, and that weight is the connection between the features, input image and the node of the hidden layer 40
  41. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 1.18. CNN model architecture 41
  42. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques iii. Fuzzy Logic: The concept of fuzzy logic was introduced by Zadeh[31] as a method for representing human knowledge that is imprecise by nature. • The most significant benefit of fuzzy logic is that it provides a practical mechanism for creating non-linear control systems that are difficult to develop and stabilize using conventional methods. • Hence, fuzzy logic is most frequently used in device-based recognition. 42
  43. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques iv. Evolutionary Algorithms: These algorithms are used to solve various optimization problems encountered in real-life applications[34][23]. • The main principle behind these algorithms is to find an appropriate selection for an application by simulating natural selection. • The main principle is that, as nature evolves with the fittest survival, EA aims to find the fittest. • Some commonly used evolutionary algorithms are Genetic algorithm (GA), Particle swarm optimization(PSO), Artificial bee colony optimization(ABC), Firefly optimization algorithm(FA), and Ant colony optimization(ACO). • In sign language recognition, these algorithms are the best used to select the best features. Feature extraction technique such as HOG and PCA extracts several features; this algorithm gives the best uses with these algorithms. 43
  44. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Segmentation: Using skin-colour-based approaches to segment the hand becomes challenging. However, placing in a cluttered and complex background and extracting a hand shape is difficult in computer vision. ii. Similar Gesture: A multiclass classification system categorizes sign language with an extensive vocabulary. Recognizing a similar gesture is a challenging task. iii. Feature Extraction: Feature extraction plays an essential role in ISLR. A feature extraction technique that can extract accurate features with non-redundancy is required to reduce the time complexity of the systems. iv. Dimensionality: Deep learning techniques are the most common technique used for SLRS. These techniques operate the number of filters at each layer, which convolves the image; hence till the final layer, the number of parameters increases by a large number. This led to an increase in the dimensionality of the training model. Although these networks provide higher accuracy than traditional methods, the time complexity and redundancy in the number of features increased. 44
  45. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques v. Hand geometry: A hand has 27 degrees of freedom and is an articulated object (related to the number of joints). The interdependences between the fingers depend on the hand movements and the degrees of freedom. Hand gestures can vary in terms of size, position, and orientation. Therefore, a combination of these parameters needs to be approximated to recognize a hand gesture. vi. Self-occlusion: During the formation of a double hand gesture, some parts of the hand may hide behind the other, and then self- occlusion occurs. It makes the segmentation and detection of hand tasks very difficult. A model robust to self-occlusion is required to tackle this problem. vii. Standardized dataset: Sign language consists of a vocabulary of signs precisely the same way spoken language consists of a vocabulary of words. Sign languages are not standard and universal, and the grammars differ from state to state. Standard sign language is required that can be followed within the country. Keeping this in mind, this work only focuses on ISL. viii. Dynamic gesture recognition: In double-hand recognition systems, one hand might move more quickly than the others. The device has trouble keeping track of the hands moving at various rates. Consequently, a fast recognition system is required. 45
  46. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • To analyze various soft computing-based techniques used for feature extraction and gesture recognization in ISL. • To propose an efficient and effective technique for feature extraction of static gesture in ISL. • To propose a soft computing-based technique for recognition of various gesture used in ISL. • To apply the above proposed techniques on some real-world problem. 46
  47. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • With the recent advancement in machine learning and computational intelligence methods, intelligent systems in sign language recognition continue to attract academic researchers and industrial practitioners' attention. • This study presents a systematic analysis of intelligent systems employed in sign language recognition related studies between 2000 and 2022. • An exhaustive search has been conducted using the Google search engine. All of the technique has been analyzed in terms of accuracy. • Approximately more than 150 articles from the field of gesture recognition have been selected, where main emphasis is on vision based ISLR. 47
  48. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 48 Table 2.1. A summary of existing works in the image acquisition phase S.No. Author Data acquisition method Feature Set Dataset Preprocessing Segmentation Technique 1. [75] Camera Hand gestures and movement of the head are collected as features Self- created Grayscale images with Gaussian Filtering have been used Otsu’s thresholding + DWT + canny edge detection. 2. [76] Camera Different handshapes are collected as features Self-created Grayscale images are used with median filter + Morphological operation + Gaussian filter. Thresholding and Blob + crop + Sobel Edge detector 3. [77] Camera Different handshapes are collected as features Self-created - Sobel edge detector 4. [78] Camera Hand gestures and head positions are collected as features Self-created Grayscale images with Morphological operations + average filter Canny edge + DWT 5. [70] Kinect + camera Different handshapes are collected as features Self-created Skin filter is used for image pre-processing HSV colour space is used for feature extraction 6. [79] Camera Different handshapes are collected as features Publicly available HE and Logarithmic transformation are used for pre-processing CIELAB colour space + canny edge detection are used to extract features
  49. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques S.No. Author Feature extraction technique Gesture’s Advantage Disadvantage Accuracy 1. [138] ED 24-alphabets Less time complexity, recognize double- handed gesture, differentiate skin color Only static images have been used 97% 2. [17] ED 24-alphabets On video sequence, recognize single and double handed gesture Works only in ideal lightening conditions 96.25% 3. [111] FD 15-words Differentiated similar gestures Large dataset 96.15% 4. [113] FD 46-alphabets, numbers, and words Dynamic gestures Dataset of 130,000 is used 92.16% 5. [141] DWT 52-alphabets, numbers, and words Considers dynamic gestures Simple background, large dataset 81.48% 6. [114] DWT 24-alphabets Increase adaptability in background complexity and illumination Less efficient for similar gesture 90% 7. [129] FL 90-alphabets, numbers, and words Invariant to scaling, translation, and rotation Can’t work in real time system 96% 8. [132] ANN 22-alphabets No noise issue, data normalization is easily done - 99.63% 9. [138] ED 24-alphabets Less time complexity, recognize double- handed gesture, differentiate skin color Only static images have been used 97% 49 Table 2.2. Summary of ISL feature extraction techniques work
  50. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques S.No. Author Feature extraction technique Gesture’s Advantage Disadvantage Accuracy 10. [78] Fuzzy + Neural 26-alphabets High recognition rate for single and double handed gestures Accuracy lacks for similar gestures 97.1% 11. [141] SIFT 26-alphabets Works better on illumination and scaled images Processing time is high - 12. [99] Elliptical FD and PCA 59-alphabets, numbers, and words Video sequences are used for recognition, works better in recognition of moving objects Need an improvement in complex background sequences 92.34% 13. [125] HOG 36-alphabets and numbers Robust in light changing condition, Additional hardwired devices are not required for shape extraction Accuracy lacks in complex background images - 14. [142] HOG 36-alphabets and numbers Reduced feature vector, minimum computational time - 92% 15. [93] Adaptive thresholding and SIFT 50-alphabets, numbers, and words Eliminate need of image pre-processing, high accuracy Works only on static images 91.84% 16. [105] HOG + SIFT 26-alphabets and numbers Invariant to illumination, orientation, and occlusion for double handed gestures Accuracy lacks in dynamic gestures 93% 17. [78] Fuzzy + Neural 26-alphabets High recognition rate for single and double handed gestures Accuracy lacks for similar gestures 97.1% 18. [141] SIFT 26-alphabets Works better on illumination and scaled images Processing time is high - 50 Table 2.2. Summary of ISL feature extraction techniques work
  51. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques S.NO Author Feature Extraction Technique Classifier Accuracy Observation 1. [105] Shape descriptors, SIFT + HOG SVM 93% Unable to classify similar gestures 2. [26] SIFT + LDA KNN and SVM 99% SVM achieved better accuracy compared to KNN 3. [154] SIFT, k-mean clustering + BOW SVM and KNN - On large dataset, SVM perform better than KNN 4. [70] Hu Moments + Motion trajectory SVM 97.5% ISL gestures has been classified 5. [142] TOPSIS SVM 99.2%-ASL 92%-ISL Good performance under complex background. 6. [155] AlexNet,VGG16 model SVM 99.82% Computational complexity is high on large dataset. 7. [76] Centroid, area of edge ED 90.19% 26 ASL gesture has been recognized in real time 8. [173] Edge detection, FD + DTW KNN 96.15% ISL dynamic gestures has been recognized 9. [182] DTW KNN 99.23% 13 ISL alphabets recognized 10. [157] ORB, K-means clustering + BOW KNN KNN- 95.81% MLP- 96.96% MLP perform better than KNN on ASL static gestures. 51 Table. 2.3. Summary of classification techniques reviewed work
  52. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 11. [30] EFD ANN with backpropagation 95.10% Four cameras are used for acquisition 12. [78] EFD with PCA ANN 92.34% Better results compared to morphological process. 13. [79] Canny edge detector + (FCC) ANN 96.50% Unable to recognize gestures in lowly illumination 14. [184] HOG ANN with feedforward and backpropagation algorithm 99.0% Two-hand BSL alphabets were recognized 15. [170] Boundary and region feature ANFIS 100%-19 Rules 97.5%- 10 Rules Better performance than the previous model 16. [78] Active contours FIS 96% Better results were achieved compared to other models. 17. [185] GLCM Fuzzy c-means 91% 28 ArSL alphabets were recognized. 18. [35] - Continuous HMM and AdaBoost 92.70% Improved recognition accuracy compared to individual CHMM model. 19. [186] - SVM with Bagged Tree classifier 80% The bagged tree classifier has outperformed the SVM classifier 20. [187] - RF with ANN and SVM 95.48% RF outperforms ANN and SVM 21. [173] - ELM with multiple SVM 98.7% Results conclude that ELM outperforms single classifiers. 22. [30] EFD ANN with backpropagation 95.10% Four cameras are used for acquisition 52 Table. 2.3. Summary of classification techniques reviewed work
  53. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques To attained benchmark performance in this context, the following points are worthy of more attention for research: • Different methods are used for data acquisition [17, 30]. The vision-based acquisition is inexpensive but very much affected by the lighting and background, while device acquisition [70] is done under trainer and is very expensive. • During pre-processing of image objects having same skin color as of background cannot be segmented [76]. Due to high/low intensity of light, poor color of background and inappropriate position of signer will also affect segmentation of gesture [94, 95]. • There are several feature extraction techniques [20, 21, 100, 102] used in ISLR. Although these extraction techniques achieve higher accuracy [96] but with large feature set and processing time. • Mostly existing work focuses on in increasing the accuracy for the recognition of alphabets and numbers [1, 52]. 53
  54. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Although, recognition of words in ISL is a demanding research area. Classifying trained image from testing image is the most crucial part in recognizing system. • Techniques like SVM [70, 109, 150, 154], HMM [39, 146, 168], KNN [31, 76, 165, 167], and ANN [38, 39, 40, 93, 160] are used but accuracy achieved by these systems is 95%. An algorithm for improving accuracy and efficiency is needed. • Even though the above discussed approaches perform effectively in all the applications where they are being used. • The current methodologies have certain limits, either in terms of computing complexity or recognition accuracy, therefore there is still opportunity for the development of new strategies. 54
  55. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 55
  56. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques I. CBIR: Content Based Image Retrieval(CBIR) has been one of the most important research areas in the field of computer vision over the last 20 years. The main idea of CBIR is to analyze image information by low level features of an image[83], which include color, texture, shape, and space relationships of objects etc., and to set up feature vectors of an image as its index. The most common CBIR techniques used for sign language recognition has been discussed below. i. Statistical: Zernike moments requires lower computation time compared to regular moments [84][85]. • In [86] these moments are used for extraction of mutually independent shape information while [87]has used this approach on Tamil scripts to overcome the loss in information redundancy of geometric mean. • Although computation of these feature vector is easy here, but recognition rate is less efficient. These features are invariant to shape and angles but variant to background andillumination. 56
  57. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ii. Shape based: These techniques are based on phenomenon that without any change in shape of image we can extract the accurate features. • [93] determines active finger count by evaluating the ED distance between palm and wrist. As a result, feature vector of Finger projected distance (FPD) and finger base angle (FBA) are computed. But the features selection depends on orientation and rotation angle. • All the features of processed frames are then extracted using Fourier descriptor method. Instead of using pre-processing techniques like filtering and segmentation of hand gesture, methods such as scaling and shifting parameters were extracted based on high low frequency of images up-to 7th level [99]. • These feature extraction techniques lack for large database in terms of accuracy and efficiency [97]. • They also cannot perform well in cluttered background[100] and are variant to illumination changes. 57
  58. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Soft computing is an emerging approach in field of computing that gives a remarkable ability of learning in the atmosphere uncertainty. • In addition to describe appearance and shape of local object within an image HOG is used in [115][116]. [117] works on continuous gesture recognition by storing 15 frames per gesture in database. [118] uses HOG for vision-based gesture recognition. [119] extracts global descriptors ofimage by local histogram feature descriptor (LHFD). • [125] [160][161]develops three novel methods (NN- GA, NN- EA and NN-PSO) for effective recognition of gestures in ISL. The NN has been optimized using GA, EA and PSO. Experimental results conclude NN-PSO approach outperforms the two other methods. [126][162]-[166] uses CNNs for automating construction of pool with similar local region of hand. • [127] applied CNNs to directly extracts images from video. [128][129][130]automatic clustering of all frames for dynamic hand gesture is done by CNNs. Three max-pooling layers, two fully connected layers and one SoftMax layer constitutes the model. 58
  59. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques S.No Author Acquisitio n Method Gesture Type Mode Technique Accuracy Remark 1. Verma and Dev(2009) Camera Both Dynamic FSM - The proposed technique was successfully applied on gestures such as waving left hand, waving right hand, signalling to stop, forward and rewind. 2. Adithya et al.[33] Camera Both Static ANN 91.11% Hand shape is extracted by using digital image processing technique. 3. Kishore et al.[38] Camera Both Dynamic ANN 90.17% The word matching score over multiple instances of training and testing of the neural network resulted in around 90%. 4. Kaluri and Reddy(2017) Camera Both Static GA-NN 90.18% Genetic algorithm has been used to improve the recognition rate. 5. Prasad et al.(2016) Camera Both Dynamic Sugeno fuzzy inference system 92.5% The video dataset of Indian signs contains 80 words and sentence testing. 6. Kishore et al.(2016 Camera Both Dynamic Fuzzy Inference Engine 96% The system achieved better results compared to other models in the same categories 7. Hasan et al.(2017) Camera Single hand Static ANN 96.50% - 8. Fregoso et. al(2021) Camera Both Static PSO-CNN 99.98% Applied optimization algorithms to find the optimal parameters of CNN architecture. 9. Shin et. al.(2021) Camera Both Static Gradient Boost Machine 96.71% The complex shape of the hand could be easily detected 10. Meng and Li(2021) Camera Both Dynamic Graph Convolution Network 98.08% The system achieved better performance and reduce motion blurring, sign variation and finger occlusion. 59
  60. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Fusion of soft computing based and CBIR bases techniques are also employed in literature to have advantages of both techniques. • [81] integrates SURF and Hu momentsto achieve high recognition rate with less time complexity. • [88] embeds SIFT and HOG for robust feature extraction of images in cluttered background and under difficult illumination. • To improve efficiency a multiscale oriented histogram within addition to contour directions is used for feature extraction [133]. • This integration of approaches makes system memory efficient with high recognition rate of 97.1%. • Hybrid approaches develops efficient and effective system, but implementation is complex. 60
  61. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 2.1. Taxonomy of feature extraction techniques 61
  62. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques From the previous study results, a comparative analysis of feature extraction is done further. Based on characteristics, the commonly used feature extraction techniques are divided into three main categories: Scale- based techniques, intensity-based techniques, and hybrid techniques, as shown in Figure 3.1. Figure 3.1. Classification of feature extraction technique 62
  63. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. SIFT: SIFT features are local and robust to brightness, contrast, affine transformation, and noise. The number of octaves and scale depends on the size of the original image. Each octave’s image size is half the previous one. SIFT uses local spectra making it detect and compute key points efficiently as shown in Eq. 3.1. 𝐿 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝐼 𝑥, 𝑦 (3.1) Where 𝐺 is the gaussian blur operator, 𝐿 is the blurred image, 𝐼 is an image, s is the scale parameter and, (𝑥,𝑦) are the location coordinates. 63
  64. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ii. SURF: An improved and fast version of SIFT is SURF. SURF is a fast and robust algorithm for local binary descriptors. SURF uses integral box filtering by using Gaussian kernels at 𝑋 = (𝑥, 𝑦). The Hessian matrix 𝐻 (𝑥, s) in 𝑥 at scale s defined in Eq. 3.2: 𝐻 𝑥, 𝜎 = 𝐿!! 𝑥, 𝜎 𝐿!" 𝑥, 𝜎 𝐿"! 𝑥, 𝜎 𝐿"" 𝑥, 𝜎 (3.2) Where, 𝐿!! 𝑥, 𝜎 is the convolution of the gaussian second order derivative with the image 𝐼 in point 𝑥, and similarly for 𝐿"! 𝑥, 𝜎 and 𝐿"" 𝑥, 𝜎 at point 𝑦. 64
  65. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. FAST: FAST is a corner detection method with great computational efficiency. FAST corner detector is commonly used for real-time processing applications due to its high-speed performance. For every feature point, it stores the 16 pixels around it as a vector and this is done for all the images to get feature vector 𝑝. Every pixel in the image can have the following three states as shown in Eq. 3.3: 𝑆!→# = & 𝑑 𝐼!→# ≤ 𝐼!$% (𝑑𝑎𝑟𝑘𝑒𝑟) 𝑠 𝐼!$% < 𝐼!→# ≤ 𝐼!&% (𝑠𝑖𝑚𝑖𝑙𝑎𝑟) 𝑏 𝐼!&% ≤ 𝐼!→# (𝑏𝑟𝑖𝑔ℎ𝑡𝑒𝑟) (3.3) Where, 𝑆!→# is the state, 𝐼!→# is the intensity of the pixel 𝑥 and 𝑡 is a threshold. Pixel with a lower threshold value than the selected one is discarded from the vector set. 65
  66. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ii. BRIEF: BRIEF is an efficient feature point detector developed using binary strings. BRIEF is very fast both to build and match. BRIEF outperforms the other fast descriptors such as SIFT and SURF in terms of speed and terms of recognition rate. A binary feature vector of the binary test (t) is defined as shown in Eq. 3.4: 𝜏(𝑝, 𝑥, 𝑦) = , 1 ∶ 𝑝 𝑥 < 𝑝(𝑦) 0 ∶ 𝑝 𝑥 ≥ 𝑝(𝑦) (3.4) Where, 𝑝(𝑥) is the intensity of 𝑝 at a point 𝑥. Choosing a set of 𝑛(𝑥, 𝑦)-location pairs uniquely defines a set of binary tests. Where 𝑛 is the length of the binary feature vector and it could be 128, 256, or 512. 66
  67. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. ORB: ORB is an improved version of FAST and BRIEF. ORB is robust to brightness, contrast, rotation, and limited scale. • The main contribution of ORB is the addition of a fast and accurate orientation, also it makes computation efficient due to its oriented BRIEF features. • As BRIEF, ORB also uses local binary spectra, and it uses intensity centroid to detect intensity change. • Unlike FAST and BRIEF, ORB can detect and compute the descriptor and thus is now emerged as the most efficient feature extraction technique in computer vision. 67
  68. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 3.1. Comparative analysis of techniques 68
  69. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Experimental Set up Python 3-jupyter notebook has been used for performing the experiments presented in this chapter. The system's specifications are- Intel® core™1.8 GHz, 8 GB RAM, and 256 caches per core, 3MB cache in total. Graphics with GPU type with VRAM 1536 MB. SIFT, SURF, FAST, BRIEF, and ORB are used as detectors in OpenCV’s environment. As no standard dataset for ISL alphabet gestures is available, so dataset from a GitHub project[18] which consists of 4962 images with more than 200 images per gesture has been used for experiments excluding J and Z as these gestures require motion. The dataset includes images of all typical forms such as different orientations, illumination, occlusion, blurring, intensity, and affine transformation as shown in Figure 3.2. 69
  70. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Match Rate: The match rate is calculated as an error for corresponding pixels in query and training images based on several keypoints matched as shown in Eq. 3.5. 𝑀𝑎𝑡𝑐ℎ 𝑟𝑎𝑡𝑒 % = '()!*+,%- *. %/0+,(1 +203(&'()!*+,%- *. 45(/) +203( 6∗80%9:(1 '()!*+,%- ∗ 100 (3.5) For example, in the case of the intensity scale for FAST, if the keypoints extracted by the training image are 144, while in the query image only 72 keypoints are extracted, and matched keypoints calculated by using the Brute-Force(BF) matcher[2] algorithm is 115, then the match rate can be evaluated as: 𝑀𝑎𝑡𝑐ℎ 𝑟𝑎𝑡𝑒 % = 144 + 72 2 ∗ 115 ∗ 100 = 93.91 70
  71. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Affine Transformation: Table 3.2 of affine transformation concludes that SURF gives the least match rate and takes the highest time. ii. Intensity Scale:Intensity Scaling is defined as the measure to change the colour of images at different scales. Features are detected from images at different intensities. The obtained result for the intensity scale by different techniques are shown in Table 3.3. iii. Orientation:In orientation, the image’s views are rotated concerning each other. Table 3.4 and Figures 3.6, 3.7, and 3.8 show the matching keypoints in training and query images on different rotation angles. iv. Blurring: Table 3.5 shows that BRIEF is the fastest while SURF is the slowest for blurring. v. Illumination: Table 3.6 shows that FAST is the fastest one and SIFT is the slowest one in case of illumination. vi. Occlusion: From Table 3.7, occlusion results state that FAST provides the highest matching rate with the least time. 71
  72. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.2. Matching images at different intensity scales for SIFT, SURF, BRIEF and ORB 72
  73. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.6. Image rotated at 0° angle 73
  74. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.7. Image rotated at 45° angle 74
  75. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.8. Image rotated at 90° angle 75
  76. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 76 Table 3.2. Comparison based on the affine transformation Table 3.3. Comparison based on the intensity scale SIFT SURF FAST BRIEF ORB Match Rate (%) 77.23 70.44 96.62 95.22 96.21 Processing Time(sec) 1.79 2.21 1.37 1.45 1.43 SIFT SURF FAST BRIEF ORB Match Rate (%) 77.47 65.52 93.91 78.67 85.68 Processing Time(sec) 0.33 0.39 0.23 0.24 0.21
  77. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 77 Table 3.4. Comparison based on the orientation Table 3.5. Comparison based on the blurring
  78. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 78 Table 3.7. Comparison based on the occlusion Table 3.6. Comparison based on the illumination
  79. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.9 illustrate the performance of all the techniques for match rate on different parameters. SIFT has performed better for intensity scale, occlusion and blurring while SURF performed the least on all parameters. The performance of BRIEF is degraded in the case of blurring, orientation, occlusion, and intensity scale. However, ORB performs the slightest on blurring. Results conclude that the performance of FAST is superior to all other techniques on all parameters . Figure 3.9. Performance-based on the match rate 79
  80. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 3.10 shows the time taken by all the techniques for all the parameters. Results state that SURF takes much more time than the other three techniques, while ORB takes the least. Figure 3.10. Performance-based on execution time 80
  81. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques After experimenting with all the techniques, it is concluded that they have evolved from one another. From the experimental results the techniques have been categorized into three categories as below: a) Common: Match rate<70%, Execution time >2 sec. b) Good: Match rate> 70 and <89%, Execution time>2 and <1.5 sec. c) Best: Match rate >90%, Execution time <1.5 sec. Table 3.8 shows the comparative analysis of all the techniques based on common, good, and best. Table 3.8. Comparative analysis of all the techniques 81 Technique Processing Time Affine Transformation Intensity Scale Orientation Blurring Illumination Occlusion SIFT Common Good Good Common Common Good Good SURF Common Good Common Common Common Good Common FAST Best Best Best Good Best Good Best BRIEF Best Best Good Good Good Best Good ORB Best Best Good Best Good Best Best
  82. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Feature extraction is essential to ISLR, as the system’s computational efficiency mainly depends on them. • Five feature extraction and detection techniques for vision-based ISLR have been compared in terms of match rate and processing time on original images with deformations. • It can also be observed that the performance of SURF is not suitable for real-time ISLR. Whereas BRIEF and ORB don’t perform good on intensity scale and blurring. • They give the best and fastest response in terms of detection and matching. SIFT has performed better for affine transformation, intensity scale, illumination, and occlusion; however, the performance of SIFT is degraded in orientation and blurring. Further, in terms of processing time, its performance is slower than FAST, BRIEF and ORB. • The FAST provides the most noticeable results for all the variation parameters. But FAST can only detect the features; it can’t compute features, and hence, it delimits the purpose of using it as an adequate algorithm for ISLR. • The limitations of FAST can be overcome by hybridizing it with another algorithm. For the ISLR vision-based system, one needs accurate and fast results in a real-time environment. • Further, an attempt is made to improve the performance of existing SIFT for ISLR. 82
  83. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 83 A Hybrid approach for Feature Extraction for Vision- Based Indian Sign Language Recognition (Objective-2)
  84. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Although existing feature extraction techniques perform exceptionally well in one situation but may underperform in other situations, they are intended to extract specific features from an image. • The FAST technique is able to detects accurate and fast keypoints even in low-resolution images [21][25][26][33]. However, it is not stable to rotation, blurring, and illumination. • However, SIFT has been used for computing features, making analysis very efficient and effective [21][39]. It has also been noticed that SIFT performs well in these conditions but takes more time for feature extraction [10,19]. • However, the incredible results from CNN in image processing and image classification have inspired researchers to apply it to SLR [5][12]. There exist a lot of SLR systems making use of CNN [48][17][20][24]. Furthermore , CNN has good generalization capability but is computationally expensive. 84
  85. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • To overcome the SIFT and FAST limitation, a hybridization of them is done. • Firstly, FAST is used to detect the keypoints, as FAST detects the keypoints speedily and is also able to detect keypoints even in low resolution images. • The detected keypoints are then computed by using SIFT. SIFT is invariant to orientation, and blurring. The SIFT returns the final keypoints after computation. • Further these images are then passed to CNN for further classification. The computation of CNN is now reduced by providing image only with the essential keypoints. In CNN image will be convolved layer after layer to each part of the image, but with this new approach CNN will have only the high intensity pixel. • This will lead to only convolve the part of image where actual gesture is present and other pixels will be considered as non-trainable parameters as they don’t have any value. • The proposed model is named as FiST_CNN. 85
  86. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Figure 4.1 shows the overall architecture of the hybrid approach FiST_CNN for ISL. • It consists of three major phases: data preprocessing, feature extraction, and training and testing of CNN. • In the first phase, the stored static single-handed images are resized to 224*224, and then data augmentation is done on resized images. • Then in the next phase, key points are localized by FAST techniques. Then the value of these localized key points is computed using SIFT in the third phase. • Finally, these values are passed to CNN for training. After this classification of images into various classes is done by CNN. 86
  87. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 87 Figure 4.1. Architecture of FiST_CNN for ISL
  88. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques I. Image Resize and Data Augmentation Firstly, all the images are resized to 224*224 pixels to maintain uniformity in the dataset. After this data augmentation is applied to make the system more robust in terms of image orientations, occlusions and transformation at different angles and lighting conditions. II. Feature Extraction In this phase firstly the key points are localized by using the FAST computer vision technique. To identify a pixel p as an interesting point, Bresenham’s circle of 16 pixels is used as a test mask. Every pixel y in the considered circle may have one of the following three states [14] as shown in Eq. 4.1: 𝑆!→) = 𝑑, 𝐼) ≤ 𝐼! − 𝑇 𝑑𝑎𝑟𝑘𝑒𝑟 𝑏, 𝐼) ≥ 𝐼! + 𝑇 𝑏𝑟𝑖𝑔ℎ𝑡𝑒𝑟 𝑠, 𝐼! − 𝑇 < 𝐼) < 𝐼! + 𝑇 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 (4.1) where, 𝐼) is the intensity value of pixel 𝑦 , 𝐼! is the intensity value of nucleus (pixel p) and 𝑇 is the threshold parameter that controls the number of corner responses. 88
  89. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques The magnitude and direction of the localized points are computed by SIFT using the equation [41]. The vector with localized magnitudes and gradients computed using FiST is passed for training and testing groups using data augmentation. However, the following approach has extracted only the essential keypoints from the image, making other pixel value 0. III. Data Partitioning The FiST_CNN approach has saved the model from the chances of overfitting as, overfitting is generated when the data contains noise. To validate the performance of the FiST_CNN model after data augmentation dataset is also divided in ratio 70:30. 89
  90. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques III. Model Training using CNN Thereafter, the group with training images (𝑇!) is passed into CNN for training. After this various convolution functions and max pooling, functions are performed on 𝑇! using Eq. (4.2 and 4.3) and (4.4) respectively. 𝐼" 𝑥, 𝑦 = 𝐾 ∗ 𝑥 − 𝑚 + 1 ∗ 𝑦 − 𝑚 + 1 (4.2) 𝐼" 𝑥, 𝑦 = 𝐾 ∗ 𝑥#) ∗ (𝑦# (4.3) 𝐼" 𝑥, 𝑦 = 𝐾 ∗ $! % ∗ &! % (4.4) Here 𝐼" 𝑥, 𝑦 Î 𝑇! is the input image from the training set. A kernel (K) with a size of 𝑚, 𝑚 and a stride of 𝑛, 𝑛 is used. After this normalization is performed using the ‘RELU’ function on 𝐼" 𝑥, 𝑦 as 𝐼" 𝑥, 𝑦 = 𝑚𝑎𝑥 0, 𝑥' (4.5) Then this normalized output is flatted into a single vector and fed to the dense layer. A dropout ratio of 0.5 is further added at a fully connected layer to avoid over-fitting. A dense layer with 124 neurons is linked as a fully connected layer. 90
  91. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Leaky Rectifier Linear Unit (Leaky ReLU) is used to introduce the non-linearity of CNN. A categorical cross-entropy is used as the cost function given in Eq. (4.6): 𝐶𝐸 = −𝑙𝑜𝑔 "!" ∑#$% & " !# (4.6) Where 𝑆! is the CNN score for the positive class, 𝐶 is the class and 𝑆; is the class score for each class 𝑗 in 𝐶. The model is then optimized using Adam, which is an adaptive gradient-based optimization method. Probabilities are calculated by using the softmax function at the final layer using Eq. (4.7) 𝑓 𝐶𝑇 $% = " &'(# ∑#$% & " &'(# (4.7) Then the trained model FiST_CNN is saved for the predictions. The saved trained model, after that, has been utilized for the prediction of gestures in the testing group. 91
  92. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 92
  93. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 93
  94. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • For the extensive evaluation of the proposed algorithm four publicly available datasets were used. • The proposed work has been tested on: a) Uniform datasets b) Complex background. • The uniform dataset includes (ISL and MNIST) and dataset with complex background includes Jochen Trisech’s (JTD) and NUS hand posture-II. • Data-augmentation is applied in both uniform and complex datasets. 94
  95. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. MNIST- This dataset contains images for numeric (0 to 9) gestures. It is available at MNIST [53]. It is having 2062 images with 206 images per gesture as shown in Figure 4.2. 95 Figure 4.2. Sample images from MNIST dataset
  96. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ii. ISL - This dataset contains images for alphabet gestures except for J and Z, as they require motion. This dataset has been taken from a GitHub project [54]. It consists of 4962 images with more than 200 images per gesture. Sample images are shown in Figure 4.3. 96 Figure 4.3. Sample images of ISL dataset
  97. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Jochen Trisech (JTD) - This dataset contains static gestures that are collected from 24 subjects in dark, light, and complex backgrounds. Sample images are shown in Figure 4.4. It is available in [35]. These images are already converted in greyscale before applying the proposed approach. This dataset has total of 2127 images with ten different classes. 97 Figure 4.4. Sample images from JTD
  98. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques ii. NUS II- This is another dataset for ISL gestures with complex background. This data set contains images both for training and testing purpose. Gestures in training set were collected from 40 subjects in complex background. It includes 2000 images categorized into 10 different classes. Samples of this dataset is shown in Figure 4.5. On the other hand, test set has 750 images collected from 15 subjects with different lighting conditions. It is available at [44]. 98 Figure 4.5. Sample images from NUS hand posture-II dataset
  99. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques For evaluation of FiST_CNN following performance metrics are considered: i. Accuracy: Accuracy is the number of correct predictions made by the model over all the predictions made. The accuracy of the FiST_CNN is computed based on correct gestures predictions. ii. Confusion Matrix: The confusion matrix here is used to summarize the performance at the classification stage, on a set of validation data whose value is mapped from training data. iii. Computational Time: It is the total processing time of the model computed from image pre-processing to the predictions of the label. 99
  100. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The algorithm has been implemented on Python 3-jupyter notebook, and the simulation is done using Intel® core™, 8 GB RAM and 256 caches per core, 3MB cache in total. • Graphics with GPU type with VRAM 1536 MB. • The dataset is split into two parts training (70%), and testing (30%) as per industry standards. • The main objective of the performance analysis of FiST_CNN is to maximize the accuracy of the model with reduced computation complexity. 100
  101. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 4.7. Accuracy comparison per epochs for alphabet set 101 Figure 4.6. Accuracy comparison of FiST_CNN, CNN and SIFT_CNN
  102. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 102 Figure 4.8. Accuracy comparison per epochs for numbers Table 4.1. Accuracy matrix for FiST_CNN Model accuracy for alphabet Model accuracy for numbers Training Accuracy Validation Accuracy Training Accuracy Validation Accuracy 97.89 95.43 95.68 92.83
  103. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 103 Figure 4.9. Accuracy evaluation for ISL alphabets Figure 4.10. Loss evaluation for ISL alphabets
  104. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 104 Figure 4.12. Loss evaluation for numbers Figure 4.11. Accuracy evaluation for numbers
  105. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 105 Figure 4.13. Time comparison of FiST_CNN, CNN and SIFT_CNN
  106. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 106 Table 4.2. The feature vector for the ISL alphabet Table 4.3. The feature vector for the ISL number
  107. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 107 Figure 4.14 Recognition accuracy on a uniform background Figure 4.15 Recognition accuracy on a complex background
  108. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 108 Table 4.4. Comparative evaluation of FiST_CNN Author SL Technique Dataset Background Accuracy [93] ISL SIFT_CNN 5000 Uniform 92.78% [208] ISL HOG 1300 Uniform 92.20% [181] ISL HOG 780 Uniform <80% [209] ARSL Skin-Blob Tracking 30 Signs Uniform 97% [161] ISL CNN 35000 Uniform 99.72% [38] ISL CNN 52000 Uniform 99.40% [210] ASL Gabor-edge 720 Dark, Light, Complex 86.2% [211] ASL MCT 720 Uniform, Complex 99.2%,89.8% [212] ASL MOGP - Complex 91.4% [213] ISL Krwatchouk 1865 Uniform 97.9% [142] ISL TOPSIS 2600 Complex 92% [22] ASL Fusion (HOG+LBP) 2000 Complex 95.09% [232] ASL Deep learning with CNN 2000 Complex 94% Proposed ISL FiST_CNN 4962 Uniform 95.56%
  109. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 109 Table 4.5. Comparison of work with JTD and NUS-II Dataset Author name/Approach used Classifier Accuracy JTD Trisech et al.[210] Gabor edge filter 86.2% MCT [211] Adaboost 98% MOGP [212] SVM 91.4% LHFD [129] SVM 95.2% Cubic kernel [215] CNN 91 Joshi et al. [142] SVM 92% Kelly et al. [216] SVM 93% X. Y. Wu [217] CNN 98.02% FiST_CNN CNN 94.78% NUS Kaur et al. [213] SVM 92.50% Adithya et al [232] SVM 92.50% Pisharady et al. [207] SVM 94.36% Haile et al. [218] RTDD 90.66% Kumar et al. [219] SVM 94.6% Zhang et al. [22] - 95.07% FiST_CNN CNN 95.56%
  110. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 4.16. Confusion matrix of FiST_CNN for ISL alphabets 110 Figure 4.17. Confusion matrix of FiST_CNN for ISL Numbers
  111. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 4.18. Confusion matrix for NUS hand posture-II dataset 111 Figure 4.19. Confusion matrix for JTD dataset
  112. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Sign Precision Recall F1 score Sign Precision Recall F1 Score A 100 100 100 S 100 100 100 B 100 100 100 T 100 99 99 C 100 100 100 U 100 92 96 D 98 100 99 V 86 100 93 E 100 100 100 W 100 95 97 F 100 100 100 X 100 100 100 G 100 100 100 Y 100 100 100 H 100 100 100 ZERO 98 98 98 I 100 100 100 ONE 98 100 99 K 100 100 100 TWO 98 88 92 L 100 97 98 THREE 100 96 98 M 100 100 100 FOUR 90 94 92 N 99 100 99 FIVE 95 100 97 O 100 100 100 SIX 87 94 90 P 100 100 100 SEVEN 92 87 89 Q 100 100 100 EIGHT 90 96 93 R 100 100 100 NINE 100 96 98 112 Table 4.6. Precision, Recall and F1 score for FiST_CNN(%)
  113. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 113 Table 4.7. Precision, Recall and F1 score for JTD Sign Precision Recall F1 score a 0.64 0.82 0.71 b 0.81 0.68 0.73 c 0.78 0.76 0.76 d 0.80 0.63 0.70 i 0.69 0.60 0.64 l 0.78 0.79 0.78 g 0.58 0.80 0.67 h 0.72 0.69 0.70 v 0.69 0.71 0.69 y 0.92 0.82 0.86 Sign Precision Recall F1 score a 0.77 0.73 0.74 b 0.50 0.66 0.56 c 0.53 0.61 0.56 d 0.63 0.65 0.63 e 0.76 0.50 0.60 f 0.80 0.68 0.73 g 0.69 0.71 0.69 h 0.70 0.76 0.72 i 0.62 0.71 0.66 j 0.73 0.83 0.77 Table 4.8. Precision, Recall and F1 score for NUS-II
  114. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • A hybrid technique FiST_CNN has been developed for the effective and efficient extraction of features for static gestures of ISL. • First, features are detected using FAST, which detects key points rapidly. Further, to compute key points in invariant and distinctive conditions, SIFT is used. Finally, classification is done by using CNN. • The performance of the proposed FiST_CNN has been compared with other techniques CNN and SIFT_CNN [93]. • Results in section 4.4 conclude that FiST_CNN is superior to both CNN and SIFT_CNN [93] compared to accuracy and computation time. FiST_CNN has achieved an accuracy of 97.89%, 95.68%, 94.90% and 95.87% for ISL-alphabets, MNIST, JTD and NUS-II, respectively. • Although the proposed hybrid technique is effective and efficient for feature extraction of ISL gestures, there is still scope for further reduction in the number of features for efficient recognition of various ISL gestures. 114
  115. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 115
  116. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • The analysis of the shape and geometry of the hand provides the essential features of the hand as shown in Figure 5.1. • These methods have shown an impeccable result and giving an elevated recognition accuracy without using any sensor devices. • These methods follow the state of art techniques that is to locate a set of essential key points representing the position of coordinates with the help of some neural network models. • In view of ISLR, there are certain keypoints that need to be kept in consideration before selecting the image pre- processing technique. Extracting hand shape accurately not only enhances the accuracy, but it also reduces the space and time complexity. • The Hand shape can be found out by using pre-processing techniques like hand segmentation, binary hand, hand contour and 3D hand model. • The essential keypoints of hand motion are hand coordinates, motion trajectories, and two hand motion. 116
  117. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 117 Figure. 5.1. Hand Anatomy with 21 keypoints
  118. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • In the proposed work, a three-stage algorithm based on hand anatomy and geometry is proposed. • At first, the palm is detected from the image. • Further, in the second stage, the keypoints are detected on the gesture, using state-of-art techniques. • In the third stage, the prior information of geometrical features and hand kinematics are used to locate the 21 keypoints on hand. 118
  119. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • For further training and classification of gestures, the Neural Network(NN) is used. • The dataset is collected through the camera in a different orientation, scales, and illumination for ISL words belonging to education, medical and other real-life usages . • These steps are repeated for all the training images, and further results are generated over testing images. 119
  120. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 120 Figure 5.2. Flowchart of FiST_HGNN
  121. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • Palm detection: The basic principle is that an object in an image has the same intensity pixels in a particular region. An image is divided into 𝑥 ∗ 𝑥 grids; the value of 𝑥 is calculated based on the image size. This process is iterated till the hand part is extracted, for all the images in the dataset. This stage takes all images pixels as input. The equation can be formulated as: 𝐼! = 𝑃", 𝑓 𝑥 , 𝑥 ∈ 1, … … … , 𝑥 − 1 5.1 Where 𝐼! is the image output of the phase, 𝑃" denotes the pixel value of the image, and 𝑓 𝑥 are the features extracted from the image 𝑥. This phase enables us to infer the part making it easier to detect the keypoints. 121
  122. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques The next step is to detect keypoints and remove the redundant pixels. For each detected point, the sum of the absolute difference between pixels in the center and the contiguous arc is computed as a score function 𝑣. The lower values are ignored when comparing the 𝑣 values of two neighboring keypoints. 𝐼+ 𝑥, 𝑦 = 𝑓 𝑥 , 0, 𝑣(𝑥, 𝑦) ≤ 𝜏 1, 𝑣(𝑥, 𝑦) > 𝜏 5.2 where 𝜏 is the threshold value, the pixels value greater than 𝜏 are selected for further processing, others are discarded. The next step is to localize the detect key points. The blurred image octaves are then created using the Gaussian blur operator. Scale-space function is defined as Eq(5.3)- 𝐿 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝐼+ 𝑥, 𝑦 5.3 122
  123. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Where * denotes the convolution operator, and 𝐺 𝑥, 𝑦, 𝜎 denotes the variable-scale Gaussian. To find the scale-invariant keypoints the Laplacian of Gaussian(LOG) approximations is applied. At this phase, the final output is the image with keypoints located on the hand. The output of this stage can be formulated as: 𝐼+ = = ,!-. ,!/. ||𝑃0 ,! − 𝐿,!||1 (5.4) Where 𝑘2 denotes the keypoints, 𝑃0 ,! denotes the pixel with higher threshold values, whereas 𝐿,! denotes the pixels with lower threshold values. ||𝑃0 ,! − 𝐿,!||1 denotes the normalization of the pixels, with a lower bound of 2. The output of this stage is the image with keypoints located on the higher intensities pixels. 123
  124. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques After detecting keypoints, a two-dimensional array of coordinates, where each coordinate corresponds to one of the keypoints in hand. The location of the 𝑘2 keypoint is denoted by 𝑃,2 = 𝑥,2, 𝑦,2 , and the location of the wrist is denoted as 𝑃. = 𝑥., 𝑦. . To locate the wrist coordinate, the distance between two extreme points (𝑤0 and 𝑤3) is calculated, as shown in the below equation: 𝑤03 = 𝜃 𝑤0 + 𝑤3 2 5.5 Where 𝑤03 is the wrist center point, 𝜃 is the angle associated with the wrist. 𝑤0 and 𝑤3 are the wrist coordinates at 𝑥 and 𝑦 coordinates respectively. 124
  125. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Further, a template is created for each joint using the mentioned by[60]. The distance of each joint is calculated using the formulated equation: 𝐷4 = G 0"!/ 3"!/ ∅ 1 678 0"!,3"! /: when ∅ ≥ 0 5.6 ∅ = 2𝑤1-(𝑥,2- 𝑦,2)1 5.7 Where, 𝑥,2 and 𝑦,2 is the distance in 𝑥 and 𝑦 coordinates, respectively, 𝑥,2 = 𝑚𝑖𝑛 𝐷0/.,+, 𝐷0;.,+ , and 𝑦,2 = 𝑚𝑖𝑛 𝐷3,!/., 𝐷3,!;. , ∅ denotes the angle for the wrist corresponds to 𝑥 and 𝑦 axis. 125
  126. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 126 • Distance between coordinates: 𝑑 𝑐!, 𝑐" = (𝑐#!−𝑐#")" + (𝑐$!−𝑐$")" Ex: 𝑑 𝑐!%, 𝑐"! = 143.013 − 113.916 " + 105.53 − 94.364 " 𝑑 𝑐!%, 𝑐"! = 962 𝑑 𝑐!%, 𝑐"! = 31.01𝑚 • Angle between coordinates: 𝑐𝑜𝑠𝜃#$ = #&$&%#'$'%#($( #& '%#' '%#( ' $& '%$' '%$( ' Ex: Vector 𝑝 = (1,0,1) and ⃗ 𝑞 = (1,1,0) for joint (12,18) 𝑐𝑜𝑠𝜃#$ = (1.1) + (0.1) + (1.0) 1 + 0 + 1 1 + 1 + 0 𝑐𝑜𝑠𝜃#$ = 1 + 0 + 0 2 2 𝑐𝑜𝑠𝜃#$ = 1 2 𝑐𝑜𝑠𝜃#$ = 𝑐𝑜𝑠60 𝜃 = 60° Figure 5.3. Distance and angle between the coordinates
  127. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 127 Coordinate Position Set of joints from the coordinates position Number of values extracted 𝐂𝟏 (c*, c+)(c*, c+)(c*, c,)(c*, c-)(c*, c.)(c*, c/)(c*, c0)(c*, c1) (c*, c2)(c*, c*3)(c*, c**)(c*, c*+)(c*, c*,)(c*, c*-)(c*, c*.) (c*, c*/)(c*, c*0)(c*, c*1)(c*, c*2)(c*, c+3)(c*, c+*) 21 𝐂𝟐 (c+, c,)(c+, c-)(c+, c.)(c+, c/)(c+, c*3)(c+, c*-) 6 𝐂𝟑 (c,, c-)(c,, c.) 2 𝐂𝟒 (c-, c.) 1 𝐂𝟓 (c., c2)(c., c*,)(c., c*0)(c., c+*) 4 𝐂𝟔 (c/, c0)(c/, c1)(c/, c2)(c/, c*3)(c/, c*-) 5 𝐂𝟕 (c0, c1)(c0, c2) 2 𝐂𝟖 (c1, c2) 1 𝐂𝟗 (c2, c*,)(c2, c*0)(c2, c+*) 3 𝐂𝟏𝟎 (c*3, c**)(c*3, c*+)(c*3, c*,)(c*3, c*-) 4 𝐂𝟏𝟏 (c**, c*+)(c**, c*,) 2 𝐂𝟏𝟐 (c*+, c*,) 1 𝐂𝟏𝟑 (c*,, c*0)(c*,, c+*) 2 𝐂𝟏𝟒 (c*-, c*.)(c*-, c*/)(c*-, c*0)(c*-, c*1)(c*-, c*2)(c*-, c+3)(c*-, c+*) 7 𝐂𝟏𝟓 (c*., c*/)(c*., c*0) 2 𝐂𝟏𝟔 (c*/, c*0) 1 𝐂𝟏𝟕 (c*0, c+*) 1 𝐂𝟏𝟖 (c*1, c*2)(c*1, c+3)(c*1, c+*) 3 𝐂𝟏𝟗 (c*2, c+3)(c*2, c+*) 2 𝐂𝟐𝟎 (c+3, c+*) 1 𝐂𝟐𝟏 {()} 0
  128. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 128 Coordinate position Set of angles from the coordinates position Number of values extracted 𝑪𝟏 {(c*, c+) c*, c/ c*, c*3 c*, c*- c*, c*1 c*, c+* c*, c. c*, c2 c*, c*, c*, c*0 } 10*3=30 𝑪𝟐 {(c+, c/)(c+, c0)(c+, c1)(c+, c2)(c+, c**)(c+, c*+)(c+, c*,)(c+, c*.)(c+, c*/)(c+, c*0)(c+, c*1)(c+, c*2)(c+, c+3)(c+, c+*)} 14*3 = 42 𝑪𝟑 {(c,, c/)(c,, c0)(c,, c1)(c,, c2)(c,, c*3)(c,, c**)(c,, c*+)(c,, c*,)(c,, c*-)(c,, c*.) (c,, c*/)(c,, c*0)(c,, c*1)(c,, c*2)(c,, c+3)(c,, c+*)} 14*3 = 42 𝑪𝟒 {(c-, c/)(c-, c0)(c-, c1)(c-, c2)(c-, c*3)(c-, c**)(c-, c*+)(c-, c*,)(c-, c*-)(c-, c*.) (c-, c*/)(c-, c*0)(c-, c*1)(c-, c*2)(c-, c+3)(c-, c+*)} 14*3 = 42 𝑪𝟓 {(c., c/) c., c0 c., c1 c., c2 c., c*3 c., c** c., c*+ c., c*, c., c*- c., c*. c., c*/ c., c*0 c., c*1 c., c*2 c., c+3 (c., c+*)} 14*3 = 42 𝑪𝟔 { c/, c*3 c/, c** c/, c*+ c/, c*, c/, c*- c/, c*. c/, c*/ c/, c*0 c/, c*1 c/, c*2 (c/, c+3)(c/, c+*)} 12*3 = 36 𝑪𝟕 {(c0, c*3)(c0, c**)(c0, c*+)(c0, c*,)(c0, c*-)(c0, c*.)(c0, c*/)(c0, c*0)(c0, c*1)(c0, c*2)(c0, c+3)(c0, c+*) 12*3 = 36 𝑪𝟖 { c1, c*3 c1, c** c1, c*+ c1, c*, c1, c*- c1, c*. c1, c*/ c1, c*0 c1, c*1 (c1, c*2)(c1, c+3)(c1, c+*)} 12*3 = 36
  129. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 129 𝐂𝟗 {(c", c#$)(c", c##)(c", c#%)(c", c#&)(c", c#')(c", c#()(c", c#))(c", c#*)(c", c#+)(c", c#")(c", c%$)(c", c%#)} 12*3 = 36 𝐂𝟏𝟎 { c#$, c#' c#$, c#( c#$, c#) c#$, c#* c#$, c#+ c#$, c#" c#$, c%$ c#$, c%# } 8*3 = 24 𝐂𝟏𝟏 { c##, c#' c##, c#( c##, c#) c##, c#* c##, c#+ c##, c#" c##, c%$ c##, c%# } 8*3 = 24 𝐂𝟏𝟐 { c#%, c#' c#%, c#( c#%, c#) c#%, c#* c#%, c#+ c#%, c#" c#%, c%$ c#%, c%# 8*3 = 24 𝐂𝟏𝟑 { c#&, c#' c#&, c#( c#&, c#) c#&, c#* c#&, c#+ c#&, c#" c#&, c%$ c#&, c%# 8*3 = 24 𝐂𝟏𝟒 { c#', c#+ c#', c#" c#', c%$ c#', c%# } 4*3 = 12 𝐂𝟏𝟓 { c#(, c#+ c#(, c#" c#(, c%$ c#(, c%# } 4*3 = 12 𝐂𝟏𝟔 { c#), c#+ c#), c#" c#), c%$ c#), c%# } 4*3 = 12 𝐂𝟏𝟕 { c#*, c#+ c#*, c#" c#*, c%$ c#*, c%# } 4*3 = 12 𝐂𝟏𝟖 { c#+, c#" c#+, c%$ c#+, c%# }} 3*3 = 9 𝐂𝟏𝟗 { c#", c%$ c#", c%# }} 2*3 = 6 𝐂𝟐𝟎 { c%$, c%# }} 1*3 = 3 𝐂𝟐𝟏 {()}} 0
  130. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 130 Contd.
  131. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 131 Contd.
  132. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques A set of labelled training samples is provided, assuming that in the feature space, these features will cluster around multiple centers. The FiST_HGNN is a feedforward with three hidden layers. It has sixty-three neurons at the input layer, (128, 64, 32, and 16) neuron at first, second, third and fourth hidden layer, respectively. Further twenty-five in the output layer as shown in Figure 5.6. Sixty- three neurons in the input layer corresponds to the vector 𝑘2 for each gesture 𝑔< category. 132 Figure 5.6. Architecture of NN
  133. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 133
  134. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 134
  135. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 135 Prediction graph of FiST_HGNN
  136. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • To validate the performance of the proposed model, two types of datasets are considered. • The first category consists of isolated single letter gestures, and in the second category, we considered ISL word gestures with both uniform and complex backgrounds. • Isolated hand-letter signs, gestures representing alphabets and digits from the ISL[22][23], and NUS-II [25] datasets are considered. • In the ISL two hand gestures, ISL alphabets and digits and words[26] gestures category have been considered. • A detailed description of the used dataset is given in Table 5.3. 136
  137. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 5.3. Description of dataset 137
  138. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques I. Performance metrics: The performance of the proposed model (FiST_HGNN) is evaluated based on accuracy and time taken by the model. Accuracy gives us the percentage of all correct predictions out of total predictions made as follows. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = =>/=? =>/@>/=?/@? (5.6) Time taken by the model is the total time taken for training and testing the gestures. 138
  139. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 139 Ø FIST_HGNN over ISL- alphabet FiST_CNN-2.83% PointBased+Fullhand- 2.58% Ø FIST_HGNN over numbers- FiST_CNN-6.45% PointBased+Fullhand-3.34% 97.58 95.68 94.75 89.23 95 92.34 0 10 20 30 40 50 60 70 80 90 100 ISL-alphabet Number Accuracy(%) Dataset Accuracy on ISL isolated gesture FiST_HGNN FiST_CNN PointBased+Fullha nd Figure 5.9. Accuracy comparison on isolated single-letter gestures
  140. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 5.10. Accuracy comparison on ISL word gestures 140 Ø FIST_HGNN Improvement over Sign_word- FiST_CNN-4.88% PointBased+Fullhand- 2.69% Ø Over ISL alphabets and numbers- FiST_CNN-2.97% PointBased+Fullhand-2% Ø Over Ankita Wadhwan- FiST_CNN-2.91% PointBased+Fullhand-1.67% 98.88 98 97.14 94 95.03 94.23 96.19 96 95.47 0 10 20 30 40 50 60 70 80 90 100 Sign_word_Dataset ISL Alphabets and Numbers Ankita_wadhwan Accuracy(%) Dataset Accuracy on ISL word gesture FiST_HGNN FiST_CNN PointBased+Fullh and
  141. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 141 Table 5.4. Time comparison on different datasets. Dataset Time(Second) FiST_HGNN FiST_CNN [90] PointBased + Fullhand [221] Isolated single letter ISL alphabets [226] and digits [205] 3217.43 4628.34 3924.45 NUS-II [207] 778.54 889.56 990.76 ISL Word gestures ISL alphabets and numbers [227] 2287.67 2987.90 3130.87 Sign-Word [228] 3278.65 5678.89 3657.34 Ankita Wadhwan [161] 1543.22 3189.43 2164.59
  142. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 142 Table 5.5. Accuracy comparison with other approaches Dataset Author name/ Approach used Classifier Accuracy (%) NUS-II [207] Kaur et al. [213] SVM 92.50 Adithya et al. [232] SVM 92.5 Pisharady et al. [207] SVM 94.36 Kumar et al.[219] SVM 94.6 FiST_HGNN NN 95.78 ISL alphabets [226] and digits [205] Ansari and Harit [225] NN 63.78 Kaur et al. [213] SVM 90 Joshi et al. [142] SVM 93.4 Rekha et al. [151] SVM 91.3 Rao et al. [92] ANN 90 FiST_HGNN NN 97.58
  143. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 143 Figure 5.11. Accuracy comparison of different classifiers on isolated single letter dataset . 95.15 92.68 94.19 97.58 0 10 20 30 40 50 60 70 80 90 100 SVM MLP KNN NN Accuracy(%) Classifier Accuracy comparison on isolated gesture SVM MLP KNN NN
  144. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 144 97.13 94.68 93.19 98 0 10 20 30 40 50 60 70 80 90 100 SVM MLP KNN NN Accuracy(%) Classifier Accuracy comparison on double hand gesture SVM MLP KNN NN Figure 5.12. Accuracy comparison of different classifiers on double- handed ISL alphabets and numbers
  145. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 145 Figure 5.13. Confusion matrix on isolated alphabets and numbers
  146. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 5.14. Confusion matrix on NUS-II 146 Figure 5.15. Confusion matrix on sign word dataset
  147. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 5.16. Confusion matrix on double-handed alphabets and numbers 147 Contd.
  148. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 5.6. Precision, Recall and F1 Score for ISL alphabets and numbers Sign Precision Recall F1 score 0 0.86 0.96 0.91 1 0.87 0.87 0.87 2 1.00 0.92 0.97 3 1.00 1.00 1.00 4 0.96 0.92 0.94 5 1.00 1.00 1.00 6 0.83 0.83 0.83 7 0.90 0.90 0.90 8 1.00 0.83 0.91 9 0.97 0.94 0.96 A 0.98 1.00 0.99 B 1.00 1.00 1.00 C 0.97 0.97 0.97 D 0.93 0.96 0.95 E 1.00 1.00 1.00 F 1.00 1.00 1.00 G 1.00 1.00 1.00 148 Sign Precision Recall F1 score H 1.00 1.00 1.00 I 0.95 1.00 0.97 K 0.96 0.98 0.97 L 1.00 0.97 0.99 M 1.00 1.00 1.00 N 1.00 1.00 1.00 O 0.97 0.94 0.96 P 1.00 0.98 0.99 Q 1.00 1.00 1.00 R 0.95 0.97 0.96 S 1.00 0.98 0.99 T 1.00 1.00 1.00 U 1.00 1.00 1.00 V 1.00 1.00 1.00 W 0.93 1.00 0.96 X 0.97 1.00 0.99 Y 1.00 1.00 1.00
  149. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 5.7. Precision, Recall and F1 Score for sign word dataset Sign Precision Recall F1 score Call 1.00 1.00 1.00 Close 1.00 0.99 1.00 Cold 1.00 1.00 1.00 Correct 1.00 1.00 1.00 fine 0.99 1.00 1.00 Help 1.00 1.00 1.00 Home 1.00 1.00 1.00 ILoveYou 1.00 1.00 1.00 Like 1.00 1.00 1.00 Love 1.00 1.00 1.00 No 1.00 1.00 1.00 Okk 1.00 1.00 1.00 Please 1.00 1.00 1.00 Single 1.00 1.00 1.00 Sit 0.99 1.00 1.00 Tall 1.00 1.00 1.00 Wash 1.00 0.99 0.99 Work 1.00 1.00 1.00 Yes 0.99 1.00 1.00 You 1.00 1.00 1.00 149
  150. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 5.9. Precision, Recall and F1 Score for NUS-II Sign Precision Recall F1 score a 1.00 1.00 1.00 b 0.98 1.00 0.99 c 0.97 0.97 0.97 d 1.00 1.00 1.00 i 1.00 0.95 0.97 l 1.00 0.97 0.98 g 1.00 1.00 1.00 h 0.95 0.97 0.96 v 0.93 0.95 0.94 y 1.00 1.00 1.00 150
  151. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques • A hand anatomy-based technique for recognising ISL gestures has been proposed. The FiST_HGNN uses hybridization of the previous FiST with hand geometry to recognise ISL gestures. • The FiST technique provides the rapid detection of keypoints and generates a feature vector of 128 keypoints. Relevant twenty-one keypoints from this feature vector are selected using hand geometry. • Multilayer feedforward NN is used as a classifier. The FiST_HGNN was tested on two ISL datasets (isolated single letters and words). • The FiST_HGNN achieves an accuracy of 98.90% for isolated gestures and 97.58% for word gestures and outperforms the other approaches in the literature. • However, FiST_HGNN has been tested on a few real-world gestures. So, in the next chapter, the FiST_HGNN will be applied to some functional gestures commonly used in real-world problems by the deaf and mute community. 151 Contd.
  152. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 152
  153. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 153 6.1.1 Self-Created Dataset The dataset contains the RGB images of hand gestures of twenty static ISL words, namely, ‘afraid’, ‘agree’, ‘assistance’, ‘bad’, ‘become’, ‘college’, ‘doctor’, ‘from’, ‘pain’, ‘pray’, ‘secondary’, ‘skin’, ‘small’, ‘specific’, ‘today’, ‘stand’, ‘warn’, ‘which’, ‘work’, ‘you’, which are commonly used to convey messages or seek support during real-life usage [229]. The captured gestures are chosen from the ISL dictionary [58]. The images were gathered from 8 individuals, comprising 6 males and 2 females, with ages varying from 9 years to 30 years. Nine hundred images are captured for each gesture, so a total of 18000 images are collected.
  154. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 154 Figure 6.1.Categorization of the self-created dataset
  155. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 155 Figure 6.2. Sample images of the self-created dataset
  156. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Subject Computer Vision and Pattern Recognition(CVPR) Specific subject type SL recognition Type of data Images(200*200 pixels JPG format) Data acquisition method The images in this dataset are captured by asking participants to stand comfortably in front of the wall. Images are captured by using a smart camera (iPhone XI) Data format Labelled RGB Images Data collection parameter All the images are captured with a plain background. The data collection comprised both male and female volunteers with a range of hand sizes. Images are collected in an indoor environment with normal lighting conditions. To make the gesture displays as genuine as possible, no limitations on the pace of hand motions have been enforced. Data Source location BLOOM Speech and hearing clinic, Dehradun, Uttarakhand, India. Data accessibility Data can be accessed through the Mendeley link.(“ Tyagi, Akansha; Bansal, Sandhya (2022), “Indian sign Language-Real-life Words”, Mendeley Data, V2, doi:10.17632/s6kgb6r3ss.2”) 156 Table 6.1. Specification of the self-created dataset
  157. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Folder File Name Description Afraid afraid_1_user1_1 to afraid_900_user6_150. Nine hundred samples of the ‘Afraid’ gesture were taken from six users. Agree agree_1_user1_1 to agree_900_user6_150. Nine hundred samples of the ‘Agree’ gesture were taken from six users. Assistance assistance_1_user1_1 to assistance_900_user6_150. Nine hundred samples of the ‘Assistance’ gestures were taken from six users. 157 Table 6.2. Organization of the images in the self-created dataset
  158. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Bt00trl SPEEGH AII,D HEARII{G GLIilIC Anjati Sufiramanium Audiologist, Speech Language Pathologist ABACertified (RBT) RClRegistration CRR No. :A6O779 CLINICAL SERVICES: . Speech & Longuoge Theropy . Voice Disorders . NeurologicqlDisorders . Flueng Disorders . Autism Speckum Disorders . Articulotion/Uncleor Speech . Leorning Disobility . Heoring Test . Heoring Aid Triol oate: .181 t..l.22 '":' To, The whomever concern, This is to certifythat Ms. Akansha, a Ph. D. research scholar has collected a dataset from our clinic. They are the Hard of Hearing patients, and we don't have any obligations regarding the dataset collection. This letter can be considered as the No objection certificate. she can use the dataset in her research publication. Date: lB/f ln E}LOOrla SPEECH A HEARIHG GLITIIIC llE, Preet Vihar, phase-2 lndra Gandhi Marg, NlranJanpur, t)ehra Dun 118, PREETVIHAR, PHASE-2, INDRAGANDHI MARG, NIRANJANPUR, DEHRADUN (U.K.) f,r^L . Ot-tETAIaOAOO I -^i^l:^..L-----:..- ^^a---!r -^- 158 Figure. Authorization letter for data collection
  159. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 159 Figure 6.3. Categorization of the gestures for existing dataset
  160. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques i. Medical: The gestures such as, ‘Elbow’, ’Help’, ’Skin’, ‘Call’, ‘Doctor’, ‘Hot’, ‘Lose’, ‘Pain’, ‘Leprosy’, ‘Tobacco’, ‘Keep’, ‘Assistance’, ‘Beside’, ‘Glove’, ‘Sample’ are mainly used in medical field. The sample gesture are shown in Figure 6.4. ii. Measurement: The gestures such as: ‘High’, ‘How_Many’, ‘Thick’, ‘Thin’, ‘Density’, ‘Measure’, ‘Quantity’, ‘Few’, ‘Size’, ‘Unit’, ‘Little’, ‘Small’, ‘Weight’, ‘Gram’, ‘Short’. A sample images from the dataset has been shown Figure 6.5. Figure 6.5. Sample images for measurement 160 Figure 6.4. Sample images for medical
  161. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 161
  162. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 6.12. Accuracy comparison on ISL daily-life words 162 Ø FIST_HGNN Improvement by- CNN-4.55% FiST_CNN-9% 94.23 89.78 98.78 0 10 20 30 40 50 60 70 80 90 100 CNN FiST_CNN FiST_HGNN Accuracy(%) Techniques Accuracy comparison on ISL daily-life words dataset[229]
  163. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 6.13. Accuracy comparison on different techniques for individual gesture from ISL daily-life words dataset 163
  164. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 6.14. Accuracy comparison of different classifiers on ISL daily-life words 164 0 10 20 30 40 50 60 70 80 90 100 SVM MLP KNN NN Accuracy(%) Classifier Accuracy comparison with different classifier SVM MLP KNN NN
  165. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Table 6.3. Comparison based on the number of features for self-created dataset 165 Techniques No. of Images Image Size Time(sec) Parameters Trainable Parameters Non-trainable Parameters CNN 18000 200*200 4789.9 72 * 107 72 * 107 0 FiST_CNN[90] 18000 200*200 4278.32 72 * 107 23* 106 71*106 FiST_HGNN 18000 200*200 3789.75 11* 105 11* 105 0
  166. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques Figure 6.15. Confusion matrix on self-created ISL daily-life words
  167. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques S.No Sign Recall Precision F1-Score Correctly identified gestures Total gestures 1. Afraid 0.99 0.98 0.98 171 172 2. Agree 0.99 1.00 0.99 173 173 3. Assistance 1.00 0.98 0.99 123 128 4. Bad 0.99 0.99 0.99 177 178 5. Become 0.99 1.00 0.99 152 154 6. College 0.99 0.98 0.98 167 167 7. Doctor 0.95 0.95 0.95 99 107 8. From 0.98 0.97 0.97 135 138 9. Pain 0.98 0.99 0.98 151 156 10. Pray 0.99 0.97 0.98 67 68 11. Secondary 0.99 0.97 0.98 176 177 12. Skin 0.97 0.98 0.97 171 172 13. Small 1.00 0.98 0.99 143 143 14. Specific 1.00 1.00 1.00 167 170 15. Stand 0.97 0.96 0.97 147 148 16. Today 0.98 1.00 0.99 172 175 17. Warn 0.97 0.99 0.98 172 177 18. Which 0.98 0.98 0.98 172 174 19. Work 0.99 0.98 0.99 186 186 20. You 0.97 1.00 0.98 160 160 Table 6.4. Precision, Recall and F1 Score for self-created dataset
  168. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 168
  169. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 169 Figure 6.16 Accuracy for general purpose gesture Ø FIST_HGNN Improvement by- SIFT_VFH & SIFT_NN-6.87% FiST_CNN-3.32% 90.68 94.73 97.55 0 10 20 30 40 50 60 70 80 90 100 SIFT_VFH & SIFT_NN FiST_CNN FiST_HGNN Accuracy(%) Technique Accuracy comparison on dataset[230]
  170. An Efficient System for Vision-Based Recognition of Indian Sign Language Using Soft Computing Techniques 92.78 95.53 98.85 0 10 20 30 40 50 60 70 80 90 100 SIFT_CNN FiST_CNN FiST_HGNN Accuracy(%) Technique Accuracy comparison on dataset [231] 170 Figure 6.17. Accuracy for family and relatives gestures Ø FIST_HGNN Improvement by- SIFT_CNN-6.07% FiST_CNN-3.32%
Anzeige