SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
DOI : 10.5121/ijist.2016.6222 203
ACHIEVING SECURITY VIA SPEECH
RECOGNITION
Swati Patel and Nita Lokwani
Smt. Chandaben Mohanbhai Patel Institute of Computer Applications,
Charusat University, Changa
ABSTRACT
Speech is one of the essential sources of the conversation between human beings. We as humans speak and
listen to each other in human-human interface. People have tried to develop systems that can listen and
prepare a speech as persons do so naturally. This paper presents a brief survey on Speech recognition,
allow people to compose documents and control their computers with their voice. In other words, the
process of enabling a machine (like a computer) to identify and respond to the sounds produced in human
speech. ASR can be treated as the independent, computer-driven script of spoken language into readable
text in real time. The Speech Recognition system requires careful attention to the following issues: Meaning
of various types of speeches, speech representation, feature extraction techniques, speech classifiers, and
database and performance evaluation. This paper helps in understanding the technique along with their
pros and cons. A comparative study of different technique is done as per stages.
KEYWORDS
Automatic speech recognition (ASR), Analysis, feature extraction, Modeling, Testing, speech processing,
Human Computer Interaction (HCI).
1. INTRODUCTION
Speech recognition is a process in which computer (or other type of machine) identifies spoken
words and performing required task. The task is to understand spoken words, react appropriately
and then convert into text. Sometime it is also known as speech to text (STT). Basically, it is
correct recognize of what you are saying. It is mostly used when someone’s hands and eyes are
busy. The speech recognition system consists a microphone in which person can speak; speech
recognition software to interpret the speech; a good quality sound card for input and/or output;
and a proper pronunciation.
2. SPEECH RECOGNITION BASICS
Speech recognition automatically extracts the string of words spoken from the speech signal.
The following factors are the basically used for understanding speech recognition technology.
Utterance
An utterance is the pronunciation (speaking) of either a word or words that represent a single
meaning to the computer. It can be word, words, sentence, or even multiple sentences.
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
204
Speaker dependent & speaker independent systems
Speaker dependent systems design speech patterns for a specific speaker. Generally they are more
perfect for the correct speaker, but inaccurate for other speakers. They accept the speaker will
speak in a steady and consistent voice and tempo. Speaker independent systems are constructed
from a wide range of speakers. Adaptive systems usually start with speaker independent systems
and apply training techniques to adapt to the speaker to increase their recognition accuracy.
Vocabularies
It is a collection of words or statements that can be recognized by the SR system. Generally, for a
computer, it is very easy to identify smaller vocabularies, but very difficult to identify larger
vocabularies. In normal dictionaries, it doesn't have only a single word, but it can be as long as a
sentence or two. Smaller vocabularies can remember only one or two words like “Stand up", but
large vocabularies can have a hundred thousand or more!
Accuract
The skill of a recognizer can be studied by measuring its accuracy - or how well it remembers the
words. This includes not only correctly identifying a word, but also identifying if the spoken a
word is not in its vocabulary. Good ASR systems have more than 98% correct. The satisfactory
accuracy of a system really based on the application.
Training
Some speech recognizers have the skill to adapt to a speaker. When the system has this skill, it
may allow training to take place. An ASR system is trained by having the person who has repeat
standard or common phrases and adjusting its comparison algorithms to match that particular
person. The accuracy of a recognizer usually improves with training.
Persons who have difficulty in speaking, or pronouncing certain of the words they can also use
training. As long as the person can consistently repeat a word, ASR systems with training should
be able to adapt.
3. TYPES OF SPEECH RECOGNITION
Speech recognition systems can be divided in various different classes by describing what types
of words they have the ability to remember. ASR is the ability to check when a person starts and
completes a word. Most packages can fit into more than one class, based on which mode they're
using.
Isolated Words
Isolated word required each word to have silent (lack of an audio signal) on both ends. It doesn't
mean that it accepts single words, but it does require a single word at a time. Often, these systems
have "Listen/Not-Listen" states, where they require the person to wait between words (usually
doing processing during the pauses). The isolated word might be a good name for this class.
Connected Words
Connected words (or more correctly 'connected utterances') are similar to isolated words, but
allow separate words to be run simultaneously with a minimum pause between them.
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
205
Continuous Speech
Processing with continuous speech are some of the most difficult to design because they must
process with the special methods to determine word limits. Continuous speech allows users to
speak almost naturally, while the computer defines the content. Basically, that particular content
itself the words which are dictated by Computer.
Spontaneous Speech
There appears to be a various definitions for what spontaneous speech in reality is. At a basic
point, it can be thought of as speech that is natural sounding and not planned. An ASR system
with spontaneous speech ability to cover types of natural speech comprises with the words being
run together.
Voice Verification/Identification
Some ASR systems are used for identifying specific users. This document doesn't cover the
particular verified or secure systems.
4. WORKING OF SPEECH RECOGNITION SYSTEM
Figure 1. Process of Speech Recognition
Basically, the conversion of the voice to an analog signal of microphone and that process takes
the input from digital stage. Input of the system from the user is known as utterance (Spoken
input from the user of a speech system. An utterance may be a one word, a sentence, an entire
phrase, or even several sentences.) This is the representation of the binary form of 1s and 0s that
make up programming languages used by the computer. Any other kind of sound is not heard by
the computers.
Sound-recognition system has acoustic models (An acoustic model is created by taking audio
demos of speech, and their text records, and using software to create numerical representations of
the sounds that make up each utterance. It is used by a speech recognition engine to remember
speech) convert the audio sounds to one of about four dozen basic speech components (called
phonemes). The latest versions of speech technology have been derived so that they eliminate the
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
206
extra noise and not used information that is not needed to let the computer work. The words we
speak are changed into digital forms of the basic speech components (phonemes).
Once this is complete, a second step of the software begins to work. The text is compared to the
digital dictionary that is stored in computer internal storage. This is a very vast collection of
words, usually more than 100,000. When it compares and identify a match based on the digital
form it displays the words on the display. This is the simple and basic process for all speech
recognition systems.
Figure 2. Working of Speech Recognition
To convert speech to text or a computer command, a computer has to go with the several complex
tasks. When you speak, you create vibrations in the air. The analog-to-digital converter (ADC)
converts this analog wave to the digital data that can be understood by computer. To do this
digitizes the sound by taking small measurements of the wave at the regular time. The system
filters the digitized sound to remove unwanted noise, and sometimes to separate it into various
bands of frequency (frequency is the wavelength of the sound waves, heard by humans as
differences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. It may
also have to be temporally aligned. The speed of the speech has been always in variable form, so
the sound must be adjusted to match the speed of the format sound that is already available in the
system's memory.
Next the signal is divided into small segments as short as a few hundreds of a second, or even
thousandths in the case of passive consonant sounds -- consonant stops produced by obstructing
airflow in the vocal tract -- like "p" or "t." The program then matches these segments to known
phonemes in the appropriate language. A phoneme is the smallest element of a language -- a
representation of the sounds we make and put together to form meaningful expressions. There are
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
207
roughly 40 phonemes in the English language (different linguists have different opinions on the
exact number), while other languages have more or fewer phonemes.
The next step seems simple, but it is actually the most difficult to accomplish and is the focus of
most speech recognition research. The program examines phonemes in the context of the other
phonemes around them. It runs the contextual phoneme plot through a complex statistical model
and compares them to a large library of known words, phrases and sentences. The program then
determines what the user was probably saying and either outputs it as text or issues a computer
command.
5. APPROACHES TO SPEECH RECOGNITION
ď‚· Acoustic phonetic approach
ď‚· Pattern Recognition approach
ď‚· Artificial intelligence approach.
5.1 Acoustic Phonetic Approach
It is also called as rule-based approach. It is use knowledge of phonetics & linguistics to guide
the search process. This approach generally uses some principles which are defined expressing
everything or anything that might help to decrypt based in “blackboard” architecture, i.e. At each
decision point it lays out the possibilities and use rules to determine which orders are acceptable.
It has poor performance due to difficulty to express rules, to improve the organization. This
approach recognizes individual phonemes, words, sentence structure and/or significance.
5.2 Pattern Recognition Approach
This method is divided in two steps, i.e. training of speech patterns and recognition of pattern by
way of pattern comparison. In the parameter measurement phase (filter bank, LFC, DFT), a
sequence of measurements is developed based on the input signal to define the “test pattern”.
The unknown test pattern is then compared with each sound reference pattern and a measure of
comprise between the examination pattern & reference pattern. Best matches the unknown test
pattern based on the matching of the pattern classification phase (dynamic time warping).
5.2.1 Template-based approach
This approach provides a family of techniques that have advanced the field considerably during
the last six decades. A collection of prototypical speech patterns is stored as reference patterns
representing the dictionary of applicant’s words. Recognition is then carried out by matching an
unknown spoken utterance with each reference template and choosing the category of the best
matching pattern. Generally templates are created for entire words. This has the advantage that,
errors due to segmentation or classification of smaller acoustically more variable units such as
phonemes can be avoided.
5.2.2 Statistics-based approach
Stochastic modelling entails the use of probabilistic models to deal with undefined or incomplete
information. In speech recognition, many sources like, confusable sounds, speaker variability’s,
contextual effects, and homophone words are affected for uncertainty and incompleteness. In
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
208
today the most popular stochastic approach is hidden Markov modelling. A hidden Markov
model is qualified by a finite state Markov model and a set of output distributions. The transition
parameters in the Markov chain models, temporal variabilities, while the parameters in the output
distribution model, spectral variabilities. These two types of variables are the effect of speech
recognition.
5.3 Artificial Intelligence Recognition Approach
This approach is a combination of acoustic phonetic approach and pattern recognition approach.
In this, it exploits the concepts of acoustic phonetics and pattern recognition methods. The
information regarding linguistic, phonetic and spectrogram used by Knowledge based approach.
Some speech researchers developed recognition system that used acoustic phonetic knowledge to
improve classification rules for speech sounds. While template based approaches have been real
in the design of a variety of speech recognition systems; they provided little insight about human
speech processing, thereby creating error analysis and knowledge-based system enhancement
difficult. On the other hand, a large body of linguistic and phonetic literature provided insights
and understanding of human spoken language processing. In its complete form, the knowledge
engineering plan involves the direct and explicit incorporation of expert’s speech knowledge into
a recognition system. This knowledge is normally derived from careful study of spectrograms
and is incorporated using guidelines or procedures. Pure knowledge engineering was also
motivated by the interest and research in expert systems.
6. CHALLENGES AND DIFFICULTIES OF SR
Speech Recognition is still a very cumbersome problem. Following are the problems….
ď‚· Speaker Variability
Speaker or more than one speaker may be pronounced the same word in a different way.
ď‚· Channel Variability
The position and quality of the microphone and background environment may be
affecting in output
7. APPLICATIONS OF SPEECH RECOGNITION
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
209
Applications of Speech Recognition
Speech recognition applications include
ď‚· Voice dialling (e.g., "Call home"),
ď‚· Call routing (e.g., "I would like to make a collect call"),
ď‚· Simple data entry (e.g., entering a credit card number),
ď‚· Preparation of structured documents (e.g., A radiology report),
ď‚· Speech-to-text processing (e.g., word processors or emails), and
ď‚· In aircraft cockpits (usually termed Direct Voice Input).
8. PROS AND CONS OF SPEECH RECOGNITION
PROS:
 There is no need to type or write text, and generally it is quicker than “typing” &
“handwriting”.
ď‚· Allows for better spelling, whether it is in text or documents. It is very useful for mental
or physical disability person.
CONS:
ď‚· No program is 100% perfect
ď‚· Factors like slang, homonyms, signal-to-noise ratio, and overlapping speech are affecting
the accuracy of speech recognition.
ď‚· Can be expensive depending on the program
9. CONCLUSIONS
Speech is the primary, and the most appropriate means of communication between human being.
Whether due to technological curiosity to make machines that mimic humans or desire to
automate work with machines, research in speech and speaker identification. This paper
introduces the basics of speech recognition technology and also highlights the difference between
different speech recognition systems. In this paper the most common algorithms which are used
to do speech recognition are also discussed along with the current and its future use. Speech
recognition is one of the most integrating areas of machine intelligence, since, humans do a daily
activity of speech recognition.
10. ACKNOWLEDGEMENT
We are very thankful to our institute CMPICA (Smt. Chandaben Mohanbhai Patel Institute of
Computer Applications), CHARUSAT for providing necessary infrastructure for development of
the research work.
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016
210
REFERENCES
[1] M.A.Anusuya, S.K.Katti “Speech Recognition by Machine: A Review” International Journal of
Computer Science and Information Security
[2] Preeti Saini, Parneet Kaur “Automatic Speech Recognition: A Review” International Journal of
Engineering Trends and Technology.
[3] Santosh k.Gaikwad, Bharti W.Gawali, Pravin Yannawar “A Review on Speech Recognition
Technique” International Journal of Computer Applications.
[4] Parwinder pal Singh, Er. Bhupinder Singh “Speech Recognition as Emerging Revolutionary
Technology, “International Journal of Advanced Research in Computer Science and Software
Engineering.
[5] http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO/introduction.html
[6] Shipra J. Arora, Rishi Pal Singh “Automatic Speech Recognition: A Review”, International Journal of
Computer Applications.
[7] Wiqas Ghai and Navdeep Singh,“Literature Review on Automatic Speech Recognition”,International
Journal of Computer Applications vol. 41– no.8, pp. 42-50, March 2012.
[8] Nnamdi Okomba S., 2Adegboye Mutiu Adesina, and 3Candidus O. Okwor., “Survey of Technical
Progress in Speech Recognition by Machine over Few Years of Research”, IOSR Journal of
Electronics and Communication Engineering
[9] http://www.computerhope.com/jargon/v/voicreco.htm
[10] https://en.wikipedia.org/wiki/Speech_recognition
[11] https://en.wikipedia.org/wiki/Acoustic_model
AUTHORS
Swati Patel received her M.C.A. & B.C.A. degrees from Dharmsinh Desai University,
Nadiad, Gujrat, India in 2011 & 2009 respectively. She is presently working as an
Assistant Professor in Smt. Chandaben Mohanbhai Patel Institute of Computer
Applications, Changa. Her research area includes HCI and Pattern Recognition.
Nita Lokwani received her M.C.A. degree from Smt. Chandaben Mohanbhai Patel
Institute of Computer Applications, Charusat University, Changa, Gujrat, India in 2012 &
B.C.A. degree from Dharmsinh Desai University, Nadiad, Gujrat, India in 2009. She is
presently working as an Assistant Professor in Smt. Chandaben Mohanbhai Patel Institute
of Computer Applications, Changa. Her research area includes Human Computer
Interaction and working on Embedded Systems.

Weitere ähnliche Inhalte

Ă„hnlich wie ACHIEVING SECURITY VIA SPEECH RECOGNITION

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition systemRipal Ranpara
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specificationijtsrd
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionRHIMRJ Journal
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptxJyothiMedisetty2
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandMohammad Liton Hossain
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingScott Faria
 
SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)dineshkatta4
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptxSubramanian Mani
 

Ă„hnlich wie ACHIEVING SECURITY VIA SPEECH RECOGNITION (20)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
VOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEMVOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEM
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speech
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
A Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language SpecificationA Survey on Speech Recognition with Language Specification
A Survey on Speech Recognition with Language Specification
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
 
K010416167
K010416167K010416167
K010416167
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 

Mehr von ijistjournal

A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINijistjournal
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDYijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...ijistjournal
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...ijistjournal
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...ijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...ijistjournal
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMijistjournal
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...ijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...ijistjournal
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domainijistjournal
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...ijistjournal
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)ijistjournal
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...ijistjournal
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithmijistjournal
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...ijistjournal
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...ijistjournal
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
 
Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...ijistjournal
 

Mehr von ijistjournal (20)

A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVINA MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
A MEDIAN BASED DIRECTIONAL CASCADED WITH MASK FILTER FOR REMOVAL OF RVIN
 
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDYDECEPTION AND RACISM IN THE TUSKEGEE  SYPHILIS STUDY
DECEPTION AND RACISM IN THE TUSKEGEE SYPHILIS STUDY
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
A NOVEL APPROACH FOR SEGMENTATION OF SECTOR SCAN SONAR IMAGES USING ADAPTIVE ...
 
Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...Call for Papers - International Journal of Information Sciences and Technique...
Call for Papers - International Journal of Information Sciences and Technique...
 
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...Online Paper Submission - 4th International Conference on NLP & Data Mining (...
Online Paper Submission - 4th International Conference on NLP & Data Mining (...
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...Research Articles Submission - International Journal of Information Sciences ...
Research Articles Submission - International Journal of Information Sciences ...
 
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHMANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
ANALYSIS OF IMAGE WATERMARKING USING LEAST SIGNIFICANT BIT ALGORITHM
 
Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...Call for Research Articles - 6th International Conference on Machine Learning...
Call for Research Articles - 6th International Conference on Machine Learning...
 
Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...Online Paper Submission - International Journal of Information Sciences and T...
Online Paper Submission - International Journal of Information Sciences and T...
 
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial DomainColour Image Steganography Based on Pixel Value Differencing in Spatial Domain
Colour Image Steganography Based on Pixel Value Differencing in Spatial Domain
 
Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...Call for Research Articles - 5th International conference on Advanced Natural...
Call for Research Articles - 5th International conference on Advanced Natural...
 
International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)International Journal of Information Sciences and Techniques (IJIST)
International Journal of Information Sciences and Techniques (IJIST)
 
Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...Research Article Submission - International Journal of Information Sciences a...
Research Article Submission - International Journal of Information Sciences a...
 
Design and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression AlgorithmDesign and Implementation of LZW Data Compression Algorithm
Design and Implementation of LZW Data Compression Algorithm
 
Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...Online Paper Submission - 5th International Conference on Soft Computing, Art...
Online Paper Submission - 5th International Conference on Soft Computing, Art...
 
Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...Submit Your Research Articles - International Journal of Information Sciences...
Submit Your Research Articles - International Journal of Information Sciences...
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...Call for Papers - 13th International Conference on Cloud Computing: Services ...
Call for Papers - 13th International Conference on Cloud Computing: Services ...
 

KĂĽrzlich hochgeladen

Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 

KĂĽrzlich hochgeladen (20)

young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 

ACHIEVING SECURITY VIA SPEECH RECOGNITION

  • 1. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 DOI : 10.5121/ijist.2016.6222 203 ACHIEVING SECURITY VIA SPEECH RECOGNITION Swati Patel and Nita Lokwani Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charusat University, Changa ABSTRACT Speech is one of the essential sources of the conversation between human beings. We as humans speak and listen to each other in human-human interface. People have tried to develop systems that can listen and prepare a speech as persons do so naturally. This paper presents a brief survey on Speech recognition, allow people to compose documents and control their computers with their voice. In other words, the process of enabling a machine (like a computer) to identify and respond to the sounds produced in human speech. ASR can be treated as the independent, computer-driven script of spoken language into readable text in real time. The Speech Recognition system requires careful attention to the following issues: Meaning of various types of speeches, speech representation, feature extraction techniques, speech classifiers, and database and performance evaluation. This paper helps in understanding the technique along with their pros and cons. A comparative study of different technique is done as per stages. KEYWORDS Automatic speech recognition (ASR), Analysis, feature extraction, Modeling, Testing, speech processing, Human Computer Interaction (HCI). 1. INTRODUCTION Speech recognition is a process in which computer (or other type of machine) identifies spoken words and performing required task. The task is to understand spoken words, react appropriately and then convert into text. Sometime it is also known as speech to text (STT). Basically, it is correct recognize of what you are saying. It is mostly used when someone’s hands and eyes are busy. The speech recognition system consists a microphone in which person can speak; speech recognition software to interpret the speech; a good quality sound card for input and/or output; and a proper pronunciation. 2. SPEECH RECOGNITION BASICS Speech recognition automatically extracts the string of words spoken from the speech signal. The following factors are the basically used for understanding speech recognition technology. Utterance An utterance is the pronunciation (speaking) of either a word or words that represent a single meaning to the computer. It can be word, words, sentence, or even multiple sentences.
  • 2. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 204 Speaker dependent & speaker independent systems Speaker dependent systems design speech patterns for a specific speaker. Generally they are more perfect for the correct speaker, but inaccurate for other speakers. They accept the speaker will speak in a steady and consistent voice and tempo. Speaker independent systems are constructed from a wide range of speakers. Adaptive systems usually start with speaker independent systems and apply training techniques to adapt to the speaker to increase their recognition accuracy. Vocabularies It is a collection of words or statements that can be recognized by the SR system. Generally, for a computer, it is very easy to identify smaller vocabularies, but very difficult to identify larger vocabularies. In normal dictionaries, it doesn't have only a single word, but it can be as long as a sentence or two. Smaller vocabularies can remember only one or two words like “Stand up", but large vocabularies can have a hundred thousand or more! Accuract The skill of a recognizer can be studied by measuring its accuracy - or how well it remembers the words. This includes not only correctly identifying a word, but also identifying if the spoken a word is not in its vocabulary. Good ASR systems have more than 98% correct. The satisfactory accuracy of a system really based on the application. Training Some speech recognizers have the skill to adapt to a speaker. When the system has this skill, it may allow training to take place. An ASR system is trained by having the person who has repeat standard or common phrases and adjusting its comparison algorithms to match that particular person. The accuracy of a recognizer usually improves with training. Persons who have difficulty in speaking, or pronouncing certain of the words they can also use training. As long as the person can consistently repeat a word, ASR systems with training should be able to adapt. 3. TYPES OF SPEECH RECOGNITION Speech recognition systems can be divided in various different classes by describing what types of words they have the ability to remember. ASR is the ability to check when a person starts and completes a word. Most packages can fit into more than one class, based on which mode they're using. Isolated Words Isolated word required each word to have silent (lack of an audio signal) on both ends. It doesn't mean that it accepts single words, but it does require a single word at a time. Often, these systems have "Listen/Not-Listen" states, where they require the person to wait between words (usually doing processing during the pauses). The isolated word might be a good name for this class. Connected Words Connected words (or more correctly 'connected utterances') are similar to isolated words, but allow separate words to be run simultaneously with a minimum pause between them.
  • 3. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 205 Continuous Speech Processing with continuous speech are some of the most difficult to design because they must process with the special methods to determine word limits. Continuous speech allows users to speak almost naturally, while the computer defines the content. Basically, that particular content itself the words which are dictated by Computer. Spontaneous Speech There appears to be a various definitions for what spontaneous speech in reality is. At a basic point, it can be thought of as speech that is natural sounding and not planned. An ASR system with spontaneous speech ability to cover types of natural speech comprises with the words being run together. Voice Verification/Identification Some ASR systems are used for identifying specific users. This document doesn't cover the particular verified or secure systems. 4. WORKING OF SPEECH RECOGNITION SYSTEM Figure 1. Process of Speech Recognition Basically, the conversion of the voice to an analog signal of microphone and that process takes the input from digital stage. Input of the system from the user is known as utterance (Spoken input from the user of a speech system. An utterance may be a one word, a sentence, an entire phrase, or even several sentences.) This is the representation of the binary form of 1s and 0s that make up programming languages used by the computer. Any other kind of sound is not heard by the computers. Sound-recognition system has acoustic models (An acoustic model is created by taking audio demos of speech, and their text records, and using software to create numerical representations of the sounds that make up each utterance. It is used by a speech recognition engine to remember speech) convert the audio sounds to one of about four dozen basic speech components (called phonemes). The latest versions of speech technology have been derived so that they eliminate the
  • 4. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 206 extra noise and not used information that is not needed to let the computer work. The words we speak are changed into digital forms of the basic speech components (phonemes). Once this is complete, a second step of the software begins to work. The text is compared to the digital dictionary that is stored in computer internal storage. This is a very vast collection of words, usually more than 100,000. When it compares and identify a match based on the digital form it displays the words on the display. This is the simple and basic process for all speech recognition systems. Figure 2. Working of Speech Recognition To convert speech to text or a computer command, a computer has to go with the several complex tasks. When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) converts this analog wave to the digital data that can be understood by computer. To do this digitizes the sound by taking small measurements of the wave at the regular time. The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into various bands of frequency (frequency is the wavelength of the sound waves, heard by humans as differences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. It may also have to be temporally aligned. The speed of the speech has been always in variable form, so the sound must be adjusted to match the speed of the format sound that is already available in the system's memory. Next the signal is divided into small segments as short as a few hundreds of a second, or even thousandths in the case of passive consonant sounds -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." The program then matches these segments to known phonemes in the appropriate language. A phoneme is the smallest element of a language -- a representation of the sounds we make and put together to form meaningful expressions. There are
  • 5. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 207 roughly 40 phonemes in the English language (different linguists have different opinions on the exact number), while other languages have more or fewer phonemes. The next step seems simple, but it is actually the most difficult to accomplish and is the focus of most speech recognition research. The program examines phonemes in the context of the other phonemes around them. It runs the contextual phoneme plot through a complex statistical model and compares them to a large library of known words, phrases and sentences. The program then determines what the user was probably saying and either outputs it as text or issues a computer command. 5. APPROACHES TO SPEECH RECOGNITION ď‚· Acoustic phonetic approach ď‚· Pattern Recognition approach ď‚· Artificial intelligence approach. 5.1 Acoustic Phonetic Approach It is also called as rule-based approach. It is use knowledge of phonetics & linguistics to guide the search process. This approach generally uses some principles which are defined expressing everything or anything that might help to decrypt based in “blackboard” architecture, i.e. At each decision point it lays out the possibilities and use rules to determine which orders are acceptable. It has poor performance due to difficulty to express rules, to improve the organization. This approach recognizes individual phonemes, words, sentence structure and/or significance. 5.2 Pattern Recognition Approach This method is divided in two steps, i.e. training of speech patterns and recognition of pattern by way of pattern comparison. In the parameter measurement phase (filter bank, LFC, DFT), a sequence of measurements is developed based on the input signal to define the “test pattern”. The unknown test pattern is then compared with each sound reference pattern and a measure of comprise between the examination pattern & reference pattern. Best matches the unknown test pattern based on the matching of the pattern classification phase (dynamic time warping). 5.2.1 Template-based approach This approach provides a family of techniques that have advanced the field considerably during the last six decades. A collection of prototypical speech patterns is stored as reference patterns representing the dictionary of applicant’s words. Recognition is then carried out by matching an unknown spoken utterance with each reference template and choosing the category of the best matching pattern. Generally templates are created for entire words. This has the advantage that, errors due to segmentation or classification of smaller acoustically more variable units such as phonemes can be avoided. 5.2.2 Statistics-based approach Stochastic modelling entails the use of probabilistic models to deal with undefined or incomplete information. In speech recognition, many sources like, confusable sounds, speaker variability’s, contextual effects, and homophone words are affected for uncertainty and incompleteness. In
  • 6. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 208 today the most popular stochastic approach is hidden Markov modelling. A hidden Markov model is qualified by a finite state Markov model and a set of output distributions. The transition parameters in the Markov chain models, temporal variabilities, while the parameters in the output distribution model, spectral variabilities. These two types of variables are the effect of speech recognition. 5.3 Artificial Intelligence Recognition Approach This approach is a combination of acoustic phonetic approach and pattern recognition approach. In this, it exploits the concepts of acoustic phonetics and pattern recognition methods. The information regarding linguistic, phonetic and spectrogram used by Knowledge based approach. Some speech researchers developed recognition system that used acoustic phonetic knowledge to improve classification rules for speech sounds. While template based approaches have been real in the design of a variety of speech recognition systems; they provided little insight about human speech processing, thereby creating error analysis and knowledge-based system enhancement difficult. On the other hand, a large body of linguistic and phonetic literature provided insights and understanding of human spoken language processing. In its complete form, the knowledge engineering plan involves the direct and explicit incorporation of expert’s speech knowledge into a recognition system. This knowledge is normally derived from careful study of spectrograms and is incorporated using guidelines or procedures. Pure knowledge engineering was also motivated by the interest and research in expert systems. 6. CHALLENGES AND DIFFICULTIES OF SR Speech Recognition is still a very cumbersome problem. Following are the problems…. ď‚· Speaker Variability Speaker or more than one speaker may be pronounced the same word in a different way. ď‚· Channel Variability The position and quality of the microphone and background environment may be affecting in output 7. APPLICATIONS OF SPEECH RECOGNITION
  • 7. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 209 Applications of Speech Recognition Speech recognition applications include ď‚· Voice dialling (e.g., "Call home"), ď‚· Call routing (e.g., "I would like to make a collect call"), ď‚· Simple data entry (e.g., entering a credit card number), ď‚· Preparation of structured documents (e.g., A radiology report), ď‚· Speech-to-text processing (e.g., word processors or emails), and ď‚· In aircraft cockpits (usually termed Direct Voice Input). 8. PROS AND CONS OF SPEECH RECOGNITION PROS: ď‚· There is no need to type or write text, and generally it is quicker than “typing” & “handwriting”. ď‚· Allows for better spelling, whether it is in text or documents. It is very useful for mental or physical disability person. CONS: ď‚· No program is 100% perfect ď‚· Factors like slang, homonyms, signal-to-noise ratio, and overlapping speech are affecting the accuracy of speech recognition. ď‚· Can be expensive depending on the program 9. CONCLUSIONS Speech is the primary, and the most appropriate means of communication between human being. Whether due to technological curiosity to make machines that mimic humans or desire to automate work with machines, research in speech and speaker identification. This paper introduces the basics of speech recognition technology and also highlights the difference between different speech recognition systems. In this paper the most common algorithms which are used to do speech recognition are also discussed along with the current and its future use. Speech recognition is one of the most integrating areas of machine intelligence, since, humans do a daily activity of speech recognition. 10. ACKNOWLEDGEMENT We are very thankful to our institute CMPICA (Smt. Chandaben Mohanbhai Patel Institute of Computer Applications), CHARUSAT for providing necessary infrastructure for development of the research work.
  • 8. International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016 210 REFERENCES [1] M.A.Anusuya, S.K.Katti “Speech Recognition by Machine: A Review” International Journal of Computer Science and Information Security [2] Preeti Saini, Parneet Kaur “Automatic Speech Recognition: A Review” International Journal of Engineering Trends and Technology. [3] Santosh k.Gaikwad, Bharti W.Gawali, Pravin Yannawar “A Review on Speech Recognition Technique” International Journal of Computer Applications. [4] Parwinder pal Singh, Er. Bhupinder Singh “Speech Recognition as Emerging Revolutionary Technology, “International Journal of Advanced Research in Computer Science and Software Engineering. [5] http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO/introduction.html [6] Shipra J. Arora, Rishi Pal Singh “Automatic Speech Recognition: A Review”, International Journal of Computer Applications. [7] Wiqas Ghai and Navdeep Singh,“Literature Review on Automatic Speech Recognition”,International Journal of Computer Applications vol. 41– no.8, pp. 42-50, March 2012. [8] Nnamdi Okomba S., 2Adegboye Mutiu Adesina, and 3Candidus O. Okwor., “Survey of Technical Progress in Speech Recognition by Machine over Few Years of Research”, IOSR Journal of Electronics and Communication Engineering [9] http://www.computerhope.com/jargon/v/voicreco.htm [10] https://en.wikipedia.org/wiki/Speech_recognition [11] https://en.wikipedia.org/wiki/Acoustic_model AUTHORS Swati Patel received her M.C.A. & B.C.A. degrees from Dharmsinh Desai University, Nadiad, Gujrat, India in 2011 & 2009 respectively. She is presently working as an Assistant Professor in Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Changa. Her research area includes HCI and Pattern Recognition. Nita Lokwani received her M.C.A. degree from Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Charusat University, Changa, Gujrat, India in 2012 & B.C.A. degree from Dharmsinh Desai University, Nadiad, Gujrat, India in 2009. She is presently working as an Assistant Professor in Smt. Chandaben Mohanbhai Patel Institute of Computer Applications, Changa. Her research area includes Human Computer Interaction and working on Embedded Systems.