SlideShare a Scribd company logo
1 of 26
SPEECH RECOGNITION
SYSTEMS

TWINKLE SAHU
CSE 6TH SEM
INTRODUCTION
• Speech recognition is a process by which a computer
takes a speech signal (recorded using a microphone)
and converts it into words in real-time. It is achieved by
following certain steps and the software responsible for
it is known as a ‘Speech Recognition System’
• SR systems are usually implemented in the form of
dictation software and intelligent assistants in personal
computers, smartphones, web browsers and many
other devices.
DESIGN OF A SR
SYSTEM
SR systems have to deal with a large number of challenges
like :• The speaker’s voice is often accompanied by
surrounding noise which makes their accurate
recognition difficult.
• A speaker may speak a number of different words and
all of these words have to be accurately recognized.
• Accent of speaking varies from person to person and
this is a very big challenge
• A speaker may speak something very quickly and all of
the words spoken have to be individually recognized
accurately.
TYPES OF SR SYSTEMS
• Speaker Dependent SR systems : Work by learning
the unique characteristics of a single person’s voice
and depend on the speaker for training.

• Speaker Independent SR systems : Designed to
recognize anyone’s voice, so no training is involved.
BASIC PRINCIPLES OF
SPEECH RECOGNITION
• The smallest unit of spoken language is known as a
Phoneme.
• The English language contains approximately 44
phonemes representing all the vowels and
consonants that we use for speech.
• We can take the example of a typical word such as
moon which can be broken down into three
phonemes: m, ue, n.
• To interpret speech we must have a way of
identifying the components of spoken words and
phonemes act as identifying markers within speech.
• An algorithm has to be used to interpret the
speech further. The Hidden Markov Model is a
commonly used mathematical model used to do
this.
• To create a speech recognition engine, a large
database of models is created to match each
phoneme.
• When a comparison is performed, the most likely
match is determined between the spoken
phoneme and the stored one, and further
computations are performed.
COMPONENTS OF SPEECH
RECOGNITION
• Corpus Collection :
Database consisting of speech data that built from
multiple speech samples.
• Corpus collection construction for a speakerdependent SR system :-
• Corpus collection construction for a speakerindependent SR system.
• Signal Analyzer :
Analyses the speech signal
and removes the background
noise thus focusing only on the
speaker’s speech .

• Acoustic Model : Identifies
phonemes from the speech
sample using a probability
based mathematical model.

ACOUSTIC MODEL
• Language Model : Identifies words and thus
sentences uttered by the speaker from the
phonemes by making use of a dictionary file and
grammar file.

DICTIONARY FILE

GRAMMAR FILE
PROCESS OF SPEECH
RECOGNITION
PAIN……
……

SPEECH
ANALYZER
SPEECH ANALYZER

/p/--/ae/--/n/
ACOUSTIC MODEL

/p/--/ae/--/n/

CORRECT
/p/--/ae/--/n/

TRAINED HIDDEN
MARKOV MODEL
LANGUAGE MODEL
/p/--/ae/--/n/

DICTIONARY FILE

pain

pain

GRAMMAR FILE
pain
TEXT OUTPUT
The Grammar File
HIDDEN MARKOV MODEL
• Markov models are excellent ways of abstracting
simple concepts into a relatively easily computable
form.
• Used in data compression to sound recognition.

From this graph we can create sequences
such as:
N1 N2 N3
N1 N2 N2 N2 N3 N3 N3 N3 N3
N1 N1 N2 N2 N3
N1 N2 N3

= 0.4 X 0.8 X 0.5 = 0.16

N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x
0.5 x 0.5 x 0.5 x 0.5
= 0.0008
N1 N1 N2 N2 N3

= 0.6 x 0.4 x 0.2 x 0.8 x 0.5
= 0.192
This accommodates for pronunciations such as:
t ow m aa t ow - British English
t ah m ey t ow - American English
t ah mey t a
- Possibly pronunciation when
speaking quickly
With sentences such as:
I like apple juice
I like tomato juice
I hate apple juice
I hate tomato juice

- Very probable
- Very improbable!
- Relatively improbable
- Relatively probable
• The Markov Model makes the Speech Recognition
systems more intelligent i.e. it can accurately
differentiate between similar sounding words like in
the case :
James's school...
James is cool
• In simpler Markov models , the state is directly visible
to the observer.
• In a hidden Markov model, the state is not directly
visible, but output, dependent on the state, is
visible.
PERFORMANCE OF A SR
SYSTEM
• Accuracy is usually rated with word error rate (WER),
whereas speed is measured with the real time
factor.
•

Other measures of accuracy include Single Word
Error Rate (SWER) and Command Success Rate
(CSR).

Factors affecting the accuracy of a SR system :•
•
•
•
•
•

Vocabulary size and confusability
Speaker dependence vs. independence
Isolated, discontinuous, or continuous speech
Task and language constraints
Read vs. spontaneous speech
Adverse conditions
APPLICATIONS
• Health Care
• Military - High Performance Aircrafts
- Air Traffic Control Systems

• Telephony – Smart-phones
- Customer Helpline Services
• Personal Computers
SIRI AND GOOGLE
NOW

Intelligent Personal Assistant
developed by Apple.

Google Now is an intelligent
personal assistant developed by
Google.

Both use a combination of speaker- dependent
and speaker-independent sr systems
CONCLUSION
• Speech Recognition systems are an indispensable
part of the ever-advancing field of humancomputer interaction.
• Needs greater research to tackle various
challenges.

Thank You!

More Related Content

What's hot

Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Visual speech to text conversion applicable to telephone communication
Visual speech to text conversion  applicable  to telephone communicationVisual speech to text conversion  applicable  to telephone communication
Visual speech to text conversion applicable to telephone communicationSwathi Venugopal
 

What's hot (20)

Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Visual speech to text conversion applicable to telephone communication
Visual speech to text conversion  applicable  to telephone communicationVisual speech to text conversion  applicable  to telephone communication
Visual speech to text conversion applicable to telephone communication
 

Similar to Speech recognition system seminar

Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognitionAditya Kumar Khare
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptxAmanBadesra1
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systemsSandeep Kumar
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for developmentAravind Reddy
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for developmentAravind Reddy
 
Nlp, robotics and expert system
Nlp, robotics and expert systemNlp, robotics and expert system
Nlp, robotics and expert systemsuman Mann Mann
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01girishjoshi1234
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10
 
Emotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechEmotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechLakshmi Sarvani Videla
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 

Similar to Speech recognition system seminar (20)

Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
Assign
AssignAssign
Assign
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systems
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Nlp, robotics and expert system
Nlp, robotics and expert systemNlp, robotics and expert system
Nlp, robotics and expert system
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 
Text summarization
Text summarization Text summarization
Text summarization
 
Emotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speechEmotion recognition using facial expressions and speech
Emotion recognition using facial expressions and speech
 
Seminar
SeminarSeminar
Seminar
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Introduction
IntroductionIntroduction
Introduction
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 

Speech recognition system seminar

  • 2. INTRODUCTION • Speech recognition is a process by which a computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’ • SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.
  • 3. DESIGN OF A SR SYSTEM SR systems have to deal with a large number of challenges like :• The speaker’s voice is often accompanied by surrounding noise which makes their accurate recognition difficult. • A speaker may speak a number of different words and all of these words have to be accurately recognized. • Accent of speaking varies from person to person and this is a very big challenge • A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.
  • 4. TYPES OF SR SYSTEMS • Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training. • Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.
  • 5. BASIC PRINCIPLES OF SPEECH RECOGNITION • The smallest unit of spoken language is known as a Phoneme. • The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech. • We can take the example of a typical word such as moon which can be broken down into three phonemes: m, ue, n.
  • 6. • To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech. • An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this. • To create a speech recognition engine, a large database of models is created to match each phoneme. • When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.
  • 7. COMPONENTS OF SPEECH RECOGNITION • Corpus Collection : Database consisting of speech data that built from multiple speech samples.
  • 8. • Corpus collection construction for a speakerdependent SR system :-
  • 9. • Corpus collection construction for a speakerindependent SR system.
  • 10. • Signal Analyzer : Analyses the speech signal and removes the background noise thus focusing only on the speaker’s speech . • Acoustic Model : Identifies phonemes from the speech sample using a probability based mathematical model. ACOUSTIC MODEL
  • 11. • Language Model : Identifies words and thus sentences uttered by the speaker from the phonemes by making use of a dictionary file and grammar file. DICTIONARY FILE GRAMMAR FILE
  • 17. HIDDEN MARKOV MODEL • Markov models are excellent ways of abstracting simple concepts into a relatively easily computable form. • Used in data compression to sound recognition. From this graph we can create sequences such as: N1 N2 N3 N1 N2 N2 N2 N3 N3 N3 N3 N3 N1 N1 N2 N2 N3
  • 18. N1 N2 N3 = 0.4 X 0.8 X 0.5 = 0.16 N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008 N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192
  • 19. This accommodates for pronunciations such as: t ow m aa t ow - British English t ah m ey t ow - American English t ah mey t a - Possibly pronunciation when speaking quickly
  • 20. With sentences such as: I like apple juice I like tomato juice I hate apple juice I hate tomato juice - Very probable - Very improbable! - Relatively improbable - Relatively probable
  • 21. • The Markov Model makes the Speech Recognition systems more intelligent i.e. it can accurately differentiate between similar sounding words like in the case : James's school... James is cool • In simpler Markov models , the state is directly visible to the observer. • In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible.
  • 22. PERFORMANCE OF A SR SYSTEM • Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. • Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).

  • 23. Factors affecting the accuracy of a SR system :• • • • • • Vocabulary size and confusability Speaker dependence vs. independence Isolated, discontinuous, or continuous speech Task and language constraints Read vs. spontaneous speech Adverse conditions
  • 24. APPLICATIONS • Health Care • Military - High Performance Aircrafts - Air Traffic Control Systems • Telephony – Smart-phones - Customer Helpline Services • Personal Computers
  • 25. SIRI AND GOOGLE NOW Intelligent Personal Assistant developed by Apple. Google Now is an intelligent personal assistant developed by Google. Both use a combination of speaker- dependent and speaker-independent sr systems
  • 26. CONCLUSION • Speech Recognition systems are an indispensable part of the ever-advancing field of humancomputer interaction. • Needs greater research to tackle various challenges. Thank You!