SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Speech Recognition System Major Project On:
Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is SRS? Speaker recognition  is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves.
SRS is divided into following two parts: Speech Identification  Speech Verification Speaker identification  is the process of determining which registered speaker provides a given utterance. Speaker verification  , is the process of accepting or rejecting the identity claim of a speaker.
Speech Identification Block Diagram And Description
Speaker Identification:
Feature extraction: It is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching: It involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. Speech Feature Extraction: The purpose of this module is to convert the speech waveform, using digital signal processing (DSP) tools, to a set of features (at a considerably lower information rate) for further analysis.  This is often referred as the  signal-processing front end .
Mel-frequency cepstrum coefficients processor
Description of MFCP: Frame Blocking: In this step the continuous speech signal is blocked into frames of  N   samples, with adjacent frames being separated by  M  ( M < N ).  The first frame consists of the first  N  samples. Windowing: The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame.  The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame.
Fast Fourier Transform: The next processing step is the Fast Fourier Transform, which converts each frame of  N  samples from the time domain into the frequency domain.  The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the set of  N  samples { x n }, as follow: Mel Frequency Wrapping: As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale.  Thus for each tone with an actual frequency,  f , measured in Hz, a subjective pitch is measured on a scale called the ‘Mel’ scale.
The  Mel-frequency  scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. Cepstrum: In this final step, we convert the log Mel spectrum back to time.  The result is called the Mel frequency cepstrum coefficients (MFCC).  The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis.
Speech Verification Block Diagram And Description
Speaker Verification :
Speaker Verification is also called as Feature Matching or Pattern Matching. Vector Quantization Method (VQ) is used for high accuracy and ease of implementation. Vector Quantization:  VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space.  Each region is called a  cluster  and can be represented by its center called a  codeword .  The collection of all codeword's is called a  codebook .
Clustering the training Vectors: After the enrolment session, the acoustic vectors extracted from input speech of each speaker provide a set of training vectors for that speaker.  As described above, the next important step is to build a speaker-specific VQ codebook for each speaker using those training vectors.  There is a well-know algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of  L  training vectors into a set of  M  codebook vectors.
Applications
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
Hira Shaukat
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
Samiul Parag
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
himadrigupta
 

Was ist angesagt? (20)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
VOICE BROWSER
VOICE BROWSERVOICE BROWSER
VOICE BROWSER
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 

Andere mochten auch

FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
omutukuda
 
10 transformada fourier
10 transformada fourier10 transformada fourier
10 transformada fourier
Alex Jjavier
 
Estudio de mercado galletas de quinua
Estudio de mercado galletas de quinuaEstudio de mercado galletas de quinua
Estudio de mercado galletas de quinua
Armida Sucasaire
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
jeanrummy
 
Medical Transcription
Medical TranscriptionMedical Transcription
Medical Transcription
aadhar14_b
 

Andere mochten auch (20)

Digital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and PythonDigital signal processing through speech, hearing, and Python
Digital signal processing through speech, hearing, and Python
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
Speech Reognition Using FPGA Technology
Speech Reognition Using FPGA TechnologySpeech Reognition Using FPGA Technology
Speech Reognition Using FPGA Technology
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
Developing an embedded video application on dual Linux + FPGA architecture
Developing an embedded video application on dual Linux + FPGA architectureDeveloping an embedded video application on dual Linux + FPGA architecture
Developing an embedded video application on dual Linux + FPGA architecture
 
FPGA Applications in Finance
FPGA Applications in FinanceFPGA Applications in Finance
FPGA Applications in Finance
 
10 transformada fourier
10 transformada fourier10 transformada fourier
10 transformada fourier
 
Estudio de mercado galletas de quinua
Estudio de mercado galletas de quinuaEstudio de mercado galletas de quinua
Estudio de mercado galletas de quinua
 
Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...Universal Patient Identity: eliminating duplicate records, medical identity t...
Universal Patient Identity: eliminating duplicate records, medical identity t...
 
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
The Impact of Duplicate Medical Records and Overlays on the Healthcare Industry
 
Voice & Speech Recognition Technology in Healthcare
Voice &  Speech Recognition Technology in HealthcareVoice &  Speech Recognition Technology in Healthcare
Voice & Speech Recognition Technology in Healthcare
 
Medical Records Destruction Guide
Medical Records Destruction GuideMedical Records Destruction Guide
Medical Records Destruction Guide
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
 
Translation and Transcription Process | Medical Transcription Service Company
Translation and Transcription Process | Medical Transcription Service Company  Translation and Transcription Process | Medical Transcription Service Company
Translation and Transcription Process | Medical Transcription Service Company
 
Medical Transcription
Medical TranscriptionMedical Transcription
Medical Transcription
 
Noise Adaptive Training for Robust Automatic Speech Recognition
Noise Adaptive Training for Robust Automatic Speech RecognitionNoise Adaptive Training for Robust Automatic Speech Recognition
Noise Adaptive Training for Robust Automatic Speech Recognition
 
What is medical transcription
What is medical transcriptionWhat is medical transcription
What is medical transcription
 
Medical Transcription Power Point Show
Medical Transcription Power Point ShowMedical Transcription Power Point Show
Medical Transcription Power Point Show
 
FPGA
FPGAFPGA
FPGA
 

Ähnlich wie Speech Recognition System By Matlab

Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
sipij
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
sunnysyed
 

Ähnlich wie Speech Recognition System By Matlab (20)

Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
Joint MFCC-and-Vector Quantization based Text-Independent Speaker Recognition...
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Speaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained DataSpeaker Identification From Youtube Obtained Data
Speaker Identification From Youtube Obtained Data
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
A017410108
A017410108A017410108
A017410108
 
A017410108
A017410108A017410108
A017410108
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Kürzlich hochgeladen (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

Speech Recognition System By Matlab

  • 1. Speech Recognition System Major Project On:
  • 2.
  • 3. What is SRS? Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves.
  • 4. SRS is divided into following two parts: Speech Identification Speech Verification Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification , is the process of accepting or rejecting the identity claim of a speaker.
  • 5. Speech Identification Block Diagram And Description
  • 7. Feature extraction: It is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching: It involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. Speech Feature Extraction: The purpose of this module is to convert the speech waveform, using digital signal processing (DSP) tools, to a set of features (at a considerably lower information rate) for further analysis. This is often referred as the signal-processing front end .
  • 8.
  • 10. Description of MFCP: Frame Blocking: In this step the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M ( M < N ). The first frame consists of the first N samples. Windowing: The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame.
  • 11. Fast Fourier Transform: The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the set of N samples { x n }, as follow: Mel Frequency Wrapping: As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f , measured in Hz, a subjective pitch is measured on a scale called the ‘Mel’ scale.
  • 12. The Mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. Cepstrum: In this final step, we convert the log Mel spectrum back to time. The result is called the Mel frequency cepstrum coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis.
  • 13. Speech Verification Block Diagram And Description
  • 15. Speaker Verification is also called as Feature Matching or Pattern Matching. Vector Quantization Method (VQ) is used for high accuracy and ease of implementation. Vector Quantization: VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword . The collection of all codeword's is called a codebook .
  • 16.
  • 17. Clustering the training Vectors: After the enrolment session, the acoustic vectors extracted from input speech of each speaker provide a set of training vectors for that speaker. As described above, the next important step is to build a speaker-specific VQ codebook for each speaker using those training vectors. There is a well-know algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of L training vectors into a set of M codebook vectors.
  • 18.
  • 20.