SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speech Processing
Govind
Center for Computational Engineering & Networking
Amrita Vishwa Vidyapeetham
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Outline
Introduction
Human Speech Production and Perception Systems
Representation of Speech in the Time and Frequency
Domains
Speech Sounds and Features
Signal Processing Methods for Estimating Speech
Features
Speech Processing Applications
Speech Recognition
Speech Synthesis
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Prior Knowledge Required:
Signals and Systems
Digital signal Processing
Advanced DSP
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Signals and Systems
Classification of Signals
LTI systems
Correlation/Convolution Operations
Fourier Representation: FS, DTFS, DTFT,DFT,FFT,
Z-transform
Concepts of Impulse Response, Frequency Response etc.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Digital signal Processing
Sampling: Nyquist, Aliasing
FFT implementation of DFT
Design of FIR and IIR filters
Structures for realization of Filters
Multirate signal processing: Filter banks
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Advanced DSP
Time-Frequency Analysis
TFA by STFT
TFA by wigner Distribututions
TFA by Wavelets
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
References
L. Rabiner, Biing-Hwang Juang and B.
Yegnanarayana,"Fundamentals of Speech
Recognition",Pearson Education Inc.2009
Douglas O’Shaughnessy,"Speech
Communication",University Press,2001
Thomas F Quatieri,"Discrete Time Speech Signal
Processing", Pearson Education Inc.,2004
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Introduction
Information in Speech
Message
Language
Accent
Speaker
Emotions/Stress
Applications
Recognition
Speech recognition
Speaker Recognition/Verification
Emotion Recognition etc..
Synthesis
Text to Speech Synthesis
Speech Enhancement
Voice Conversion
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Recognition
Speech Objective Information Extracted
Message Author of the danger...
Speaker Its Govind Speaking
Speaker claim has to
be verified
Hi Govind, your claim is ac-
cepted
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Synthesis
Input Objective Output
Text To Speech Synthesis
Text (Epochs Occur... Synthesize Text
Speech Enhancement
Remove noise
Remove reverberation
Enhance desired
speaker speech
Voice Conversion
Convert source
speaker speech target
speakr speech
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
What makes automatic processing of speech
Complicated?
Its an inter-disciplinary area
1 Signal Processing: The process of extracting relevant information from
speech signal
2 Physics: The science of understanding relationship between physical
speech signal and physiological mechanisms that produced it.
3 Pattern Recognition: Grouping or classifying patterns of various events
in speech
4 Communication and information theory: Deals with efficient way of
encodng or decoding parameters of speech, efficient serach for patterns of
interest in speech (dynamic programming, viterbi search, stack algorithms
etc..)
5 Linguistics: The relationship between sounds (phonology) with syntax
and semantics of a language and sense that derived from the meaning
(pragmatics)
6 Computer Science: The study of diferent algorithms for implementing in
Software/Hardware
7 Psychology: Understanding the psychological state of the
speaker/listener will be helpful for the tasks like emotion analysis.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speaker-Listener Schematic Diagram in Speech
Communication
Figure: Schematic Diagram of Speech Communication: Figure
Courtesy- Rabiner et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Production-Perception Block Diagram
DĞƐƐĂŐĞ
&ŽƌŵƵůĂƚŝŽŶ
>ĂŶŐƵĂŐĞ
ŽĚĞ
EĞƵƌŽͲ
DƵƐĐƵůĂƌ
ŽŶƚƌŽůƐ
sŽĐĂů dƌĂĐƚ
^LJƐƚĞŵ
ĐŽƵƐƚŝĐ
tĂǀĞĨŽƌŵ
dƌĂŶƐŵŝƐƐŝŽŶ
ŚĂŶŶĞů
ĐŽƵƐƚŝĐ
tĂǀĞĨŽƌŵ
DĞƐƐĂŐĞ
hŶĚĞƌƐƚĂŶĚŝŶŐ
>ĂŶŐƵĂŐĞ
dƌĂŶƐůĂƚŝŽŶ
EĞƵƌĂů
dƌĂŶƐĚƵĐƚŝŽŶ
ĂƐŝůĂƌ
DĞŵďƌĂŶĞ
DŽƚŝŽŶ
dĞdžƚ WŚŽŶĞŵĞƐͲ
WƌŽƐŽĚLJ
ƌƚŝĐƵůĂƚŽƌLJ
DŽƚŝŽŶ
^ĞŵĂŶƚŝĐƐ
WŚŽŶĞŵĞƐ
tŽƌĚƐ
^ĞŶƚĞŶĐĞƐ
&ĞĂƚƵƌĞ
džƚƌĂĐƚŝŽŶ
ŽĚŝŶŐ
^ƉĞĐƚƌƵŵ
ŶĂůLJƐŝƐ
Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner
et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speech Production
Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri,
"Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Mechanical Equivalent of Speech Production System
Figure: Speech production mechanism: Figure Courtesy- Rabiner et
al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of Speech Signal
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure: Speech Signal in Time domain
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow During Speech Production
Figure: Glottal air flow: Courtesy- Rabinar et al.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow: Graphical Illustration
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Speech Waveform
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Glottal Flow: EGG
Speech EGG
Glottis
Vibration
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Silence (S): No Speech is produced
Unvoiced (U): Vocal folds are not vibrating
Voiced (V): Periodic vibration of vocal cords
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
US S
V
V
V
Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Separation of voiced sounds from unvoiced and silence
sounds is known as voiced-non-voiced detection
Issues in voiced-non-voiced detection:
Difficult to identify weak unvoiced sound from silence
Difficult to distinguish weakly periodic voiced sounds from
unvoiced sounds
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
SpectroGrams: Narrow-band & Wide-band
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Spectral Envelope from a Long Segment of Speech
0
10
20
30
0
1000
2000
3000
4000
0
20
40
FrameIndex
Frequency (Hz)
Magnitude
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of sound units
WŚŽŶĞŵĞƐ
sŽǁĞůƐ
ĨĨƌŝĐĂƚĞ
Ɛ
ŝƉŚƚŚŽŶŐƐ
^ĞŵŝͲ sŽǁĞůƐ
>ŝƋƵŝĚƐ 'ůŝĚĞƐ
ŽŶƐŽŶĂŶƚ
Ɛ
EĂƐĂůƐ
WůŽƐŝǀĞƐ
&ƌŝĐĂƚŝǀĞƐ tŚŝƐƉĞƌƐ
&ƌŽŶƚ DŝĚ ĂĐŬ
sŽŝĐĞĚ hŶǀŽŝĐĞĚ
ŝ ;ĞǀĞͿ
/ ;ŝƚͿ
Ğ ;ŚĂƚĞͿ
;ŵĞƚͿ
h;ŬͿ
Ƶ;ƚͿ
;ƵƉͿ
Ă ;ĨĂƚŚĞƌͿ
Ž;KďĞLJͿ
Đ; ůůͿ
ĂLJ ;ďƵLJͿ
Ăǁ;ĚŽǁŶͿ
ĞLJ ;ďĂŝƚͿ
K ;ďŽLJͿ
ƚnj ;ƐƉŽƌƚƐͿ
ũŚ;ũƵĚŐĞͿ
ĐŚ ;ĐŚƵƌĐŚͿ
ů ;ůĂƌŐĞͿ
ƌ;ƌƵŶͿ
ǁ ;ǁŝƚͿ
LJ ;LJŽƵͿ
ŵ ;ŵĞƚͿ
Ŷ;ŶĞƚͿ
ŶŐ;ƐŝŶŐͿ
Ś ;ŚĞͿ
ď ;ďĂůůͿ
Ě ;ĚĞďƚͿ
Ő ;ŐĞƚͿ
Ŭ ;ŬŝƚͿ
Ɖ ;ƉĞŶͿ
ƚ;ƚĞŶͿ
sŽŝĐĞĚ hŶǀŽŝĐĞĚ
ǀ ;ǀĂƚͿ
ĚŚ;ƚŚĂƚͿ
nj;njŽŽͿ
Ĩ ;ĨƵŶͿ
ƚŚ ;ƚŚŝŶŐͿ
Ɛ;ƐĂƚͿ
ƐŚ;ƐŚŽƵůĚͿ
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of sound units in speech
Sounds are classified into vowels and consonant
Vowels: By exciting fixed vocaltract shape with quasi
periodic glottal pulses
Vowels are classified into front, mid and back based on the
tongue-hump-position
Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate")
Mid vowels: /a/("father"), /Λ/("Up")
Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey")
Another classification is based on the length of vowels:
Long and short
Diphthongs: Combination of two vowels
/ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in
"boat",/cy/ as in "boy" etc.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Front Vowel
Front
Vowel
Speech Signal Spectrogram
I(It)
0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
1000
2000
3000
4000
5000
6000
7000
e(Hate)
0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
1000
2000
3000
4000
5000
6000
7000
i(eve)
0.32 0.34 0.36 0.38 0.4 0.42
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Vowel Analysis
Front vowels found to show high frequency resonance
Front vowels are discriminated among each other by the
tongue height during the vowel production
Mid vowels found to show well separated and balanced
resonant frequency distribution
Back vowels shows almost no energy beyond low
frequency regions
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Diphthongs
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Semivowels
Group of sounds consisting of /w/,/r/,/l/,/y/
difficult to characterize because they are vowel like in
nature
Characterized by gliding transition in vocaltract area
functions between adjacent phonemes
Best described as transitional vowel like sounds
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Nasal Consonants
Group of sounds consisting of /m/,/n/,/η/
Produced with glottal Excitation and vocaltract totally
constricted along the oral passageway
Velam is lowered to block the air passage through oral
cavity and allowing through nasal cavity
Due the acoustic coupling of oral cavity to the pharynx, anti
resonances will be created
/m/,/n/ and /η/ are produced by the constiction at lips,
behind the teeth and at velum, respectively.
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Nasalized Vowels
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Unvoiced Fricatives
Produced by exciting vocaltract with a turbulant airflow
through a narrow constriction
/f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the
class of fricative sounds
/f/: Constriction at teeth
/s/: Constriction near middle of oral cavity
/sh/: constriction at the end of oral tract
Govind CEN, Amrita Vishwa Vidyapeetham
Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Voiced Fricatives
/v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class
of fricative sounds
/v/: Constriction at teeth
/z/: Constriction near middle of oral cavity
/zh/: constriction at the end of oral tract
Except glottal vibrations, the place of articulation remains
same as that of unvoiced fricatives
Govind CEN, Amrita Vishwa Vidyapeetham

Weitere ähnliche Inhalte

Ähnlich wie Speech processinglecworkshop

Effective Presentation: Multimedia
Effective Presentation: MultimediaEffective Presentation: Multimedia
Effective Presentation: Multimedia
Alaa Sadik
 

Ähnlich wie Speech processinglecworkshop (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Dy36749754
Dy36749754Dy36749754
Dy36749754
 
Enterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A PrimerEnterprise Voice Technology Solutions: A Primer
Enterprise Voice Technology Solutions: A Primer
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Effective Presentation: Multimedia
Effective Presentation: MultimediaEffective Presentation: Multimedia
Effective Presentation: Multimedia
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voiceover sevices
Voiceover sevicesVoiceover sevices
Voiceover sevices
 
Instructional Design - Unit 2
Instructional Design - Unit 2Instructional Design - Unit 2
Instructional Design - Unit 2
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
 
Seminar
SeminarSeminar
Seminar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Video conferencing
Video conferencingVideo conferencing
Video conferencing
 
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdfSpeechbird AI Review – Unleashing the Power of Speech Recognition.pdf
Speechbird AI Review – Unleashing the Power of Speech Recognition.pdf
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurement
 
Automated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test MeasurementAutomated Voice And Audio Quality Test Measurement
Automated Voice And Audio Quality Test Measurement
 

Kürzlich hochgeladen

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Kürzlich hochgeladen (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Speech processinglecworkshop

  • 1. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speech Processing Govind Center for Computational Engineering & Networking Amrita Vishwa Vidyapeetham Govind CEN, Amrita Vishwa Vidyapeetham
  • 2. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Outline Introduction Human Speech Production and Perception Systems Representation of Speech in the Time and Frequency Domains Speech Sounds and Features Signal Processing Methods for Estimating Speech Features Speech Processing Applications Speech Recognition Speech Synthesis Govind CEN, Amrita Vishwa Vidyapeetham
  • 3. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Prior Knowledge Required: Signals and Systems Digital signal Processing Advanced DSP Govind CEN, Amrita Vishwa Vidyapeetham
  • 4. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Signals and Systems Classification of Signals LTI systems Correlation/Convolution Operations Fourier Representation: FS, DTFS, DTFT,DFT,FFT, Z-transform Concepts of Impulse Response, Frequency Response etc. Govind CEN, Amrita Vishwa Vidyapeetham
  • 5. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Digital signal Processing Sampling: Nyquist, Aliasing FFT implementation of DFT Design of FIR and IIR filters Structures for realization of Filters Multirate signal processing: Filter banks Govind CEN, Amrita Vishwa Vidyapeetham
  • 6. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP Advanced DSP Time-Frequency Analysis TFA by STFT TFA by wigner Distribututions TFA by Wavelets Govind CEN, Amrita Vishwa Vidyapeetham
  • 7. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Prerequisites: S&S, DSP & ADSP References L. Rabiner, Biing-Hwang Juang and B. Yegnanarayana,"Fundamentals of Speech Recognition",Pearson Education Inc.2009 Douglas O’Shaughnessy,"Speech Communication",University Press,2001 Thomas F Quatieri,"Discrete Time Speech Signal Processing", Pearson Education Inc.,2004 Govind CEN, Amrita Vishwa Vidyapeetham
  • 8. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Introduction Information in Speech Message Language Accent Speaker Emotions/Stress Applications Recognition Speech recognition Speaker Recognition/Verification Emotion Recognition etc.. Synthesis Text to Speech Synthesis Speech Enhancement Voice Conversion Govind CEN, Amrita Vishwa Vidyapeetham
  • 9. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Applications:Recognition Speech Objective Information Extracted Message Author of the danger... Speaker Its Govind Speaking Speaker claim has to be verified Hi Govind, your claim is ac- cepted Govind CEN, Amrita Vishwa Vidyapeetham
  • 10. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Applications:Synthesis Input Objective Output Text To Speech Synthesis Text (Epochs Occur... Synthesize Text Speech Enhancement Remove noise Remove reverberation Enhance desired speaker speech Voice Conversion Convert source speaker speech target speakr speech Govind CEN, Amrita Vishwa Vidyapeetham
  • 11. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals What makes automatic processing of speech Complicated? Its an inter-disciplinary area 1 Signal Processing: The process of extracting relevant information from speech signal 2 Physics: The science of understanding relationship between physical speech signal and physiological mechanisms that produced it. 3 Pattern Recognition: Grouping or classifying patterns of various events in speech 4 Communication and information theory: Deals with efficient way of encodng or decoding parameters of speech, efficient serach for patterns of interest in speech (dynamic programming, viterbi search, stack algorithms etc..) 5 Linguistics: The relationship between sounds (phonology) with syntax and semantics of a language and sense that derived from the meaning (pragmatics) 6 Computer Science: The study of diferent algorithms for implementing in Software/Hardware 7 Psychology: Understanding the psychological state of the speaker/listener will be helpful for the tasks like emotion analysis. Govind CEN, Amrita Vishwa Vidyapeetham
  • 12. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speaker-Listener Schematic Diagram in Speech Communication Figure: Schematic Diagram of Speech Communication: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 13. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Production-Perception Block Diagram DĞƐƐĂŐĞ &ŽƌŵƵůĂƚŝŽŶ >ĂŶŐƵĂŐĞ ŽĚĞ EĞƵƌŽͲ DƵƐĐƵůĂƌ ŽŶƚƌŽůƐ sŽĐĂů dƌĂĐƚ ^LJƐƚĞŵ ĐŽƵƐƚŝĐ tĂǀĞĨŽƌŵ dƌĂŶƐŵŝƐƐŝŽŶ ŚĂŶŶĞů ĐŽƵƐƚŝĐ tĂǀĞĨŽƌŵ DĞƐƐĂŐĞ hŶĚĞƌƐƚĂŶĚŝŶŐ >ĂŶŐƵĂŐĞ dƌĂŶƐůĂƚŝŽŶ EĞƵƌĂů dƌĂŶƐĚƵĐƚŝŽŶ ĂƐŝůĂƌ DĞŵďƌĂŶĞ DŽƚŝŽŶ dĞdžƚ WŚŽŶĞŵĞƐͲ WƌŽƐŽĚLJ ƌƚŝĐƵůĂƚŽƌLJ DŽƚŝŽŶ ^ĞŵĂŶƚŝĐƐ WŚŽŶĞŵĞƐ tŽƌĚƐ ^ĞŶƚĞŶĐĞƐ &ĞĂƚƵƌĞ džƚƌĂĐƚŝŽŶ ŽĚŝŶŐ ^ƉĞĐƚƌƵŵ ŶĂůLJƐŝƐ Figure: Speech production BlockDiagram: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 14. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Speech Production Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri, "Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi Govind CEN, Amrita Vishwa Vidyapeetham
  • 15. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Mechanical Equivalent of Speech Production System Figure: Speech production mechanism: Figure Courtesy- Rabiner et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 16. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Representation of Speech Signal 0 0.5 1 1.5 2 2.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure: Speech Signal in Time domain Govind CEN, Amrita Vishwa Vidyapeetham
  • 17. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Glottal Air Flow During Speech Production Figure: Glottal air flow: Courtesy- Rabinar et al. Govind CEN, Amrita Vishwa Vidyapeetham
  • 18. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Glottal Air Flow: Graphical Illustration 1.3 1.35 1.4 1.45 1.5 1.55 x 10 4 −1 −0.5 0 0.5 Time (Samples) Amplitude Speech Waveform 1.3 1.35 1.4 1.45 1.5 1.55 x 10 4 −1 −0.5 0 0.5 Time (Samples) Amplitude Glottal Flow: EGG Speech EGG Glottis Vibration Govind CEN, Amrita Vishwa Vidyapeetham
  • 19. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of Speech Sounds Silence (S): No Speech is produced Unvoiced (U): Vocal folds are not vibrating Voiced (V): Periodic vibration of vocal cords 0 0.5 1 1.5 2 2.5 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 US S V V V Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham
  • 20. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of Speech Sounds Separation of voiced sounds from unvoiced and silence sounds is known as voiced-non-voiced detection Issues in voiced-non-voiced detection: Difficult to identify weak unvoiced sound from silence Difficult to distinguish weakly periodic voiced sounds from unvoiced sounds Govind CEN, Amrita Vishwa Vidyapeetham
  • 21. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes SpectroGrams: Narrow-band & Wide-band Govind CEN, Amrita Vishwa Vidyapeetham
  • 22. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Spectral Envelope from a Long Segment of Speech 0 10 20 30 0 1000 2000 3000 4000 0 20 40 FrameIndex Frequency (Hz) Magnitude Govind CEN, Amrita Vishwa Vidyapeetham
  • 23. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Classification of sound units WŚŽŶĞŵĞƐ sŽǁĞůƐ ĨĨƌŝĐĂƚĞ Ɛ ŝƉŚƚŚŽŶŐƐ ^ĞŵŝͲ sŽǁĞůƐ >ŝƋƵŝĚƐ 'ůŝĚĞƐ ŽŶƐŽŶĂŶƚ Ɛ EĂƐĂůƐ WůŽƐŝǀĞƐ &ƌŝĐĂƚŝǀĞƐ tŚŝƐƉĞƌƐ &ƌŽŶƚ DŝĚ ĂĐŬ sŽŝĐĞĚ hŶǀŽŝĐĞĚ ŝ ;ĞǀĞͿ / ;ŝƚͿ Ğ ;ŚĂƚĞͿ ;ŵĞƚͿ h;ŬͿ Ƶ;ƚͿ ;ƵƉͿ Ă ;ĨĂƚŚĞƌͿ Ž;KďĞLJͿ Đ; ůůͿ ĂLJ ;ďƵLJͿ Ăǁ;ĚŽǁŶͿ ĞLJ ;ďĂŝƚͿ K ;ďŽLJͿ ƚnj ;ƐƉŽƌƚƐͿ ũŚ;ũƵĚŐĞͿ ĐŚ ;ĐŚƵƌĐŚͿ ů ;ůĂƌŐĞͿ ƌ;ƌƵŶͿ ǁ ;ǁŝƚͿ LJ ;LJŽƵͿ ŵ ;ŵĞƚͿ Ŷ;ŶĞƚͿ ŶŐ;ƐŝŶŐͿ Ś ;ŚĞͿ ď ;ďĂůůͿ Ě ;ĚĞďƚͿ Ő ;ŐĞƚͿ Ŭ ;ŬŝƚͿ Ɖ ;ƉĞŶͿ ƚ;ƚĞŶͿ sŽŝĐĞĚ hŶǀŽŝĐĞĚ ǀ ;ǀĂƚͿ ĚŚ;ƚŚĂƚͿ nj;njŽŽͿ Ĩ ;ĨƵŶͿ ƚŚ ;ƚŚŝŶŐͿ Ɛ;ƐĂƚͿ ƐŚ;ƐŚŽƵůĚͿ Govind CEN, Amrita Vishwa Vidyapeetham
  • 24. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Representation of sound units in speech Sounds are classified into vowels and consonant Vowels: By exciting fixed vocaltract shape with quasi periodic glottal pulses Vowels are classified into front, mid and back based on the tongue-hump-position Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate") Mid vowels: /a/("father"), /Λ/("Up") Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey") Another classification is based on the length of vowels: Long and short Diphthongs: Combination of two vowels /ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in "boat",/cy/ as in "boy" etc. Govind CEN, Amrita Vishwa Vidyapeetham
  • 25. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Front Vowel Front Vowel Speech Signal Spectrogram I(It) 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1000 2000 3000 4000 5000 6000 7000 e(Hate) 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1000 2000 3000 4000 5000 6000 7000 i(eve) 0.32 0.34 0.36 0.38 0.4 0.42 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Govind CEN, Amrita Vishwa Vidyapeetham
  • 26. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Vowel Analysis Front vowels found to show high frequency resonance Front vowels are discriminated among each other by the tongue height during the vowel production Mid vowels found to show well separated and balanced resonant frequency distribution Back vowels shows almost no energy beyond low frequency regions Govind CEN, Amrita Vishwa Vidyapeetham
  • 27. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Diphthongs Govind CEN, Amrita Vishwa Vidyapeetham
  • 28. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Semivowels Group of sounds consisting of /w/,/r/,/l/,/y/ difficult to characterize because they are vowel like in nature Characterized by gliding transition in vocaltract area functions between adjacent phonemes Best described as transitional vowel like sounds Govind CEN, Amrita Vishwa Vidyapeetham
  • 29. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Nasal Consonants Group of sounds consisting of /m/,/n/,/η/ Produced with glottal Excitation and vocaltract totally constricted along the oral passageway Velam is lowered to block the air passage through oral cavity and allowing through nasal cavity Due the acoustic coupling of oral cavity to the pharynx, anti resonances will be created /m/,/n/ and /η/ are produced by the constiction at lips, behind the teeth and at velum, respectively. Govind CEN, Amrita Vishwa Vidyapeetham
  • 30. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Nasalized Vowels Govind CEN, Amrita Vishwa Vidyapeetham
  • 31. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Unvoiced Fricatives Produced by exciting vocaltract with a turbulant airflow through a narrow constriction /f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the class of fricative sounds /f/: Constriction at teeth /s/: Constriction near middle of oral cavity /sh/: constriction at the end of oral tract Govind CEN, Amrita Vishwa Vidyapeetham
  • 32. Organization: Speech Processing Prerequisites Introduction Speech Production Representation of Speech Signals Spectro-Temporal Representation classification of Phonemes Voiced Fricatives /v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class of fricative sounds /v/: Constriction at teeth /z/: Constriction near middle of oral cavity /zh/: constriction at the end of oral tract Except glottal vibrations, the place of articulation remains same as that of unvoiced fricatives Govind CEN, Amrita Vishwa Vidyapeetham