2. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Outline
Introduction
Human Speech Production and Perception Systems
Representation of Speech in the Time and Frequency
Domains
Speech Sounds and Features
Signal Processing Methods for Estimating Speech
Features
Speech Processing Applications
Speech Recognition
Speech Synthesis
Govind CEN, Amrita Vishwa Vidyapeetham
3. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Prior Knowledge Required:
Signals and Systems
Digital signal Processing
Advanced DSP
Govind CEN, Amrita Vishwa Vidyapeetham
4. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Signals and Systems
Classification of Signals
LTI systems
Correlation/Convolution Operations
Fourier Representation: FS, DTFS, DTFT,DFT,FFT,
Z-transform
Concepts of Impulse Response, Frequency Response etc.
Govind CEN, Amrita Vishwa Vidyapeetham
5. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Digital signal Processing
Sampling: Nyquist, Aliasing
FFT implementation of DFT
Design of FIR and IIR filters
Structures for realization of Filters
Multirate signal processing: Filter banks
Govind CEN, Amrita Vishwa Vidyapeetham
6. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
Advanced DSP
Time-Frequency Analysis
TFA by STFT
TFA by wigner Distribututions
TFA by Wavelets
Govind CEN, Amrita Vishwa Vidyapeetham
7. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Prerequisites: S&S, DSP & ADSP
References
L. Rabiner, Biing-Hwang Juang and B.
Yegnanarayana,"Fundamentals of Speech
Recognition",Pearson Education Inc.2009
Douglas O’Shaughnessy,"Speech
Communication",University Press,2001
Thomas F Quatieri,"Discrete Time Speech Signal
Processing", Pearson Education Inc.,2004
Govind CEN, Amrita Vishwa Vidyapeetham
8. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Introduction
Information in Speech
Message
Language
Accent
Speaker
Emotions/Stress
Applications
Recognition
Speech recognition
Speaker Recognition/Verification
Emotion Recognition etc..
Synthesis
Text to Speech Synthesis
Speech Enhancement
Voice Conversion
Govind CEN, Amrita Vishwa Vidyapeetham
9. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Recognition
Speech Objective Information Extracted
Message Author of the danger...
Speaker Its Govind Speaking
Speaker claim has to
be verified
Hi Govind, your claim is ac-
cepted
Govind CEN, Amrita Vishwa Vidyapeetham
10. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Applications:Synthesis
Input Objective Output
Text To Speech Synthesis
Text (Epochs Occur... Synthesize Text
Speech Enhancement
Remove noise
Remove reverberation
Enhance desired
speaker speech
Voice Conversion
Convert source
speaker speech target
speakr speech
Govind CEN, Amrita Vishwa Vidyapeetham
11. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
What makes automatic processing of speech
Complicated?
Its an inter-disciplinary area
1 Signal Processing: The process of extracting relevant information from
speech signal
2 Physics: The science of understanding relationship between physical
speech signal and physiological mechanisms that produced it.
3 Pattern Recognition: Grouping or classifying patterns of various events
in speech
4 Communication and information theory: Deals with efficient way of
encodng or decoding parameters of speech, efficient serach for patterns of
interest in speech (dynamic programming, viterbi search, stack algorithms
etc..)
5 Linguistics: The relationship between sounds (phonology) with syntax
and semantics of a language and sense that derived from the meaning
(pragmatics)
6 Computer Science: The study of diferent algorithms for implementing in
Software/Hardware
7 Psychology: Understanding the psychological state of the
speaker/listener will be helpful for the tasks like emotion analysis.
Govind CEN, Amrita Vishwa Vidyapeetham
12. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speaker-Listener Schematic Diagram in Speech
Communication
Figure: Schematic Diagram of Speech Communication: Figure
Courtesy- Rabiner et al.
Govind CEN, Amrita Vishwa Vidyapeetham
14. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Speech Production
Figure: Speech production mechanism: Figure Courtesy- Thomas F. Quatieri,
"Discrete-Time Speech Signal Processing", Chapter. 3, pp. 58, Pearson Edu., Delhi
Govind CEN, Amrita Vishwa Vidyapeetham
15. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Mechanical Equivalent of Speech Production System
Figure: Speech production mechanism: Figure Courtesy- Rabiner et
al.
Govind CEN, Amrita Vishwa Vidyapeetham
16. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of Speech Signal
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure: Speech Signal in Time domain
Govind CEN, Amrita Vishwa Vidyapeetham
17. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow During Speech Production
Figure: Glottal air flow: Courtesy- Rabinar et al.
Govind CEN, Amrita Vishwa Vidyapeetham
18. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Glottal Air Flow: Graphical Illustration
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Speech Waveform
1.3 1.35 1.4 1.45 1.5 1.55
x 10
4
−1
−0.5
0
0.5
Time (Samples)
Amplitude
Glottal Flow: EGG
Speech EGG
Glottis
Vibration
Govind CEN, Amrita Vishwa Vidyapeetham
19. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Silence (S): No Speech is produced
Unvoiced (U): Vocal folds are not vibrating
Voiced (V): Periodic vibration of vocal cords
0 0.5 1 1.5 2 2.5
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
US S
V
V
V
Figure: Speech signal in time domainGovind CEN, Amrita Vishwa Vidyapeetham
20. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Classification of Speech Sounds
Separation of voiced sounds from unvoiced and silence
sounds is known as voiced-non-voiced detection
Issues in voiced-non-voiced detection:
Difficult to identify weak unvoiced sound from silence
Difficult to distinguish weakly periodic voiced sounds from
unvoiced sounds
Govind CEN, Amrita Vishwa Vidyapeetham
24. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Representation of sound units in speech
Sounds are classified into vowels and consonant
Vowels: By exciting fixed vocaltract shape with quasi
periodic glottal pulses
Vowels are classified into front, mid and back based on the
tongue-hump-position
Front vowels:/i/("eve"), /I/("it"),//("at"),/e/("hate")
Mid vowels: /a/("father"), /Λ/("Up")
Back Vowels: /U/("foot"),/u/("boot"),/o/("Obey")
Another classification is based on the length of vowels:
Long and short
Diphthongs: Combination of two vowels
/ay/ as in "buy",/aw/ as in "down",/ey/ as in "bait",/o/ as in
"boat",/cy/ as in "boy" etc.
Govind CEN, Amrita Vishwa Vidyapeetham
26. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Vowel Analysis
Front vowels found to show high frequency resonance
Front vowels are discriminated among each other by the
tongue height during the vowel production
Mid vowels found to show well separated and balanced
resonant frequency distribution
Back vowels shows almost no energy beyond low
frequency regions
Govind CEN, Amrita Vishwa Vidyapeetham
28. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Semivowels
Group of sounds consisting of /w/,/r/,/l/,/y/
difficult to characterize because they are vowel like in
nature
Characterized by gliding transition in vocaltract area
functions between adjacent phonemes
Best described as transitional vowel like sounds
Govind CEN, Amrita Vishwa Vidyapeetham
29. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Nasal Consonants
Group of sounds consisting of /m/,/n/,/η/
Produced with glottal Excitation and vocaltract totally
constricted along the oral passageway
Velam is lowered to block the air passage through oral
cavity and allowing through nasal cavity
Due the acoustic coupling of oral cavity to the pharynx, anti
resonances will be created
/m/,/n/ and /η/ are produced by the constiction at lips,
behind the teeth and at velum, respectively.
Govind CEN, Amrita Vishwa Vidyapeetham
31. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Unvoiced Fricatives
Produced by exciting vocaltract with a turbulant airflow
through a narrow constriction
/f/("four"),/θ/("thing"),/s/("sat") and /sh/ ("shut") are the
class of fricative sounds
/f/: Constriction at teeth
/s/: Constriction near middle of oral cavity
/sh/: constriction at the end of oral tract
Govind CEN, Amrita Vishwa Vidyapeetham
32. Organization: Speech Processing
Prerequisites
Introduction
Speech Production
Representation of Speech Signals
Spectro-Temporal Representation
classification of Phonemes
Voiced Fricatives
/v/("vat"),/δ/("zoo"),/z/("zoo") and /zh/("azure") are the class
of fricative sounds
/v/: Constriction at teeth
/z/: Constriction near middle of oral cavity
/zh/: constriction at the end of oral tract
Except glottal vibrations, the place of articulation remains
same as that of unvoiced fricatives
Govind CEN, Amrita Vishwa Vidyapeetham