2. SPEECH PERCEPTION
Speech perception refers to the ability to perceive linguistic structure in the acoustic speech signal.
Process in which speech signal are transformed into the neural representation that are then projected
into word-form representation in the mental lexicon (L E Bernstein, 2001).
3. THEORIES OF SPEECH PERCEPTION
Theories of speech perception explains how the auditory system analyzes the speech sounds
received and how this information is processed by the listener to correctly understand the message of the
speaker.
Theories of speech perception is necessary to explain certain facts about the acoustic speech signal:-
* There is inter-speaker and intra-speaker variability among signals
* The acoustic speech signal is continuous even though it is perceived as a series of discrete units
* Speech signals contain cues that are transmitted very quickly
5. ACTIVE THEORIES PASSIVE THEORIES
Active:- Cognitive/intellectual work is involved in
perception
Passive:- Perception relies on passive responses
(thresholds)
Top down:- Listeners use higher level sources of
information to supplement the acoustic signal
Bottom up:- Perception is built from information in the
physical signal
Auditory:- Listeners identify acoustic patterns or
features by matching them to stored acoustic
representations
Motor:- Listeners extract information about
articulations from the acoustic signal
6. ACTIVE THEORIES
1. Motor theory
• Early motor theory
• Revised motor theory
• Analysis by synthesis
2. Dual stream model
3. Reverse hierarchy theory
7. EARLY MOTOR THEORY
Developed by Liberman and colleagues in 1967
Undergone significant changes since its initial formulation
Object of speech perception is articulatory events rather than acoustic or auditory events
Basic Principle
The motor theory suggests that perception involves a reference to articulation. The
basic principle of this model lies with the production of speech sounds in the speaker’s vocal
tract. The motor theory proposes that a listener specifically perceives a speaker’s phonetic
gestures while they are speaking.
8. Liberman (1972) has stated that speech perception constitutes a unique process, and the reasons
for this opinion as follows,
1. The speech code involves a special function
2. It exists in distinctive form
3. It is unlocked by a special key
4. It is perceived in a special mode.
9. 1. The speech code involves a special function
• Basis of speech perception:- Lies in the grammatical structure of the language
• Grammatical structure :- Serves as a matching device
• Mismatch ???
Transmission system of the vocal tract and auditory mechanism developed independently of the
cortical intellect
• Grammatical structure serves as a matching device to overcome the high impedance (mismatch)
which occurs at the interface between the transmission and the-phonetic system, and between the
phonetic system and the intellect.
10. Speech code, bridges the gap between the acoustic and semantic levels of speech processing.
Liberman :- If we were asked to repeat an idea expressed by someone we donot duplictae the speech input,
but we rephrase it.
This suggests long term memory stores information in a manner different from STM
Such restructuring is possible through rules of linguistic structure (grammatical structure)
11. 2.Speech perception as a distinctive form
• Coarticulation:- Important role in speech perception
• Increases the efficiency of speech production:- increasing the rate at which phonemes are
transmitted.
• Liberman (1972) has used the word “bag” to explain the overlapping of acoustic features that is
caused by coarticulation.
13. Ear receives acoustic signal :- In a serial form
Processing of phonemes :- Not serial
Rapid rate of information flow:- Processed by parallel processing
This means that the acoustic signal for a phoneme must contain several coexisting bits of information or
constraint criteria rather than comprise a discrete pattern of phonemes
14. 3.Unlocked by a special key
• The motor theory of speech perception, proposes that speech sounds represent a code based on the
phonemic structure of the language, rather than on an alphabet or a cipher.
• In a cipher system a symbol represents each of the units of the original message.
15. • Speech sounds are produced by the neuromuscular events that are at some point equivalent to the
grammar of the language (Liberman et al 1967).
• This statement implies a direct relationship between our manner of encoding language instructions
for transmission and the decoding of the resultant acoustic signal.
16. • The features of phonemes are overlapped in the acoustic stream and encoded into syllable sized
units. This process greatly reduces the number of discrete acoustic segments which the ear must
process in a given unit of time.
• Therefore instead of receiving phonemes directly embedded in the sound stream, the phonemes
might be recovered from the sound stream.
• This assumption is based upon the concept of encoded information unlocked or decoded through a
process of reconstructing bits of constraining information transmitted by the source as an acoustic
pattern
17. • The representation of speech sounds in the acoustic signal varies depends upon preceding or
following speech sounds.
• Representation of /d/ on the syllables /di/ and /du/.
18. • The first formant transition indicates that the sound is either of voiced stops (b/d/g).
• Liberman et al, (1967) concluded that the 2nd formant transitions are the cues for the perception of
/d/ as opposed to /b/ or /g/.
• As we can see in the figure that 2nd formant transitions are different for both syllables, but listener
still perceive that the both syllables start with /d/.
• In case of /di/ 2nd formant rises from 2200 to 2600 Hz, and in /du/ it falls from 1200 to 700 Hz.
This difference is due to the effect of coarticulation.
19. 4. Perceived in a special mode
• Another assumption of motor theory is that speech is perceived in different mode from non speech.
• Lehiste (1972)
• Auditory processing:- Non speech signals
• Phonetic processing:- Speech signals
• Lower levels of processing :- Signal goes to both speech and non speech processors
• Speech processor:_ Extract the phonetic features and gate the neural signal appropriate for speech
processor.
20. REVISED MOTOR THEORY
• Liberman and Mattingly in 1985.
• Two major revisions of early motor theory.
1. Existence of specialized module for perception of speech
2. Existence of duplex
21. Existence of specialized module for perception of speech
Brain has developed specialized areas for treating certain perceptual information. These areas operate
independently of other brain processing.
Duplex Perception
Same acoustic information can be processed simultaneously in a both speech and non-speech mode
Refers to the linguistic phenomena where “part of acoustic signals is used for both a speech and a non-speech
percept”
22. Duplex perception :- First described by Rand (1974)
Base:- Consists of first and second formants with their transition and steady state part of third
formant. Base is presented to one ear
Other part :- Consisted of isolated third formant transition. Transition in the 3rd formant can be
varied (/ da / or/ ga /)
Can be perceived in two ways at the same time
a. Complete syllable ( /da/ or /ga/ based on the formant transitioin. Heard in the ear which
base is presented
b. Non speech chirp heard in the other ear simultaneously
23.
24. ANALYSIS BY SYNTHESIS
• Theory was proposed by Stevens and Halle (1967).
• Differ from motor theory:- More acoustic and less articulatory and relies on a matching system.
• They postulated that ‘”perception of speech involves the internal synthesis of patterns according to
certain rules, and a matching of these internally generated patterns against the pattern under
analysis”
25. • Incoming acoustic signal:- decoded into abstract representation of segments and feature employing
the same set of phonological rules used by the speaker to generate the acoustic signal.
• Listener is processing information, not in terms of the acoustic signal but in terms of knowledge of
rules governing its production.
26.
27. ADVANTAGES
Normalization (context and speaker normalization) is explained.
Explains the perception of dialectal variations.
Accounts for perception of rapid rate of speech.
Accounts for perception of all speech sounds.
LIMITATIONS
Does not have research support
Speech v/s non speech processing is not clearly explained
28. DUAL STREAM MODEL
Hickok & Poeppel (2004)
Speech information can be processed with different routes:-
a. Auditory conceptual route
b. Auditory motor route
These two routes form the basis of dual model of speech processing
29. Ventral & Dorsal Stream
Ventral stream:- Structures in the superior and middle portions of the temporal lobe.
Processing of speech signals for comprehension. Bilaterally organized
Dorsal stream:- Structures in the posterior planum temporale, and posterior frontal lobe.
Speech production. Strong only on the LH
30. VENTRAL STREAM
• Mapping from sound to meaning
• Maps phonological information onto conceptual and semantic representations
• Processes distinctive features of the sounds, phonemes, syllabic and phonological structures,
and grammatical and semantic information
• Spectro-temporal analysis takes place in the STG
• Phonological processing: STG is activated with language tasks which requires the phonological
information including the perception and production of speech signal (Indefrey & Levelt, 2004).
• Lexical-semantic access: The left STG plays an important role in the analysis of speech sounds for
comprehension at a linguistic-semantic level.
31. DORSAL STREAM
Mapping from sound to action
•The dorsal stream left dominant, and, within the left hemisphere, involves only a portion of the
PT.
•Damage to the dorsal stream may result in conduction aphasia, difficulty with repetition of
speech with no impairment to the recognition of speech.
•Dorsal stream is more of sensori- motor integration part.
•There is less agreement regarding the functional role of the auditory dorsal stream.
32. REVERSE HIERARCHIAL THEORY
Merav Ahissar 2008
Sense organs :- Dissect the incoming signal into their constituent elements that are localized
along the sensory epithelium.
Auditory processing:- Lower level of processing :- Up to midbrain extract the spectro
temporal features of the signals
Higher level processing:- Above midbrain, especially at the cortex
35. ACOUSTIC THEORY
Acoustic signal:- Contains all information necessary for the identification and recognition of
various speech sounds
Acoustic analysis:- Result in a clean non-overlapping set of groups that correspond to different
speech sounds
Speech perception then becomes a passive process wherein the only processing undertaken is the
acoustic analysis of the signal by the auditory system which will result in the natural categorization
of the various speech sounds.
36. Speech production mechanism consists of two components;
• A sound generator or source
• A sound modifier or filter
Sound generated at the source_travels along the vocal tract (modified by various articulators)_radiates out
through the mouth_some sound frequencies are filtered out_others are emphasized
Thus, the vocal tract is referred to as filter.
The frequencies that are maximally amplified depend upon the entire shape of the vocal tract.
These resonant frequencies are referred to as formant frequencies and they are labeled sequentially in an
ascending order as the first formant (F1), the second formant (F2) and so on.
37. Coding of speech sounds
• Fant acoustic boundaries of the successive sound units can be fairly well defined in terms
of source and filter. He recognizes these boundaries as;
Phonetic / Acoustic
Non phonemic / Linguistic
• It’s from these units that the binary information regarding the distinctive features are
extracted.
• It’s believed that the acoustic patterns of speech are mapped onto the neurophysiologic
structures of the auditory system
38. Vowel perception
• This theory helps us to understand the perception vowel based on the assumption that the different vowels
contain different set of formant frequencies.
• Thus a particular vowel would have a particular formant which will have a particular configuration of
formant ratio with the F0.
• This formant F0 ratio will be unique to the vowel and doesn’t vary significantly with acoustic or linguistic
boundaries.
• Thus, a given vowel would have a given configuration which would be different from another vowel.
39. Criticisms of acoustic theory
• Speaking rate: Speaking rate can also affect the formant frequencies.
• InterPersonal variability: An enormous variation in the formant structure for any given vowel
when spoken by different speakers (Peterson & Barany '52) due to within speaker and between
speaker factors.
• Gender effects: The formant frequencies for particular vowel vary depending on whether the
speaker is a male, female or child
• Dialect: The language and dialect of the speaker can also affect the formant frequencies.
Active theories:- compares the input pattern to internally generated pattern
Individuals knowledge on sound production is an integral factor.
Passive theories:_ Knowledge of speech production has very minor role
Focuses on the ways in which auditory system extract cues from the acoustic information.
Is perceived by reference to the place and manner of production of acoustic signal.