SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Computational Approaches to the
Syntax-Prosody Interface: Using
Prosody to Improve Parsing
Dissertation Defense
Hussein Ghaly
December 12th, 2019
Goal and Motivation
Main Goal:
Improve automatic syntactic parsing of spontaneous spoken
sentences using prosodic cues
Theoretical Motivation:
● Automatic parsing is negatively affected by syntactic ambiguity (Kummerfeld
et al., 2012)
● Prosody can help resolving some syntactic ambiguities (Cutler et al., 1997)
● Syntactic structure is related to prosodic structure (Selkirk, 1986, among
many other studies)
Challenges and Opportunities
Challenges
- Lack of congruence between syntactic and prosodic structures
- Lack of interdisciplinary engagement in prosody research between computational
linguistics and other branches of linguistics
Opportunities
- Availability of parsing frameworks
- Availability of ToBI annotation
- Availability of speech corpora (e.g. the Switchboard Corpus)
- Interest in Natural Language Understanding for speech
What is prosody
● “(1) acoustic patterns of F0, duration, amplitude, spectral tilt, and segmental
reduction, and their articulatory correlates, that can be best accounted for by
reference to higher-level structures, and (2) the higher-level structures that
best account for these patterns.” (Shattuck-Hufnagel and Turk, 1996)
● Includes a number of speech phenomena, including: Prosodic phrasing,
Stress, Intonation, Rhythm
● Autosegmental-Metrical Theory (Ladd, 2008) was proposed to organize these
components together
Prosodic Structure is a Hierarchy of Constituents
● all languages have
hierarchically ordered
prosodic structure
● languages make use of
the same set of
prosodic categories
(Elfner, 2018) Illustration of prosodic hierarchy, from Elfner (2018)
Prosodic structure is much flatter than syntactic structure
As a matter of fact that’s what I’m doing
⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔
iP iP
IP
Prosodic word
Intermediate phrase/Phonological Phrase
Intonational phrase
Prosodic Structure
Syntactic Structure
Prosody is influenced by syntax and other factors
Syntax
● Suci (1967): non-syntactically structured word
lists have more prosodic variation than
syntactically structured sentences
● Prosody can resolve some syntactic
ambiguities
● Some syntactic structures (e.g. parentheticals
“John, said Mary, was nice”, and tag
questions “She’s Italian, isn’t she?”)
● clause boundaries (e.g. when John left, I
cried)
Other factors
● prosodic grouping can be
different from syntactic
grouping
○ syntax follows the grouping
S (V O), while prosody
follows the grouping (S V) O
(Martin, 1970)
● speech rate
● utterance length and
constituent length
● semantic and pragmatic
factors
A theoretical model depicts factors affecting prosody
Model for factors influencing prosody
(from Turk and Shattuck-Hufnagel,
2014)
● prosodic prosodic structure as a
theoretical construct,
representing the convergence of
all these factors
● Constituent length can be added
to utterance length factors
Syntax-Prosody Interface - some phonological theories
● Indirect Reference: phonological processes apply to prosodic domains
(constituents), which are related to syntactic constituents
○ Selkirk (1986) Align-XP: syntactic constituents share one edge with prosodic constituents
(Align-R or Align-L depending on the language)
○ Truckenbrodt (1995, 1999) Wrap-XP: a constraint that demands that each syntactic phrase is
contained within a phonological phrase
○ Match Theory (Selkirk 2006, 2009, Elfner 2012, Myrberg 2013): syntactic clauses map to
Intonational phrases, syntactic phrases map to phonological phrases, and morphosyntactic
words map to prosodic words
ToBI is a system for annotating prosody
ToBI: Tones and Break Indexes
A system for annotating prosodic
information (Silverman et al., 1992)
Based on theories of prosodic
structure by Beckman and
Pierrehumbert (1986)
Break indexes (0-4) reflect
disjuncture levels between words
from (Veilleux et al., 2006)
Part 1 - The Effect of Syntactic Phrase
Length on Prosody
Phrase length affects prosody of double center embedded sentences in English
Double Center Embedded sentences (From Fodor and Nickels, 2011):
- Encouraging Phrase length (ENC) (short inner phrases) split into 3 chunks
- Discouraging Phrase length (DISC) (long inner phrases) split into 4+ chunks
NP1 NP2 NP3 VP1 VP2 VP3
the rusty old ceiling pipes that the plumber my dad trained fixed continue to leak occasionally
NP1 NP2 NP3 VP1 VP2 VP3
the pipes that the unlicensed plumber the new janitor reluctantly assisted tried to repair burst
No difference in prosody due to phrase length was found in French
Desroses (2014) manually examined the frequency of pauses (silence intervals
>=250 ms) at the edges of syntactic constituents, didn’t find difference between
the two sentence types
ENC: Le joli ballon jaune vif (1) que l'enfant (2) que le maĂźtre (3) punit (4) lĂącha (5)
est vraiment coincé dans l'arbre.
DISC: Le ballon (1) que le jeune enfant (2) que le maĂźtre d'Ă©cole (3) punit trĂšs
souvent (4) lĂącha bĂȘtement (5) est jaune.
Before NP2 (location
1)
After NP2 (location 2) Before VP2 (location 4) After VP2 (location 5)
ENC 6.7 % 14.07 % 22.6 % 28.15 %
DISC 8.5 % 10.4 % 19.63 % 27.04 %
Data reanalyzed using judge annotation and forced alignment
Re-analyzing recordings collected by Desroses (271 ENC and 272 DISC
recordings) to identify the prosodic boundaries at the edges of syntactic phrases by:
● Obtaining judgments by two native speakers of French, of where they perceive
the prosodic boundaries, in a subset of recordings (48 recordings)
● Using forced alignment (automatically mapping each word to its corresponding
portion of the audio file), to obtain silent pause durations between words (397
recordings)
Sentences were presented to judges for annotation
An example sentence of the set presented to judges
Forced alignment indicated pauses between words
● Montreal Forced Alignment (McAuliffe et al., 2017) was used
● edges of syntactic phrases are identified manually in a copy of the sentence,
for example:
○ Le ballon <1> que le jeune enfant <2> que le maĂźtre d'Ă©cole <3> punit trĂšs souvent <4> lĂącha
bĂȘtement <5> est jaune.
● Words in the forced alignment with their start and end times are mapped to
those in the copy
● Pause values are calculated at the five locations
Judges indicated difference in average number of breaks for each sentence type
- average number of prosodic boundaries over all five syntactic boundaries for each judge (48 recordings)
- first judge: ENC: 2.43; DISC: 2.92. second judge: ENC: 2.5; DISC: 3.2.
Forced Alignment indicated more pauses before VP2 for DISC sentences
● at location 4, percentage of ENC sentences with a pause value of 250 ms or greater
was 14%, while percentage for DISC sentences was 19.1% (397 recordings)
● Average pause duration at location 4: 105 ms (ENC), 154 ms (DISC)
○ After excluding recordings with pauses > 1 second: 80 ms (ENC, 189 recordings), 110 ms (DISC, 202
recordings), p-value .056
Part 2 - Resolving Syntactic Ambiguities
Using Prosody
Goal: examine whether it is possible to identify the syntactic attachment using
prosody, both by human listeners and by computers
I saw the boy with the telescope
Can prosody resolve syntactic ambiguity?
An experiment for production and perception of sentences with ambiguities:
● Comma ambiguity:
○ John, said Mary, was the nicest person at the party.
○ John said Mary was the nicest person at the party.
● PP-attachment ambiguity:
○ I have a new telescope. I saw the boy with the telescope.
○ One of the boys got a telescope. I saw the boy with the telescope.
Comma ambiguity and PP-attachment ambiguity are investigated
Ambiguous sentences were recorded by speakers and presented to listeners
Using crowdsourcing, through Amazon Mechanical Turk (MTurk), workers were
recruited for the production and perception experiment
Production Experiment
● Ambiguous sentences were recorded by a number of naive native speakers
● Recordings by 6 speakers, with the clearest recordings, were selected
Perception Experiment
● Ambiguous sentences were presented to naive human participants both in
audio and text formats, to answer comprehension questions
● Experiment was organized in phases, where in each phase questions were
based on the recordings of only one of the speakers
Listeners answer questions about
ambiguous sentences
Question Types
1. Comma-ambiguity - Text
2. PP-attachment ambiguity with context - Text
3. PP-attachment ambiguity without context - Text
4. Comma-ambiguity - Audio
5. PP-attachment ambiguity with context - Audio
6. (and 7) PP-attachment ambiguity without context - Audio (two
different sentences of this question type)
Example: https://champolu.net/mturk/listen.html?abcd
● Participants disambiguation
accuracy (text: 49% audio
63%, p-value < .001,
independent t-test)
● Results are after excluding
sentences not understood
properly even with context,
and listeners with overall low
comprehension accuracy
PP-attachment ambiguity is more accurately resolved in audio than in text
Question Type Accuracy
comma ambiguity - text 99%
comma ambiguity - audio 92%
PP-attachment ambiguity - text - with context 97%
PP-attachment ambiguity - audio - with context 98%
PP-attachment ambiguity - text - no context 49%
PP-attachment ambiguity - audio - no context 63%
Larger pauses yield better accuracy for high attachment sentences
● higher values of pauses and
normalized duration of the last
NP word lead to higher accuracy
of disambiguation of sentences
with high attachment by listeners
Normalized duration: actual duration of the
word divided by expected duration, where
expected duration is the sum of average
duration of each phoneme for each speaker
high attachment low attachment
Pause (ms)
Number of
Recordings
average
accuracy
Number of
Recordings
average
accuracy
0 73 35.28% 103 82.26%
10 11 41.41% 11 76.73%
20+ 34 66.93% 4 57.29%
high attachment low attachment
Normalized
Duration
Number of
Recordings
average
accuracy
Number of
Recordings
average
accuracy
<1.0 10 22.08% 32 84.84%
1 17 28.53% 39 77.16%
1.1 18 52.03% 23 85.44%
1.2 22 44.29% 15 79.63%
>1.2 51 52.74% 7 19.72%
● Based on the data just presented, a machine learning system (decision trees)
used pauses and duration values as features to predict the attachment.
● System accuracy ranged from 63% to 73%, based on how the data are split
into training and test portions (e.g. system performs better when training and
testing on different portions of the recordings of the same speaker)
Machine Learning predicts attachment of recorded sentences
Speaker ID shuffled all
intra-speaker
classification
odd speaker
out
Odd sentence
out
Odd recording
out
Listener
Accuray
my64 75.0% 50.0% 67.5% 66.3% 60.0% 64.0%
wdn 62.5% 67.5% 55.0% 53.8% 70.0% 54.0%
ds 70.0% 90.0% 70.0% 66.3% 85.0% 83.0%
dz 57.5% 60.0% 52.5% 57.5% 65.0% 56.0%
mm 70.0% 87.5% 70.0% 70.0% 85.0% 52.0%
tk 69.4% 83.2% 61.1% 65.3% 77.5% 69.0%
Average 67.4% 73.0% 62.7% 63.2% 73.8% 63.0%
● Corpus analysis was conducted for the syntactic and prosodic data in the
ToBI annotated subset from the Switchboard corpus (SWB) (Godfrey et al.,
1992) from different speakers (150 speakers)
● The focus was on PP-attachment ambiguity and relative clause attachment
(RC-attachment) ambiguity
● An algorithm was developed to identify instances of such ambiguities in the
syntactic data:
○ PP-attachment: instances of a noun phrase (NP) immediately followed by a prepositional
phrase (PP)
○ RC-attachment: instances of NP immediately followed by a relative clause (SBAR)
○ Low attachment is when there is a large NP spanning both constituents (NP + PP or NP +
SBAR), otherwise high attachment
Does attachment affect prosody in spontaneous sentences?
Examples of sentences with PP-attachment identified by the algorithm
Low attachment High attachment
Examples of sentences with RC-attachment identified by the algorithm
Low attachment High attachment
Attachment affects the distribution of prosodic breaks
● At the end of NP (before PP or SBAR), identify ToBI break index (from the
Switchboard corpus)
● Effect of RC-attachment is much stronger than PP-attachment
PP-attachment RC-attachment
Phrase Length also affects the distribution of prosodic breaks
● Consistent with Shafran and Fodor (2016), Watson and Gibson (2004), phrase
length affects likelihood of prosodic breaks
● More than 75% instances of high attachment with ToBI 1 have short NPs (<3 words)
● 50% of instances of low attachment with ToBI 3,4 have longer PPs (4+ words)
ToBI 0 1 2 3 4 Grand Total
attachment Count (%) Count (%) Count (%) Count (%) Count (%) Count (%)
high 27 0.96% 842 30.04% 44 1.57% 145 5.17% 212 7.56% 1270 45.31%
low 166 5.92% 1139 40.64% 43 1.53% 89 3.18% 96 3.42% 1533 54.69%
Grand Total 193 6.89% 1981 70.67% 87 3.10% 234 8.35% 308 10.99% 2803 100.00%
NP phrase length:
High attachment
instances
ToBI 1
PP phrase length:
Low attachment
instances,
ToBI 3 and 4
ToBI 1
NP length Count (%)
1 word 425 15.16%
2 words 228 8.13%
3 words 101 3.60%
4+ words 88 3.14%
Total 842 30.04%
ToBI 3 4
PP length Count (%) Count (%)
1 word 1 0.04%
2 words 21 0.75% 16 0.57%
3 words 23 0.82% 32 1.14%
4+ words 44 1.57% 48 1.71%
89 3.18% 96 3.42%
Can we predict attachment from phrase length and prosody?
● 2803 instances of PP-
attachment:
○ 1270 high attachment (45%)
○ 1533 low attachment (55%)
● 1559 instances of RC-
attachment:
○ 739 high attachment (47%)
○ 820 low attachment (53%)
Features Labels
ambiguity sentence ID
NP size
(words)
PP/SBAR
size (words)
ToBI Break
Index low attachment
ppa sw4890.A-s89 1 3 1 FALSE
ppa sw4890.B-s72 2 3 4 FALSE
ppa sw2018.A-s144 1 2 1 TRUE
ppa sw2018.A-s145 1 2 1 TRUE
ppa sw2018.A-s157 1 2 3 TRUE
rca sw4890.B-s72 1 4 3 FALSE
rca sw4890.B-s73 3 4 4 FALSE
rca sw4890.B-s8 1 4 4 FALSE
rca sw2018.A-s131 3 4 1 TRUE
rca sw2018.B-s163 3 4 4 TRUE
Sample of data compiled
● Using Machine Learning
(decision trees), with different
feature combinations
● PP-attachment prediction
using prosody only is
statistically significant (p-
value: < .001 using
independent T-test)
● an improvement when
combined with phrase length,
but not statistically significant
(p-value .078)
Machine Learning Predicts attachment based on prosody and phrase length
Set Description Features Accuracy (%)
RC- Attachment instances
ToBI 69.02
Length of NP 62.80
Length of NP, ToBI 71.14
Length of NP, Length of SBAR 64.85
Length of NP, Length of SBAR, ToBI 71.20
Length of SBAR 57.79
Length of SBAR, ToBI 69.60
PP-Attachment instances
ToBI 60.54
Length of NP 60.93
Length of NP, ToBI 63.47
Length of NP, Length of PP 61.04
Length of NP, Length of PP, ToBI 63.40
Length of PP 55.19
Length of PP, ToBI 60.86
Part 3 - Using Prosody to Improve
Parsing
Goal:
build a computational system using prosody to improve parsing of
spontaneous sentences in the Switchboard Corpus
Motivation:
Previous computational approaches (e.g. Kahn et al., (2005), Huang and Harper
(2010), Tran et al., (2017)) attempted to do so. This work will proceed in this
direction informed with the theoretical foundation of syntax-prosody relationship,
mainly semantic coherence
Can prosody be used to improve parsing?
Hypothesis: Syntax-Prosody correspondences improve parsing
Hypothesis 1- There are elements of correspondence between prosody and
syntax that can be extracted from the syntactic structure
Hypothesis 2- using these correspondences, along with prosodic
information, we can select the most appropriate parse for an utterance
Parsing is identifying the structure of a sentence
● Constituency parsing: a hierarchy of syntactic constituents
● Dependency parsing: dependent-head relationships
○ Main metric: Unlabeled Attachment Score (UAS), percentage of
heads identified correctly
● Dependency parsing is now the norm in computational
linguistics:
○ Faster, scalable to new languages, represents semantic relationships
○ Provides same information as constituency plus head information
● Dependency structure has not been used much in prosody
research
○ Exception: Pate and Goldwater, 2014
Constituency
Dependency
Semantic coherence affects likelihood of prosodic breaks
● Selkirk (1984): distribution of intonational phrase boundaries can be accounted
for by a semantic constraint called the Sense Unit Condition (SUC):
○ The immediate constituents of an intonational phrase must be semantically related
■ a. John gave the book // to Mary.
■ b. * John gave // the book to Mary.
■ c. John gave // the book // to Mary (examples from Watson and Gibson, 2004)
● Ferreira (1988) and Watson and Gibson (2004): developed algorithms for
predicting likelihood of prosodic breaks, predicting higher likelihood when there
is no dependency and semantic coherence between words
Dependency configurations correspond to semantic coherence
● The concept of “dependency configurations” is proposed here to quantify
semantic coherence between adjacent words, based on dependency
structure
● It is defined in terms of dependency offsets: the distance (measured by
number of words) between a word and its head
● For each word, the offset is quantified as:
○ 0 if the word is root
○ +1 if it depends on the word immediately to the right
○ +2 if it depends on a word further to the right
○ -1 if it depends on the word immediately to the left
○ -2 if it depends on a word further to the left
● Each pair of consecutive words is characterized by a duple representation
(e.g. (+1,-2)) to describe the configurations
There are 12 different observed dependency configurations
Examples from the Switchboard Corpus, converted to dependency structure by
Honnibal and Johnson (2014)
Dependency configurations correspond to prosodic breaks
● Configuration (-1, +1) account for 35% of
ToBI 4 and 26% of ToBI 3
● Configurations (+2,+1) and (-1,-2) combined
account for 41% of ToBI 4 and 38% of ToBI 3
● If there is a direct dependency between two
consecutive words, there is a smaller
likelihood of prosodic breaks between them
ToBI Values Grand
Totalconfiguration 1 2 3 4
(+2, +1) 15657 1550 602 860 18669
(+1, -2) 10328 402 203 135 11068
(-1, +1) 6125 557 726 1456 8864
(-1, -1) 6062 281 308 358 7009
(+1, 0) 6032 233 114 94 6473
(-1, -2) 3268 334 465 829 4896
(+1, +1) 3308 102 42 30 3482
(0, -1) 2802 130 120 115 3167
(0, +1) 2645 130 174 154 3103
(+2, -1) 1011 25 31 22 1089
(-1, 0) 117 11 27 62 217
(0, 0) 74 30 2 7 113
Grand Total 57429 3785 2814 4122 68150
Features are extracted from parse hypotheses and prosodic information
Lexical Syntactic Prosodic
word head word POS Config normalized duration pause after
so know RB (+2,+1) 3.23 0.31
i know PRP (+1,0) 0.45 0
know - VBP (0,+1) 1.13 0
what said WP (+2,+1) 0.97 0
they said PRP (+2,+1) 1.62 0
‘ve said VBP (+1,-2) 0.71 0
said know VBN N/A 2.03 0
Prediction outcome represents which heads are correctly identified in the parse
spaCy (UAS: 0.5) 3 3 0 5 3 7 5 10 10 7
Gold 7 7 7 5 7 7 0 10 10 7
syntaxnet (UAS: 0.4) 4 4 4 0 4 7 5 10 10 7
clearNLP (UAS: 0.6) 3 3 0 5 7 7 3 10 10 7
0 0 0 1 0 1 0 1 1 1
Expected Prediction
Outcome
0 0 0 0 0 1 0 1 1 1
0 0 0 1 1 1 0 1 1 1
Reference
Parse
Recurrent Neural Networks offer a lot of flexibility
- RNNs accept inputs of variable length with categorical and continuous
features, and variable length output.
- Long Short-Term Memory (LSTM) is an optimization of RNNs used here
source: https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/03/1-768x421.png
Sentence
System extracts features from parses and predicts correct heads
Parse Hypothesis
Sentence Acoustics
Feature
Extraction
Training Stage
Gold Standard
Parse
Correct heads Outcome
Testing Stage
Sentence
Parse Hypothesis
Sentence Acoustics
Feature
Extraction
parse hypothesis
predicted correct
heads
System scores parses and selects most likely parse
Using syntactic features from parse hypotheses and acoustic information to make
predictions about which parse is more likely
spaCy (UAS: 0.83)
clearNLP (UAS: 0.67)
syntaxnet (UAS: 0.33)
0.75 0.87 0.69 1.01 0.75 1.04
Predictions
0.75 0.77 0.78 1.02 0.58 0.88
0.03 0.76 0.30 0.89 0.33 0.84
Sum: 5.11
Sum: 4.78
Sum: 3.15
Prosody can improve parsing for Switchboard data
System Text Features Prosodic features UAS Dev UAS Test
clearnlp 79.76 79.59
spacy 79.06 78.91
syntaxnet 72.54 72.81
Oracle 85.93 85.89
Ensemble POS, configs 80.69 80.73
Ensemble POS, configs Dur, dur log, pause 81.21 81.17
Ensemble Lexical, POS, configs, links 83.47 83.36
Ensemble Lexical, POS, configs, links Dur, pause 83.51 83.39
● Only duration and pauses were used, while pitch, intensity and other acoustic
information can still be used in further work
● Phrase length information was not used in any of the features
● Speech repairs and disfluencies are marked with prosodic cues but not
addressed in this study
● Other correspondences between prosody and dependency structure were
suggested in this study but need further development (dependency chunks)
Further improvements are possible
Output analysis doesn’t indicate clear improvement patterns
● An output analysis didn’t show clear improvement patterns in the following:
○ sentences with PP-attachment
○ sentences with RC-attachment
○ sentences with parentheticals
○ sentences with speech repairs
● For sentence size, largest improvement was for sentences 3-8 words, unclear
patterns for larger sentences, but mainly smaller improvement
UAS Dev UAS Test
UAS Improvement (POS + Configs) with prosody 0.51 0.44
Sentences with improved UAS 268 240
Sentences with worse UAS 178 180
Sentences with the Same UAS 4970 5036
p-value (paired t-test comparing UAS values for all sentences) < .001 < .001
Conclusions
● Part 1:
○ Syntactic phrase length affects prosodic phrasing, also in French
● Part 2:
○ Syntactic ambiguity can be resolved prosodically by speakers
○ Prosodic cues can be used by human listeners and computers to predict the syntactic structure
○ Syntactic phrase length also affects prosodic phrasing in speaking, and can be used by
computers as a factor, along with prosody, to improve prediction of the structure
● Part 3:
○ Certain syntactic information (dependency configurations), based on dependency structure,
relates to prosodic breaks
○ Using this information together with timing (pause and duration) is more useful for selecting
better parses than syntactic information only
○ The ensemble system yields better performance than any individual parser in the ensemble
Final Note
● This dissertation is an interdisciplinary work, building on prosody research
from phonology and psycholinguistics towards computational goals
● Using dependency structure can provide a new perspective for investigating
syntax-prosody relationship
Thank You!

Weitere Àhnliche Inhalte

Was ist angesagt?

my ppt presentation
my ppt presentationmy ppt presentation
my ppt presentation
Bora Demir
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
Surabhi Verma
 

Was ist angesagt? (15)

Prosodic Morphology
Prosodic Morphology Prosodic Morphology
Prosodic Morphology
 
Esa act
Esa actEsa act
Esa act
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
my ppt presentation
my ppt presentationmy ppt presentation
my ppt presentation
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
L1
L1L1
L1
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Ähnlich wie COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IMPROVE PARSING

Group Presentation I
Group Presentation IGroup Presentation I
Group Presentation I
betty122508
 
ASA_Accent_EEG
ASA_Accent_EEGASA_Accent_EEG
ASA_Accent_EEG
Lou Stringer
 
Predictability of Consonant Perception Ability Through a Listening Comprehens...
Predictability of Consonant Perception Ability Through a Listening Comprehens...Predictability of Consonant Perception Ability Through a Listening Comprehens...
Predictability of Consonant Perception Ability Through a Listening Comprehens...
Kosuke Sugai
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
Nikolay Karpov
 

Ähnlich wie COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IMPROVE PARSING (20)

MA thesis
MA thesisMA thesis
MA thesis
 
Group Presentation I
Group Presentation IGroup Presentation I
Group Presentation I
 
What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...What can typological knowledge bases and language representations tell us abo...
What can typological knowledge bases and language representations tell us abo...
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
B110512
B110512B110512
B110512
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba language
 
ASA_Accent_EEG
ASA_Accent_EEGASA_Accent_EEG
ASA_Accent_EEG
 
Predictability of Consonant Perception Ability Through a Listening Comprehens...
Predictability of Consonant Perception Ability Through a Listening Comprehens...Predictability of Consonant Perception Ability Through a Listening Comprehens...
Predictability of Consonant Perception Ability Through a Listening Comprehens...
 
Fonteneau_etal_15
Fonteneau_etal_15Fonteneau_etal_15
Fonteneau_etal_15
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
2007 CogSci 2020 poster
2007 CogSci 2020 poster2007 CogSci 2020 poster
2007 CogSci 2020 poster
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
æ±ŸæŒŻćź‡/It's Not What You Say: It's How You Say It!
æ±ŸæŒŻćź‡/It's Not What You Say: It's How You Say It!æ±ŸæŒŻćź‡/It's Not What You Say: It's How You Say It!
æ±ŸæŒŻćź‡/It's Not What You Say: It's How You Say It!
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
 

KĂŒrzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂŒrzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IMPROVE PARSING

  • 1. Computational Approaches to the Syntax-Prosody Interface: Using Prosody to Improve Parsing Dissertation Defense Hussein Ghaly December 12th, 2019
  • 2. Goal and Motivation Main Goal: Improve automatic syntactic parsing of spontaneous spoken sentences using prosodic cues Theoretical Motivation: ● Automatic parsing is negatively affected by syntactic ambiguity (Kummerfeld et al., 2012) ● Prosody can help resolving some syntactic ambiguities (Cutler et al., 1997) ● Syntactic structure is related to prosodic structure (Selkirk, 1986, among many other studies)
  • 3. Challenges and Opportunities Challenges - Lack of congruence between syntactic and prosodic structures - Lack of interdisciplinary engagement in prosody research between computational linguistics and other branches of linguistics Opportunities - Availability of parsing frameworks - Availability of ToBI annotation - Availability of speech corpora (e.g. the Switchboard Corpus) - Interest in Natural Language Understanding for speech
  • 4. What is prosody ● “(1) acoustic patterns of F0, duration, amplitude, spectral tilt, and segmental reduction, and their articulatory correlates, that can be best accounted for by reference to higher-level structures, and (2) the higher-level structures that best account for these patterns.” (Shattuck-Hufnagel and Turk, 1996) ● Includes a number of speech phenomena, including: Prosodic phrasing, Stress, Intonation, Rhythm ● Autosegmental-Metrical Theory (Ladd, 2008) was proposed to organize these components together
  • 5. Prosodic Structure is a Hierarchy of Constituents ● all languages have hierarchically ordered prosodic structure ● languages make use of the same set of prosodic categories (Elfner, 2018) Illustration of prosodic hierarchy, from Elfner (2018)
  • 6. Prosodic structure is much flatter than syntactic structure As a matter of fact that’s what I’m doing ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ ⍔ iP iP IP Prosodic word Intermediate phrase/Phonological Phrase Intonational phrase Prosodic Structure Syntactic Structure
  • 7. Prosody is influenced by syntax and other factors Syntax ● Suci (1967): non-syntactically structured word lists have more prosodic variation than syntactically structured sentences ● Prosody can resolve some syntactic ambiguities ● Some syntactic structures (e.g. parentheticals “John, said Mary, was nice”, and tag questions “She’s Italian, isn’t she?”) ● clause boundaries (e.g. when John left, I cried) Other factors ● prosodic grouping can be different from syntactic grouping ○ syntax follows the grouping S (V O), while prosody follows the grouping (S V) O (Martin, 1970) ● speech rate ● utterance length and constituent length ● semantic and pragmatic factors
  • 8. A theoretical model depicts factors affecting prosody Model for factors influencing prosody (from Turk and Shattuck-Hufnagel, 2014) ● prosodic prosodic structure as a theoretical construct, representing the convergence of all these factors ● Constituent length can be added to utterance length factors
  • 9. Syntax-Prosody Interface - some phonological theories ● Indirect Reference: phonological processes apply to prosodic domains (constituents), which are related to syntactic constituents ○ Selkirk (1986) Align-XP: syntactic constituents share one edge with prosodic constituents (Align-R or Align-L depending on the language) ○ Truckenbrodt (1995, 1999) Wrap-XP: a constraint that demands that each syntactic phrase is contained within a phonological phrase ○ Match Theory (Selkirk 2006, 2009, Elfner 2012, Myrberg 2013): syntactic clauses map to Intonational phrases, syntactic phrases map to phonological phrases, and morphosyntactic words map to prosodic words
  • 10. ToBI is a system for annotating prosody ToBI: Tones and Break Indexes A system for annotating prosodic information (Silverman et al., 1992) Based on theories of prosodic structure by Beckman and Pierrehumbert (1986) Break indexes (0-4) reflect disjuncture levels between words from (Veilleux et al., 2006)
  • 11. Part 1 - The Effect of Syntactic Phrase Length on Prosody
  • 12. Phrase length affects prosody of double center embedded sentences in English Double Center Embedded sentences (From Fodor and Nickels, 2011): - Encouraging Phrase length (ENC) (short inner phrases) split into 3 chunks - Discouraging Phrase length (DISC) (long inner phrases) split into 4+ chunks NP1 NP2 NP3 VP1 VP2 VP3 the rusty old ceiling pipes that the plumber my dad trained fixed continue to leak occasionally NP1 NP2 NP3 VP1 VP2 VP3 the pipes that the unlicensed plumber the new janitor reluctantly assisted tried to repair burst
  • 13. No difference in prosody due to phrase length was found in French Desroses (2014) manually examined the frequency of pauses (silence intervals >=250 ms) at the edges of syntactic constituents, didn’t find difference between the two sentence types ENC: Le joli ballon jaune vif (1) que l'enfant (2) que le maĂźtre (3) punit (4) lĂącha (5) est vraiment coincĂ© dans l'arbre. DISC: Le ballon (1) que le jeune enfant (2) que le maĂźtre d'Ă©cole (3) punit trĂšs souvent (4) lĂącha bĂȘtement (5) est jaune. Before NP2 (location 1) After NP2 (location 2) Before VP2 (location 4) After VP2 (location 5) ENC 6.7 % 14.07 % 22.6 % 28.15 % DISC 8.5 % 10.4 % 19.63 % 27.04 %
  • 14. Data reanalyzed using judge annotation and forced alignment Re-analyzing recordings collected by Desroses (271 ENC and 272 DISC recordings) to identify the prosodic boundaries at the edges of syntactic phrases by: ● Obtaining judgments by two native speakers of French, of where they perceive the prosodic boundaries, in a subset of recordings (48 recordings) ● Using forced alignment (automatically mapping each word to its corresponding portion of the audio file), to obtain silent pause durations between words (397 recordings)
  • 15. Sentences were presented to judges for annotation An example sentence of the set presented to judges
  • 16. Forced alignment indicated pauses between words ● Montreal Forced Alignment (McAuliffe et al., 2017) was used ● edges of syntactic phrases are identified manually in a copy of the sentence, for example: ○ Le ballon <1> que le jeune enfant <2> que le maĂźtre d'Ă©cole <3> punit trĂšs souvent <4> lĂącha bĂȘtement <5> est jaune. ● Words in the forced alignment with their start and end times are mapped to those in the copy ● Pause values are calculated at the five locations
  • 17. Judges indicated difference in average number of breaks for each sentence type - average number of prosodic boundaries over all five syntactic boundaries for each judge (48 recordings) - first judge: ENC: 2.43; DISC: 2.92. second judge: ENC: 2.5; DISC: 3.2.
  • 18. Forced Alignment indicated more pauses before VP2 for DISC sentences ● at location 4, percentage of ENC sentences with a pause value of 250 ms or greater was 14%, while percentage for DISC sentences was 19.1% (397 recordings) ● Average pause duration at location 4: 105 ms (ENC), 154 ms (DISC) ○ After excluding recordings with pauses > 1 second: 80 ms (ENC, 189 recordings), 110 ms (DISC, 202 recordings), p-value .056
  • 19. Part 2 - Resolving Syntactic Ambiguities Using Prosody
  • 20. Goal: examine whether it is possible to identify the syntactic attachment using prosody, both by human listeners and by computers I saw the boy with the telescope Can prosody resolve syntactic ambiguity?
  • 21. An experiment for production and perception of sentences with ambiguities: ● Comma ambiguity: ○ John, said Mary, was the nicest person at the party. ○ John said Mary was the nicest person at the party. ● PP-attachment ambiguity: ○ I have a new telescope. I saw the boy with the telescope. ○ One of the boys got a telescope. I saw the boy with the telescope. Comma ambiguity and PP-attachment ambiguity are investigated
  • 22. Ambiguous sentences were recorded by speakers and presented to listeners Using crowdsourcing, through Amazon Mechanical Turk (MTurk), workers were recruited for the production and perception experiment Production Experiment ● Ambiguous sentences were recorded by a number of naive native speakers ● Recordings by 6 speakers, with the clearest recordings, were selected Perception Experiment ● Ambiguous sentences were presented to naive human participants both in audio and text formats, to answer comprehension questions ● Experiment was organized in phases, where in each phase questions were based on the recordings of only one of the speakers
  • 23. Listeners answer questions about ambiguous sentences Question Types 1. Comma-ambiguity - Text 2. PP-attachment ambiguity with context - Text 3. PP-attachment ambiguity without context - Text 4. Comma-ambiguity - Audio 5. PP-attachment ambiguity with context - Audio 6. (and 7) PP-attachment ambiguity without context - Audio (two different sentences of this question type) Example: https://champolu.net/mturk/listen.html?abcd
  • 24. ● Participants disambiguation accuracy (text: 49% audio 63%, p-value < .001, independent t-test) ● Results are after excluding sentences not understood properly even with context, and listeners with overall low comprehension accuracy PP-attachment ambiguity is more accurately resolved in audio than in text Question Type Accuracy comma ambiguity - text 99% comma ambiguity - audio 92% PP-attachment ambiguity - text - with context 97% PP-attachment ambiguity - audio - with context 98% PP-attachment ambiguity - text - no context 49% PP-attachment ambiguity - audio - no context 63%
  • 25. Larger pauses yield better accuracy for high attachment sentences ● higher values of pauses and normalized duration of the last NP word lead to higher accuracy of disambiguation of sentences with high attachment by listeners Normalized duration: actual duration of the word divided by expected duration, where expected duration is the sum of average duration of each phoneme for each speaker high attachment low attachment Pause (ms) Number of Recordings average accuracy Number of Recordings average accuracy 0 73 35.28% 103 82.26% 10 11 41.41% 11 76.73% 20+ 34 66.93% 4 57.29% high attachment low attachment Normalized Duration Number of Recordings average accuracy Number of Recordings average accuracy <1.0 10 22.08% 32 84.84% 1 17 28.53% 39 77.16% 1.1 18 52.03% 23 85.44% 1.2 22 44.29% 15 79.63% >1.2 51 52.74% 7 19.72%
  • 26. ● Based on the data just presented, a machine learning system (decision trees) used pauses and duration values as features to predict the attachment. ● System accuracy ranged from 63% to 73%, based on how the data are split into training and test portions (e.g. system performs better when training and testing on different portions of the recordings of the same speaker) Machine Learning predicts attachment of recorded sentences Speaker ID shuffled all intra-speaker classification odd speaker out Odd sentence out Odd recording out Listener Accuray my64 75.0% 50.0% 67.5% 66.3% 60.0% 64.0% wdn 62.5% 67.5% 55.0% 53.8% 70.0% 54.0% ds 70.0% 90.0% 70.0% 66.3% 85.0% 83.0% dz 57.5% 60.0% 52.5% 57.5% 65.0% 56.0% mm 70.0% 87.5% 70.0% 70.0% 85.0% 52.0% tk 69.4% 83.2% 61.1% 65.3% 77.5% 69.0% Average 67.4% 73.0% 62.7% 63.2% 73.8% 63.0%
  • 27. ● Corpus analysis was conducted for the syntactic and prosodic data in the ToBI annotated subset from the Switchboard corpus (SWB) (Godfrey et al., 1992) from different speakers (150 speakers) ● The focus was on PP-attachment ambiguity and relative clause attachment (RC-attachment) ambiguity ● An algorithm was developed to identify instances of such ambiguities in the syntactic data: ○ PP-attachment: instances of a noun phrase (NP) immediately followed by a prepositional phrase (PP) ○ RC-attachment: instances of NP immediately followed by a relative clause (SBAR) ○ Low attachment is when there is a large NP spanning both constituents (NP + PP or NP + SBAR), otherwise high attachment Does attachment affect prosody in spontaneous sentences?
  • 28. Examples of sentences with PP-attachment identified by the algorithm Low attachment High attachment
  • 29. Examples of sentences with RC-attachment identified by the algorithm Low attachment High attachment
  • 30. Attachment affects the distribution of prosodic breaks ● At the end of NP (before PP or SBAR), identify ToBI break index (from the Switchboard corpus) ● Effect of RC-attachment is much stronger than PP-attachment PP-attachment RC-attachment
  • 31. Phrase Length also affects the distribution of prosodic breaks ● Consistent with Shafran and Fodor (2016), Watson and Gibson (2004), phrase length affects likelihood of prosodic breaks ● More than 75% instances of high attachment with ToBI 1 have short NPs (<3 words) ● 50% of instances of low attachment with ToBI 3,4 have longer PPs (4+ words) ToBI 0 1 2 3 4 Grand Total attachment Count (%) Count (%) Count (%) Count (%) Count (%) Count (%) high 27 0.96% 842 30.04% 44 1.57% 145 5.17% 212 7.56% 1270 45.31% low 166 5.92% 1139 40.64% 43 1.53% 89 3.18% 96 3.42% 1533 54.69% Grand Total 193 6.89% 1981 70.67% 87 3.10% 234 8.35% 308 10.99% 2803 100.00% NP phrase length: High attachment instances ToBI 1 PP phrase length: Low attachment instances, ToBI 3 and 4 ToBI 1 NP length Count (%) 1 word 425 15.16% 2 words 228 8.13% 3 words 101 3.60% 4+ words 88 3.14% Total 842 30.04% ToBI 3 4 PP length Count (%) Count (%) 1 word 1 0.04% 2 words 21 0.75% 16 0.57% 3 words 23 0.82% 32 1.14% 4+ words 44 1.57% 48 1.71% 89 3.18% 96 3.42%
  • 32. Can we predict attachment from phrase length and prosody? ● 2803 instances of PP- attachment: ○ 1270 high attachment (45%) ○ 1533 low attachment (55%) ● 1559 instances of RC- attachment: ○ 739 high attachment (47%) ○ 820 low attachment (53%) Features Labels ambiguity sentence ID NP size (words) PP/SBAR size (words) ToBI Break Index low attachment ppa sw4890.A-s89 1 3 1 FALSE ppa sw4890.B-s72 2 3 4 FALSE ppa sw2018.A-s144 1 2 1 TRUE ppa sw2018.A-s145 1 2 1 TRUE ppa sw2018.A-s157 1 2 3 TRUE rca sw4890.B-s72 1 4 3 FALSE rca sw4890.B-s73 3 4 4 FALSE rca sw4890.B-s8 1 4 4 FALSE rca sw2018.A-s131 3 4 1 TRUE rca sw2018.B-s163 3 4 4 TRUE Sample of data compiled
  • 33. ● Using Machine Learning (decision trees), with different feature combinations ● PP-attachment prediction using prosody only is statistically significant (p- value: < .001 using independent T-test) ● an improvement when combined with phrase length, but not statistically significant (p-value .078) Machine Learning Predicts attachment based on prosody and phrase length Set Description Features Accuracy (%) RC- Attachment instances ToBI 69.02 Length of NP 62.80 Length of NP, ToBI 71.14 Length of NP, Length of SBAR 64.85 Length of NP, Length of SBAR, ToBI 71.20 Length of SBAR 57.79 Length of SBAR, ToBI 69.60 PP-Attachment instances ToBI 60.54 Length of NP 60.93 Length of NP, ToBI 63.47 Length of NP, Length of PP 61.04 Length of NP, Length of PP, ToBI 63.40 Length of PP 55.19 Length of PP, ToBI 60.86
  • 34. Part 3 - Using Prosody to Improve Parsing
  • 35. Goal: build a computational system using prosody to improve parsing of spontaneous sentences in the Switchboard Corpus Motivation: Previous computational approaches (e.g. Kahn et al., (2005), Huang and Harper (2010), Tran et al., (2017)) attempted to do so. This work will proceed in this direction informed with the theoretical foundation of syntax-prosody relationship, mainly semantic coherence Can prosody be used to improve parsing?
  • 36. Hypothesis: Syntax-Prosody correspondences improve parsing Hypothesis 1- There are elements of correspondence between prosody and syntax that can be extracted from the syntactic structure Hypothesis 2- using these correspondences, along with prosodic information, we can select the most appropriate parse for an utterance
  • 37. Parsing is identifying the structure of a sentence ● Constituency parsing: a hierarchy of syntactic constituents ● Dependency parsing: dependent-head relationships ○ Main metric: Unlabeled Attachment Score (UAS), percentage of heads identified correctly ● Dependency parsing is now the norm in computational linguistics: ○ Faster, scalable to new languages, represents semantic relationships ○ Provides same information as constituency plus head information ● Dependency structure has not been used much in prosody research ○ Exception: Pate and Goldwater, 2014 Constituency Dependency
  • 38. Semantic coherence affects likelihood of prosodic breaks ● Selkirk (1984): distribution of intonational phrase boundaries can be accounted for by a semantic constraint called the Sense Unit Condition (SUC): ○ The immediate constituents of an intonational phrase must be semantically related ■ a. John gave the book // to Mary. ■ b. * John gave // the book to Mary. ■ c. John gave // the book // to Mary (examples from Watson and Gibson, 2004) ● Ferreira (1988) and Watson and Gibson (2004): developed algorithms for predicting likelihood of prosodic breaks, predicting higher likelihood when there is no dependency and semantic coherence between words
  • 39. Dependency configurations correspond to semantic coherence ● The concept of “dependency configurations” is proposed here to quantify semantic coherence between adjacent words, based on dependency structure ● It is defined in terms of dependency offsets: the distance (measured by number of words) between a word and its head ● For each word, the offset is quantified as: ○ 0 if the word is root ○ +1 if it depends on the word immediately to the right ○ +2 if it depends on a word further to the right ○ -1 if it depends on the word immediately to the left ○ -2 if it depends on a word further to the left ● Each pair of consecutive words is characterized by a duple representation (e.g. (+1,-2)) to describe the configurations
  • 40. There are 12 different observed dependency configurations Examples from the Switchboard Corpus, converted to dependency structure by Honnibal and Johnson (2014)
  • 41. Dependency configurations correspond to prosodic breaks ● Configuration (-1, +1) account for 35% of ToBI 4 and 26% of ToBI 3 ● Configurations (+2,+1) and (-1,-2) combined account for 41% of ToBI 4 and 38% of ToBI 3 ● If there is a direct dependency between two consecutive words, there is a smaller likelihood of prosodic breaks between them ToBI Values Grand Totalconfiguration 1 2 3 4 (+2, +1) 15657 1550 602 860 18669 (+1, -2) 10328 402 203 135 11068 (-1, +1) 6125 557 726 1456 8864 (-1, -1) 6062 281 308 358 7009 (+1, 0) 6032 233 114 94 6473 (-1, -2) 3268 334 465 829 4896 (+1, +1) 3308 102 42 30 3482 (0, -1) 2802 130 120 115 3167 (0, +1) 2645 130 174 154 3103 (+2, -1) 1011 25 31 22 1089 (-1, 0) 117 11 27 62 217 (0, 0) 74 30 2 7 113 Grand Total 57429 3785 2814 4122 68150
  • 42. Features are extracted from parse hypotheses and prosodic information Lexical Syntactic Prosodic word head word POS Config normalized duration pause after so know RB (+2,+1) 3.23 0.31 i know PRP (+1,0) 0.45 0 know - VBP (0,+1) 1.13 0 what said WP (+2,+1) 0.97 0 they said PRP (+2,+1) 1.62 0 ‘ve said VBP (+1,-2) 0.71 0 said know VBN N/A 2.03 0
  • 43. Prediction outcome represents which heads are correctly identified in the parse spaCy (UAS: 0.5) 3 3 0 5 3 7 5 10 10 7 Gold 7 7 7 5 7 7 0 10 10 7 syntaxnet (UAS: 0.4) 4 4 4 0 4 7 5 10 10 7 clearNLP (UAS: 0.6) 3 3 0 5 7 7 3 10 10 7 0 0 0 1 0 1 0 1 1 1 Expected Prediction Outcome 0 0 0 0 0 1 0 1 1 1 0 0 0 1 1 1 0 1 1 1 Reference Parse
  • 44. Recurrent Neural Networks offer a lot of flexibility - RNNs accept inputs of variable length with categorical and continuous features, and variable length output. - Long Short-Term Memory (LSTM) is an optimization of RNNs used here source: https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/03/1-768x421.png
  • 45. Sentence System extracts features from parses and predicts correct heads Parse Hypothesis Sentence Acoustics Feature Extraction Training Stage Gold Standard Parse Correct heads Outcome Testing Stage Sentence Parse Hypothesis Sentence Acoustics Feature Extraction parse hypothesis predicted correct heads
  • 46. System scores parses and selects most likely parse Using syntactic features from parse hypotheses and acoustic information to make predictions about which parse is more likely spaCy (UAS: 0.83) clearNLP (UAS: 0.67) syntaxnet (UAS: 0.33) 0.75 0.87 0.69 1.01 0.75 1.04 Predictions 0.75 0.77 0.78 1.02 0.58 0.88 0.03 0.76 0.30 0.89 0.33 0.84 Sum: 5.11 Sum: 4.78 Sum: 3.15
  • 47. Prosody can improve parsing for Switchboard data System Text Features Prosodic features UAS Dev UAS Test clearnlp 79.76 79.59 spacy 79.06 78.91 syntaxnet 72.54 72.81 Oracle 85.93 85.89 Ensemble POS, configs 80.69 80.73 Ensemble POS, configs Dur, dur log, pause 81.21 81.17 Ensemble Lexical, POS, configs, links 83.47 83.36 Ensemble Lexical, POS, configs, links Dur, pause 83.51 83.39
  • 48. ● Only duration and pauses were used, while pitch, intensity and other acoustic information can still be used in further work ● Phrase length information was not used in any of the features ● Speech repairs and disfluencies are marked with prosodic cues but not addressed in this study ● Other correspondences between prosody and dependency structure were suggested in this study but need further development (dependency chunks) Further improvements are possible
  • 49. Output analysis doesn’t indicate clear improvement patterns ● An output analysis didn’t show clear improvement patterns in the following: ○ sentences with PP-attachment ○ sentences with RC-attachment ○ sentences with parentheticals ○ sentences with speech repairs ● For sentence size, largest improvement was for sentences 3-8 words, unclear patterns for larger sentences, but mainly smaller improvement UAS Dev UAS Test UAS Improvement (POS + Configs) with prosody 0.51 0.44 Sentences with improved UAS 268 240 Sentences with worse UAS 178 180 Sentences with the Same UAS 4970 5036 p-value (paired t-test comparing UAS values for all sentences) < .001 < .001
  • 50. Conclusions ● Part 1: ○ Syntactic phrase length affects prosodic phrasing, also in French ● Part 2: ○ Syntactic ambiguity can be resolved prosodically by speakers ○ Prosodic cues can be used by human listeners and computers to predict the syntactic structure ○ Syntactic phrase length also affects prosodic phrasing in speaking, and can be used by computers as a factor, along with prosody, to improve prediction of the structure ● Part 3: ○ Certain syntactic information (dependency configurations), based on dependency structure, relates to prosodic breaks ○ Using this information together with timing (pause and duration) is more useful for selecting better parses than syntactic information only ○ The ensemble system yields better performance than any individual parser in the ensemble
  • 51. Final Note ● This dissertation is an interdisciplinary work, building on prosody research from phonology and psycholinguistics towards computational goals ● Using dependency structure can provide a new perspective for investigating syntax-prosody relationship

Hinweis der Redaktion

  1. Picking a sentence, show the parse tree, play the recording, prosodic phrases, prosodic structure all languages have hierarchically ordered prosodic structure, and languages are thought to make use of the same set of prosodic categories in the structuring of utterances. (Elfner, 2019) Prosodic structure is flatter than syntactic structure (less recursion) due to strict layering
  2. Theoretical model/ construct, prosodic prosodic structure as a theoretical construct, representing the convergence of all these factors
  3. Wrap XP demands that each syntactic XP be contained in a phonological phrase (ϕ). Wrap favours prosodic phrasings that do not break up syntactic constituents over those that do.
  4. Include an audio for a sentence
  5. to match each word in the sentence text with the corresponding portion of the audio file of its recording as the difference between the start time of a word and the end time of a preceding word
  6. Less boundaries at 4 than at 5, less at 2 than 1
  7. average pause duration at location 4: 105 ms (ENC), 154 ms (DISC). Be ready for questions about why data differs between people and machines - and not many early pauses pvalue=0.056 (excluding pauses greater than 1) ENC: 189, DISC: 202 ENC average: 0.080, DISC average 0.110 number of recordings actually aligned within the ENC category was 193 files (out of 271), while the ones from DISC category were 204 files (out of 274).
  8. Different structure, different meaning
  9. Sentences are unambiguous when there is comma, only ambiguous without, clear markers of prosody
  10. Explain better the phases Participants in the perception experiment were asked to answer a comprehension question based on the format provided (text or audio)
  11. A separate page for demostration of the interface/recordings http://champolu.net/mturk/list_speakers_ambig_verify.py
  12. [[talk about data excluded]]
  13. [[ use the counts]] Effect of attachment is much stronger for relative clauses, consider whether another factor is also involved (e.g. phrase length)
  14. 185
  15. [[make it like a headline]] Different message message for each slide Statistics How features are represented Complete description of machine learning, The problem: identify attahment, random guessing, most popular, all not bold, bold ones statistically indistinguishable
  16. Extra slides to explain these approaches theoretical foundation (e.g. these approaches tried using prosody as punctuation, attaching prosodic breaks to the tree, or use the prosodic/acoustic features as input in a kind of black-box format)
  17. Based on results of preceding chapters, further issues of interest arise: providing information about the most likely syntactic structure for a given utterance Define UAS - reemove UAS Using an ensemble classifier can improve parsing, by choosing the better parses (closest to the ground truth manually annotated parse trees, as measured by metrics such as UAS) from a number of hypotheses, using features from the dependency structure. [[reranking // Ensemble]] Hypothesis 3- Using prosody combined with the features obtained from the dependency structure can achieve even better improvement
  18. Parsing Metrics: Main metric: Unlabeled Attachment Score (UAS) Main metric used in dependency parsing is Unlabelled Attachment Score (UAS), reflecting how many heads in the sentence were predicted correctly by the parser, compared to the manually annotated trees Dependency structure represents the information contained in constituency structure Emergence of Universal Dependencies, a framework that describes dependency grammar for many languages, allowing easy scalability of parsers into such languages. Dependency structure represents semantic relationships between words
  19. the immediate constituents of an intonational phrase must together form a sense unit. Two constituents form a sense unit if they are semantically related (as arguments or modifiers)
  20. [[show how is calculated]] The main new concept proposed here is "dependency configurations" Dependency configurations represent the relative head location of two consecutive words (examples below)
  21. [[just keep the counts]] [[ a bullet to say this is one third/one fourth]] 1067 [[remove percentages]] 1689
  22. [[is this the ground truth]]] correctness || ground truth/target Mention that we focus on syntactic/prosodic, we also report lexical
  23. Highlighted row different table
  24. Using lexical features yielded a significant improvement (UAS dev set 83.47, an improvement of 3.7% above the best parser), but using prosody with these features didn’t yield further significant improvement (UAS dev set 83.51) [[remove this]] The top individual system in the ensemble was clearNLP parser which achieved UAS of 79.76 for development set and 79.59 for the test set. For a text-only ensemble baseline using Parts of Speech tags and dependency configurations, the performance was 80.69 for development set and 80.73 for test set. Adding prosody features (normalized word duration, log duration, pauses) to this model, the UAS increases to 81.21 for development set and 81.17 for test set. (4.8)
  25. [[show it, example, sentence was improved, but not PP-attachment]] the improvement according to sentence length was also analyzed, and it is possible that little improvement was achieved for longer sentences, where the largest improvement for sentences of 10 words or less. This can be due to phrase length factors that have not been considered in the current implementation. - Therefore, prosodic information can be used to improve parsing, and further improvement may be possible with phrase length information
  26. Text features