Determining the Types of Temporal Relations in Discourse

Introduction Concepts and tools Relation Extraction Temporal Signals Modelling Tense Conclusion
Determining the Types of Temporal Relations in
Discourse
Leon Derczynski
University of Sheﬃeld
5 March, 2013
Leon Derczynski University of Sheﬃeld
Determining the Types of Temporal Relations in Discourse

The Role of Time
Why is time important in language processing?
World state changes constantly
Every empirical assertion has temporal bounds
“The sky is blue”, but it was not always
Without it, na¨ıve knowledge extraction will fail (given an
Almanac of Presidents, who is President?)
By understanding temporal information, you will do better
knowledge extraction.
Overall goal
How do we automatically understand temporal information in
natural languages?

Temporal Information Extraction
Existing state of the art
How can we categorise types of temporal information?
Events – e.g. occurrences, states
Temporal expressions (timexes) – e.g. dates, durations
Links – relations between pairs of events or times
Supporting texts – e.g. action cardinality, event ordering
We develop and use ISO-TimeML to annotate these entities.
Main dataset: TimeBank (about 180 annotated documents)

TimeML
Organizers
<EVENT eid="e2120" class="REPORTING">state</EVENT>
the
<TIMEX3 tid="t29" type="DURATION" value="P2D"
temporalFunction="false"
functionInDocument="NONE">two days</TIMEX3>
of music, dancing, and speeches is
<EVENT eid="e2123" class="I STATE">expected</EVENT>
to
<EVENT eid="e13" class="OCCURRENCE">draw</EVENT>
some two million people.
<TLINK eventID="e2123" relatedToTime="t29" relType="BEFORE"/>

Times and Events
What are temporal expressions?
They refer to a time
Subtasks: recognition and interpretation; SotA recognition is
0.86 F1
What do we consider as events?
Verbal, nominal
State of the art: 0.90 F1 for recognition
Doesn’t cover complex structure; e.g. a music festival
Events are not very useful unless related to other temporal
entities
How can we describe this structural complexity?
Start by modeling the document as a graph

Temporal relations
What are temporal relations?
They describe the links between times and events
Can capture both complex and partial orderings
What kinds of temporal relation are there?
1 Interval (before, after, included by, simultaneous)
2 Subordinate (reported speech, modal, conditional)
3 Aspectual (start, culmination – see Vendler, Comrie)
This work is concerned with the coarsest-grained information: the
ﬁrst category

Problem Deﬁnition
How are these relations represented?
Temporal interval algebra (Allen 1984) – a set of 14 relations
between a pair of intervals
TimeML deﬁnes a set of relation types and also types of
interval
What is our problem?
Assume discourse w/ perfect event and timex annotations
In fact, assume we know which intervals to link!
“Given an ordered pair of intervals (arg1, arg2), which relation in
the set Rallen describes them?”

Relation Extraction
How can relations be labelled?
Machine learning
Using TimeML attributes: some success
Using syntactic relations: matches SotA in tree kernels
What’s the state of the art?
2007: Mani et al.: baseline 56%, system has 61% accuracy
2008: Bethard, Chambers: many sophisticated improvements
– ILP, timex-timex ordering. Improved on Mani et al. by 1.5%.
2010: TempEval-2: baseline 58%, best was 65% accuracy
Why do we ﬁnd this performance ceiling?

Sources of Temporal Relation Information
What are we missing?
There is a heterogeneous set of temporal information types,
including:
Explicit signals – subsequently, as soon as
Linguistic theory oﬀers some models
What is the evidence these two types will help?
Conducted failure analysis: TempEval-2010 1
Multiple diverse approaches, same dataset
Find the set of diﬃcult links
Characterise information supporting these links
1
Verhagen et al., 2010: Semeval Task 13 - TempEval-2

Task C: event−timex intra−sentence relations
All systems correct 1 fails 2 fail 3 fail 4 fail 5 fail All systems fail
Task D: event−DCT relations
All systems correct 1 fails 2 fail 3 fail 4 fail All systems fail
Task E: main event inter−sentence relations
All systems correct 1 fails 2 fail 3 fail 4 fail 5 fail All systems fail
Task F: event−subordinate intra−sentence relations
All systems correct 1 fails 2 fail 3 fail 4 fail All systems fail
Figure: TempEval-2 relation labelling tasks, showing proportions of
relations according to the number of systems that gave correct labels.

C D E F
Proportion of links within a task that are difficult
Task
%difficult
010203040
The problem is diﬃcult, and there is a consistently-diﬃcult set of
links. Perhaps we are ignoring some critical information.

New sources of ordering information
Next step: manually characterise each “diﬃcult” link.
Attempt to identify what kind of information could be used to
label it.
Sources to investigate
Explicit text – signals “After you pull the pin, throw the grenade”
Sources to investigate
Tensed relations “Having eaten, I left”

Temporal Signals
What are these?
In TimeML, they are text annotated as being helpful to a
temporal relation
Used by 12.2% of TimeBank’s relations
Are temporal signals useful?
A resounding yes! 61% → 83% accuracy with simple
features 2
This level of performance on event-event links is above
general state-of-the-art
Existing corpora are under-annotated
2
Derczynski and Gaizauskas, 2010: Using signals for temporal relation
classiﬁcation

Temporal Signal Annotation
How can we automatically annotate temporal signals?
Deﬁne signals formally 3
Deﬁne a closed class of signals
Re-annotate TimeBank
Train discrimination and association
We included dependency information and function tagging.
3
Derczynski and Gaizauskas, 2011: A corpus based study of temporal signals

Results
How well did our approach perform?
1 Discrimination: 92% accuracy, 75% accuracy on positives
(0.77 IAA)
2 Association: 99% accuracy / 80% error reduction
3 Inductive bias towards independence assumption was harmful
(MaxEnt, NBayes)
Results: 16% of links have signals (31% improvement) and can
now be labelled at high accuracy.
What remains to be done?
How can we remedy under-annotation at the source?
Clear links to spatial signal annotation (e.g. -LOC tags)

Reichenbach’s Model of Verbs
How can we model tense in language?
Each verb happens at event time, E
The verb is uttered at speech time, S
Past tense: E < S John ran.
Present tense: E = S I’m free!
What diﬀerentiates simple past from past perfect?
John ran. is not the same as John had run.
Introduce abstract reference time, R
John had run. E < R < S

Reasoning about tense
How is Reichenbach’s model helpful?
We can describe all verbal events as three points linked by
either equality or precedence
Automatic and quick inference for relating intervals
Does it work?
Conducted ﬁrst corpus-driven validation of the framework
For reporting-type links, we used features based on pairwise
event-time relations
Add one feature representing the Reichenbachian ordering
Classiﬁer reached 59% accuracy (48% MCC baseline) on 9%
of all temporal relations (above SotA)

Extending the model
How else can we use the model?
Positional use
Timexes relate to reference points
Only consider cases where the event and time are linguistically
connected
Identify these using dependency parses
Add a feature hinting at the ordering
We reach 75% accuracy from a 67% baseline (above SotA)
Also useful for timex standard transduction 4
4
Derczynski, Llorens and Saquete 2012: Massively increasing TIMEX3
resources

Contributions
A large part of the diﬃcult relation set (roughly 60%) is catered
for by these new information sources.
Diﬃcult task, with notable impact
Focus on automatic annotation of temporal relations
Pushed beyond SotA understanding of the problem
Creation of and contribution to language resources – e.g.
ISO-TimeML, RTMML, CAVaT (among others)
.. where could we go next?

Future
Forensic analysis
How can we build a consistent event model from multiple
semi-reliable accounts of an event?
Challenges:
Multi-document event and actor co-reference
Story conﬂict resolution 5
Spatial and temporal IE from colloquial text
Building and resolving accurate co-constraining models from
unreliable data (belief networks)
5
Regneri, Koller and Pinkal 2010: Learning Script Knowledge with Web
Experiments

Future
Assertion bounding
All assertions have temporal bounds. How can we determine these?
Challenges:
Accurate extraction of document temporal structure
Automated reasoning
High-precision timex normalisation
Doing temporal IE & IR at gigaword scale

Future
Temporal dataset construction
Many current systems index whole documents by date, but
information is more nuanced than that
Challenges:
Mapping events to temporal data points
Storing and extracting events
Anchoring events with uncertain bounds (“last year’s ﬁghting”
vs. “the ﬁghting on April 23, 2011”)
Mining complex super-events; e.g. the Fukushima disaster;
what happened when?

Recap
Temporality is ubiquitous, in the world around us and in the
language we use to describe our world
Processing it automatically is diﬃcult
Doing high-performance temporal IE opens exciting research
avenues
Thank you for your time. Are there any questions?

Labellings as probability distributions
Automated methods (e.g. classifiers) may have varying degrees of
confidence about a link’s label.
We could assign a set of labels and probabilities to each label.
Consistency constraints allow us to find the most-likely possible
graph.
A:B → before: 0.9; after 0.1
B:C → before: 0.5; simultaneous: 0.5
A:C → before: 1.0
Very time-consuming to compute
– optimisations welcome!

Unuttered temporal orderings
Event/Time distance
“When I was brushing my teeth”
→ This event happens at least twice daily; assume this instance is
0-16 hours away
Complex events
“When we were putting up the tents for the festival”
→ near the beginning of / just before the “festival” event

Determining the Types of Temporal Relations in Discourse

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Determining the Types of Temporal Relations in Discourse

Ähnlich wie Determining the Types of Temporal Relations in Discourse (20)

Mehr von Leon Derczynski

Mehr von Leon Derczynski (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Determining the Types of Temporal Relations in Discourse