This document discusses techniques for temporal and event tagging to analyze documents and extract events and their temporal relationships. It proposes applying summarization as a way to filter sentences and events. The approach uses three components: prior ranking of sentences, cosine similarity ranking, and a PageRank-like algorithm. It evaluates the approach on DUC2007 data and finds it significantly reduces the number of extracted events while maintaining sentence selection quality. However, the document also notes limitations in existing technology for temporal analysis across multiple documents.
20. Effect of Sentence Filtering
choosing the top 10 sentences
D0701A D0720E
#Event before
3320 1435
Filtering
#Event after
67 37
Filtering
How can we represent 3320 events
on a timeline?
21. Time-Event Anchoring
D0701A D0720E
#Event before
3320 1435
Filtering
#Failure 3085 1129
#Event after 67 37
Filtering
#Failure 49 29
This shows that my approach is a failure
22. TARSQI only support
single document
WHY?
e.g. 50 tagged events, Unable to deduce the
only 50 pairs of relationships for all pair
relation are tagged of events
should be 50C2 = 1225
24. Temporal and Event
Tagging 3 areas
my project
Automatic
Summarization
Topic Detection
and Tracking
25. The limit of existing
technology
OR EVEN
The limit of temporal
analysis
cannot get enough information from the documents
26. cosine similarity with tf-idf
weighting is computational
expensive
2.5 hrs for 867 sentences
27. DUC2007 Documents are
hard to parse
different documents have different
format........
no standard date format...
contains some special
characters that cause troubles
to XML parsers...