2. presentation my research taster project
cdt?
■ 4-year PhD course
■ funded by EPSRC
■ industrial partners
■ multi-disciplinary
■ new model for all PhD training within the UK
15/02/2012, Michele Filannino 2 / 23
3. presentation my research taster project
cdt?
■ 6 months of foundation period
● 3 postgraduate courses
▶ Machine Learning and Data Mining, Modelling and
visualisation of high-dimensional data, Semi-structured data
and the web
● 3 scientific methods courses
● 1 short taster project [6 weeks]
● creativity workshops
■ 3,5 years of PhD research
15/02/2012, Michele Filannino 3 / 23
4. presentation my research taster project
where we are
■ Computer science
● natural language processing
▶ information retrieval
★ information extraction
✦ temporal expressions extraction
15/02/2012, Michele Filannino 4 / 23
5. presentation my research taster project
or...
■ Computer science
● data mining
▶ text mining
★ information extraction
✦ temporal expressions extraction
15/02/2012, Michele Filannino 5 / 23
6. presentation my research taster project
temporal expression
■ natural language phrase that denotes a temporal
entity: an interval or an instant1
● fully-qualified: no reference to any other temporal
entity
▶ March 15, 2001
● deictic: reference to the time of utterance
▶ today, yesterday, three weeks ago, last Thursday
● anaphoric: reference to a timex2 previously evoked in
the text
▶ March 15, the next week, Saturday, at that time
1 L.Ferro, I. Mani, B. Sundheim, and G. Wilson, “Tides temporal annotation guidelines, v.
1.0.2,” MITRE, 2001 15/02/2012, Michele Filannino 6 / 23
2 timex temporal expression
7. presentation my research taster project
why?
■ user’s perspective
● temporal aspects of events and entities provide a
natural mechanism for organising information.
■ machine’s perspective
● improvements in
▶ question answering, summarisation, browsing
15/02/2012, Michele Filannino 7 / 23
8. presentation my research taster project
how?
■ annotation
● recognition
▶ automatically detect and delimitate expressions
▶ mostly machine-learning techniques
● normalisation
▶ assign attributes values for all the recognised
expressions
▶ using a shared and formal format (standard?)
▶ mostly rule-based techniques
■ reasoning or searching
15/02/2012, Michele Filannino 8 / 23
9. presentation my research taster project
timex forms1
■ time or date references
● 11pm, February 14th, 2005
■ time references that anchor on another time
● one hour after midnight, two weeks before Christmas
■ durations
● few months, two days, five years
■ recurring times
● every third month, twice in the hour
1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition
of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 9 / 23
10. presentation my research taster project
timex forms1
■ context-dependent times
● today, last year
■ vague references
● somewhere in the middle of June, the near future
■ times indicated by an event
● the day S. Berlusconi resigned
▶ an event is considered a cover term for situations that
happen or occur
1 J.Poveda, M. Surdeanu, and J. Turmo, “An analysis of Bootstrapping for the Recognition
of Temporal Expressions”, 2009 15/02/2012, Michele Filannino 10 / 23
11. presentation my research taster project
timeline
ACE-2004 dev & eval TempEval Task#15 TempEval-3 Task#1
(TERN2004 corpus) (in SemEval07) (in SemEval13)
TimeML TempEval-2 Task#13
(standard) (in SemEval10)
85%1 87.8%1 90.7%1
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
TimeBank SVM Conditional Random Fields
(corpus) (machine learning) (machine learning)
Hand grammar approach Maximum Entropy Class. Markov logic network
(rule-based) (machine learning) (machine learning)
1 TERN2004 corpus 15/02/2012, Michele Filannino 11 / 23
12. presentation my research taster project
standards
■ “the nice thing about standards is, there are so
many to choose from” by Andrew S. Tanenbaum
● TimeML
● DAML-Time
● TIDES
● ACE-TERN
15/02/2012, Michele Filannino 12 / 23
13. presentation my research taster project
standards
■ there’s a tension between
● flexibility and efficiency
● usability and flexibility
● complexity and spreadability
● flexibility and agreement
15/02/2012, Michele Filannino 13 / 23
14. presentation my research taster project
about the spreadability
15/02/2012, Michele Filannino 14 / 23
15. presentation my research taster project
about the agreement
TimeML Tag agreement
TIMEX3 0.83
SIGNAL 0.77
EVENT 0.78
ALINK 0.81
SLINK 0.85
TLINK 0.55
Source: http://timeml.org/site/timebank/documentation-1.2.html 15/02/2012, Michele Filannino 15 / 23
16. presentation my research taster project
example: raw text
That means Unisys must pay about $100 million in interest every
quarter, on top of $27 million in dividends on preferred stock.
Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 16 / 23
17. presentation my research taster project
example: recognition
That means Unisys must <ev>pay</ev> about $100 million in interest
<te>every quarter</te>, on top of $27 million in dividends on preferred
stock.
Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 17 / 23
18. presentation my research taster project
example: normalisation
That means Unisys must <EVENT eid="e110" mainevent="YES"
class="OCCURRENCE" stem="pay" tense="NONE" aspect="NONE"
polarity="POS" pos="VERB">pay</EVENT> about $100 million in
interest <TIMEX3 tid="t256" type="SET" value="P1Q"
temporalFunction="false" functionInDocument="NONE"
quant="every">every quarter</TIMEX3>, on top of $27 million in
dividends on preferred stock.
<TLINK lid="l32" relType="BEFORE" relatedToEvent="e110"
eventID="e107"/>
<TLINK lid="l26" relType="OVERLAP" eventID="e110"
relatedToTime="t256"/>
Source: TRIOS TimeBank v.0.1 15/02/2012, Michele Filannino 18 / 23
19. presentation my research taster project
considerations
■ specialised linguistic approaches do not pay
● machine learning techniques usually perform better
■ scarcity of pre-annotated corpus
● manual corpus annotation is very tricky
● partially solved with TempEval-3 (2013)
▶ 1M words corpus automatically annotated by TRIOS
■ vibrant area in bio-medical domain
15/02/2012, Michele Filannino 19 / 23
22. presentation my research taster project
considerations
■ rule-based approach will never die
● CRF and MLN are machine learning hybridisation
■ better performance means clever decomposition
● how to divide the general problem into sub-problems
15/02/2012, Michele Filannino 22 / 23
23. presentation my research taster project
my to-do list
■ collect some corpus in clinical field
■ study novel machine learning approaches
● maximum likelihood, logistic regression, CRF, MLN
■ implement a prototype
● Python or MATLAB
12 days elapsed 18 days remaining
0 3 6 9 12 15 18 21 24 27 30
15/02/2012, Michele Filannino 23 / 23