3. Introduction
Answer on questions:
what happened on 3 January
what happened on 3 January 1865
what happened on January 1825
what happened from January until July 1985
what happened during the 16th century
what started on January 1930
what ended in 1990
28/01/10 SWT - Final Project 3
9. Date Extraction – Regular Exp.
Regular expressions aren't for parsing
Day=(d+).; Month = (Jan|Feb|...); Year=(d+)
Date = (Day Month Year | Day Month | Month Year |
Year)
Extract = (“from” Date “until” Date | Date “-” Date |
“between” Date “and” Date | “from” Date)
Day number can be on 14 positions
In real more than 1000 slots
28/01/10 SWT - Final Project 9
10. Date Extraction - Tools
Standard way:
GNU Flex / GNU Bison
Ragel
Problem with UTF-8 support
Unicode – almost 100.000 characters
Big transition tables (100.000 vs 127)
28/01/10 SWT - Final Project 10
12. Date Extraction - Example
Lexical Analysis
“From 23 January 1956 until 2 February 1960”
“From {{DATE_1}} until {{DATE_2}}”
Syntactic Analysis
Interval = “From” DATE “to” DATE
Interval = “Between” DATE “and” DATE
28/01/10 SWT - Final Project 12
13. Date Representation
Dates from 10.000 BC to 2500 AC
th
Not exact: 13 century, June 1689
Zero
2 January - 5days = 28 December
2 January 1AC -5days = 28 December
1BC
Simple tuples
(“I”, 23, 1, 1956, 20, 2, 2, 1960, 20)
28/01/10 SWT - Final Project 13
14. Web application
PHP5 + MySQL
Nette Framework + Dibi
http://css.majlis.cz/
GT: http://jdem.cz/dspw9
HTML, JSON, XML output
28/01/10 SWT - Final Project 14
15. iGoogle Gadget
iGoogle = Google personalized homepage
URL: http://jdem.cz/dspx7
Using JSON
Tricky development
28/01/10 SWT - Final Project 15
16. Future Work
Improve performance
20th century events – 28s – 406.980 (one OR)
20th century events – 0.0007s – 392.573 (no OR)
Improve parser architecture
28/01/10 SWT - Final Project 16