(Linked Data Development and Exploitation track) "Generating the Semantic Snapshot of Newscasts Using Entity Expansion" - José Luis Redondo-García, Giuseppe Rizzo, Lilia Pérez Romero, Michiel Hildebrand and Raphaël Troncy
Similar to (Linked Data Development and Exploitation track) "Generating the Semantic Snapshot of Newscasts Using Entity Expansion" - José Luis Redondo-García, Giuseppe Rizzo, Lilia Pérez Romero, Michiel Hildebrand and Raphaël Troncy
Microservices 101: From DevOps to Docker and beyondDonnie Berkholz
Similar to (Linked Data Development and Exploitation track) "Generating the Semantic Snapshot of Newscasts Using Entity Expansion" - José Luis Redondo-García, Giuseppe Rizzo, Lilia Pérez Romero, Michiel Hildebrand and Raphaël Troncy (20)
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
(Linked Data Development and Exploitation track) "Generating the Semantic Snapshot of Newscasts Using Entity Expansion" - José Luis Redondo-García, Giuseppe Rizzo, Lilia Pérez Romero, Michiel Hildebrand and Raphaël Troncy
1. GENERATING NEWSCASTS
SEMANTIC SNAPSHOTS USING
ENTITY EXPANSION
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / redondo@eurecom.fr
@giusepperizzo / giuseppe.rizzo@eurecom.fr
L.Perez@cwi.nl
@McHildebrand / Michiel.Hildebrand@cwi.nl
@rtroncy / raphael.troncy@eurecom.fr
2. NEWS CONSUMPTION
SEMANTIC SNAPSHOT
(NSS)
Named Entity
Expansion
News item
2
News Semantic Snapshot
(NSS)
Snowden asks
Russia for asylum
15th International Conference on Web Engineering (ICWE)June 24, 2015
3. NEWS ENTITY
EXPANSION
NSS
June 24, 2015 3
(20) (1) (4) (4)
Web-based, Unsupervised, Sequential
15th International Conference on Web Engineering (ICWE)
4. Involving: (experts in the news domain + users)
Dimensions:
Play with the data and help us to extend it at:
https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-
Standard-Creation
EVALUATION: NEWS ENTITIES
GOLD STANDARD
(1) Video Subtitles
(2) Image in the video
(3) Text in the video image
(4) Suggestions of an expert
(5) Related articles
4June 24, 2015 15th International Conference on Web Engineering (ICWE)
5. DOCUMENT
COLLECTION
(20 variations)
Using Google Custom Search Engine (CSE)1
[1] https://cse.google.com/cse/all
June 24, 2015 5
N …N NN N N
N N N N N N N N N N
N N N
Web sites to be crawled:
- Google:
- L1 : A set of 10 internationals English speaking newspapers
- L2 : A set of 3 international newspapers used in GS
Temporal Window:
- 1W:
- 2W:
Annotation filtering:
15th International Conference on Web Engineering (ICWE)
6. DOCUMENT
ANNOTATION
NER extractors in
NERD *
(*) Benchmarking the Extraction and
Disambiguation of Named Entities on
the Semantic Web, Rizzo et al. (2004)
6June 24, 2015 15th International Conference on Web Engineering (ICWE)
7. ENTITY FILTERING
(4 variations)
Filtering dimensions:
- F1: NERD type:
- Person
- Organization
- Location
- F2: Confidence score:
> Threshold
- F3: Capitalization:
country
president
Obama
asylum
June 24, 2015 715th International Conference on Web Engineering (ICWE)
9. RANKING
STRATEGIES (2)
Rules: [ Sel(e) , ]
POPULARITY EXPERT RULES
9
- Based on Google Trends
- w = 2 months
- µ + 2*σ (2.5%)
- .
Example:
- [ Location, = 0.48 ]
- [ Person, = 0.74 ]
- [ Organization, = 0.95 ]
- [ < 2 , = 0.0 ]
(4 variations)
June 24, 2015 15th International Conference on Web Engineering (ICWE) 9
10. EVALUATION:
MEASURES
Mean P/R at N:
- Most popular
- Easy to interpret
Mean Average Precision at N (MAP):
- Considers ranking
- Relevant documents at the top positions
Mean Normalized Discounted Cumulative Gain at N (MNDCG):
- Different levels of document relevance
- The lower an high relevant document is ranked, the less useful
is for the user
N = 10
June 24, 2015 1015th International Conference on Web Engineering (ICWE)
11. RESULTS (1)
Baselines:
BS1: Former Entity Expansion Implementation*
• Google
• No temporal window
• No_Schema.org
• No_Filter
•
BS2: TFIDF-based Function.
June 24, 2015 1115th International Conference on Web Engineering (ICWE)
(*) Describing and Contextualizing
Events in TV News Show, Redondo et
al. (2014)
12. RESULTS(2)
1
20 x 4 x 4 =
320 runs
F3 Freq + POP + EXPGoogle + 2W + Schema.org 12
13. CONCLUSIONS & FUTURE WORK
- News Entity Expansion è Generate the News
Semantic Snapshot
- Best score: 0.666 in MNDCG at 10, better than BS1/2
• Collection: CSE (Google + 2W + Schema.org)
• Filtering: F3
• Ranking: Freq + POP + EXP
What’s next:
- Extend the Ground Truth
- Supervised approach
- Better exploit semantic connections between entities in KB
- Is MNDCG@10 an ideal indicator for assessing NSS quality?
June 24, 2015 1315th International Conference on Web Engineering (ICWE)
14. JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / redondo@eurecom.fr
@giusepperizzo / giuseppe.rizzo@eurecom.fr
L.Perez@cwi.nl
@McHildebrand / Michiel.Hildebrand@cwi.nl
@rtroncy / raphael.troncy@eurecom.fr
http://www.slideshare.net/joseluisredondo/newssemantic