The lack of context that a multimedia document taken in isolation can provide, hinders a proper understanding of the story being reported. International news items are a good example of such phenomena. Therefore, there is a need of unveiling other story's aspects that, even not being explicitly present in the seed document, are crucial to fully capture the backstory. To deal with this problem, we propose an innovative conceptual model called the News Semantic Snapshot (NSS) that is designed to make explicit the wide context of a news event. Following a process called Named Entity Expansion, we query the Web to bring other viewpoints about what is happening around us, from the thousands of news articles and posts where we could potentially find those missing story details. We have also proposed an innovative Concentric-based approach that better spots those contextual entities by leveraging on the duality between the so-called Core, which contains representative entities that are frequently mentioned in the related documents, and the the ones that hold particular semantic relationships with the Core and shape up the Crust around it.
2. Outline
Semantic
Annotation of
News’ Context
Original artwork by Matt Might
http://matt.might.net/articles/phd-school-in-pictures/
TOWARDS A SEMANTIC
MULTIMEDIA WEB
i. Media annotation
ii. A multimedia model
iii. Semantic media
exploitation
CONTEXTUALIZING NEWS
STORIES
i. The News Semantic
Snapshot (NSS)
ii. The multidimensional
nature of the entity
relevance
iii. A concentric model for NSS
generation
iv. NSS in the consumption of
News
Future CareerPHDPrevious
1 2
3. Outline
Semantically Capturing and Representing News Stories on the Web 3
Part II: Semantic
Annotation of News’
Context
Multidimensional Relevancy
NSS Generation
Concentric Model
NSS Gold Standard
News Prototypes
2016/03/04
4. The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 4
http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8
(Media Fragment URI 1.0)
Edward
Snowden
(NE over Subtitles)
Sarah Harrison
WikiLeaks Editor Airport in Moscow
Sheremetyevo
2016/03/04
5. Semantically Capturing and Representing News Stories on the Web 5
The Use Case: Contextualizing News
2016/03/04
6. Semantically Capturing and Representing News Stories on the Web 6
Research Questions
Q1: How can multimedia content be semantically
annotated and seamlessly connected with other resources on
the Web?
Q2: Can those semantic annotations and linked media
resources bring value for the exploitation and consumption
of multimedia content?
Q3: Is it possible to automatically contextualize news
stories with background information so they can be
effectively interpreted by humans and machines?
2016/03/04
7. Part 1
Towards a Semantic Multimedia
Web
Semantically Capturing and Representing News Stories on the Web 7
1
Q.1, Q.2
2016/03/04
8. “
Bringing Multimedia to the Web
Why?
Semantically Capturing and Representing News Stories on the Web 8
Make video a first citizen of the Web
Make video universally accessible and
shareable at different granularities
(segments)
Benefit from the vast knowledge already
present on the Web
2016/03/04
9. Semantic Annotation
Alfonseca, E. and Manandhar. An
unsupervised method for General Named Entity
Recognition and Automated Concept Discovery
Mendes, P., Jakob, M. and Garcia-Silva,
A and Bizer, C. DBpedia spotlight: shedding
light on the web of documents
Shinyama, Y. and Sekine, S. Named entity
discovery using comparable news articles
Chang, S-F, Manmatha, R and Chua, T-S.
Combining text and audio-visual features in
video indexing
Wang, Richard C. and Cohen, William W.
Iterative Set Expansion of Named Entities Using
the Web
Talukdar, P-P., Brants, T., Liberman, M.
and Pereira, F. A. Context Pattern Induction
Method for Named Entity Extraction
Multimedia Modeling
MPEG-7 http://mpeg.chiariglione.org/
standards/mpeg-7/mpeg-7.htm
TV-Anytime http://tech.ebu.ch/tvanytime
Synchronized Multimedia Integration
Language https://www.w3.org/TR/REC-smil/
Media Fragment URI 1.0 specification
(W3C) http://www.w3.org/TR/media-frags
◉ Synote: http://linkeddata.synote.org
◉ Ninsuna: http://ninsuna.elis.ugent.be/
BBC Programmes Ontology http://
www.bbc.co.uk/ontologies/programmes/
2009-09-07.shtml
Schema.org (SchemaDotOrgTV) http://
www.w3.org/wiki/WebSchemas/
Ontology for Media Resources https://
www.w3.org/TR/mediaont-10/
Web Annotation https://www.w3.org/TR/
annotation-model/
Semantically Capturing and Representing News Stories on the Web 9
State of the Art & Related Work
Part
1
Named
Entity
Multimodal
Expansion
2016/03/04
10. Multimedia
Annotations
Semantically Capturing and Representing News Stories on the Web 10
Automatic annotation: 300 hours/min YouTube video
What is inside the video? multimodal approach
Semantic annotations, leveraging on Web
Resources: more human-like operations
1.a
2016/03/04
11. 1 ontology http://nerd.eurecom.fr/ontology
2 API http://nerd.eurecom.fr/api/application.wadl
3 UI http://nerd.eurecom.fr
Multimedia Annotation: Named Entity Recognition
Semantically Capturing and Representing News Stories on the Web 11
nerd:Product
S-Bahn
nerd:Person
Obama
nerd:Person
Michelle
nerd:Location
Berlin
http://data.linkedtv.eu/media/e2899e7f#t=840,900
Part
1.a
https://github.com/giusepperizzo/nerdml
ML
[Rizzo_LREC’14]
2016/03/04
12. Other documents
similar to DS
b) Expanded Entities
a) Entities from Seed Document DS
Multimedia Annotation: Named Entity Expansion
Semantically Capturing and Representing News Stories on the Web 12
[Redondo_SNOW’14]
Part
1.a
2016/03/04
13. Multimedia Annotation: Expansion Pipeline
Semantically Capturing and Representing News Stories on the Web 13
[Redondo_SNOW’14]
Part
1.a
Available @ http://linkedtv.eurecom.fr/entitycontext/api/
2016/03/04
14. Multimedia Annotation: Multimodal Approach
Text:
○ Keyword Extraction
○ Topic Recognition
○ From Textual Visual Cues to LSCOM Concepts
Visual:
○ Visual Concept Detection (LSCOM)
○ Shot Segmentation
○ Scene Segmentation
○ Optical Character Recognition (OCR)
○ Automatic Speech Recognition (ASR)
○ Face Detection and Tracking
○ …
14
Multimedia
Knowledge
Model
Part
1.a
Semantically Capturing and Representing News Stories on the Web2016/03/04
15. Multimedia
Model
Semantically Capturing and Representing News Stories on the Web 15
Explicitly represent video and its annotations
At the level of fragments
Based on well-known vocabularies, flexible and
extensible while being Linked Data compliant
1.b
2016/03/04
16. Multimedia Model: LinkedTV Model
Semantically Capturing and Representing News Stories on the Web 16
Annotation
Concept
KeywordBBC Ontology +
SchemaDotOrgTV
ANALYSIS RESULTS (Support for segmentation)
Media
Fragments URI
1.0 (W3C)
LSCOM
Ontology for Media
Resources (W3C)
BROADCAST DATA
Web Annotations
(W3C)
EXTERNAL DATASETS
Entity
NERD
Provenance
Ontology for
Provenance
Management
Programme
Brand
Series
Episode
Version Broadcast
ServiceBroadcast Channel
Scene
Shot
MediaFragment
Face
Part
1.b
Available @ http://data.linkedtv.eu/ontologies/core/
2016/03/04
17. Semantically Capturing and Representing News Stories on the Web 17
Part
1.b
Locator
MediaResource
MediaFragmentAnnotation
Entity
URL (hyperlink)
Type
OffsetBasedString
Multimedia Model: LinkedTV Model
2016/03/04
18. Multimedia Model: TV2RDF Service
Semantically Capturing and Representing News Stories on the Web 18
Part
1.b
Content Publisher
RDF
Conversion + NERD
TV2RDF
AnalysisMetadata
RDF
Triplestore
Available @ http://linkedtv.eurecom.fr/tv2rdf/
2016/03/04
19. Exploiting
Knowledge
Semantically Capturing and Representing News Stories on the Web 19
Leverage on the Model & Annotations for
advanced mining tasks
Probe the value of multimodal approach:
Evaluation on standard corpora
1.c
2016/03/04
20. Semantically Capturing and Representing News Stories on the Web 20
Part
1.c
Exploitation: Enriching
oa:Annotation
rbbaktuell_20120809
nerd:Location
Berlin
Illustrate seed video [Milicic_WWW'13]
2016/03/04
21. Exploitation: Enriching Services & Prototypes
Semantically Capturing and Representing News Stories on the Web 21
Part
1.c
Name URL Published @
MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12]
MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13]
Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13]
TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14]
TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14]
2016/03/04
23. Exploitation: Promoting Media Fragments
Semantically Capturing and Representing News Stories on the Web 23
Part
1.c
Available @ http://linkedtv.eurecom.fr/HyperTED
[Redondo_ISWC’14]
2016/03/04
24. Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 24
Part
1.c
~ 1697h of BBC video data, 2323 videos
Different TV shows
(news, sports, politics…)
from 2012
Subtitles and ASR
(English)
Output of some visual
algorithms: shot and face
detection
Anchor
Search Task Hyperlinking Task
Query
T/V
v1 v2 v3 vn v1 v2 v3 vn
va
2016/03/04
25. Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 25
Part
1.c
Annotations Processing Time Type
Visual Concept Detection (151) 20 days on 100 cores Visual **
Scene Segmentation 2 days on 6 cores Visual
OCR 1 day on 10 cores Visual
Keywords Extraction 5 hours Textual **
Named Entities Extraction 4 days Textual
Face detection and Tracking 4 days on 160 cores Visual
Data Indexing:
◉ Lucene & Solr
◉ Granularities: Shot, Scenes, Sliding Windows…
◉ Multimodality
Query Formulation:
◉ Search: Text + Visual Cues + Visual Concept
Mapping, MLSCOM
◉ Hyperlink: Subtitles, Keywords, LSCOM
concepts (MoreLikeThis)
Approach:
2016/03/04
26. 0.19 MRR
(Mean R. Rank)
Evaluation: Mediaeval 2013 Results
Semantically Capturing and Representing News Stories on the Web 26
Part
1.c
Search Task
Hyperlinking Task
[Sahuguet_MediaEval’13]
0,72 P10
2016/03/04
27. Evaluation: Mediaeval 2014 Results
Semantically Capturing and Representing News Stories on the Web 27
Part
1.c
Search Task
[Hoang_MediaEval’14]
Hyperlinking Task
Changes in 2014 edition:
◉ New Dataset from BBC: 2686 hours and 3520 videos
◉ No Visual Cues on Search Queries
◉ New Approach: 22% MAP improvement in 2013 Dataset
0.71 P10
0.67 P10
2016/03/04
28. “
Narrowing down…
From Multimedia Content to
News Items
Semantically Capturing and Representing News Stories on the Web 282016/03/04
30. The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 30
Wolfgang Schäuble
Finance Minister Ruling Party in Ger.
Christian Democratic
Union
Part
2
2016/03/04
31. Semantic News Annotation
N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named
entity disambiguation in the news domain.
S. Chabra. Entity-centric summarization: Generating text summaries for graph
snippets.
A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D.
Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao.
Contextual insights
N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.
N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past:
Supporting interpretations of forgotten stories by time-aware re-contextualization.
N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator:
Automatically contextualizing news articles.
T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes.
Automatic selection of social media responses to news.
Semantically Capturing and Representing News Stories on the Web 31
State of the Art & Related Work
Part
2
Graph
Named Entities
in News
Contextualizing
News
Relevancy of
Entities
2016/03/04
33. Semantically Capturing and Representing News Stories on the Web 33
Going deep
down…
It is always challenging
What is on top:
Entities explicitly appearing
in the documents
Laura Poitras
Anatoly Kucherena
Edward Snowden
Part
2.a
The News Semantic Snapshot (NSS)
2016/03/04
34. The News Semantic Snapshot (NSS)
Semantically Capturing and Representing News Stories on the Web 34
Part
2.a
News Semantic Snapshot
(NSS)[Redondo_ICWE’15]
2016/03/04
35. The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 35
Part
2.a
High Level of detail, significant human Intervention: (Experts
in the news domain + users)
Entities in 5 Dimensions: (Visual & Text)
(1) Video Subtitles
(2) Image in the video
(4) Suggestions of an expert
(5) Related articles
USER SURVEY
“We don't have any extradition treaty with Russia.
Broadly speaking our policy remains the same: that
we'd like him returned
(3) Text in the video
image
(2)
(3)
(1)
[Romero_TVX’14]
2016/03/04
36. The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 36
Part
2.a
Play with the data and help us to extend it at:
https://github.com/jluisred/
NewsConceptExpansion/wiki/Golden-Standard-
Creation
25
2016/03/04
38. b) Expanded
Entities
a) Entities from Seed Document DS
Generating the NSS: General Method
Semantically Capturing and Representing News Stories on the Web 38
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
2016/03/04
39. b) Expanded
Entities
a) Entities from Seed Document DS
Generating the NSS: Entity Expansion
Semantically Capturing and Representing News Stories on the Web 39
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
2016/03/04
40. Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 40
Part
2.b
Query:
- Title
- 5 W’s over Subtitles Entities
Web sites to be crawled:
- Google
- L1 : A set of 10 internationals
English speaking newspapers
- L2 : A set of 3 international
newspapers used in GS
Temporal Window:
- 1W:
- 2W:
Annotation filtering
- Schema.org
[Redondo_ICWE’15]
Parameters:
2016/03/04
41. b) Expanded
Entities
a) Entities DS
Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 41
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
Recall (E. Expansion) =
0.91
Recall (NER on Subtitles) =
0.42
2016/03/04
42. b) Expanded
Entities
a) Entities DS
Generating the NSS: Selection
Semantically Capturing and Representing News Stories on the Web 42
(2)
c) News Semantic Snapshot
Part
2.b
[Redondo_SNOW’14]
2016/03/04
43. Generating the NSS: The Selection problem
Semantically Capturing and Representing News Stories on the Web 43
Part
2.b
(NSS)
0
N
FIdeal(ei)
(NSS)
FX(ei)
=?Expansion
2016/03/04
44. Generating the NSS: Measures
Semantically Capturing and Representing News Stories on the Web 44
Part
2.b
1 Precision / Recall @ N
- Popular
- Easy to interpret
2 Mean Normalized Discounted Cumulative Gain
(MNDCG) @ N:
- Considers ranking
- Relevant documents at the top positions
3 Compactness for Recall R:
- Compromise between: Recall and NSS size
2016/03/04
45. Generating the NSS: Compactness Example
Semantically Capturing and Representing News Stories on the Web 45
Part
2.b
Recall: 22/33 = 0.66
Sa = 27
Sb = 33
Sc = 54
Sa = 27
Sb = 33
Sc= 54
(NSS)
A B CA
B
C
> >
2016/03/04
46. Generating the NSS: The Approaches
Semantically Capturing and Representing News Stories on the Web 46
Part
2.b
1 Frequency-Based Ranking
- Leverages on biggest sample provided by expansion
- Prioritizes representativeness
2 Multidimensional Entity Relevance
Ranking
- Relevancy of entities is ground on different dimensions
3 Concentric Based Approach
- Core / Crust model
- Alleviates the problem of dealing with many dimensions
[Redondo_SNOW’14]
[Redondo_ICWE’15]
[Redondo_KCAP’15A]
2016/03/04
47. Generating the NSS: (1) Frequency-Based
Semantically Capturing and Representing News Stories on the Web 47
Part
2.b
[Redondo_SNOW’14]
A
2016/03/04
48. Generating the NSS: (2) Multidimensional
Semantically Capturing and Representing News Stories on the Web 48
Part
2.b
[Redondo_ICWE2015]
2016/03/04
49. Semantically Capturing and Representing News Stories on the Web 49
Part
2.b
POPULARITY (FPOP) EXPERT RULES (FEXP)
49
- Based on Google Trends
- w = 2 months
- µ + 2*σ (2.5%)
Example:
- [ Location, = 0.43]
- [ Person, = 0.78]
- [ Organization, = 0.95 ]
- [ < 2 , = 0.0 ]
Generating the NSS: (2) Multidimensional
2016/03/04
50. Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 50
Part
2.b
20 x 4 x 4 =
320 formulas
2016/03/04
51. Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 51
Part
2.b
News Entity Expansion & Dimensions ! Generate NSS
Frequency-based score: 0.473 MNDCG @ 10
Best score: 0.698 MNDCG @ 10
• Collection:
• CSE (Google + 2W + Schema.org)
• Ranking:
• Expert Rules
• Popularity
Multidimensional Nature of the NSS
2016/03/04
52. Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 52
Part
2.b
(NSS)
FREQ
0
(NSS)
F(Laura Poitras) = 2
F(Glenn Greenwald) = 1
2016/03/04
53. Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 53
Part
2.b
(NSS)
(Expansion)
FREQ POP EXP
+ + =
(NSS)
2016/03/04
54. Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 54
Part
2.b
1. Exploit Google relevance (+1.80%)
2. Promote subtitle entities (+2.50%)
3. Exploit named entity extractor’s
confidence (+0.20%)
4. Interpret popularity dimension (+1.40%)
5. Performing clustering before filtering
(-0.60%)
- NO SIGNIFICANT IMPROVEMENT -
NMDCG @ 10:
2016/03/04
55. Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 55
Part
2.b
Tune
Function XFREQ POP EXP Re-ShuffleOriginal
(NSS)
2016/03/04
56. Semantically Capturing and Representing News Stories on the Web 56
Part
2.b
MNDCG:
• Too focused on success at first positions (decay Function)
• NSS intends to be flexible, ranking is application-dependent
COMPACTNESS:
• Prioritizes coverage over ranking while minimizing NSS size
Re-thinking the problem: measures
2016/03/04
57. Semantically Capturing and Representing News Stories on the Web 57
Part
2.b
Duality in news entity spectrum:
• Representative entities:
• Driving the plot of the story
• Relevant entities
• Related to former via specific reasons
• Exploit the entity semantic relations
Suggested by Expert?
Informative?
Unexpected?
Interesting?
Explicative?
Re-thinking the problem: dimensions
2016/03/04
58. Semantically Capturing and Representing News Stories on the Web 58
Part
2.b
Generating the NSS: (3) Concentric Approach
Core
• Representative entities
• Spottable via frequency
dimensions
• High degree of
cohesiveness
Crust
• Attached to the Core via
semantic relations
• Agnostic to relevancy
nature:
informativeness,
interestingness, etc.
[Redondo_KCAP2015A]
2016/03/04
59. Semantically Capturing and Representing News Stories on the Web 59
Part
2.b
Generating the NSS: (3) Core Creation
a) Spot representative entities:
Frequency Dimension
(NSS)
b) Cohesiveness (DBpedia)
2016/03/04
60. Semantically Capturing and Representing News Stories on the Web 60
Part
2.b
Generating the NSS: (3) Crust Creation
The number of Web
documents talking
simultaneously about a
particular entity e and the
Core: ?
2016/03/04
61. Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 61
Part
2.b
1. Entity Frequency
○ Core1: Jaro-Winkler > 0.9
○ Core2: Frequency based on Exact String matching
2. Cohesiveness:
○ Everything is Connected Engine, Skb(e1, e2) > 0.125
Everything is Connected
Engine:
https://github.com/mmlab/eice
Concentric Core:
2016/03/04
62. Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 62
Part
2.b
1. Candidates for CRUST generation:
○ Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP
○ Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP
2. Function for attaching entities to CORE:
○ SWEB(ei, Core) over Google CSE, default configuration
Concentric Crust:
2016/03/04
63. Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 63
Part
2.b
Combining CORE and CRUST:
Core+CrustCrustOnly
2016/03/04
64. Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 64
Part
2.b
36.9% more compact than Multidimensional
(NSS’s size decrease)
IdealGT: size of SSN according to Gold Standard
(2*2*2 + 2) Runs
2016/03/04
65. Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 65
Part
2.b
NSS
Gold
Standard
Fukushima Disaster 2013
2016/03/04
n=22
69. Semantically Capturing and Representing News Stories on the Web 69
Part
2.c
NSS Consumption: News Prototypes
… short
summaries,
previews,
hotspots …
… advanced
graphs and
diagrams,
timelines, in-
depth summaries
…
… second screen
apps, slideshows,
info-boxes …
2016/03/04
70. Semantically Capturing and Representing News Stories on the Web 70
Part
2.c
NSS Consumption: Consumptions Phases
The Before The During The After
2016/03/04
71. Semantically Capturing and Representing News Stories on the Web 71
Part
2.c
NSS Consumption: Phases VS Layers
[Redondo_KCAP’15B]
2016/03/04
73. Semantically Capturing and Representing News Stories on the Web 73
Conclusions
a. Applied NER and NED as semantic annotation techniques in the
multimedia domain
b. Developed other techniques such as Named Entity Expansion or
Visual Concept Mapping
c. LinkedTV model to harmonize annotations into the Linked Data Web
Q1: How can multimedia content be semantically
annotated and seamlessly connected with
other resources on the Web?
Q2: Can those semantic annotations and linked
media resources bring value for the exploitation
and consumption of multimedia content?
a. Exploiting multimedia semantic techniques: enriching, highlighting
media fragments (hotspots), classifying videos…
b. Evaluation of multimodal approaches via Mediaeval 2013/2014
2016/03/04
74. Semantically Capturing and Representing News Stories on the Web 74
Conclusions
a. Proposed the NSS model and a Gold Standard
b. The multidimensional nature of the entity relevance
• Gaussian function, popularity, experts rules…
c. Concentric model better reproduces the NSS:
• Better Compactness: 36.9% over BAS01 (similar recall, smaller size)
• Core/Crust brings up relevant entities without having to deal with
fuzzy dimensions
d. NSS better supports the news consumption phases:
(Before, During, After)
Q3: Is it possible to automatically contextualize news
stories with background information so they can
be effectively interpreted by humans and
machines?
2016/03/04
75. Semantically Capturing and Representing News Stories on the Web 75
Future Work
• [S] Publish generated NSS on the Web (Linked Data)
• [S] Extend the Gold Standard:
• From 5 to 23 videos, concentric based model for candidate selection
• Submission to TOIS
• [S] Not depending on “big players” for retrieving
knowledge during the expansion phase
(Terrier VS Google experiments)
2016/03/04
76. Semantically Capturing and Representing News Stories on the Web 76
Future Work
• [M] Using the power of crowdsourcing in Gold Standard
creation
• Increase size of the Gold Standard without involving
experts
• Consider different levels of entity relevancy
• [M] Supervised techniques: Learn to Rank
• Features in entities: surface forms, URL’s, types…
• Features in documents, sources, and other provenance
information
2016/03/04
77. Semantically Capturing and Representing News Stories on the Web 77
Future Work
• [L] Spot not only the strength of the relationships
between Crust and the Core, but also the predicates
Editor in WikiLeaks
Generating
Explanations
analyzing documents
considered in Sweb
2016/03/04
78. Semantically Capturing and Representing News Stories on the Web 78
Future Work
• [L] Not having to rely on “Big Players” during Crust
generation:
• Continuous indexing
• Better curated white lists
• Fresher structured databases: DBpedia events
• [L] Reusing concentric model in context-related tasks:
• Name Entity Extraction/Disambiguation
" As another feature similar to BagOfWords, Word2vec…
• Exploratory Searches
" Diversity, serendipity…
++
[Steiner_ICWE’15]
2016/03/04
79. José Luis Redondo García
http://jluisred.github.io
@peputo
http://github.com/jluisred
“my small dent in
the vast ocean of
knowledge…”
Ph.D.
questions?
80. Semantically Capturing and Representing News Stories on the Web 80
Publications
Journals
• Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the
Management of Information about Television Content. International Journal of Semantic
Computing, 6(01), 111-130, 2012.
Conferences
• Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling
a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15),
Palisades, NY, USA.
• Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News
Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th
International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA.
Best Paper Award
• Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating
Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference
on Web Engineering (ICWE'15), Rotterdam, the Netherlands.
• Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R.
(2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social
Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia
(SAM'12), Nara, Japan
Journals (2), Conferences (6), Workshops(5), Demo/Poster(7)
2016/03/04
81. Semantically Capturing and Representing News Stories on the Web 81
References
[Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways
[Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots
[Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion
[Redondo_ISWC’14] Finding and sharing hot spots in Web Videos
[Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion
[Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show
[LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations
[Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts
[Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task
[Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web
[Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification
[Milicic_WWW'13] Live Topic Generation from Event Streams
[Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election
[Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task
[Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social
Networks
2016/03/04