SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Downloaden Sie, um offline zu lesen
Semantically Capturing
and Representing News
Stories on the Web
José%Luis%Redondo%García

Jluisred.github.io @peputo
Outline
 Semantic
Annotation of
News’ Context
Original artwork by Matt Might
http://matt.might.net/articles/phd-school-in-pictures/
  TOWARDS A SEMANTIC
MULTIMEDIA WEB
i.  Media annotation
ii.  A multimedia model
iii.  Semantic media
exploitation
  CONTEXTUALIZING NEWS
STORIES
i.  The News Semantic
Snapshot (NSS)
ii.  The multidimensional
nature of the entity
relevance
iii.  A concentric model for NSS
generation
iv.  NSS in the consumption of
News
Future CareerPHDPrevious
1 2
Outline
Semantically Capturing and Representing News Stories on the Web 3
 Part II: Semantic
Annotation of News’
Context
Multidimensional Relevancy
NSS Generation
Concentric Model
NSS Gold Standard
News Prototypes
2016/03/04
The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 4
http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8
(Media Fragment URI 1.0)
Edward
Snowden
(NE over Subtitles)
Sarah Harrison
WikiLeaks Editor Airport in Moscow
Sheremetyevo
2016/03/04
Semantically Capturing and Representing News Stories on the Web 5
The Use Case: Contextualizing News
2016/03/04
Semantically Capturing and Representing News Stories on the Web 6
Research Questions
 Q1: How can multimedia content be semantically
annotated and seamlessly connected with other resources on
the Web?
 Q2: Can those semantic annotations and linked media
resources bring value for the exploitation and consumption
of multimedia content?
 Q3: Is it possible to automatically contextualize news
stories with background information so they can be
effectively interpreted by humans and machines?
2016/03/04
Part 1
Towards a Semantic Multimedia
Web
Semantically Capturing and Representing News Stories on the Web 7
1
Q.1, Q.2
2016/03/04
“
Bringing Multimedia to the Web
Why?
Semantically Capturing and Representing News Stories on the Web 8
  Make video a first citizen of the Web
  Make video universally accessible and
shareable at different granularities
(segments)
  Benefit from the vast knowledge already
present on the Web
2016/03/04
Semantic Annotation
  Alfonseca, E. and Manandhar. An
unsupervised method for General Named Entity
Recognition and Automated Concept Discovery
  Mendes, P., Jakob, M. and Garcia-Silva,
A and Bizer, C. DBpedia spotlight: shedding
light on the web of documents
  Shinyama, Y. and Sekine, S. Named entity
discovery using comparable news articles
  Chang, S-F, Manmatha, R and Chua, T-S.
Combining text and audio-visual features in
video indexing
  Wang, Richard C. and Cohen, William W.
Iterative Set Expansion of Named Entities Using
the Web
  Talukdar, P-P., Brants, T., Liberman, M.
and Pereira, F. A. Context Pattern Induction
Method for Named Entity Extraction
Multimedia Modeling
  MPEG-7 http://mpeg.chiariglione.org/
standards/mpeg-7/mpeg-7.htm
  TV-Anytime http://tech.ebu.ch/tvanytime
  Synchronized Multimedia Integration
Language https://www.w3.org/TR/REC-smil/
  Media Fragment URI 1.0 specification
(W3C) http://www.w3.org/TR/media-frags
◉  Synote: http://linkeddata.synote.org
◉  Ninsuna: http://ninsuna.elis.ugent.be/
  BBC Programmes Ontology http://
www.bbc.co.uk/ontologies/programmes/
2009-09-07.shtml
  Schema.org (SchemaDotOrgTV) http://
www.w3.org/wiki/WebSchemas/
  Ontology for Media Resources https://
www.w3.org/TR/mediaont-10/
  Web Annotation https://www.w3.org/TR/
annotation-model/
Semantically Capturing and Representing News Stories on the Web 9
State of the Art & Related Work
Part
1
Named
Entity
Multimodal
Expansion
 2016/03/04
Multimedia
Annotations
Semantically Capturing and Representing News Stories on the Web 10
  Automatic annotation: 300 hours/min YouTube video
  What is inside the video? multimodal approach
  Semantic annotations, leveraging on Web
Resources: more human-like operations
1.a
2016/03/04
1 ontology http://nerd.eurecom.fr/ontology
2 API http://nerd.eurecom.fr/api/application.wadl
3 UI http://nerd.eurecom.fr
Multimedia Annotation: Named Entity Recognition
Semantically Capturing and Representing News Stories on the Web 11
nerd:Product
S-Bahn
nerd:Person
Obama
nerd:Person
Michelle
nerd:Location
Berlin
http://data.linkedtv.eu/media/e2899e7f#t=840,900
Part
1.a
https://github.com/giusepperizzo/nerdml
ML
[Rizzo_LREC’14]
2016/03/04
Other documents
similar to DS
b) Expanded Entities
a) Entities from Seed Document DS
Multimedia Annotation: Named Entity Expansion
Semantically Capturing and Representing News Stories on the Web 12
[Redondo_SNOW’14]
Part
1.a
2016/03/04
Multimedia Annotation: Expansion Pipeline
Semantically Capturing and Representing News Stories on the Web 13
[Redondo_SNOW’14]
Part
1.a
Available @ http://linkedtv.eurecom.fr/entitycontext/api/
2016/03/04
Multimedia Annotation: Multimodal Approach
 Text:
○ Keyword Extraction
○ Topic Recognition
○ From Textual Visual Cues to LSCOM Concepts
 Visual:
○ Visual Concept Detection (LSCOM)
○ Shot Segmentation
○ Scene Segmentation
○ Optical Character Recognition (OCR)
○ Automatic Speech Recognition (ASR)
○ Face Detection and Tracking
○ …
14
Multimedia
Knowledge
Model
Part
1.a
Semantically Capturing and Representing News Stories on the Web2016/03/04
Multimedia
Model
Semantically Capturing and Representing News Stories on the Web 15
  Explicitly represent video and its annotations
  At the level of fragments
  Based on well-known vocabularies, flexible and
extensible while being Linked Data compliant
1.b
2016/03/04
Multimedia Model: LinkedTV Model
Semantically Capturing and Representing News Stories on the Web 16
Annotation
Concept
KeywordBBC Ontology +
SchemaDotOrgTV
ANALYSIS RESULTS (Support for segmentation)
Media
Fragments URI
1.0 (W3C)
LSCOM
Ontology for Media
Resources (W3C)
BROADCAST DATA
Web Annotations
(W3C)
EXTERNAL DATASETS
Entity
NERD
Provenance
Ontology for
Provenance
Management
Programme
Brand
Series
Episode
Version Broadcast
ServiceBroadcast Channel
Scene
Shot
MediaFragment
Face
Part
1.b
Available @ http://data.linkedtv.eu/ontologies/core/
2016/03/04
Semantically Capturing and Representing News Stories on the Web 17
Part
1.b
Locator
MediaResource
MediaFragmentAnnotation
Entity
URL (hyperlink)
Type
OffsetBasedString
Multimedia Model: LinkedTV Model
2016/03/04
Multimedia Model: TV2RDF Service
Semantically Capturing and Representing News Stories on the Web 18
Part
1.b
Content Publisher
RDF
Conversion + NERD
TV2RDF
AnalysisMetadata
RDF
Triplestore
Available @ http://linkedtv.eurecom.fr/tv2rdf/
2016/03/04
Exploiting
Knowledge
Semantically Capturing and Representing News Stories on the Web 19
  Leverage on the Model & Annotations for
advanced mining tasks
  Probe the value of multimodal approach:
Evaluation on standard corpora
1.c
2016/03/04
Semantically Capturing and Representing News Stories on the Web 20
Part
1.c
Exploitation: Enriching
oa:Annotation
rbbaktuell_20120809
nerd:Location
Berlin
Illustrate seed video [Milicic_WWW'13]
2016/03/04
Exploitation: Enriching Services & Prototypes
Semantically Capturing and Representing News Stories on the Web 21
Part
1.c
Name URL Published @
MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12]
MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13]
Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13]
TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14]
TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14]
2016/03/04
Exploitation: Classifying videos
Semantically Capturing and Representing News Stories on the Web 22
Part
1.c
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.fun channel
0
17
85 85
96 106 114
78
117
140
188
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2.tech channel
0
410
453
402 396 404
353 364 344 374
571
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3.sport channel
0
192
298 301 288 291 302
260 270
361
231
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.news channel
0
527
481 488 469
412 412 434 419
487
792
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5.creation channel
0
259 272
245
186
149
177 165 165
143
205
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
6.lifestyle channel
0
1128
786
563 525 475 519 465 501 467
1567
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7.shortfilms channel
0
169216431567156714971234121410991025
4268
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
8.music channel
0
204
222
186
129
166
131
148 137 125
169
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
9.other channel
0
423
495
451
401 404
356 354 368 338
689
Thing Amount Animal Event Function Loc Organization Person Product Time
x−Axis: The temporal positions of NEs
y−Axis: The number of NEs
[Li_LIME'13]Dailymotion Dataset, 805 videos, 46.58% Accuracy0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4.news channel
0
527
481 488 469
412 412 434 419
487
792
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
5.creation channel
0
259 272
245
186
149
177 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
7.shortfilms channel
0
169216431567156714971234121410991025
4268
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
8.music channel
0
204
222
186
129
166
131
14
Thing Amount Animal Event
x−Axis: The temporal positions of NEs
y−Axis: The number of NEs
Temporal
distribution of
entity types
2016/03/04
Exploitation: Promoting Media Fragments
Semantically Capturing and Representing News Stories on the Web 23
Part
1.c
Available @ http://linkedtv.eurecom.fr/HyperTED
[Redondo_ISWC’14]
2016/03/04
Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 24
Part
1.c
~ 1697h of BBC video data, 2323 videos
 Different TV shows
(news, sports, politics…)
from 2012
 Subtitles and ASR
(English)
 Output of some visual
algorithms: shot and face
detection
Anchor
Search Task Hyperlinking Task
Query
T/V
v1 v2 v3 vn v1 v2 v3 vn
va
2016/03/04
Evaluation: Multimodal @ Mediaeval 2013
Semantically Capturing and Representing News Stories on the Web 25
Part
1.c
Annotations Processing Time Type
Visual Concept Detection (151) 20 days on 100 cores Visual **
Scene Segmentation 2 days on 6 cores Visual
OCR 1 day on 10 cores Visual
Keywords Extraction 5 hours Textual **
Named Entities Extraction 4 days Textual
Face detection and Tracking 4 days on 160 cores Visual
 Data Indexing:
◉  Lucene & Solr
◉  Granularities: Shot, Scenes, Sliding Windows…
◉  Multimodality
 Query Formulation:
◉  Search: Text + Visual Cues + Visual Concept
Mapping, MLSCOM
◉  Hyperlink: Subtitles, Keywords, LSCOM
concepts (MoreLikeThis)
Approach:
2016/03/04
0.19 MRR
(Mean R. Rank)
Evaluation: Mediaeval 2013 Results
Semantically Capturing and Representing News Stories on the Web 26
Part
1.c
Search Task
Hyperlinking Task
[Sahuguet_MediaEval’13]
0,72 P10
2016/03/04
Evaluation: Mediaeval 2014 Results
Semantically Capturing and Representing News Stories on the Web 27
Part
1.c
Search Task
[Hoang_MediaEval’14]
Hyperlinking Task
  Changes in 2014 edition:
◉  New Dataset from BBC: 2686 hours and 3520 videos
◉  No Visual Cues on Search Queries
◉  New Approach: 22% MAP improvement in 2013 Dataset
0.71 P10
0.67 P10
2016/03/04
“
Narrowing down…
From Multimedia Content to
News Items
Semantically Capturing and Representing News Stories on the Web 282016/03/04
Part 2
Semantically Contextualizing
News Stories
Semantically Capturing and Representing News Stories on the Web 29
2
Q.3
2016/03/04
The Use Case: Contextualizing News
Semantically Capturing and Representing News Stories on the Web 30
Wolfgang Schäuble
Finance Minister Ruling Party in Ger.
Christian Democratic
Union
Part
2
2016/03/04
Semantic News Annotation
  N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named
entity disambiguation in the news domain.
  S. Chabra. Entity-centric summarization: Generating text summaries for graph
snippets.
  A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D.
Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao.
Contextual insights
  N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.
  N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past:
Supporting interpretations of forgotten stories by time-aware re-contextualization.
  N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator:
Automatically contextualizing news articles.
  T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes.
Automatic selection of social media responses to news.
Semantically Capturing and Representing News Stories on the Web 31
State of the Art & Related Work
Part
2
Graph
Named Entities
in News
Contextualizing
News
Relevancy of
Entities
2016/03/04
Semantic
Snapshot of
News (NSS)
Semantically Capturing and Representing News Stories on the Web 32
  Definition and Motivation
  A Gold Standard of News Entities
2.a
2016/03/04
Semantically Capturing and Representing News Stories on the Web 33
Going deep
down…
It is always challenging
What is on top:
Entities explicitly appearing
in the documents
Laura Poitras
Anatoly Kucherena
Edward Snowden
Part
2.a
The News Semantic Snapshot (NSS)
2016/03/04
The News Semantic Snapshot (NSS)
Semantically Capturing and Representing News Stories on the Web 34
Part
2.a
News Semantic Snapshot
(NSS)[Redondo_ICWE’15]
2016/03/04
The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 35
Part
2.a
 High Level of detail, significant human Intervention: (Experts
in the news domain + users)
 Entities in 5 Dimensions: (Visual & Text)
(1) Video Subtitles
(2) Image in the video
(4) Suggestions of an expert
(5) Related articles
USER SURVEY
“We don't have any extradition treaty with Russia.
Broadly speaking our policy remains the same: that
we'd like him returned
(3) Text in the video
image
(2)
(3)
(1)
[Romero_TVX’14]
2016/03/04
The News Semantic Snapshot: Gold Standard
Semantically Capturing and Representing News Stories on the Web 36
Part
2.a
Play with the data and help us to extend it at:
https://github.com/jluisred/
NewsConceptExpansion/wiki/Golden-Standard-
Creation
25
2016/03/04
Automatically
Generating
the NSS
Semantically Capturing and Representing News Stories on the Web 37
2.b
  The Selection problem
  Approaches: frequency-based, multidimensional, concentric
  Experiments and Results
2016/03/04
b) Expanded
Entities
a) Entities from Seed Document DS
Generating the NSS: General Method
Semantically Capturing and Representing News Stories on the Web 38
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
2016/03/04
b) Expanded
Entities
a) Entities from Seed Document DS
Generating the NSS: Entity Expansion
Semantically Capturing and Representing News Stories on the Web 39
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
2016/03/04
Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 40
Part
2.b
Query:
-  Title
-  5 W’s over Subtitles Entities
Web sites to be crawled:
-  Google
-  L1 : A set of 10 internationals
English speaking newspapers
-  L2 : A set of 3 international
newspapers used in GS
Temporal Window:
-  1W:
-  2W:
Annotation filtering
-  Schema.org
[Redondo_ICWE’15]
Parameters:
2016/03/04
b) Expanded
Entities
a) Entities DS
Generating the NSS: Expansion’s Settings
Semantically Capturing and Representing News Stories on the Web 41
[Redondo_SNOW’14]
(2)
c) News Semantic Snapshot
Part
2.b
Recall (E. Expansion) =
0.91
Recall (NER on Subtitles) =
0.42
2016/03/04
b) Expanded
Entities
a) Entities DS
Generating the NSS: Selection
Semantically Capturing and Representing News Stories on the Web 42
(2)
c) News Semantic Snapshot
Part
2.b
[Redondo_SNOW’14]
2016/03/04
Generating the NSS: The Selection problem
Semantically Capturing and Representing News Stories on the Web 43
Part
2.b
(NSS)
0
N
FIdeal(ei)
(NSS)
FX(ei)
=?Expansion
2016/03/04
Generating the NSS: Measures
Semantically Capturing and Representing News Stories on the Web 44
Part
2.b
1  Precision / Recall @ N
-  Popular
-  Easy to interpret
2  Mean Normalized Discounted Cumulative Gain
(MNDCG) @ N:
-  Considers ranking
-  Relevant documents at the top positions
3  Compactness for Recall R:
-  Compromise between: Recall and NSS size
2016/03/04
Generating the NSS: Compactness Example
Semantically Capturing and Representing News Stories on the Web 45
Part
2.b
Recall: 22/33 = 0.66
Sa = 27
Sb = 33
Sc = 54
Sa = 27
Sb = 33
Sc= 54
(NSS)
A B CA
B
C
> >
2016/03/04
Generating the NSS: The Approaches
Semantically Capturing and Representing News Stories on the Web 46
Part
2.b
1  Frequency-Based Ranking
-  Leverages on biggest sample provided by expansion
-  Prioritizes representativeness
2  Multidimensional Entity Relevance
Ranking
-  Relevancy of entities is ground on different dimensions
3  Concentric Based Approach
-  Core / Crust model
-  Alleviates the problem of dealing with many dimensions
[Redondo_SNOW’14]
[Redondo_ICWE’15]
[Redondo_KCAP’15A]
2016/03/04
Generating the NSS: (1) Frequency-Based
Semantically Capturing and Representing News Stories on the Web 47
Part
2.b
[Redondo_SNOW’14]
A
2016/03/04
Generating the NSS: (2) Multidimensional
Semantically Capturing and Representing News Stories on the Web 48
Part
2.b
[Redondo_ICWE2015]
2016/03/04
Semantically Capturing and Representing News Stories on the Web 49
Part
2.b
POPULARITY (FPOP) EXPERT RULES (FEXP)
49
-  Based on Google Trends
-  w = 2 months
-  µ + 2*σ (2.5%)
Example:
-  [ Location, = 0.43]
-  [ Person, = 0.78]
-  [ Organization, = 0.95 ]
-  [ < 2 , = 0.0 ]
Generating the NSS: (2) Multidimensional
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 50
Part
2.b
20 x 4 x 4 =
320 formulas
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 51
Part
2.b
  News Entity Expansion & Dimensions ! Generate NSS
  Frequency-based score: 0.473 MNDCG @ 10
  Best score: 0.698 MNDCG @ 10
•  Collection:
•  CSE (Google + 2W + Schema.org)
•  Ranking:
•  Expert Rules
•  Popularity
Multidimensional Nature of the NSS
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 52
Part
2.b
(NSS)
FREQ
0
(NSS)
F(Laura Poitras) = 2
F(Glenn Greenwald) = 1
2016/03/04
Experiment 1: Frequency VS Multidimensional
Semantically Capturing and Representing News Stories on the Web 53
Part
2.b
(NSS)
(Expansion)
FREQ POP EXP
+ + =
(NSS)
2016/03/04
Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 54
Part
2.b
1.  Exploit Google relevance (+1.80%)
2.  Promote subtitle entities (+2.50%)
3.  Exploit named entity extractor’s
confidence (+0.20%)
4.  Interpret popularity dimension (+1.40%)
5.  Performing clustering before filtering
(-0.60%)
- NO SIGNIFICANT IMPROVEMENT -
NMDCG @ 10:
2016/03/04
Experiment 2: Multidimensional ++
Semantically Capturing and Representing News Stories on the Web 55
Part
2.b
Tune
Function XFREQ POP EXP Re-ShuffleOriginal
(NSS)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 56
Part
2.b
MNDCG:
•  Too focused on success at first positions (decay Function)
•  NSS intends to be flexible, ranking is application-dependent
COMPACTNESS:
•  Prioritizes coverage over ranking while minimizing NSS size
Re-thinking the problem: measures
2016/03/04
Semantically Capturing and Representing News Stories on the Web 57
Part
2.b
Duality in news entity spectrum:
•  Representative entities:
•  Driving the plot of the story
•  Relevant entities
•  Related to former via specific reasons
•  Exploit the entity semantic relations
Suggested by Expert?
Informative?
Unexpected?
Interesting?
Explicative?
Re-thinking the problem: dimensions
2016/03/04
Semantically Capturing and Representing News Stories on the Web 58
Part
2.b
Generating the NSS: (3) Concentric Approach
 Core
•  Representative entities
•  Spottable via frequency
dimensions
•  High degree of
cohesiveness
 Crust
•  Attached to the Core via
semantic relations
•  Agnostic to relevancy
nature:
informativeness,
interestingness, etc.
[Redondo_KCAP2015A]
2016/03/04
Semantically Capturing and Representing News Stories on the Web 59
Part
2.b
Generating the NSS: (3) Core Creation
a) Spot representative entities:
Frequency Dimension
(NSS)
b) Cohesiveness (DBpedia)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 60
Part
2.b
Generating the NSS: (3) Crust Creation
The number of Web
documents talking
simultaneously about a
particular entity e and the
Core: ?
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 61
Part
2.b
1.  Entity Frequency
○  Core1: Jaro-Winkler > 0.9
○  Core2: Frequency based on Exact String matching
2.  Cohesiveness:
○  Everything is Connected Engine, Skb(e1, e2) > 0.125
Everything is Connected
Engine:
https://github.com/mmlab/eice
Concentric Core:
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 62
Part
2.b
1.  Candidates for CRUST generation:
○  Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP
○  Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP
2.  Function for attaching entities to CORE:
○  SWEB(ei, Core) over Google CSE, default configuration
Concentric Crust:
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 63
Part
2.b
Combining CORE and CRUST:
Core+CrustCrustOnly
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 64
Part
2.b
36.9% more compact than Multidimensional
(NSS’s size decrease)
IdealGT: size of SSN according to Gold Standard
(2*2*2 + 2) Runs
2016/03/04
Experiment 3: Multidimensional VS Concentric
Semantically Capturing and Representing News Stories on the Web 65
Part
2.b
NSS
Gold
Standard
Fukushima Disaster 2013
2016/03/04
n=22
Multidimensional
Concentric
Semantically Capturing and Representing News Stories on the Web 66
Part
2.b
Experiment 3: Multidimensional VS Concentric
2016/03/04
Semantically Capturing and Representing News Stories on the Web 67
Part
2.b
NSS: Suitable model for news applications ?
2016/03/04
Consuming
the Concentric
NSS
Semantically Capturing and Representing News Stories on the Web 68
2.c
  News consumption phases
  The NSS for feeding news prototypes
2016/03/04
Semantically Capturing and Representing News Stories on the Web 69
Part
2.c
NSS Consumption: News Prototypes
… short
summaries,
previews,
hotspots …
… advanced
graphs and
diagrams,
timelines, in-
depth summaries
…
… second screen
apps, slideshows,
info-boxes …
2016/03/04
Semantically Capturing and Representing News Stories on the Web 70
Part
2.c
NSS Consumption: Consumptions Phases
The Before The During The After
2016/03/04
Semantically Capturing and Representing News Stories on the Web 71
Part
2.c
NSS Consumption: Phases VS Layers
[Redondo_KCAP’15B]
2016/03/04
Conclusions
& Future Work
Semantically Capturing and Representing News Stories on the Web 72
  Publications
  References
2016/03/04
Semantically Capturing and Representing News Stories on the Web 73
Conclusions
a.  Applied NER and NED as semantic annotation techniques in the
multimedia domain
b.  Developed other techniques such as Named Entity Expansion or
Visual Concept Mapping
c.  LinkedTV model to harmonize annotations into the Linked Data Web
Q1: How can multimedia content be semantically
annotated and seamlessly connected with
other resources on the Web?
Q2: Can those semantic annotations and linked
media resources bring value for the exploitation
and consumption of multimedia content?
a.  Exploiting multimedia semantic techniques: enriching, highlighting
media fragments (hotspots), classifying videos…
b.  Evaluation of multimodal approaches via Mediaeval 2013/2014
2016/03/04
Semantically Capturing and Representing News Stories on the Web 74
Conclusions
a.  Proposed the NSS model and a Gold Standard
b.  The multidimensional nature of the entity relevance
•  Gaussian function, popularity, experts rules…
c.  Concentric model better reproduces the NSS:
•  Better Compactness: 36.9% over BAS01 (similar recall, smaller size)
•  Core/Crust brings up relevant entities without having to deal with
fuzzy dimensions
d.  NSS better supports the news consumption phases:
(Before, During, After)
Q3: Is it possible to automatically contextualize news
stories with background information so they can
be effectively interpreted by humans and
machines?
2016/03/04
Semantically Capturing and Representing News Stories on the Web 75
Future Work
•  [S] Publish generated NSS on the Web (Linked Data)
•  [S] Extend the Gold Standard:
•  From 5 to 23 videos, concentric based model for candidate selection
•  Submission to TOIS
•  [S] Not depending on “big players” for retrieving
knowledge during the expansion phase
(Terrier VS Google experiments)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 76
Future Work
•  [M] Using the power of crowdsourcing in Gold Standard
creation
•  Increase size of the Gold Standard without involving
experts
•  Consider different levels of entity relevancy
•  [M] Supervised techniques: Learn to Rank
•  Features in entities: surface forms, URL’s, types…
•  Features in documents, sources, and other provenance
information
2016/03/04
Semantically Capturing and Representing News Stories on the Web 77
Future Work
•  [L] Spot not only the strength of the relationships
between Crust and the Core, but also the predicates
Editor in WikiLeaks
Generating
Explanations
analyzing documents
considered in Sweb
2016/03/04
Semantically Capturing and Representing News Stories on the Web 78
Future Work
•  [L] Not having to rely on “Big Players” during Crust
generation:
•  Continuous indexing
•  Better curated white lists
•  Fresher structured databases: DBpedia events
•  [L] Reusing concentric model in context-related tasks:
•  Name Entity Extraction/Disambiguation
"  As another feature similar to BagOfWords, Word2vec…
•  Exploratory Searches
"  Diversity, serendipity…
++
[Steiner_ICWE’15]
2016/03/04
José Luis Redondo García
http://jluisred.github.io
@peputo
http://github.com/jluisred
“my small dent in
the vast ocean of
knowledge…”
Ph.D.
questions?
Semantically Capturing and Representing News Stories on the Web 80
Publications
Journals
•  Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the
Management of Information about Television Content. International Journal of Semantic
Computing, 6(01), 111-130, 2012.
Conferences
•  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling
a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15),
Palisades, NY, USA.
•  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News
Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th
International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA.
Best Paper Award
•  Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating
Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference
on Web Engineering (ICWE'15), Rotterdam, the Netherlands.
•  Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R.
(2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social
Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia
(SAM'12), Nara, Japan
Journals (2), Conferences (6), Workshops(5), Demo/Poster(7)
2016/03/04
Semantically Capturing and Representing News Stories on the Web 81
References
[Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways
[Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots
[Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion
[Redondo_ISWC’14] Finding and sharing hot spots in Web Videos
[Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion
[Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show
[LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations
[Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts
[Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task
[Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web
[Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification
[Milicic_WWW'13] Live Topic Generation from Event Streams
[Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election
[Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task
[Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social
Networks
2016/03/04

Weitere ähnliche Inhalte

Ähnlich wie Semantically Capturing and Representing News Stories on the Web

A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)Raphael Troncy
 
T3camp mallorca semantic_web
T3camp mallorca semantic_webT3camp mallorca semantic_web
T3camp mallorca semantic_webAndré Wuttig
 
Video Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataTrieu Nguyen
 
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...LinkedTV
 
IRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News BroadcastIRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News BroadcastIRJET Journal
 
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...Luis Lopez
 
IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013Stuart Myles
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET Journal
 
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...Christine Tobias
 
Implementing the Auphonic Web Application Programming Interface
Implementing the Auphonic Web Application Programming InterfaceImplementing the Auphonic Web Application Programming Interface
Implementing the Auphonic Web Application Programming InterfaceEducational Technology
 
Exploring the Use of Linked Data to Bridge State and Federal Archives
Exploring the Use of Linked Data to Bridge State and Federal ArchivesExploring the Use of Linked Data to Bridge State and Federal Archives
Exploring the Use of Linked Data to Bridge State and Federal ArchivesJon Voss
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of DataJohn Domingue
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 
Web20 Intro Naj Shaik
Web20 Intro Naj ShaikWeb20 Intro Naj Shaik
Web20 Intro Naj ShaikKaren Vignare
 
Building a community with social media and web 2.0 a cisco product launch c...
Building a community with social media and web 2.0   a cisco product launch c...Building a community with social media and web 2.0   a cisco product launch c...
Building a community with social media and web 2.0 a cisco product launch c...Việt Long Plaza
 
Resources (Links) for 2016
Resources (Links) for 2016Resources (Links) for 2016
Resources (Links) for 2016Andrew Newman
 

Ähnlich wie Semantically Capturing and Representing News Stories on the Web (20)

A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)
 
T3camp mallorca semantic_web
T3camp mallorca semantic_webT3camp mallorca semantic_web
T3camp mallorca semantic_web
 
Video Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big dataVideo Ecosystem and some ideas about video big data
Video Ecosystem and some ideas about video big data
 
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...
 
RaymondResume2015v5
RaymondResume2015v5RaymondResume2015v5
RaymondResume2015v5
 
IRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News BroadcastIRJET- Multimedia Summarization and Retrieval of News Broadcast
IRJET- Multimedia Summarization and Retrieval of News Broadcast
 
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
 
IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013
 
Guru_poster
Guru_posterGuru_poster
Guru_poster
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
 
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...
Tech Tools for Reference: Enhancing the Research Experience in the Health Sci...
 
On Annotation of Video Content for Multimedia Retrieval and Sharing
On Annotation of Video Content for Multimedia  Retrieval and SharingOn Annotation of Video Content for Multimedia  Retrieval and Sharing
On Annotation of Video Content for Multimedia Retrieval and Sharing
 
Implementing the Auphonic Web Application Programming Interface
Implementing the Auphonic Web Application Programming InterfaceImplementing the Auphonic Web Application Programming Interface
Implementing the Auphonic Web Application Programming Interface
 
Webware Webinar
Webware WebinarWebware Webinar
Webware Webinar
 
Exploring the Use of Linked Data to Bridge State and Federal Archives
Exploring the Use of Linked Data to Bridge State and Federal ArchivesExploring the Use of Linked Data to Bridge State and Federal Archives
Exploring the Use of Linked Data to Bridge State and Federal Archives
 
Linked services for the Web of Data
Linked services for the Web of DataLinked services for the Web of Data
Linked services for the Web of Data
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
Web20 Intro Naj Shaik
Web20 Intro Naj ShaikWeb20 Intro Naj Shaik
Web20 Intro Naj Shaik
 
Building a community with social media and web 2.0 a cisco product launch c...
Building a community with social media and web 2.0   a cisco product launch c...Building a community with social media and web 2.0   a cisco product launch c...
Building a community with social media and web 2.0 a cisco product launch c...
 
Resources (Links) for 2016
Resources (Links) for 2016Resources (Links) for 2016
Resources (Links) for 2016
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Kürzlich hochgeladen (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Semantically Capturing and Representing News Stories on the Web

  • 1. Semantically Capturing and Representing News Stories on the Web José%Luis%Redondo%García Jluisred.github.io @peputo
  • 2. Outline  Semantic Annotation of News’ Context Original artwork by Matt Might http://matt.might.net/articles/phd-school-in-pictures/   TOWARDS A SEMANTIC MULTIMEDIA WEB i.  Media annotation ii.  A multimedia model iii.  Semantic media exploitation   CONTEXTUALIZING NEWS STORIES i.  The News Semantic Snapshot (NSS) ii.  The multidimensional nature of the entity relevance iii.  A concentric model for NSS generation iv.  NSS in the consumption of News Future CareerPHDPrevious 1 2
  • 3. Outline Semantically Capturing and Representing News Stories on the Web 3  Part II: Semantic Annotation of News’ Context Multidimensional Relevancy NSS Generation Concentric Model NSS Gold Standard News Prototypes 2016/03/04
  • 4. The Use Case: Contextualizing News Semantically Capturing and Representing News Stories on the Web 4 http://www.bbc.com/news/world-europe-23339199#t=34.1,39.8 (Media Fragment URI 1.0) Edward Snowden (NE over Subtitles) Sarah Harrison WikiLeaks Editor Airport in Moscow Sheremetyevo 2016/03/04
  • 5. Semantically Capturing and Representing News Stories on the Web 5 The Use Case: Contextualizing News 2016/03/04
  • 6. Semantically Capturing and Representing News Stories on the Web 6 Research Questions  Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web?  Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content?  Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines? 2016/03/04
  • 7. Part 1 Towards a Semantic Multimedia Web Semantically Capturing and Representing News Stories on the Web 7 1 Q.1, Q.2 2016/03/04
  • 8. “ Bringing Multimedia to the Web Why? Semantically Capturing and Representing News Stories on the Web 8   Make video a first citizen of the Web   Make video universally accessible and shareable at different granularities (segments)   Benefit from the vast knowledge already present on the Web 2016/03/04
  • 9. Semantic Annotation   Alfonseca, E. and Manandhar. An unsupervised method for General Named Entity Recognition and Automated Concept Discovery   Mendes, P., Jakob, M. and Garcia-Silva, A and Bizer, C. DBpedia spotlight: shedding light on the web of documents   Shinyama, Y. and Sekine, S. Named entity discovery using comparable news articles   Chang, S-F, Manmatha, R and Chua, T-S. Combining text and audio-visual features in video indexing   Wang, Richard C. and Cohen, William W. Iterative Set Expansion of Named Entities Using the Web   Talukdar, P-P., Brants, T., Liberman, M. and Pereira, F. A. Context Pattern Induction Method for Named Entity Extraction Multimedia Modeling   MPEG-7 http://mpeg.chiariglione.org/ standards/mpeg-7/mpeg-7.htm   TV-Anytime http://tech.ebu.ch/tvanytime   Synchronized Multimedia Integration Language https://www.w3.org/TR/REC-smil/   Media Fragment URI 1.0 specification (W3C) http://www.w3.org/TR/media-frags ◉  Synote: http://linkeddata.synote.org ◉  Ninsuna: http://ninsuna.elis.ugent.be/   BBC Programmes Ontology http:// www.bbc.co.uk/ontologies/programmes/ 2009-09-07.shtml   Schema.org (SchemaDotOrgTV) http:// www.w3.org/wiki/WebSchemas/   Ontology for Media Resources https:// www.w3.org/TR/mediaont-10/   Web Annotation https://www.w3.org/TR/ annotation-model/ Semantically Capturing and Representing News Stories on the Web 9 State of the Art & Related Work Part 1 Named Entity Multimodal Expansion  2016/03/04
  • 10. Multimedia Annotations Semantically Capturing and Representing News Stories on the Web 10   Automatic annotation: 300 hours/min YouTube video   What is inside the video? multimodal approach   Semantic annotations, leveraging on Web Resources: more human-like operations 1.a 2016/03/04
  • 11. 1 ontology http://nerd.eurecom.fr/ontology 2 API http://nerd.eurecom.fr/api/application.wadl 3 UI http://nerd.eurecom.fr Multimedia Annotation: Named Entity Recognition Semantically Capturing and Representing News Stories on the Web 11 nerd:Product S-Bahn nerd:Person Obama nerd:Person Michelle nerd:Location Berlin http://data.linkedtv.eu/media/e2899e7f#t=840,900 Part 1.a https://github.com/giusepperizzo/nerdml ML [Rizzo_LREC’14] 2016/03/04
  • 12. Other documents similar to DS b) Expanded Entities a) Entities from Seed Document DS Multimedia Annotation: Named Entity Expansion Semantically Capturing and Representing News Stories on the Web 12 [Redondo_SNOW’14] Part 1.a 2016/03/04
  • 13. Multimedia Annotation: Expansion Pipeline Semantically Capturing and Representing News Stories on the Web 13 [Redondo_SNOW’14] Part 1.a Available @ http://linkedtv.eurecom.fr/entitycontext/api/ 2016/03/04
  • 14. Multimedia Annotation: Multimodal Approach  Text: ○ Keyword Extraction ○ Topic Recognition ○ From Textual Visual Cues to LSCOM Concepts  Visual: ○ Visual Concept Detection (LSCOM) ○ Shot Segmentation ○ Scene Segmentation ○ Optical Character Recognition (OCR) ○ Automatic Speech Recognition (ASR) ○ Face Detection and Tracking ○ … 14 Multimedia Knowledge Model Part 1.a Semantically Capturing and Representing News Stories on the Web2016/03/04
  • 15. Multimedia Model Semantically Capturing and Representing News Stories on the Web 15   Explicitly represent video and its annotations   At the level of fragments   Based on well-known vocabularies, flexible and extensible while being Linked Data compliant 1.b 2016/03/04
  • 16. Multimedia Model: LinkedTV Model Semantically Capturing and Representing News Stories on the Web 16 Annotation Concept KeywordBBC Ontology + SchemaDotOrgTV ANALYSIS RESULTS (Support for segmentation) Media Fragments URI 1.0 (W3C) LSCOM Ontology for Media Resources (W3C) BROADCAST DATA Web Annotations (W3C) EXTERNAL DATASETS Entity NERD Provenance Ontology for Provenance Management Programme Brand Series Episode Version Broadcast ServiceBroadcast Channel Scene Shot MediaFragment Face Part 1.b Available @ http://data.linkedtv.eu/ontologies/core/ 2016/03/04
  • 17. Semantically Capturing and Representing News Stories on the Web 17 Part 1.b Locator MediaResource MediaFragmentAnnotation Entity URL (hyperlink) Type OffsetBasedString Multimedia Model: LinkedTV Model 2016/03/04
  • 18. Multimedia Model: TV2RDF Service Semantically Capturing and Representing News Stories on the Web 18 Part 1.b Content Publisher RDF Conversion + NERD TV2RDF AnalysisMetadata RDF Triplestore Available @ http://linkedtv.eurecom.fr/tv2rdf/ 2016/03/04
  • 19. Exploiting Knowledge Semantically Capturing and Representing News Stories on the Web 19   Leverage on the Model & Annotations for advanced mining tasks   Probe the value of multimodal approach: Evaluation on standard corpora 1.c 2016/03/04
  • 20. Semantically Capturing and Representing News Stories on the Web 20 Part 1.c Exploitation: Enriching oa:Annotation rbbaktuell_20120809 nerd:Location Berlin Illustrate seed video [Milicic_WWW'13] 2016/03/04
  • 21. Exploitation: Enriching Services & Prototypes Semantically Capturing and Representing News Stories on the Web 21 Part 1.c Name URL Published @ MediaCollector http://linkedtv.eurecom.fr/api/mediacollector/search/ [Rizzo_SAM’12] MediaFinder http://mediafinder.eurecom.fr/ [Milicic_WWW’13] Italian Elections 2013 http://mediafinder.eurecom.fr/story/elezioni2013 [Milicic_ESWC’13] TVEnricher http://linkedtv.eurecom.fr/tvenricher/api/ [LinkedTV_D2.6’14] TVNewsEnricher http://linkedtv.eurecom.fr/newsenricher/api/ [Redondo_ESWC’14] 2016/03/04
  • 22. Exploitation: Classifying videos Semantically Capturing and Representing News Stories on the Web 22 Part 1.c 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.fun channel 0 17 85 85 96 106 114 78 117 140 188 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2.tech channel 0 410 453 402 396 404 353 364 344 374 571 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3.sport channel 0 192 298 301 288 291 302 260 270 361 231 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4.news channel 0 527 481 488 469 412 412 434 419 487 792 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5.creation channel 0 259 272 245 186 149 177 165 165 143 205 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 6.lifestyle channel 0 1128 786 563 525 475 519 465 501 467 1567 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7.shortfilms channel 0 169216431567156714971234121410991025 4268 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 8.music channel 0 204 222 186 129 166 131 148 137 125 169 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 9.other channel 0 423 495 451 401 404 356 354 368 338 689 Thing Amount Animal Event Function Loc Organization Person Product Time x−Axis: The temporal positions of NEs y−Axis: The number of NEs [Li_LIME'13]Dailymotion Dataset, 805 videos, 46.58% Accuracy0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4.news channel 0 527 481 488 469 412 412 434 419 487 792 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5.creation channel 0 259 272 245 186 149 177 16 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7.shortfilms channel 0 169216431567156714971234121410991025 4268 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 8.music channel 0 204 222 186 129 166 131 14 Thing Amount Animal Event x−Axis: The temporal positions of NEs y−Axis: The number of NEs Temporal distribution of entity types 2016/03/04
  • 23. Exploitation: Promoting Media Fragments Semantically Capturing and Representing News Stories on the Web 23 Part 1.c Available @ http://linkedtv.eurecom.fr/HyperTED [Redondo_ISWC’14] 2016/03/04
  • 24. Evaluation: Multimodal @ Mediaeval 2013 Semantically Capturing and Representing News Stories on the Web 24 Part 1.c ~ 1697h of BBC video data, 2323 videos  Different TV shows (news, sports, politics…) from 2012  Subtitles and ASR (English)  Output of some visual algorithms: shot and face detection Anchor Search Task Hyperlinking Task Query T/V v1 v2 v3 vn v1 v2 v3 vn va 2016/03/04
  • 25. Evaluation: Multimodal @ Mediaeval 2013 Semantically Capturing and Representing News Stories on the Web 25 Part 1.c Annotations Processing Time Type Visual Concept Detection (151) 20 days on 100 cores Visual ** Scene Segmentation 2 days on 6 cores Visual OCR 1 day on 10 cores Visual Keywords Extraction 5 hours Textual ** Named Entities Extraction 4 days Textual Face detection and Tracking 4 days on 160 cores Visual  Data Indexing: ◉  Lucene & Solr ◉  Granularities: Shot, Scenes, Sliding Windows… ◉  Multimodality  Query Formulation: ◉  Search: Text + Visual Cues + Visual Concept Mapping, MLSCOM ◉  Hyperlink: Subtitles, Keywords, LSCOM concepts (MoreLikeThis) Approach: 2016/03/04
  • 26. 0.19 MRR (Mean R. Rank) Evaluation: Mediaeval 2013 Results Semantically Capturing and Representing News Stories on the Web 26 Part 1.c Search Task Hyperlinking Task [Sahuguet_MediaEval’13] 0,72 P10 2016/03/04
  • 27. Evaluation: Mediaeval 2014 Results Semantically Capturing and Representing News Stories on the Web 27 Part 1.c Search Task [Hoang_MediaEval’14] Hyperlinking Task   Changes in 2014 edition: ◉  New Dataset from BBC: 2686 hours and 3520 videos ◉  No Visual Cues on Search Queries ◉  New Approach: 22% MAP improvement in 2013 Dataset 0.71 P10 0.67 P10 2016/03/04
  • 28. “ Narrowing down… From Multimedia Content to News Items Semantically Capturing and Representing News Stories on the Web 282016/03/04
  • 29. Part 2 Semantically Contextualizing News Stories Semantically Capturing and Representing News Stories on the Web 29 2 Q.3 2016/03/04
  • 30. The Use Case: Contextualizing News Semantically Capturing and Representing News Stories on the Web 30 Wolfgang Schäuble Finance Minister Ruling Party in Ger. Christian Democratic Union Part 2 2016/03/04
  • 31. Semantic News Annotation   N. Fernandez, J. A. Fisteus, L. Sanchez, and G. Lopez. Identityrank: Named entity disambiguation in the news domain.   S. Chabra. Entity-centric summarization: Generating text summaries for graph snippets.   A. Fuxman, P. Pantel, Y. Lv, A. Chandra, P. Chilakamarri, M. Gamon, D. Hamilton, B. Kohlmeier, D. Narayanan, E. Papalexakis, and B. Zhao. Contextual insights   N. Kanhabua, R. Blanco, and M. Matthews. Ranking related news predictions.   N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Back to the past: Supporting interpretations of forgotten stories by time-aware re-contextualization.   N. K. Tran, A. Ceroni, N. Kanhabua, and C. Niederee. Time-travel translator: Automatically contextualizing news articles.   T. Stajner, B. Thomee, A.-M. Popescu, M. Pennacchiotti, and A. Jaimes. Automatic selection of social media responses to news. Semantically Capturing and Representing News Stories on the Web 31 State of the Art & Related Work Part 2 Graph Named Entities in News Contextualizing News Relevancy of Entities 2016/03/04
  • 32. Semantic Snapshot of News (NSS) Semantically Capturing and Representing News Stories on the Web 32   Definition and Motivation   A Gold Standard of News Entities 2.a 2016/03/04
  • 33. Semantically Capturing and Representing News Stories on the Web 33 Going deep down… It is always challenging What is on top: Entities explicitly appearing in the documents Laura Poitras Anatoly Kucherena Edward Snowden Part 2.a The News Semantic Snapshot (NSS) 2016/03/04
  • 34. The News Semantic Snapshot (NSS) Semantically Capturing and Representing News Stories on the Web 34 Part 2.a News Semantic Snapshot (NSS)[Redondo_ICWE’15] 2016/03/04
  • 35. The News Semantic Snapshot: Gold Standard Semantically Capturing and Representing News Stories on the Web 35 Part 2.a  High Level of detail, significant human Intervention: (Experts in the news domain + users)  Entities in 5 Dimensions: (Visual & Text) (1) Video Subtitles (2) Image in the video (4) Suggestions of an expert (5) Related articles USER SURVEY “We don't have any extradition treaty with Russia. Broadly speaking our policy remains the same: that we'd like him returned (3) Text in the video image (2) (3) (1) [Romero_TVX’14] 2016/03/04
  • 36. The News Semantic Snapshot: Gold Standard Semantically Capturing and Representing News Stories on the Web 36 Part 2.a Play with the data and help us to extend it at: https://github.com/jluisred/ NewsConceptExpansion/wiki/Golden-Standard- Creation 25 2016/03/04
  • 37. Automatically Generating the NSS Semantically Capturing and Representing News Stories on the Web 37 2.b   The Selection problem   Approaches: frequency-based, multidimensional, concentric   Experiments and Results 2016/03/04
  • 38. b) Expanded Entities a) Entities from Seed Document DS Generating the NSS: General Method Semantically Capturing and Representing News Stories on the Web 38 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b 2016/03/04
  • 39. b) Expanded Entities a) Entities from Seed Document DS Generating the NSS: Entity Expansion Semantically Capturing and Representing News Stories on the Web 39 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b 2016/03/04
  • 40. Generating the NSS: Expansion’s Settings Semantically Capturing and Representing News Stories on the Web 40 Part 2.b Query: -  Title -  5 W’s over Subtitles Entities Web sites to be crawled: -  Google -  L1 : A set of 10 internationals English speaking newspapers -  L2 : A set of 3 international newspapers used in GS Temporal Window: -  1W: -  2W: Annotation filtering -  Schema.org [Redondo_ICWE’15] Parameters: 2016/03/04
  • 41. b) Expanded Entities a) Entities DS Generating the NSS: Expansion’s Settings Semantically Capturing and Representing News Stories on the Web 41 [Redondo_SNOW’14] (2) c) News Semantic Snapshot Part 2.b Recall (E. Expansion) = 0.91 Recall (NER on Subtitles) = 0.42 2016/03/04
  • 42. b) Expanded Entities a) Entities DS Generating the NSS: Selection Semantically Capturing and Representing News Stories on the Web 42 (2) c) News Semantic Snapshot Part 2.b [Redondo_SNOW’14] 2016/03/04
  • 43. Generating the NSS: The Selection problem Semantically Capturing and Representing News Stories on the Web 43 Part 2.b (NSS) 0 N FIdeal(ei) (NSS) FX(ei) =?Expansion 2016/03/04
  • 44. Generating the NSS: Measures Semantically Capturing and Representing News Stories on the Web 44 Part 2.b 1  Precision / Recall @ N -  Popular -  Easy to interpret 2  Mean Normalized Discounted Cumulative Gain (MNDCG) @ N: -  Considers ranking -  Relevant documents at the top positions 3  Compactness for Recall R: -  Compromise between: Recall and NSS size 2016/03/04
  • 45. Generating the NSS: Compactness Example Semantically Capturing and Representing News Stories on the Web 45 Part 2.b Recall: 22/33 = 0.66 Sa = 27 Sb = 33 Sc = 54 Sa = 27 Sb = 33 Sc= 54 (NSS) A B CA B C > > 2016/03/04
  • 46. Generating the NSS: The Approaches Semantically Capturing and Representing News Stories on the Web 46 Part 2.b 1  Frequency-Based Ranking -  Leverages on biggest sample provided by expansion -  Prioritizes representativeness 2  Multidimensional Entity Relevance Ranking -  Relevancy of entities is ground on different dimensions 3  Concentric Based Approach -  Core / Crust model -  Alleviates the problem of dealing with many dimensions [Redondo_SNOW’14] [Redondo_ICWE’15] [Redondo_KCAP’15A] 2016/03/04
  • 47. Generating the NSS: (1) Frequency-Based Semantically Capturing and Representing News Stories on the Web 47 Part 2.b [Redondo_SNOW’14] A 2016/03/04
  • 48. Generating the NSS: (2) Multidimensional Semantically Capturing and Representing News Stories on the Web 48 Part 2.b [Redondo_ICWE2015] 2016/03/04
  • 49. Semantically Capturing and Representing News Stories on the Web 49 Part 2.b POPULARITY (FPOP) EXPERT RULES (FEXP) 49 -  Based on Google Trends -  w = 2 months -  µ + 2*σ (2.5%) Example: -  [ Location, = 0.43] -  [ Person, = 0.78] -  [ Organization, = 0.95 ] -  [ < 2 , = 0.0 ] Generating the NSS: (2) Multidimensional 2016/03/04
  • 50. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 50 Part 2.b 20 x 4 x 4 = 320 formulas 2016/03/04
  • 51. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 51 Part 2.b   News Entity Expansion & Dimensions ! Generate NSS   Frequency-based score: 0.473 MNDCG @ 10   Best score: 0.698 MNDCG @ 10 •  Collection: •  CSE (Google + 2W + Schema.org) •  Ranking: •  Expert Rules •  Popularity Multidimensional Nature of the NSS 2016/03/04
  • 52. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 52 Part 2.b (NSS) FREQ 0 (NSS) F(Laura Poitras) = 2 F(Glenn Greenwald) = 1 2016/03/04
  • 53. Experiment 1: Frequency VS Multidimensional Semantically Capturing and Representing News Stories on the Web 53 Part 2.b (NSS) (Expansion) FREQ POP EXP + + = (NSS) 2016/03/04
  • 54. Experiment 2: Multidimensional ++ Semantically Capturing and Representing News Stories on the Web 54 Part 2.b 1.  Exploit Google relevance (+1.80%) 2.  Promote subtitle entities (+2.50%) 3.  Exploit named entity extractor’s confidence (+0.20%) 4.  Interpret popularity dimension (+1.40%) 5.  Performing clustering before filtering (-0.60%) - NO SIGNIFICANT IMPROVEMENT - NMDCG @ 10: 2016/03/04
  • 55. Experiment 2: Multidimensional ++ Semantically Capturing and Representing News Stories on the Web 55 Part 2.b Tune Function XFREQ POP EXP Re-ShuffleOriginal (NSS) 2016/03/04
  • 56. Semantically Capturing and Representing News Stories on the Web 56 Part 2.b MNDCG: •  Too focused on success at first positions (decay Function) •  NSS intends to be flexible, ranking is application-dependent COMPACTNESS: •  Prioritizes coverage over ranking while minimizing NSS size Re-thinking the problem: measures 2016/03/04
  • 57. Semantically Capturing and Representing News Stories on the Web 57 Part 2.b Duality in news entity spectrum: •  Representative entities: •  Driving the plot of the story •  Relevant entities •  Related to former via specific reasons •  Exploit the entity semantic relations Suggested by Expert? Informative? Unexpected? Interesting? Explicative? Re-thinking the problem: dimensions 2016/03/04
  • 58. Semantically Capturing and Representing News Stories on the Web 58 Part 2.b Generating the NSS: (3) Concentric Approach  Core •  Representative entities •  Spottable via frequency dimensions •  High degree of cohesiveness  Crust •  Attached to the Core via semantic relations •  Agnostic to relevancy nature: informativeness, interestingness, etc. [Redondo_KCAP2015A] 2016/03/04
  • 59. Semantically Capturing and Representing News Stories on the Web 59 Part 2.b Generating the NSS: (3) Core Creation a) Spot representative entities: Frequency Dimension (NSS) b) Cohesiveness (DBpedia) 2016/03/04
  • 60. Semantically Capturing and Representing News Stories on the Web 60 Part 2.b Generating the NSS: (3) Crust Creation The number of Web documents talking simultaneously about a particular entity e and the Core: ? 2016/03/04
  • 61. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 61 Part 2.b 1.  Entity Frequency ○  Core1: Jaro-Winkler > 0.9 ○  Core2: Frequency based on Exact String matching 2.  Cohesiveness: ○  Everything is Connected Engine, Skb(e1, e2) > 0.125 Everything is Connected Engine: https://github.com/mmlab/eice Concentric Core: 2016/03/04
  • 62. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 62 Part 2.b 1.  Candidates for CRUST generation: ○  Ex1: 1° ICWE2015 by R*(50): L2+Google, F3 1W, Gauss+ POP ○  Ex2: 2° ICWE 2015 by R*(50): L2+Google, F3 1W, Freq + POP 2.  Function for attaching entities to CORE: ○  SWEB(ei, Core) over Google CSE, default configuration Concentric Crust: 2016/03/04
  • 63. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 63 Part 2.b Combining CORE and CRUST: Core+CrustCrustOnly 2016/03/04
  • 64. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 64 Part 2.b 36.9% more compact than Multidimensional (NSS’s size decrease) IdealGT: size of SSN according to Gold Standard (2*2*2 + 2) Runs 2016/03/04
  • 65. Experiment 3: Multidimensional VS Concentric Semantically Capturing and Representing News Stories on the Web 65 Part 2.b NSS Gold Standard Fukushima Disaster 2013 2016/03/04 n=22
  • 66. Multidimensional Concentric Semantically Capturing and Representing News Stories on the Web 66 Part 2.b Experiment 3: Multidimensional VS Concentric 2016/03/04
  • 67. Semantically Capturing and Representing News Stories on the Web 67 Part 2.b NSS: Suitable model for news applications ? 2016/03/04
  • 68. Consuming the Concentric NSS Semantically Capturing and Representing News Stories on the Web 68 2.c   News consumption phases   The NSS for feeding news prototypes 2016/03/04
  • 69. Semantically Capturing and Representing News Stories on the Web 69 Part 2.c NSS Consumption: News Prototypes … short summaries, previews, hotspots … … advanced graphs and diagrams, timelines, in- depth summaries … … second screen apps, slideshows, info-boxes … 2016/03/04
  • 70. Semantically Capturing and Representing News Stories on the Web 70 Part 2.c NSS Consumption: Consumptions Phases The Before The During The After 2016/03/04
  • 71. Semantically Capturing and Representing News Stories on the Web 71 Part 2.c NSS Consumption: Phases VS Layers [Redondo_KCAP’15B] 2016/03/04
  • 72. Conclusions & Future Work Semantically Capturing and Representing News Stories on the Web 72   Publications   References 2016/03/04
  • 73. Semantically Capturing and Representing News Stories on the Web 73 Conclusions a.  Applied NER and NED as semantic annotation techniques in the multimedia domain b.  Developed other techniques such as Named Entity Expansion or Visual Concept Mapping c.  LinkedTV model to harmonize annotations into the Linked Data Web Q1: How can multimedia content be semantically annotated and seamlessly connected with other resources on the Web? Q2: Can those semantic annotations and linked media resources bring value for the exploitation and consumption of multimedia content? a.  Exploiting multimedia semantic techniques: enriching, highlighting media fragments (hotspots), classifying videos… b.  Evaluation of multimodal approaches via Mediaeval 2013/2014 2016/03/04
  • 74. Semantically Capturing and Representing News Stories on the Web 74 Conclusions a.  Proposed the NSS model and a Gold Standard b.  The multidimensional nature of the entity relevance •  Gaussian function, popularity, experts rules… c.  Concentric model better reproduces the NSS: •  Better Compactness: 36.9% over BAS01 (similar recall, smaller size) •  Core/Crust brings up relevant entities without having to deal with fuzzy dimensions d.  NSS better supports the news consumption phases: (Before, During, After) Q3: Is it possible to automatically contextualize news stories with background information so they can be effectively interpreted by humans and machines? 2016/03/04
  • 75. Semantically Capturing and Representing News Stories on the Web 75 Future Work •  [S] Publish generated NSS on the Web (Linked Data) •  [S] Extend the Gold Standard: •  From 5 to 23 videos, concentric based model for candidate selection •  Submission to TOIS •  [S] Not depending on “big players” for retrieving knowledge during the expansion phase (Terrier VS Google experiments) 2016/03/04
  • 76. Semantically Capturing and Representing News Stories on the Web 76 Future Work •  [M] Using the power of crowdsourcing in Gold Standard creation •  Increase size of the Gold Standard without involving experts •  Consider different levels of entity relevancy •  [M] Supervised techniques: Learn to Rank •  Features in entities: surface forms, URL’s, types… •  Features in documents, sources, and other provenance information 2016/03/04
  • 77. Semantically Capturing and Representing News Stories on the Web 77 Future Work •  [L] Spot not only the strength of the relationships between Crust and the Core, but also the predicates Editor in WikiLeaks Generating Explanations analyzing documents considered in Sweb 2016/03/04
  • 78. Semantically Capturing and Representing News Stories on the Web 78 Future Work •  [L] Not having to rely on “Big Players” during Crust generation: •  Continuous indexing •  Better curated white lists •  Fresher structured databases: DBpedia events •  [L] Reusing concentric model in context-related tasks: •  Name Entity Extraction/Disambiguation "  As another feature similar to BagOfWords, Word2vec… •  Exploratory Searches "  Diversity, serendipity… ++ [Steiner_ICWE’15] 2016/03/04
  • 79. José Luis Redondo García http://jluisred.github.io @peputo http://github.com/jluisred “my small dent in the vast ocean of knowledge…” Ph.D. questions?
  • 80. Semantically Capturing and Representing News Stories on the Web 80 Publications Journals •  Redondo Garcia J. L and Adolfo Lozano-Tello: OntoTV: an Ontology Based System for the Management of Information about Television Content. International Journal of Semantic Computing, 6(01), 111-130, 2012. Conferences •  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) Capturing News Stories Once, Retelling a Thousand Ways. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. •  Redondo Garcia J. L., Rizzo G., Troncy R. (2015) The Concentric Nature of News Semantic Snapshots: Knowledge Extraction for Semantic Annotation of News Items. In: 8th International Conference on Knowledge Capture (K-CAP'15), Palisades, NY, USA. Best Paper Award •  Redondo Garcia J. L., Rizzo G., Romero L. P., Hildebrand M., Troncy R. (2015) Generating Semantic Snapshots of Newscasts using Entity Expansion. In: 15th International Conference on Web Engineering (ICWE'15), Rotterdam, the Netherlands. •  Rizzo G., Steiner T., Troncy R., Verborgh R., Redondo Garcia J. L. and Van de Walle R. (2012), What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks. In (ACM Multimedia) International Workshop on Socially-Aware Multimedia (SAM'12), Nara, Japan Journals (2), Conferences (6), Workshops(5), Demo/Poster(7) 2016/03/04
  • 81. Semantically Capturing and Representing News Stories on the Web 81 References [Redondo_KCAP’15B] Capturing News Stories Once, Retelling a Thousand Ways [Redondo_KCAP’15A] The Concentric Nature of News Semantic Snapshots [Redondo_ICWE’15] Generating Semantic Snapshots of Newscasts using Entity Expansion [Redondo_ISWC’14] Finding and sharing hot spots in Web Videos [Redondo_ESWC’14] Augmenting TV Newscasts via Entity Expansion [Redondo_SNOW’14] Describing and Contextualizing Events in TV News Show [LinkedTV_D2.6’14] LinkedTV Framework for Generating Video Enrichments with Annotations [Romero_TVX’14] LinkedTV News: A dual mode second screen companion for web-enriched news broadcasts [Hoang_MediaEval’14] LinkedTV at MediaEval 2014 Search and Hyperlinking Task [Rizzo_LREC’14] Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web [Li_LIMe'13] Enriching Media Fragments with Named Entities for Video Classification [Milicic_WWW'13] Live Topic Generation from Event Streams [Milicic_ESWC’13] Tracking and Analyzing The 2013 Italian Election [Sahuguet_MediaEval’13] LinkedTV at MediaEval 2013 Search and Hyperlinking Task [Rizzo_SAM’12] What Fresh Media Are You Looking For? Extracting Media Items from Multiple Social Networks 2016/03/04