SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Global Media Monitoring 
h0p://eventregistry.org/ 
Marko 
Grobelnik 
Jozef 
Stefan 
Ins4tute 
Ljubljana, 
Slovenia 
Contribu4ons 
from 
Gregor 
Leban, 
Blaz 
Fortuna, 
Janez 
Brank, 
Jan 
Rupnik, 
Andrej 
Muhic 
ESWC 
Summer 
School, 
Sep 
2nd 
2014, 
Kalamaki
Outline 
• Introduc4on 
• Collec4ng 
Media 
Data 
• Document 
Enrichment 
• Cross-­‐linguality 
• News 
Repor4ng 
Bias 
• Event 
Representa4on 
• Event 
Tracking 
• Future 
Projects
Introduc=on
What ques=ons we’ll try to answer? 
• Where 
to 
get 
global 
media 
data? 
• What 
is 
extractable 
from 
media 
documents? 
• How 
to 
connect 
informa4on 
across 
languages? 
• What 
is 
an 
event? 
• How 
to 
approach 
diversity 
in 
news 
repor4ng? 
• How 
to 
visualize 
global 
event 
dynamics?
Systems/Demos used within the presenta=on 
• NewsFeed 
(hWp://newsfeed.ijs.si/) 
• News 
and 
social 
media 
crawler 
• Enrycher 
(hWp://enrycher.ijs.si/) 
• Language 
and 
Seman4c 
annota4on 
• XLing 
(hWp://xling.ijs.si/ 
• Cross-­‐lingual 
document 
linking 
and 
categoriza4on 
• DiversiNews 
(hWp://aidemo.ijs.si/diversinews/) 
• News 
Diversity 
Explorer 
• Event 
Registry 
(hWp://eventregistry.org/) 
• Event 
detec4on 
and 
topic 
tracking
The overall goal 
• The 
goal 
is 
to 
establish 
a 
real-­‐4me 
system 
• …to 
collect 
data 
from 
global 
media 
in 
real-­‐4me 
• …to 
iden4fy 
events 
and 
track 
evolving 
topics 
• …to 
assign 
stable 
iden4fiers 
to 
events 
• …to 
iden4fy 
events 
across 
languages 
• …to 
detect 
diversity 
of 
repor4ng 
along 
several 
dimensions 
• …to 
provide 
rich 
exploratory 
visualiza4ons 
• …to 
provide 
interoperable 
data 
export
Main 
stream 
news 
Blogs 
Global Media Monitoring pipeline 
Ar4cle 
seman4c 
annota4on 
Cross-­‐lingual 
ar4cle 
matching 
Cross-­‐lingual 
cluster 
matching 
Event 
forma4on 
Event 
registry 
API 
Interface 
Event 
info. 
extrac4on 
Input 
data 
Pre-­‐processing 
steps 
Event 
construc4on 
Event 
storage 
& 
maintenance 
Extrac4on 
of 
date 
references 
Ar4cle 
clustering 
Iden4fying 
related 
events 
Detec4on 
of 
ar4cle 
duplicates 
GUI/Visualiza4ons 
hWp://EventRegistry.org
Collec=ng Media Data 
hWp://newsfeed.ijs.si/
Where to get references to news publishers? 
• Good 
start 
is 
Wikipedia 
list 
of 
newspapers: 
• hWp://en.wikipedia.org/wiki/Lists_of_newspapers
From a newspaper home-­‐page to an ar=cle 
hWp://www.ny4mes.com/ 
HTML 
RSS 
Feed 
(list 
of 
ar4cles) 
Ar4cle 
to 
be 
retreived
Collec=ng global media data 
• Data 
collec4on 
service 
News-­‐Feed 
• hWp://newsfeed.ijs.si/ 
• …crawling 
global 
main-­‐stream 
and 
social 
media 
• Monitoring 
• ~60k 
main-­‐stream 
publishers 
(RSS 
feeds+special 
feeds) 
• ~250k 
most 
influen4al 
blogs 
(RSS 
feeds) 
• free 
TwiWer 
feed 
• Data 
volume: 
~350k 
ar4cles 
& 
blogs 
per 
day 
(+5M 
tweets) 
• Languages: 
eng 
(50%), 
ger 
(10%), 
spa 
(8%), 
fra 
(5%)
Downloading the news stream (1/2) 
• The 
stream 
is 
accessible 
at 
hWp://newsfeed.ijs.si/stream/ 
• To 
download 
the 
whole 
stream 
con4nuously, 
you 
can 
use 
the 
python 
script 
(hWp://newsfeed.ijs.si/hWp2fs.py) 
• The 
script 
does 
the 
following:
Downloading the news 
stream (2/2) 
• News 
Stream 
Contents 
and 
Format 
• The 
root 
element, 
<ar4cle-­‐set>, 
contains 
zero 
or 
more 
ar4cles 
in 
the 
following 
XML 
format: 
• …more 
details: 
• Trampus, 
Mitja 
and 
Novak, 
Blaz: 
The 
Internals 
Of 
An 
Aggregated 
Web 
News 
Feed. 
Proceedings 
of 
15th 
Mul4conference 
on 
Informa4on 
Society 
2012 
(IS-­‐2012). 
[PDF]
Document Enrichment 
hWp://enrycher.ijs.si/
What can extracted from a document? 
• Lexical 
level 
• Tokeniza4on 
– 
extrac4ng 
tokens 
from 
a 
document 
(words, 
separators, 
…) 
• Sentence 
spli<ng 
– 
set 
of 
sentences 
to 
be 
further 
processed 
• Linguis4c 
level 
• Part-­‐of-­‐Speech 
– 
assigning 
word 
types 
(nouns, 
verbs, 
adjec4ves, 
…) 
• Deep 
Parsing 
– 
construc4ng 
parse 
trees 
from 
sentences 
• Triple 
extrac4on 
– 
subject-­‐predicate-­‐object 
triple 
extrac4on 
• Name 
en4ty 
extrac4on 
– 
iden4fying 
names 
of 
people, 
places, 
organiza4ons 
• Seman4c 
level 
• Co-­‐reference 
resolu4on 
– 
replacing 
pronouns 
with 
corresponding 
names; 
merging 
different 
surface 
forms 
of 
names 
into 
single 
en4ty 
• Seman4c 
labeling 
– 
assigning 
seman4c 
iden4fiers 
to 
names 
(e.g. 
LOD/DBpedia/ 
Freebase) 
including 
disambigua4on 
• Topic 
classifica4on 
– 
assigning 
topic 
categories 
to 
a 
document 
(e.g. 
DMoz) 
• Summariza4on 
– 
assigning 
importance 
to 
parts 
of 
a 
document 
• Fact 
extrac4on 
– 
extrac4ng 
relevant 
facts 
from 
a 
document
Enrycher (h0p://enrycher.ijs.si/) 
Plain 
text 
Extracted 
graph 
of 
triples 
from 
text 
Text 
Enrichment 
Diego 
Maradona 
Seman4cs: 
owl:sameAs: 
hKp://dbpedia.org/resource/Diego_Maradona 
owl:sameAs: 
hKp://sw.opencyc.org/concept/Mx4rvofERZwpEbGdrcN5Y29ycA 
rdf:type: 
hWp://dbpedia.org/class/yago/Argen4naInterna4onalFootballers 
rdf:type: 
hWp://dbpedia.org/class/yago/Argen4neExpatriatesInItaly 
rdf:type: 
hWp://dbpedia.org/class/yago/Argen4neFootballManagers 
rdf:type: 
hWp://dbpedia.org/class/yago/Argen4neFootballers 
Robbie 
Keane 
Seman4cs: 
owl:sameAs: 
hKp://dbpedia.org/resource/Robbie_Keane 
rdf:type: 
hWp://dbpedia.org/class/yago/CoventryCityF.C.Players 
rdf:type: 
hWp://dbpedia.org/class/yago/ExpatriateFootballPlayersInItaly 
rdf:type: 
hWp://dbpedia.org/class/yago/F.C.InternazionaleMilanoPlayers 
“Enrycher” 
is 
available 
as 
as 
a 
web-­‐service 
genera4ng 
Seman4c 
Graph, 
LOD 
links, 
En44es, 
Keywords, 
Categories, 
Text 
Summariza4on, 
Sen4ment
Enrycher 
Architecture 
• Enrycher 
Plain 
text 
is 
a 
web 
service 
consis4ng 
of 
a 
set 
of 
interlinked 
modules… 
• …covering 
lexical, 
linguis4c 
and 
seman4c 
annota4ons 
• …expor4ng 
data 
in 
XML 
or 
RDF 
• To 
execute 
the 
service, 
one 
should 
send 
an 
HTTP 
POST 
request, 
with 
the 
raw 
text 
in 
the 
body: 
• curl -d “Enrycher was 
developed at JSI, a 
research institute in 
Ljubljana. Ljubljana is 
the capital of Slovenia.” 
http://enrycher.ijs.si/run! 
Annotated 
document
Cross-­‐linguality 
hWp://xling.ijs.si/
Cross-­‐linguality 
How to operate in many languages? 
• Cross-­‐linguality 
is 
a 
set 
of 
func4ons 
on 
how 
to 
transfer 
informa4on 
across 
the 
languages 
• …having 
this, 
we 
can 
track 
informa4on 
independent 
of 
the 
language 
borders 
• Machine 
Transla4on 
is 
expensive 
and 
slow, 
so 
the 
goal 
is 
to 
avoid 
machine 
transla4on 
to 
gain 
speed 
and 
scale 
• The 
key 
building 
block 
is 
the 
func4on 
for 
comparing 
and 
categoriza4on 
of 
documents 
in 
different 
languages 
• XLing.ijs.si 
is 
an 
open 
web 
service 
to 
bridge 
informa4on 
across 
100 
languages
Languages covered by XLing 
(top 100 Wikipedia languages)
XLing (XLing.ijs.si) 
service for comparing and categoriza=on of documents across 100 languages 
Chinese 
Text 
English 
Text 
Automa4cally 
Extracted 
Keywords 
Automa4cally 
Extracted 
Keywords 
Similarity 
Between 
Two 
Documents 
Selec4on 
Of 
100 
Languages
News Repor=ng Bias 
hWp://aidemo.ijs.si/diversinews/
News Repor=ng Bias example
Detec=ng News Repor=ng Bias 
• The 
task: 
• Given 
a 
news 
story, 
are 
we 
able 
to 
say 
from 
which 
news 
source 
it 
came? 
• We 
compared 
CNN 
and 
Aljazeera 
reports 
about 
the 
same 
events 
from 
the 
war 
in 
Iraq 
• …300 
aligned 
ar4cles 
describing 
the 
same 
story 
from 
both 
sources 
• The 
same 
topics 
are 
expressed 
in 
both 
sources 
with 
the 
following 
keywords: 
• CNN 
with: 
• Insurgents, 
Troops, 
Baghdad, 
Iran, 
Militant, 
Police, 
Suicide, 
Terrorist, 
United, 
Na4onal, 
Hussein, 
Alleged, 
Israeli, 
Syria, 
Terrorism… 
• Aljazeera 
with: 
• AWacks, 
Claims, 
Rebels, 
Withdrawing, 
Report, 
Fighters, 
President, 
Resistance, 
Occupa4on, 
Injured, 
Army, 
Demanded, 
Hit, 
Muslim, 
…
DiversiNews iPad App (1/2) 
• DiversiNews 
iPad 
App 
is 
using 
newsfeed.ijs.si 
and 
enrycher.ijs.si 
services 
• …in 
its 
ini4al 
screen 
is 
shows 
list 
of 
current 
hot 
topics 
and 
current 
trending 
events 
Hot 
Topics 
Trending 
Events
DiversiNews iPad App (2/2) 
• DiversiNews 
“diversity 
search” 
screen 
allows 
dynamic 
reranking 
of 
ar4cles 
describing 
an 
event 
along 
three 
dimensions: 
• Geography 
– 
where 
is 
a 
content 
being 
published 
from 
• Subtopics 
– 
what 
are 
subtopics 
of 
an 
event 
• Sen4ment 
– 
what 
are 
good 
and 
what 
are 
bad 
news 
• For 
each 
query 
it 
provides 
• Automa4cally 
generated 
summary 
• List 
of 
corresponding 
ar4cles 
Geography 
Subtopics 
Sen4ment 
Summary 
Ar4cles
Event Representa=on 
hWp://eventregistry.org/
What is an event? 
(abstract descrip=on) 
• …more 
prac4cal 
ques4on: 
what 
defini4on 
of 
is 
computa4onally 
feasible? 
• In 
general, 
an 
event 
is 
something 
which 
“s4cks 
out” 
of 
the 
average 
in 
some 
kind 
of 
(high 
dimensional) 
data 
space 
• …could 
be 
interpreted 
as 
an 
“anomaly” 
• …densifica4on 
of 
data 
points 
(e.g. 
many 
similar 
documents) 
• …significant 
change 
of 
distribu4on 
(e.g. 
a 
trend 
on 
TwiWer) 
• In 
prac4ce, 
the 
event 
could 
be: 
• A 
cluster 
od 
documents 
/ 
change 
of 
a 
distribu4on 
in 
data 
• Detected 
in 
an 
unsupervised 
way 
• A 
fit 
to 
a 
pre-­‐built 
model 
• Detected 
in 
a 
supervised 
way
How to represent an event? 
• Baseline 
data 
for 
a 
news 
event 
is 
usually 
a 
cluster 
of 
documents 
• …with 
some 
preprocessing 
we 
extract 
linguis4c 
and 
seman4c 
annota4ons 
• …seman4c 
annota4ons 
are 
linked 
to 
ontologies 
providing 
possibility 
for 
mul4resolu4on 
annota4ons 
• Three 
levels 
of 
event 
representa4on: 
• Feature 
vector 
event 
representa4on: 
• …light 
weight 
representa4on 
that 
can 
be 
easily 
represented 
as 
a 
set 
of 
feature 
vectors 
augmented 
with 
external 
ontologies 
– 
suitable 
for 
scalable 
ML 
analysis 
• Structured 
event 
representa4on: 
• Infobox 
representa4on 
(slots 
filling) 
using 
open 
schema 
or 
event 
taxonomy 
• Deep 
event 
representa4on 
• Seman4c 
representa4on 
linked 
to 
a 
world-­‐model 
(e.g. 
CycKB 
common 
sense 
knowledge) 
– 
suitable 
for 
reasoning 
and 
diagnos4cs
Feature vector event representa=on 
• Feature 
vectors 
easily 
extractable 
from 
news 
documents: 
• Topical 
dimension 
– 
what 
is 
being 
talked 
about? 
(keywords) 
• Social 
dimension 
– 
which 
en44es 
are 
men4oned? 
(named 
en44es) 
• Temporal 
aspect 
– 
what 
is 
the 
4me 
of 
an 
event? 
(temporal 
distribu4on) 
• Geographical 
aspect 
– 
where 
an 
event 
is 
taking 
place? 
(loca4on) 
• Publisher 
aspect 
– 
who 
is 
repor4ng? 
(publisher 
iden4fiers) 
• Sen4ment/bias 
aspect 
– 
emo4onal 
signals 
(numeric 
es4mates) 
• Scalable 
Machine 
Learning 
techniques 
can 
easily 
deal 
with 
such 
representa4on 
• …in 
“Event 
Registry” 
system 
we 
use 
this 
representa4on 
to 
describe 
events
Example of “feature vector” event representa=on: Event Registry “Chicago” related events 
Where? 
(geography) 
When? 
(temporal 
distribu4on) 
Who? 
(named 
en44es) 
What? 
(keyword/ 
topics) 
Query: 
“Chicago”
Structured event representa=on 
• Structured 
event 
representa4on 
describes 
an 
event 
by 
its 
“Event 
Type” 
and 
corresponding 
informa4on 
slots 
to 
be 
filled 
• Event 
Types 
should 
be 
taken 
from 
“Event 
Taxonomy” 
• …at 
this 
stage 
of 
development 
this 
level 
of 
representa4on 
s4ll 
requires 
human 
interven4on 
to 
achieve 
high 
accuracy 
(Precision/Recall) 
extrac4on 
• Example 
on 
the 
right 
– 
Wikipedia 
event 
infobox: 
• 2011 
Tōhoku 
earthquake 
and 
tsunami
“Event Taxonomy” – preview 
to the current development
Prototype for event Infobox extrac=on: 
XLike annota=on service 
• The 
goal 
is 
to 
build 
a 
system 
for 
economically 
viable 
extrac4on 
of 
event 
infoboxes 
• …using 
crowd-­‐sourcing 
• …aiming 
at 
high 
Precision 
& 
Recall 
for 
a 
small 
cost
Event sequences & Hierarchical events 
• Once 
having 
events 
iden4fies 
and 
represented 
we 
can 
connect 
events 
into 
“event 
sequences” 
(also 
called 
story-­‐lines) 
• “Event 
sequences” 
include 
events 
which 
are 
supposedly 
related 
and 
cons4tute 
larger 
story 
• Collec4on 
of 
interrelated 
events 
can 
be 
also 
organized 
in 
hierarchies 
(e.g. 
World 
Cup 
event 
consists 
from 
a 
series 
of 
smaller 
events)
An example event: Microsoa Windows 9
Similar events example: similar events to 
Microsoa Windows 9 event
Event sequence iden=fica=on
Hierarchy of events
Example 
Microsoa 
hierarchy of 
events
Zoom-­‐in Example 
Microsoa 
hierarchy of events
Event Tracking 
hWp://eventregistry.org/
Live Event tracking with h0p://EventRegistry.org/
Event 
descrip4on 
through 
en44es 
and 
Seman4c 
keywords
Collec4on 
of 
events 
described 
through 
En4ty 
relatedness
Collec4on 
of 
events 
described 
through 
trending 
concepts
Collec4on 
of 
events 
described 
through 
three 
level 
categoriza4on
Events 
iden4fied 
across 
languages
Collec4on 
of 
events 
described 
through 
Repor4ng 
dynamics
Collec4on 
of 
events 
described 
through 
a 
story-­‐line 
of 
related 
events
Event Registry exports event data through API 
and RDF/Storyline ontology 
• API 
to 
search 
and 
export 
event 
informa4on 
• Export 
of 
all 
the 
system 
data 
in 
JSON 
• Event 
data 
is 
exported 
in 
a 
structured 
form 
• BBC 
Storyline 
ontology 
• hWp://www.bbc.co.uk/ontologies/storyline/2013-­‐05-­‐01.html 
• SPARQL 
endpoint: 
• hWp://eventregistry.org/rdf/search 
• hWp://eventregistry.org/rdf/event/{eventID} 
• hWp://eventregistry.org/rdf/ar4cle/{ar4cleD} 
• hWp://eventregistry.org/rdf/storyline/{storylineID} 
• Example: 
hWp://eventregistry.org/rdf/event/1234
Future / Follow-­‐up projects
Some of the follow-­‐up projects 
• Understanding 
global 
social 
dynamics 
• How 
global 
society 
func4ons? 
• Integra4ng 
text-­‐based 
media 
with 
TV 
channels 
• …requires 
speech 
recogni4on, 
video 
processing, 
visual 
object 
recogni4on, 
face 
recogni4on, 
… 
• Event 
predic4on 
/ 
Event-­‐Consequence 
predic4on 
• …requires 
understanding 
of 
causality 
in 
the 
social 
dynamics 
and 
much 
more 
• Micro-­‐reading 
/ 
Machine-­‐reading 
• …full 
understanding 
of 
individual 
documents 
– 
the 
goal 
for 
10+ 
years

Weitere ähnliche Inhalte

Was ist angesagt?

Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
Robert Sanderson
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
J Singh
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
nisohq
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticweb
Aneta Tu
 

Was ist angesagt? (20)

Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Shebanq roma-2013-10-01
Shebanq roma-2013-10-01Shebanq roma-2013-10-01
Shebanq roma-2013-10-01
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
How much does $1.7 billion buy?
How much does $1.7 billion buy?How much does $1.7 billion buy?
How much does $1.7 billion buy?
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
NISO ResourceSync Training Session
NISO ResourceSync Training SessionNISO ResourceSync Training Session
NISO ResourceSync Training Session
 
Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411
 
Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11Isni where are we now gatenby harvard 2014 11
Isni where are we now gatenby harvard 2014 11
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 
Carpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSyncCarpenter - Wolfram Data Summit ResourceSync
Carpenter - Wolfram Data Summit ResourceSync
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
ResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource SynchronizationResourceSync: Web-Based Resource Synchronization
ResourceSync: Web-Based Resource Synchronization
 
4-Managing CrossRef DOIs
4-Managing CrossRef DOIs4-Managing CrossRef DOIs
4-Managing CrossRef DOIs
 
ResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource SynchronizationResourceSync: Web-based Resource Synchronization
ResourceSync: Web-based Resource Synchronization
 
Staab programming thesemanticweb
Staab programming thesemanticwebStaab programming thesemanticweb
Staab programming thesemanticweb
 

Andere mochten auch (7)

OWF - Travailler aux frontières
OWF - Travailler aux frontièresOWF - Travailler aux frontières
OWF - Travailler aux frontières
 
20090302 Pratic Solutions Linux
20090302 Pratic Solutions Linux20090302 Pratic Solutions Linux
20090302 Pratic Solutions Linux
 
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014 Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
Instructions: Student mini-projects - F.Flöck - ESWC SS 2014
 
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini ProjectsESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
ESWC SS 2013 - Tuesday Projects Fabian Flöck: Introduction to Mini Projects
 
Wed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationWed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservation
 
OWF Agilité et usabilité
OWF Agilité et usabilitéOWF Agilité et usabilité
OWF Agilité et usabilité
 
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
ESWC SS 2013 - Tuesday Tutorial 2 Maribel Acosta and Barry Norton: Interactio...
 

Ähnlich wie Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014

DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Anja Jentzsch
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
University of Bologna
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
lljohnston
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 

Ähnlich wie Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014 (20)

Global Media Monitor - Marko Grobelnik
Global Media Monitor - Marko GrobelnikGlobal Media Monitor - Marko Grobelnik
Global Media Monitor - Marko Grobelnik
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Sw4 sh slides
Sw4 sh slidesSw4 sh slides
Sw4 sh slides
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Linked Open Data and Applications
Linked Open Data and Applications Linked Open Data and Applications
Linked Open Data and Applications
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Dynamics of Web: Analysis and Implications from Search Perspective
Dynamics of Web: Analysis and Implications from Search  PerspectiveDynamics of Web: Analysis and Implications from Search  Perspective
Dynamics of Web: Analysis and Implications from Search Perspective
 
BiographyNet: Linking the world of History
BiographyNet: Linking the world of HistoryBiographyNet: Linking the world of History
BiographyNet: Linking the world of History
 
A Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 PresentationA Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 Presentation
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 

Mehr von eswcsummerschool

Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
eswcsummerschool
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
eswcsummerschool
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
eswcsummerschool
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
eswcsummerschool
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
eswcsummerschool
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
eswcsummerschool
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
eswcsummerschool
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
eswcsummerschool
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
eswcsummerschool
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
eswcsummerschool
 

Mehr von eswcsummerschool (20)

Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student project
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student project
 
Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student project
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 

Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014

  • 1. Global Media Monitoring h0p://eventregistry.org/ Marko Grobelnik Jozef Stefan Ins4tute Ljubljana, Slovenia Contribu4ons from Gregor Leban, Blaz Fortuna, Janez Brank, Jan Rupnik, Andrej Muhic ESWC Summer School, Sep 2nd 2014, Kalamaki
  • 2. Outline • Introduc4on • Collec4ng Media Data • Document Enrichment • Cross-­‐linguality • News Repor4ng Bias • Event Representa4on • Event Tracking • Future Projects
  • 4. What ques=ons we’ll try to answer? • Where to get global media data? • What is extractable from media documents? • How to connect informa4on across languages? • What is an event? • How to approach diversity in news repor4ng? • How to visualize global event dynamics?
  • 5. Systems/Demos used within the presenta=on • NewsFeed (hWp://newsfeed.ijs.si/) • News and social media crawler • Enrycher (hWp://enrycher.ijs.si/) • Language and Seman4c annota4on • XLing (hWp://xling.ijs.si/ • Cross-­‐lingual document linking and categoriza4on • DiversiNews (hWp://aidemo.ijs.si/diversinews/) • News Diversity Explorer • Event Registry (hWp://eventregistry.org/) • Event detec4on and topic tracking
  • 6. The overall goal • The goal is to establish a real-­‐4me system • …to collect data from global media in real-­‐4me • …to iden4fy events and track evolving topics • …to assign stable iden4fiers to events • …to iden4fy events across languages • …to detect diversity of repor4ng along several dimensions • …to provide rich exploratory visualiza4ons • …to provide interoperable data export
  • 7. Main stream news Blogs Global Media Monitoring pipeline Ar4cle seman4c annota4on Cross-­‐lingual ar4cle matching Cross-­‐lingual cluster matching Event forma4on Event registry API Interface Event info. extrac4on Input data Pre-­‐processing steps Event construc4on Event storage & maintenance Extrac4on of date references Ar4cle clustering Iden4fying related events Detec4on of ar4cle duplicates GUI/Visualiza4ons hWp://EventRegistry.org
  • 8. Collec=ng Media Data hWp://newsfeed.ijs.si/
  • 9. Where to get references to news publishers? • Good start is Wikipedia list of newspapers: • hWp://en.wikipedia.org/wiki/Lists_of_newspapers
  • 10. From a newspaper home-­‐page to an ar=cle hWp://www.ny4mes.com/ HTML RSS Feed (list of ar4cles) Ar4cle to be retreived
  • 11. Collec=ng global media data • Data collec4on service News-­‐Feed • hWp://newsfeed.ijs.si/ • …crawling global main-­‐stream and social media • Monitoring • ~60k main-­‐stream publishers (RSS feeds+special feeds) • ~250k most influen4al blogs (RSS feeds) • free TwiWer feed • Data volume: ~350k ar4cles & blogs per day (+5M tweets) • Languages: eng (50%), ger (10%), spa (8%), fra (5%)
  • 12. Downloading the news stream (1/2) • The stream is accessible at hWp://newsfeed.ijs.si/stream/ • To download the whole stream con4nuously, you can use the python script (hWp://newsfeed.ijs.si/hWp2fs.py) • The script does the following:
  • 13. Downloading the news stream (2/2) • News Stream Contents and Format • The root element, <ar4cle-­‐set>, contains zero or more ar4cles in the following XML format: • …more details: • Trampus, Mitja and Novak, Blaz: The Internals Of An Aggregated Web News Feed. Proceedings of 15th Mul4conference on Informa4on Society 2012 (IS-­‐2012). [PDF]
  • 15. What can extracted from a document? • Lexical level • Tokeniza4on – extrac4ng tokens from a document (words, separators, …) • Sentence spli<ng – set of sentences to be further processed • Linguis4c level • Part-­‐of-­‐Speech – assigning word types (nouns, verbs, adjec4ves, …) • Deep Parsing – construc4ng parse trees from sentences • Triple extrac4on – subject-­‐predicate-­‐object triple extrac4on • Name en4ty extrac4on – iden4fying names of people, places, organiza4ons • Seman4c level • Co-­‐reference resolu4on – replacing pronouns with corresponding names; merging different surface forms of names into single en4ty • Seman4c labeling – assigning seman4c iden4fiers to names (e.g. LOD/DBpedia/ Freebase) including disambigua4on • Topic classifica4on – assigning topic categories to a document (e.g. DMoz) • Summariza4on – assigning importance to parts of a document • Fact extrac4on – extrac4ng relevant facts from a document
  • 16. Enrycher (h0p://enrycher.ijs.si/) Plain text Extracted graph of triples from text Text Enrichment Diego Maradona Seman4cs: owl:sameAs: hKp://dbpedia.org/resource/Diego_Maradona owl:sameAs: hKp://sw.opencyc.org/concept/Mx4rvofERZwpEbGdrcN5Y29ycA rdf:type: hWp://dbpedia.org/class/yago/Argen4naInterna4onalFootballers rdf:type: hWp://dbpedia.org/class/yago/Argen4neExpatriatesInItaly rdf:type: hWp://dbpedia.org/class/yago/Argen4neFootballManagers rdf:type: hWp://dbpedia.org/class/yago/Argen4neFootballers Robbie Keane Seman4cs: owl:sameAs: hKp://dbpedia.org/resource/Robbie_Keane rdf:type: hWp://dbpedia.org/class/yago/CoventryCityF.C.Players rdf:type: hWp://dbpedia.org/class/yago/ExpatriateFootballPlayersInItaly rdf:type: hWp://dbpedia.org/class/yago/F.C.InternazionaleMilanoPlayers “Enrycher” is available as as a web-­‐service genera4ng Seman4c Graph, LOD links, En44es, Keywords, Categories, Text Summariza4on, Sen4ment
  • 17. Enrycher Architecture • Enrycher Plain text is a web service consis4ng of a set of interlinked modules… • …covering lexical, linguis4c and seman4c annota4ons • …expor4ng data in XML or RDF • To execute the service, one should send an HTTP POST request, with the raw text in the body: • curl -d “Enrycher was developed at JSI, a research institute in Ljubljana. Ljubljana is the capital of Slovenia.” http://enrycher.ijs.si/run! Annotated document
  • 19. Cross-­‐linguality How to operate in many languages? • Cross-­‐linguality is a set of func4ons on how to transfer informa4on across the languages • …having this, we can track informa4on independent of the language borders • Machine Transla4on is expensive and slow, so the goal is to avoid machine transla4on to gain speed and scale • The key building block is the func4on for comparing and categoriza4on of documents in different languages • XLing.ijs.si is an open web service to bridge informa4on across 100 languages
  • 20. Languages covered by XLing (top 100 Wikipedia languages)
  • 21. XLing (XLing.ijs.si) service for comparing and categoriza=on of documents across 100 languages Chinese Text English Text Automa4cally Extracted Keywords Automa4cally Extracted Keywords Similarity Between Two Documents Selec4on Of 100 Languages
  • 22. News Repor=ng Bias hWp://aidemo.ijs.si/diversinews/
  • 24. Detec=ng News Repor=ng Bias • The task: • Given a news story, are we able to say from which news source it came? • We compared CNN and Aljazeera reports about the same events from the war in Iraq • …300 aligned ar4cles describing the same story from both sources • The same topics are expressed in both sources with the following keywords: • CNN with: • Insurgents, Troops, Baghdad, Iran, Militant, Police, Suicide, Terrorist, United, Na4onal, Hussein, Alleged, Israeli, Syria, Terrorism… • Aljazeera with: • AWacks, Claims, Rebels, Withdrawing, Report, Fighters, President, Resistance, Occupa4on, Injured, Army, Demanded, Hit, Muslim, …
  • 25. DiversiNews iPad App (1/2) • DiversiNews iPad App is using newsfeed.ijs.si and enrycher.ijs.si services • …in its ini4al screen is shows list of current hot topics and current trending events Hot Topics Trending Events
  • 26. DiversiNews iPad App (2/2) • DiversiNews “diversity search” screen allows dynamic reranking of ar4cles describing an event along three dimensions: • Geography – where is a content being published from • Subtopics – what are subtopics of an event • Sen4ment – what are good and what are bad news • For each query it provides • Automa4cally generated summary • List of corresponding ar4cles Geography Subtopics Sen4ment Summary Ar4cles
  • 28. What is an event? (abstract descrip=on) • …more prac4cal ques4on: what defini4on of is computa4onally feasible? • In general, an event is something which “s4cks out” of the average in some kind of (high dimensional) data space • …could be interpreted as an “anomaly” • …densifica4on of data points (e.g. many similar documents) • …significant change of distribu4on (e.g. a trend on TwiWer) • In prac4ce, the event could be: • A cluster od documents / change of a distribu4on in data • Detected in an unsupervised way • A fit to a pre-­‐built model • Detected in a supervised way
  • 29. How to represent an event? • Baseline data for a news event is usually a cluster of documents • …with some preprocessing we extract linguis4c and seman4c annota4ons • …seman4c annota4ons are linked to ontologies providing possibility for mul4resolu4on annota4ons • Three levels of event representa4on: • Feature vector event representa4on: • …light weight representa4on that can be easily represented as a set of feature vectors augmented with external ontologies – suitable for scalable ML analysis • Structured event representa4on: • Infobox representa4on (slots filling) using open schema or event taxonomy • Deep event representa4on • Seman4c representa4on linked to a world-­‐model (e.g. CycKB common sense knowledge) – suitable for reasoning and diagnos4cs
  • 30. Feature vector event representa=on • Feature vectors easily extractable from news documents: • Topical dimension – what is being talked about? (keywords) • Social dimension – which en44es are men4oned? (named en44es) • Temporal aspect – what is the 4me of an event? (temporal distribu4on) • Geographical aspect – where an event is taking place? (loca4on) • Publisher aspect – who is repor4ng? (publisher iden4fiers) • Sen4ment/bias aspect – emo4onal signals (numeric es4mates) • Scalable Machine Learning techniques can easily deal with such representa4on • …in “Event Registry” system we use this representa4on to describe events
  • 31. Example of “feature vector” event representa=on: Event Registry “Chicago” related events Where? (geography) When? (temporal distribu4on) Who? (named en44es) What? (keyword/ topics) Query: “Chicago”
  • 32. Structured event representa=on • Structured event representa4on describes an event by its “Event Type” and corresponding informa4on slots to be filled • Event Types should be taken from “Event Taxonomy” • …at this stage of development this level of representa4on s4ll requires human interven4on to achieve high accuracy (Precision/Recall) extrac4on • Example on the right – Wikipedia event infobox: • 2011 Tōhoku earthquake and tsunami
  • 33. “Event Taxonomy” – preview to the current development
  • 34. Prototype for event Infobox extrac=on: XLike annota=on service • The goal is to build a system for economically viable extrac4on of event infoboxes • …using crowd-­‐sourcing • …aiming at high Precision & Recall for a small cost
  • 35. Event sequences & Hierarchical events • Once having events iden4fies and represented we can connect events into “event sequences” (also called story-­‐lines) • “Event sequences” include events which are supposedly related and cons4tute larger story • Collec4on of interrelated events can be also organized in hierarchies (e.g. World Cup event consists from a series of smaller events)
  • 36. An example event: Microsoa Windows 9
  • 37. Similar events example: similar events to Microsoa Windows 9 event
  • 41. Zoom-­‐in Example Microsoa hierarchy of events
  • 43. Live Event tracking with h0p://EventRegistry.org/
  • 44. Event descrip4on through en44es and Seman4c keywords
  • 45. Collec4on of events described through En4ty relatedness
  • 46. Collec4on of events described through trending concepts
  • 47. Collec4on of events described through three level categoriza4on
  • 49. Collec4on of events described through Repor4ng dynamics
  • 50. Collec4on of events described through a story-­‐line of related events
  • 51. Event Registry exports event data through API and RDF/Storyline ontology • API to search and export event informa4on • Export of all the system data in JSON • Event data is exported in a structured form • BBC Storyline ontology • hWp://www.bbc.co.uk/ontologies/storyline/2013-­‐05-­‐01.html • SPARQL endpoint: • hWp://eventregistry.org/rdf/search • hWp://eventregistry.org/rdf/event/{eventID} • hWp://eventregistry.org/rdf/ar4cle/{ar4cleD} • hWp://eventregistry.org/rdf/storyline/{storylineID} • Example: hWp://eventregistry.org/rdf/event/1234
  • 53. Some of the follow-­‐up projects • Understanding global social dynamics • How global society func4ons? • Integra4ng text-­‐based media with TV channels • …requires speech recogni4on, video processing, visual object recogni4on, face recogni4on, … • Event predic4on / Event-­‐Consequence predic4on • …requires understanding of causality in the social dynamics and much more • Micro-­‐reading / Machine-­‐reading • …full understanding of individual documents – the goal for 10+ years