Entities, Time and Events in BiographyNet and NewsReader

Entities, Time and Events
in BiographyNet &
NewsReader
Antske Fokkens
VU University
Monday, November 11, 13

Acknowledgement
(people)
The work presented in this
presentation was carried out by/with:
Agata Cybulska, Marieke van Erp and
Piek Vossen
Niels Ockeloen, Serge ter Braake, Willem
Robert van Hage, Jesper Hoeksema, Sara
Tonelli, Rachele Sprugnoli, Luciano Serafini,
Aitor Soroa, German Rigau and others


Overview

mini introduction to BiographyNet
mini introduction to NewsReader
representing entities and events


BiographyNet
An interdisciplinary project
involving history, computer science
and computational linguistics
Goal: inspire new historic research
by identifying relations between
people and events in Biographical
dictionaries


NLP in BiographyNet
The Biography Portal of the Netherlands
125,000 biographies from 23 sources
describing 76,000 people
Text and metadata
Role of NLP:
Identify information in text
Study differences in style and focus

BiographyNet
use cases
Analysis on groups of individuals (e.g.
who were governor generals of the Dutch
Indies)
More complex questions, e.g. the relation
between influential people in the Dutch
colonies and current Dutch elite
Perspectives: how are people and events
judged in different sources?


BiographyNet data
Biographical text in Dutch
Heterogenous corpus: 23 sources,
texts from 17th century - now
Metadata about basic facts:
high quality (few errors)
completeness varies


BiographyNet
Text mining
First step: fill out gaps in metadata
Basic supervised machine learning system
Next steps:
Create timelines for individuals
Identify relations between people
Identify events and relations between them


BiographyNet
Methodology
The output of NLP tools is used by other
researchers
They should have insight into the
performance of the tools and the
approaches that are used
Provenance information plays a vital role


NewsReader
Automatically process massive streams of
daily news from thousands of sources in 4
different languages
Project Partners:
VU University Amsterdam, LexisNexis,
Synerscope (the Netherlands)
Basque University (Spain)
ScraperWiki (UK)
Federation Bruno Kessler (Italy)

NewsReader
what happened, where, when and who was
involved?
Which temporal and causal relations hold
between events, what does that tell us
about the people involved?
Place the cumulated result in a knowledge
store that can handle dynamic growth of
information: a history recorder

NewsReader
Big Data
Focus: The financial crisis
E.g. What is the impact of the financial
crisis on the car industry?
Big Data: LexisNexis estimates:
1-2 million news articles per day
that their archive has 10 million
English news articles about the car
industry from the last 10 years

NewsReader
Narratives
What are the stories that are being
told by all this data?
Challenges:
Duplicates, overlap and repetitions: how to
distinguish old from new?
Single results tell only parts of the story
Results can be inconsistent
News is opinionated and colored


NewsReader
overall approach
Resolve all mentions of events, their
participants, locations and time in texts
and other resources
Determine coreference and other relations
between them
Combine all information from coreferring
event mentions around a hypothetical
event instance (independent from text)
Combine instances into storylines

NLP pipeline
TOKENIZER +
SENTENCE
SPLITTER

Time
expressions

WSD_client

WSD_server

NER

POS-TAGGER

NED_client

NED_server

PARSER

KS Frontend

Mgmt.
Scripts

API implementation over layers; replicated for scalability and fault tolerance
LEXISNEXIS
documents

Storage of original
input data

HBase + Hadoop

Triple Store

distributed & replicated for scalability and fault-tolerance

(possibly) distributed

Resource

Mention

KNOWLEDGE STORE

Visualisation
(Synerscope)

Story
Understanding

Entity

Statement
+ Context

Partial replication

Event
relations

RDF Triples +
Named Graphs

Coreference
resolution

start / stop,
backup /
restore,
conﬁguration,
statistics,
gathering

SRL

Event
detection

Inference

Event
coreference

Opinion
Detection

Factuality

Runs in virtual machine
EHU
Runs in virtual machine

Input data storage

Processes that can be carried out in any order at this stage
VUA


FBK

Both Projects

Accumulate information about the same
entities and events from various
sources
Must deal with different perspectives,
contradicting and partial information


Grounded Annotation
Framework (GAF)
Sources report on events and entities:
event mentions and entity mentions
URIs represent instances of these
entities and events in reality
GAF links instances to mentions
Information from mentions in other
sources is merged with known
information around the instance

a GAF example

changes in the world

2004

2005

SEM-EVENT
TEMBLOR

SEM-EVENT
USS Jimmy
Carter energy
weapon

2006

SEM-EVENT
TSUNAMI

2007

SEM-EVENT
TEMBLOR

2009

2008

SEM-EVENT
TSUNAMI

SEM-EVENT
TEMBLOR

SEM-EVENT
TSUNAMI

future tsunami
Tsunami alert
system
ANNOTATION
ANNOTATION
NAF
TAF

publication of sources

2004

2005

ANNOTATION

2006

sensor data
direct event report


delayed event report
future event report

ANNOTATION
ANNOTATION

2007

ANNOTATION

ANNOTATION

2008

"The catastrophe four years ago devastated Indian
Ocean community and killed more than 230,000
people, over 170,000 of them in Aceh
at northern tip of Sumatra Island of Indonesia."

2009

2013

..., the vessel is the party responsible for the 2004 Indian
Ocean tsunami that killed 230,000 people. Apparently,
the submarine was able to trigger seismic activity via
some kind of directed energy weapon.

Linguistic information in
GAF
The NLP Annotation Format (NAF)
Knowledge Annotation Format (KAF)
stand-off layered annotation (LAF
compatible)
separating mentions from instances
NLP Interchange Format (NIF)
RDF and URIs, inline annotation
Compatible with PROV-DM

Events in GAF
extended Simple Event Model (SEM):
RDF representations of event
instances with participant, location
and time
can represent contradictory
information


GAF from NAF + SEM
Can accumulate information from
different sources
Can represent repeated information as a
single relation (with links to all
sources that provided this information)
Can represent contradicting information
Is compatible with the PROV-DM


Acknowledgements
Supported by the European Union’s 7th
Framework program via the NewsReader
Project (ICT-316404)
Supported by the BiographyNet project
(nr. 660.011.308) funded by the
Netherlands eScience center (http://
escience.center.nl)


References
GAF:
Fokkens, Antske, Marieke van Erp, Piek Vossen, Sara
Tonelli, Willem Robert van Hage, Luciano Serafini,
Rachele Sprugnoli and Jesper Hoeksema. 2013. GAF: A
Grounded Annotation Framework for Events. Proceedings
of the first Workshop on EVENTS: Definition, Detection,
Coreference and Representation. Atlanta USA.
Marieke Van Erp, Antske Fokkens, Piek Vossen, Sara
Tonelli, Willem Robert Van Hage, Luciano Serafini,
Rachele Sprugnoli and Jesper Hoeksema. 2013. Denoting
Data in the Grounded Annotation Framework. ISWC 2013
Posters and Demos. Sydney Australia, 21-25 October 2013


References
SEM:
Van Hage, Willem Robert, Véronique Malaisé, Roxane
Segers, Laura Hollink, and Guus Schreiber. "Design
and use of the Simple Event Model (SEM)." Web
Semantics: Science, Services and Agents on the World
Wide Web 9, no. 2 (2011): 128-136.

Cross-document coreference:
Cybulska, Agata, and Piek Vossen. “Semantic
Relations between Events and their Time, Locations
and Participants for Event Coreference Resolution.”
In: Proceedings of RANLP 2013.

References
Named Entity Recognition:
Marieke van Erp, Giuseppe Rizzo and Raphaël Troncy
(2013) Learning with the Web: Spotting Named Entities on
the intersection of NERD and Machine Learning. #MSM2013
Concept Extraction Challenge. Rio de Janeiro, Brazil,
May 2013.

Provenance:
Niels Ockeloen, Antske Fokkens, Serge Ter Braake, Piek
Vossen, Victor de Boer, Guus Schreiber and Susan Legêne.
2013. BiographyNet: Managing Provenance at multiple
levels and from different perspectives. In: Proceedings
of the Workshop on Linked Science 2013 (LISC2013).

Entities, Time and Events in BiographyNet and NewsReader

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Entities, Time and Events in BiographyNet and NewsReader

Ähnlich wie Entities, Time and Events in BiographyNet and NewsReader (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Entities, Time and Events in BiographyNet and NewsReader