SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Entities, Time and Events
in BiographyNet &
NewsReader
Antske Fokkens
VU University
Monday, November 11, 13
Acknowledgement
(people)
The work presented in this
presentation was carried out by/with:
Agata Cybulska, Marieke van Erp and
Piek Vossen
Niels Ockeloen, Serge ter Braake, Willem
Robert van Hage, Jesper Hoeksema, Sara
Tonelli, Rachele Sprugnoli, Luciano Serafini,
Aitor Soroa, German Rigau and others

Monday, November 11, 13
Overview

mini introduction to BiographyNet
mini introduction to NewsReader
representing entities and events

Monday, November 11, 13
BiographyNet
An interdisciplinary project
involving history, computer science
and computational linguistics
Goal: inspire new historic research
by identifying relations between
people and events in Biographical
dictionaries

Monday, November 11, 13
NLP in BiographyNet
The Biography Portal of the Netherlands
125,000 biographies from 23 sources
describing 76,000 people
Text and metadata
Role of NLP:
Identify information in text
Study differences in style and focus
Monday, November 11, 13
BiographyNet
use cases
Analysis on groups of individuals (e.g.
who were governor generals of the Dutch
Indies)
More complex questions, e.g. the relation
between influential people in the Dutch
colonies and current Dutch elite
Perspectives: how are people and events
judged in different sources?

Monday, November 11, 13
BiographyNet data
Biographical text in Dutch
Heterogenous corpus: 23 sources,
texts from 17th century - now
Metadata about basic facts:
high quality (few errors)
completeness varies

Monday, November 11, 13
BiographyNet
Text mining
First step: fill out gaps in metadata
Basic supervised machine learning system
Next steps:
Create timelines for individuals
Identify relations between people
Identify events and relations between them

Monday, November 11, 13
BiographyNet
Methodology
The output of NLP tools is used by other
researchers
They should have insight into the
performance of the tools and the
approaches that are used
Provenance information plays a vital role

Monday, November 11, 13
NewsReader
Automatically process massive streams of
daily news from thousands of sources in 4
different languages
Project Partners:
VU University Amsterdam, LexisNexis,
Synerscope (the Netherlands)
Basque University (Spain)
ScraperWiki (UK)
Federation Bruno Kessler (Italy)
Monday, November 11, 13
NewsReader
what happened, where, when and who was
involved?
Which temporal and causal relations hold
between events, what does that tell us
about the people involved?
Place the cumulated result in a knowledge
store that can handle dynamic growth of
information: a history recorder
Monday, November 11, 13
NewsReader
Big Data
Focus: The financial crisis
E.g. What is the impact of the financial
crisis on the car industry?
Big Data: LexisNexis estimates:
1-2 million news articles per day
that their archive has 10 million
English news articles about the car
industry from the last 10 years
Monday, November 11, 13
NewsReader
Narratives
What are the stories that are being
told by all this data?
Challenges:
Duplicates, overlap and repetitions: how to
distinguish old from new?
Single results tell only parts of the story
Results can be inconsistent
News is opinionated and colored

Monday, November 11, 13
NewsReader
overall approach
Resolve all mentions of events, their
participants, locations and time in texts
and other resources
Determine coreference and other relations
between them
Combine all information from coreferring
event mentions around a hypothetical
event instance (independent from text)
Combine instances into storylines
Monday, November 11, 13
NLP pipeline
TOKENIZER +
SENTENCE
SPLITTER

Time
expressions

WSD_client

WSD_server

NER

POS-TAGGER

NED_client

NED_server

PARSER

KS Frontend

Mgmt.
Scripts

API implementation over layers; replicated for scalability and fault tolerance
LEXISNEXIS
documents

Storage of original
input data

HBase + Hadoop

Triple Store

distributed & replicated for scalability and fault-tolerance

(possibly) distributed

Resource

Mention

KNOWLEDGE STORE

Visualisation
(Synerscope)

Story
Understanding

Entity

Statement
+ Context

Partial replication

Event
relations

RDF Triples +
Named Graphs

Coreference
resolution

start / stop,
backup /
restore,
configuration,
statistics,
gathering

SRL

Event
detection

Inference

Event
coreference

Opinion
Detection

Factuality

Runs in virtual machine
EHU
Runs in virtual machine

Input data storage

Processes that can be carried out in any order at this stage
VUA

Monday, November 11, 13

FBK
Both Projects

Accumulate information about the same
entities and events from various
sources
Must deal with different perspectives,
contradicting and partial information

Monday, November 11, 13
Grounded Annotation
Framework (GAF)
Sources report on events and entities:
event mentions and entity mentions
URIs represent instances of these
entities and events in reality
GAF links instances to mentions
Information from mentions in other
sources is merged with known
information around the instance
Monday, November 11, 13
a GAF example

changes in the world

2004

2005

SEM-EVENT
TEMBLOR

SEM-EVENT
USS Jimmy
Carter energy
weapon

2006

SEM-EVENT
TSUNAMI

2007

SEM-EVENT
TEMBLOR

2009

2008

SEM-EVENT
TSUNAMI

SEM-EVENT
TEMBLOR

SEM-EVENT
TSUNAMI

future tsunami
Tsunami alert
system
ANNOTATION
ANNOTATION
NAF
TAF

publication of sources

2004

2005

ANNOTATION

2006

sensor data
direct event report

Monday, November 11, 13

delayed event report
future event report

ANNOTATION
ANNOTATION

2007

ANNOTATION

ANNOTATION

2008

"The catastrophe four years ago devastated Indian
Ocean community and killed more than 230,000
people, over 170,000 of them in Aceh
at northern tip of Sumatra Island of Indonesia."

2009

2013

..., the vessel is the party responsible for the 2004 Indian
Ocean tsunami that killed 230,000 people. Apparently,
the submarine was able to trigger seismic activity via
some kind of directed energy weapon.
Linguistic information in
GAF
The NLP Annotation Format (NAF)
Knowledge Annotation Format (KAF)
stand-off layered annotation (LAF
compatible)
separating mentions from instances
NLP Interchange Format (NIF)
RDF and URIs, inline annotation
Compatible with PROV-DM
Monday, November 11, 13
Events in GAF
extended Simple Event Model (SEM):
RDF representations of event
instances with participant, location
and time
can represent contradictory
information

Monday, November 11, 13
GAF from NAF + SEM
Can accumulate information from
different sources
Can represent repeated information as a
single relation (with links to all
sources that provided this information)
Can represent contradicting information
Is compatible with the PROV-DM

Monday, November 11, 13
Acknowledgements
Supported by the European Union’s 7th
Framework program via the NewsReader
Project (ICT-316404)
Supported by the BiographyNet project
(nr. 660.011.308) funded by the
Netherlands eScience center (http://
escience.center.nl)

Monday, November 11, 13
References
GAF:
Fokkens, Antske, Marieke van Erp, Piek Vossen, Sara
Tonelli, Willem Robert van Hage, Luciano Serafini,
Rachele Sprugnoli and Jesper Hoeksema. 2013. GAF: A
Grounded Annotation Framework for Events. Proceedings
of the first Workshop on EVENTS: Definition, Detection,
Coreference and Representation. Atlanta USA.
Marieke Van Erp, Antske Fokkens, Piek Vossen, Sara
Tonelli, Willem Robert Van Hage, Luciano Serafini,
Rachele Sprugnoli and Jesper Hoeksema. 2013. Denoting
Data in the Grounded Annotation Framework. ISWC 2013
Posters and Demos. Sydney Australia, 21-25 October 2013

Monday, November 11, 13
References
SEM:
Van Hage, Willem Robert, Véronique Malaisé, Roxane
Segers, Laura Hollink, and Guus Schreiber. "Design
and use of the Simple Event Model (SEM)." Web
Semantics: Science, Services and Agents on the World
Wide Web 9, no. 2 (2011): 128-136.

Cross-document coreference:
Cybulska, Agata, and Piek Vossen. “Semantic
Relations between Events and their Time, Locations
and Participants for Event Coreference Resolution.”
In: Proceedings of RANLP 2013.
Monday, November 11, 13
References
Named Entity Recognition:
Marieke van Erp, Giuseppe Rizzo and Raphaël Troncy
(2013) Learning with the Web: Spotting Named Entities on
the intersection of NERD and Machine Learning. #MSM2013
Concept Extraction Challenge. Rio de Janeiro, Brazil,
May 2013.

Provenance:
Niels Ockeloen, Antske Fokkens, Serge Ter Braake, Piek
Vossen, Victor de Boer, Guus Schreiber and Susan Legêne.
2013. BiographyNet: Managing Provenance at multiple
levels and from different perspectives. In: Proceedings
of the Workshop on Linked Science 2013 (LISC2013).
Monday, November 11, 13

Weitere ähnliche Inhalte

Andere mochten auch

Sw3 week12 slide1
Sw3 week12 slide1Sw3 week12 slide1
Sw3 week12 slide1
s1180197
 

Andere mochten auch (11)

World Economic Forum in Turkey 2006
World Economic Forum in Turkey 2006World Economic Forum in Turkey 2006
World Economic Forum in Turkey 2006
 
The Wonders Of Winter
The Wonders Of WinterThe Wonders Of Winter
The Wonders Of Winter
 
Zaragoza turismo-57
Zaragoza turismo-57Zaragoza turismo-57
Zaragoza turismo-57
 
תחום המחול של תכנית קרב בהשתלמות
תחום המחול של תכנית קרב בהשתלמותתחום המחול של תכנית קרב בהשתלמות
תחום המחול של תכנית קרב בהשתלמות
 
digital marketing -advertising
digital marketing -advertisingdigital marketing -advertising
digital marketing -advertising
 
Carlos Sierra 10 Fun Summer Cupcake Recipes
Carlos Sierra 10 Fun Summer Cupcake RecipesCarlos Sierra 10 Fun Summer Cupcake Recipes
Carlos Sierra 10 Fun Summer Cupcake Recipes
 
Facebook for Art
Facebook for ArtFacebook for Art
Facebook for Art
 
ประชาสัมพันธ์_การประชุมวิชาการปัญญาภิวัฒน์ ครั้งที่ 3
ประชาสัมพันธ์_การประชุมวิชาการปัญญาภิวัฒน์ ครั้งที่ 3ประชาสัมพันธ์_การประชุมวิชาการปัญญาภิวัฒน์ ครั้งที่ 3
ประชาสัมพันธ์_การประชุมวิชาการปัญญาภิวัฒน์ ครั้งที่ 3
 
this is test api2
this is test api2this is test api2
this is test api2
 
Social media para PyMEs (GAMLP deck)
Social media para PyMEs (GAMLP deck)Social media para PyMEs (GAMLP deck)
Social media para PyMEs (GAMLP deck)
 
Sw3 week12 slide1
Sw3 week12 slide1Sw3 week12 slide1
Sw3 week12 slide1
 

Ähnlich wie Entities, Time and Events in BiographyNet and NewsReader

The changing journals landscape
The changing journals landscapeThe changing journals landscape
The changing journals landscape
Laura Czerniewicz
 
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
Teresa S. Welsh
 
Luca Giuliano - Data visualization
Luca Giuliano - Data visualizationLuca Giuliano - Data visualization
Luca Giuliano - Data visualization
SegnalazionIT
 
Open Access to Science: a practical Institutional Repository perspective
Open Access to Science: a practical Institutional Repository perspectiveOpen Access to Science: a practical Institutional Repository perspective
Open Access to Science: a practical Institutional Repository perspective
calsi
 
Mobilising the knowledge economy for Europe
Mobilising the knowledge economy for EuropeMobilising the knowledge economy for Europe
Mobilising the knowledge economy for Europe
LIBER Europe
 
BoF Bellamy et al 2010
BoF Bellamy et al 2010BoF Bellamy et al 2010
BoF Bellamy et al 2010
Craig Bellamy
 

Ähnlich wie Entities, Time and Events in BiographyNet and NewsReader (20)

20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
 
Data and science
Data and scienceData and science
Data and science
 
Dive exploring history presentation
Dive exploring history presentationDive exploring history presentation
Dive exploring history presentation
 
The changing journals landscape
The changing journals landscapeThe changing journals landscape
The changing journals landscape
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And Archive
 
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
Free Visualization Tools for Teaching and Research: Blogs, Glogs, GIS, Word C...
 
Luca Giuliano - Data visualization
Luca Giuliano - Data visualizationLuca Giuliano - Data visualization
Luca Giuliano - Data visualization
 
The Changing Journal Landscape
The Changing Journal Landscape The Changing Journal Landscape
The Changing Journal Landscape
 
OKFN_OpenDataMx
OKFN_OpenDataMxOKFN_OpenDataMx
OKFN_OpenDataMx
 
Doing Television History Outside the Box: Unexplored Territories on the Europ...
Doing Television History Outside the Box: Unexplored Territories on the Europ...Doing Television History Outside the Box: Unexplored Territories on the Europ...
Doing Television History Outside the Box: Unexplored Territories on the Europ...
 
Comenius Coordinators briefing for the common project
Comenius Coordinators briefing for the common projectComenius Coordinators briefing for the common project
Comenius Coordinators briefing for the common project
 
An Ontology For Historical Research Documents
An Ontology For Historical Research DocumentsAn Ontology For Historical Research Documents
An Ontology For Historical Research Documents
 
Open Access to Science: a practical Institutional Repository perspective
Open Access to Science: a practical Institutional Repository perspectiveOpen Access to Science: a practical Institutional Repository perspective
Open Access to Science: a practical Institutional Repository perspective
 
CER (Communicating European Research) event news _en Brussels Belgium
CER (Communicating European Research) event news _en Brussels BelgiumCER (Communicating European Research) event news _en Brussels Belgium
CER (Communicating European Research) event news _en Brussels Belgium
 
Mobilising the knowledge economy for Europe
Mobilising the knowledge economy for EuropeMobilising the knowledge economy for Europe
Mobilising the knowledge economy for Europe
 
Open access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesOpen access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, libraries
 
Infographics- getting the message
Infographics- getting the messageInfographics- getting the message
Infographics- getting the message
 
The Manylaws Project: Overall Presentation
The Manylaws Project: Overall PresentationThe Manylaws Project: Overall Presentation
The Manylaws Project: Overall Presentation
 
Scenarios oct 2016
Scenarios oct 2016Scenarios oct 2016
Scenarios oct 2016
 
BoF Bellamy et al 2010
BoF Bellamy et al 2010BoF Bellamy et al 2010
BoF Bellamy et al 2010
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Entities, Time and Events in BiographyNet and NewsReader

  • 1. Entities, Time and Events in BiographyNet & NewsReader Antske Fokkens VU University Monday, November 11, 13
  • 2. Acknowledgement (people) The work presented in this presentation was carried out by/with: Agata Cybulska, Marieke van Erp and Piek Vossen Niels Ockeloen, Serge ter Braake, Willem Robert van Hage, Jesper Hoeksema, Sara Tonelli, Rachele Sprugnoli, Luciano Serafini, Aitor Soroa, German Rigau and others Monday, November 11, 13
  • 3. Overview mini introduction to BiographyNet mini introduction to NewsReader representing entities and events Monday, November 11, 13
  • 4. BiographyNet An interdisciplinary project involving history, computer science and computational linguistics Goal: inspire new historic research by identifying relations between people and events in Biographical dictionaries Monday, November 11, 13
  • 5. NLP in BiographyNet The Biography Portal of the Netherlands 125,000 biographies from 23 sources describing 76,000 people Text and metadata Role of NLP: Identify information in text Study differences in style and focus Monday, November 11, 13
  • 6. BiographyNet use cases Analysis on groups of individuals (e.g. who were governor generals of the Dutch Indies) More complex questions, e.g. the relation between influential people in the Dutch colonies and current Dutch elite Perspectives: how are people and events judged in different sources? Monday, November 11, 13
  • 7. BiographyNet data Biographical text in Dutch Heterogenous corpus: 23 sources, texts from 17th century - now Metadata about basic facts: high quality (few errors) completeness varies Monday, November 11, 13
  • 8. BiographyNet Text mining First step: fill out gaps in metadata Basic supervised machine learning system Next steps: Create timelines for individuals Identify relations between people Identify events and relations between them Monday, November 11, 13
  • 9. BiographyNet Methodology The output of NLP tools is used by other researchers They should have insight into the performance of the tools and the approaches that are used Provenance information plays a vital role Monday, November 11, 13
  • 10. NewsReader Automatically process massive streams of daily news from thousands of sources in 4 different languages Project Partners: VU University Amsterdam, LexisNexis, Synerscope (the Netherlands) Basque University (Spain) ScraperWiki (UK) Federation Bruno Kessler (Italy) Monday, November 11, 13
  • 11. NewsReader what happened, where, when and who was involved? Which temporal and causal relations hold between events, what does that tell us about the people involved? Place the cumulated result in a knowledge store that can handle dynamic growth of information: a history recorder Monday, November 11, 13
  • 12. NewsReader Big Data Focus: The financial crisis E.g. What is the impact of the financial crisis on the car industry? Big Data: LexisNexis estimates: 1-2 million news articles per day that their archive has 10 million English news articles about the car industry from the last 10 years Monday, November 11, 13
  • 13. NewsReader Narratives What are the stories that are being told by all this data? Challenges: Duplicates, overlap and repetitions: how to distinguish old from new? Single results tell only parts of the story Results can be inconsistent News is opinionated and colored Monday, November 11, 13
  • 14. NewsReader overall approach Resolve all mentions of events, their participants, locations and time in texts and other resources Determine coreference and other relations between them Combine all information from coreferring event mentions around a hypothetical event instance (independent from text) Combine instances into storylines Monday, November 11, 13
  • 15. NLP pipeline TOKENIZER + SENTENCE SPLITTER Time expressions WSD_client WSD_server NER POS-TAGGER NED_client NED_server PARSER KS Frontend Mgmt. Scripts API implementation over layers; replicated for scalability and fault tolerance LEXISNEXIS documents Storage of original input data HBase + Hadoop Triple Store distributed & replicated for scalability and fault-tolerance (possibly) distributed Resource Mention KNOWLEDGE STORE Visualisation (Synerscope) Story Understanding Entity Statement + Context Partial replication Event relations RDF Triples + Named Graphs Coreference resolution start / stop, backup / restore, configuration, statistics, gathering SRL Event detection Inference Event coreference Opinion Detection Factuality Runs in virtual machine EHU Runs in virtual machine Input data storage Processes that can be carried out in any order at this stage VUA Monday, November 11, 13 FBK
  • 16. Both Projects Accumulate information about the same entities and events from various sources Must deal with different perspectives, contradicting and partial information Monday, November 11, 13
  • 17. Grounded Annotation Framework (GAF) Sources report on events and entities: event mentions and entity mentions URIs represent instances of these entities and events in reality GAF links instances to mentions Information from mentions in other sources is merged with known information around the instance Monday, November 11, 13
  • 18. a GAF example changes in the world 2004 2005 SEM-EVENT TEMBLOR SEM-EVENT USS Jimmy Carter energy weapon 2006 SEM-EVENT TSUNAMI 2007 SEM-EVENT TEMBLOR 2009 2008 SEM-EVENT TSUNAMI SEM-EVENT TEMBLOR SEM-EVENT TSUNAMI future tsunami Tsunami alert system ANNOTATION ANNOTATION NAF TAF publication of sources 2004 2005 ANNOTATION 2006 sensor data direct event report Monday, November 11, 13 delayed event report future event report ANNOTATION ANNOTATION 2007 ANNOTATION ANNOTATION 2008 "The catastrophe four years ago devastated Indian Ocean community and killed more than 230,000 people, over 170,000 of them in Aceh at northern tip of Sumatra Island of Indonesia." 2009 2013 ..., the vessel is the party responsible for the 2004 Indian Ocean tsunami that killed 230,000 people. Apparently, the submarine was able to trigger seismic activity via some kind of directed energy weapon.
  • 19. Linguistic information in GAF The NLP Annotation Format (NAF) Knowledge Annotation Format (KAF) stand-off layered annotation (LAF compatible) separating mentions from instances NLP Interchange Format (NIF) RDF and URIs, inline annotation Compatible with PROV-DM Monday, November 11, 13
  • 20. Events in GAF extended Simple Event Model (SEM): RDF representations of event instances with participant, location and time can represent contradictory information Monday, November 11, 13
  • 21. GAF from NAF + SEM Can accumulate information from different sources Can represent repeated information as a single relation (with links to all sources that provided this information) Can represent contradicting information Is compatible with the PROV-DM Monday, November 11, 13
  • 22. Acknowledgements Supported by the European Union’s 7th Framework program via the NewsReader Project (ICT-316404) Supported by the BiographyNet project (nr. 660.011.308) funded by the Netherlands eScience center (http:// escience.center.nl) Monday, November 11, 13
  • 23. References GAF: Fokkens, Antske, Marieke van Erp, Piek Vossen, Sara Tonelli, Willem Robert van Hage, Luciano Serafini, Rachele Sprugnoli and Jesper Hoeksema. 2013. GAF: A Grounded Annotation Framework for Events. Proceedings of the first Workshop on EVENTS: Definition, Detection, Coreference and Representation. Atlanta USA. Marieke Van Erp, Antske Fokkens, Piek Vossen, Sara Tonelli, Willem Robert Van Hage, Luciano Serafini, Rachele Sprugnoli and Jesper Hoeksema. 2013. Denoting Data in the Grounded Annotation Framework. ISWC 2013 Posters and Demos. Sydney Australia, 21-25 October 2013 Monday, November 11, 13
  • 24. References SEM: Van Hage, Willem Robert, Véronique Malaisé, Roxane Segers, Laura Hollink, and Guus Schreiber. "Design and use of the Simple Event Model (SEM)." Web Semantics: Science, Services and Agents on the World Wide Web 9, no. 2 (2011): 128-136. Cross-document coreference: Cybulska, Agata, and Piek Vossen. “Semantic Relations between Events and their Time, Locations and Participants for Event Coreference Resolution.” In: Proceedings of RANLP 2013. Monday, November 11, 13
  • 25. References Named Entity Recognition: Marieke van Erp, Giuseppe Rizzo and Raphaël Troncy (2013) Learning with the Web: Spotting Named Entities on the intersection of NERD and Machine Learning. #MSM2013 Concept Extraction Challenge. Rio de Janeiro, Brazil, May 2013. Provenance: Niels Ockeloen, Antske Fokkens, Serge Ter Braake, Piek Vossen, Victor de Boer, Guus Schreiber and Susan Legêne. 2013. BiographyNet: Managing Provenance at multiple levels and from different perspectives. In: Proceedings of the Workshop on Linked Science 2013 (LISC2013). Monday, November 11, 13