Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
1. Scenario-Driven Selection and Exploitation of
Semantic Data for Optimal Named Entity
Disambiguation
Panos Alexopoulos, Carlos Ruiz, Jose Manuel Gomez Perez
1st Semantic Web and Information Extraction Workshop,
Galway, Ireland, October 9th, 2012
2. Agenda
Introduction
Problem Definition and Paper
Focus
Approach Overview and Rationale
Proposed Disambiguation Framework
Disambiguation Evidence Model
Entity Disambiguation Process
Framework Evaluation
Evaluation Process
Evaluation Results
Conclusions and Future Work
2
3. Introduction
Problem Definition
Entity Resolution & Disambiguation
● Named entity resolution involves detecting mentions of named entities (e.g. people,
organizations or locations) within texts and mapping them to their corresponding
entities in a given knowledge source.
● One important challenge in this task is the correct disambiguation of the detected
entities.
● For example:
● “Siege of Tripolitsa took place in Tripoli with Theodoros Kolokotronis being the
leader of the Greeks. This event marked an early victory for the fight for
independence from Turkey but it was also a massacre against the Muslim and
Jewish population of the city”.
● The term “Tripoli” here refers to http://dbpedia.org/resource/Tripoli,_Greece
but it can be mistaken, for example, with Tripoli in Libya or that in Lebanon.
3
4. Introduction
Disambiguation Approaches
● The majority of disambiguation approaches rely on the strong contextual
hypothesis that terms with similar meanings are often used in similar contexts.
● The role of these contexts is typically played by already annotated documents (e.g.
wikipedia articles) which are used to train term classifiers.
● These classifiers link a term to its correct meaning entity, based on the similarity
between the term’s textual context and the contexts of its potential entities.
● Some more recent approaches utilize semantic structures in order to determine this
similarity in a semantic way.
● The effectiveness of these latter approaches is highly dependent on:
● The availability of comprehensive semantic information.
● The degree of alignment between the content of the texts to be
disambiguated and the semantic data to be used.
4
5. Introduction
Alignment and its Importance
● Alignment means that the ontology’s elements should cover the domain(s) of the
texts to be disambiguated but should not contain other additional elements that:
● Do not belong to the domain.
● Do belong to it but do not appear in the texts.
● For example assume the text “Ronaldo scored two goals for Real Madrid“ from a
contemporary football match description.
● To disambiguate the term “Ronaldo” using an ontology, the only contextual
evidence that can be used is the entity “Real Madrid”.
● Yet there are two players with that name that are semantically related to Real:
● Cristiano Ronaldo (current player)
● Ronaldo Luis Nazario de Lima (former player).
● This means that if both relations are considered then the term will not be
disambiguated.
5
6. Introduction
Towards Better Alignment
● In the previous example the fact that the text describes a contemporary football
match suggests that, in general, the relation between a team and its former players
is not expected to appear in it.
● Thus, for such texts, it would make sense to ignore this relation in order to achieve
more accurate disambiguation.
● Based on this observation, we make two claims:
● That there are certain scenarios where there is available a priori knowledge
about what entities and relations are expected to be present in the text.
● That this knowledge can be exploited for better alignment between semantic
information and content leading to more effective disambiguation.
● To verify these claims we define an entity disambiguation framework that can
perform better disambiguation in such scenarios.
6
7. Proposed Framework
Approach
● We target the task of entity disambiguation based on the intuition that a given
ontological entity is more likely to represent the meaning of an ambiguous term
when there are many ontologically related to it entities in the text.
● E.g. in the example text the entities “Siege of Tripolitsa” and “Theodoros
Kolokotronis” indicate that the term “Tripoli” refers to the city of Greece.
● These evidential entities are derived from one or more domain ontologies.
● However, which entities and to what extent may serve as evidence in a given
application scenario depends on the domain and expected content of the texts.
● For that, the key ability our framework provides to its users is to construct, in a
semi-automatic manner, semantic evidence models for specific disambiguation
scenarios and use them to perform entity disambiguation within them.
7
8. Proposed Framework
Framework Components
● A Disambiguation Evidence Model that contains the semantic entities that may
serve as disambiguation evidence for the scenario’s target entities in the given
scenario.
● Each pair of a target entity and an evidential one is accompanied by a degree
that quantifies the latter’s evidential power for the given target entity.
● A Disambiguation Evidence Model Construction Process that builds, in a semi-
automatic manner, a disambiguation evidence model for a given scenario.
● An Entity Disambiguation Process that uses the evidence model to detect and
extract from a given text terms that refer to the scenario’s target entities.
● Each term is linked to one or more possible entity uris along with a confidence
score calculated for each of them.
● The entity with the highest confidence should be the one the term actually refers
to.
8
9. Proposed Framework
Disambiguation Evidence Model
● Defines for each ontology entity which other instances and to what extent should be
used as evidence towards its correct meaning interpretation.
● It consists of entity pairs where a particular entity provides quantified evidence for a
another one.
9
10. Proposed Framework
Evidence Model Construction
● Construction of the evidence model depends on the characteristics of the domain and
the texts.
● The first step of the construction is manual and involves:
● The identification of the concepts whose instances we wish to disambiguate
(e.g. locations)
● The determination, for each of these concepts, of the related to them concepts
whose instances may serve as contextual disambiguation evidence/
● For example, in texts that describe historical events, some concepts
whose instances may act as location evidence are related locations,
historical events, and historical groups and persons.
● The identification, for each pair of evidence and target concept, of the relation
paths that links them.
10
11. Proposed Framework
Evidence Model Construction
● The result of this first step is a table like the following ones:
11
12. Proposed Framework
Evidence Model Construction
● Based on these tables, the second step of the construction is automatic and involves
the generation of the target-evidence entity pairs along with a disambiguation
evidential strength.
● This strength is inversely proportional to the number of different same-name target
entities a given evidential entity provides evidence for.
● For example, “Getafe” provides evidence for “Pedro Leon” to a strength of 0.5
because it has another player called Pedro.
12
13. Proposed Framework
Entity Resolution Process
● Step 1: We extract from the text the terms that possibly refer to the target entities as
well as those that refer to their respective evidential entities.
● Extraction is performed with Knowledge Tagger, an in-house tool based on
GATE.
● Step 2: Using the evidential entities we compute for each extracted target entity term
the confidence that it refers to a particular target entity.
● The target entity with the highest confidence is expected to be the correct one.
13
14. Proposed Framework
Disambiguation Example
Correct disambiguation
of the term Atletico
14
15. Framework Evaluation
Evaluation Process
Description
● Two disambiguation scenarios:
● Football match descriptions.
● Texts describing military conflicts.
● DBPedia as a source of semantic information in both cases.
● Disambiguation effectiveness measured through precision and recall.
● Evaluation results were compared to those achieved by two publicly available
semantic annotation and disambiguation systems;
● DBPedia Spotlight
● AIDA
● The two systems:
● Use also DBPedia as a knowledge source.
● Provide the users the capability to select the classes whose instances are to
be included in the process.
15
16. Framework Evaluation
Evaluation Results
Football Match Descriptions Scenario
● 50 texts describing football matches.
● E.g. “It's the 70th minute of the game and after a magnificent pass by Pedro, Messi
managed to beat Claudio Bravo. Barcelona now leads 1-0 against Real."
Disambiguation Results
16
17. Framework Evaluation
Evaluation Results
Military Conflict Texts Scenario
● 50 historical texts describing military conflicts.
● E.g. “The Siege of Augusta was a significant battle of the American Revolution.
Fought for control of Fort Cornwallis, a British fort near Augusta, the battle was a
major victory for the Patriot forces of Lighthorse Harry Lee and a stunning reverse to
the British and Loyalist forces in the South”.
Disambiguation Results
17
18. Conclusions and Future Work
Key Points
● We proposed a novel framework for optimizing named entity disambiguation in well-
defined and adequately constrained scenarios through the customized selection
and exploitation of semantic data.
● Our purpose was not to build another generic disambiguation system but rather a
reusable framework that can:
● Be relatively easily adapted to the particular characteristics of the domain and
application scenario at hand.
● Exploit these characteristics to increase the effectiveness of the
disambiguation process.
● The key aspect of the framework is the semi-automatic process it defines for
selecting the optimal evidence model for the scenario at hand.
18
19. Conclusions and Future Work
Key Points
● Comparative evaluation in two specific scenarios verified the framework’s
superiority over existing approaches that are designed to work in open domains and
unconstrained scenarios.
● This verified our hypothesis that the scenario adaptation capabilities of such generic
disambiguation systems can be inadequate in certain scenarios.
● Of course, the framework’s usability and effectiveness is directly proportional to the
content specificity of the texts to be disambiguated and the availability and
quality of a priori semantic knowledge about their content.
● The greater these two parameters are, the more applicable is our approach
and the more effective the disambiguation is expected to be.
● The opposite is true as the texts become more generic and the information we
have out about them more scarce.
19
20. Conclusions and Future Work
Framework Extensions
● Fully automated construction of the disambiguation evidence model.
● Challenge here is how to automatically identify the text’s domain/topic.
● Combination with statistical methods for cases where available domain semantic
information is incomplete.
● Challenge here is how to select the optimal ratio of ontological evidence v.s.
statistical one.
● Development of tool to enable users to dynamically build such models out of existing
semantic data and use them for disambiguation purposes
20
21. Thank you!
Contact iSOCO
Dr. Panos Alexopoulos
Senior Researcher
palexopoulos@isoco.com
Questions?
Barcelona Madrid Pamplona Valencia
Tel +34 935 677 200 Tel +34 913 349 797 Tel +34 948 102 408 Tel +34 963 467 143
Edificio Testa A Av. del Partenón, 16-18, 1º7ª Parque Tomás Oficina 107
C/ Alcalde Barnils, 64-68 Campo de las Naciones Caballero, 2, 6º-4ª C/ Prof. Beltrán Báguena, 4
St. Cugat del Vallès 28042 Madrid 31006 Pamplona 46009 Valencia
08174 Barcelona
21