chapter 5.pptx: drainage and irrigation engineering
ย
BabelNet 3.0
1. Shrikrishna R. Parab
Mtech Part 1
DCST, Goa University
Mtech.parab@unigoa.ac.in
BabelNet: The automatic construction of a wide-
coverage multilingual semantic network
2. Outline
โข References papers
โข Introduction
โข Knowledge resources
โข WordNet
โข Wikipedia
โข BabelNet
โข Mapping WordNet synset to wikipedia pages
โข Disambiguation Context.
3. Reference papers
โข R. Navigli and S. Ponzetto. BabelNet: The Automatic Construction,
Evaluation and Application of a Wide-Coverage Multilingual Semantic
Network. Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250.
โข M. Ehrmann, F. Cecconi, D. Vannella, J. McCrae, P. Cimiano, R.
Navigli. Representing Multilingual Data as Linked Data: the Case of
BabelNet 2.0. Proc. of the 9th Language Resources and Evaluation
Conference (LREC 2014), Reykjavik, Iceland, 26-31 May, 2014.
โข S. Fernando and M. Stevenson, โMapping WordNet synsets to
Wikipedia articles.,โ in LREC, 2012, pp. 590โ596.
4. Introduction
โข In the information society, knowledge โ i.e., the
information and expertise needed to understand any
subject of interest.
โข Much information is conveyed by means of linguistic
communication, therefore it is critical to know how words
are used to express meaning, i.e., we need lexical
knowledge.
โข lexical knowledge is an essential component for
performing language-oriented automatic tasks effectively.
โข areas of Natural Language Processing (NLP) have been
shown to benefit from the availability of lexical knowledge
at different levels.
5. โข In this paper author presented an automatic approach to
the construction of BabelNet.
โข Key to this approach is the integration of lexicographic
and encyclopaedic knowledge from WordNet and
Wikipedia.
โข Machine Translation is applied to enrich the resource with
lexical information for all languages.
6. Knowledge resource
โข BabelNet aims at providing an โencyclopaedic dictionaryโ
by merging WordNet and Wikipedia
โข WordNet
โข Wikipedia
7. WordNet
โข is by far the most popular lexical knowledge resource in
the field of NLP.
โข A concept in WordNet is represented as a synonym set
(called synset), i.e., the set of words that share the same
meaning.
โข WordNet provides a textual definition, or gloss for each
synset.
โข Synsets can contain sample sentences to provide
examples of their usage
โข It also consist of part of speech tagging.
8. โข synsets are related to each other by means of lexical and
semantic relations.
โข is-a relations such as hypernymy and hyponymy.
โข instance-of relations denoting set membership between a
named entity and the class it belongs to.
โข part-of relations expressing the elements of a partition by
means of meronymy and holonymy.
9. Wikipedia
โข Wikipedia, is a multilingual Web-based encyclopaedia.
โข It is a collaborative open source medium edited by
volunteers to provide a very large wide-coverage
repository of encyclopaedic knowledge.
โข Each article in Wikipedia is represented as a page known
as Wikipage and presents information about a specific
concept or named entity.
โข The title of a Wikipage is composed of the lemma of the
concept defined.
โข an optional label in parentheses which specifies its
meaning if the lemma is ambiguous.
10. โข Relation between the pages:
โข Redirect pages: These pages are used to forward to the
Wikipage containing the actual information about a concept of
interest.
โข Disambiguation pages: These pages collect links for a
number of possible concepts an arbitrary expression could be
referred to.
โข Internal links: Wikipages typically contain hypertext linked to
other Wikipages, which refers to related concepts.
โข Inter-language links: Wikipages also provide links to their
counterparts (i.e., corresponding concepts) contained within
wikipedias in other languages.
โข Categories: Wikipages can be assigned to one or more
categories.
11. What is BabelNet??
โข BabelNet is a multilingual lexicalized semantic network and
ontology.
โข BabelNet was automatically created by linking the largest
multilingual Web encyclopaedia i.e. Wikipedia, to the most
popular computational lexicon of the English language
i.e. WordNet.
โข combine WordNet and Wikipedia by automatically acquiring
a mapping between WordNet senses and Wikipages
โข Harvest multilingual lexicalizations of the available concepts
by using (a) the human-generated translations provided by
Wikipedia,(b) a machine translation system to translate
occurrences of the concepts within sense-tagged corpora
12. Mapping WordNet synset to Wikipedia
pages
โข This process is divided into 3 phases:
โข Generation of Candidate Articles: aims to reduces the
search space by identifying a small set of candidate
articles for each noun synset.
โข this can be done in 2 ways, 1st by matching words in
WordNet synsets.
โข And secondly, using an Information Retrieval system to
search the full article text against Wikipedia article titles.
โข Selecting the Best Mappings: uses this candidate article
set to select the best matching article for each synset.
โข Refining the Mappings
13. Generation of Candidate Articles
โข Two methods were used to find candidate articles: title
matching and Information Retrieval.
โข The title matching approach examines the titles of
Wikipedia articles to identify WordNet synsets that could
map onto them.
โข use of an Information Retrieval system to index Wikipedia
and makes use of entire articles in Wikipedia rather than
just their titles.
โข This stage returns a set of candidate articles for each
noun synset in Wikipedia.
14. Selecting the Best Mappings
โข This stage attempt to identify the best matching article
from this set using two methods: text similarity and title
similarity.
โข Wikipedia articles are pre-processed by removing markup
then stemming and removing stopwords from the
remaining text.
โข Various combinations of features from the WordNet
synset (lemmas, glosses, related lemmas etc.) are used
to calculate the similarities.
15. โข The previous method use the whole Wikipedia article for
comparison.
โข the title of the article is the single most important feature
when considering similarity to the synset.
โข Therefore this method assigns a similarity score using the
title alone.
16. Refining the Mappings
โข The result of the mapping from WordNet to Wikipedia is a
set of synset-article pairings.
โข A global view of the mappings and information about the
link structure in Wikipedia is then used to refine the
mappings.
โข It remove all mappings where more than one synset maps
to the same Wikipedia article i.e. many to 1 relations is
converted to 1 to 1 relation.
17. Disambiguation contexts
โข Disambiguation context of a Wikipage:
โข Sense labels: e.g., given the page Play (theatre), the
word theatre is added to the disambiguation context.
โข Links: the titlesโ lemmas of the pages linked from the
Wikipage w (i.e., outgoing links). For instance, the links in
the Wikipage Play (theatre) include literature, comedy,
etc.
โข Redirections: the titlesโ lemmas of the pages which are
redirecting to w are taken into context.
โข Categories: Wikipages are typically classified according
to one or more categories. For example, the Wikipage
Play(theatre) is categorized as PLAYS, DRAMA,
THEATRE, etc.
18. โข Disambiguation context of a WordNet sense:
โข Synonymy: all synonyms of s in synset S. For instance,
given the synset of play, all its synonyms are included in
the context.
โข Hypernymy/Hyponymy: all synonyms in the synsets H
such that H is either a hypernym (i.e., a generalization) or
a hyponym (i.e., a specialization) of S.
โข Gloss: the set of lemmas of the content words occurring
within the gloss of s.