Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction

Usage of Word Sense Disambiguation in
Concept Identification in Ontology
Construction
1
Guest Talk at University of Moratuwa, Department of Computer Science and Engineering
5th November, 2016
Discussed by: Kiruparan Balachandran

Background Information - Ontology
Ontology provides a potential method to describe domain knowledge
2
algorithm
sorting algorithm
problem
solve
complexity
has
is a

Background Information - Ontology learning layer-cake approach
Terms
Relations
Concept Hierarchy
Concepts
Synonyms
{Randomized algorithm, sorting algorithm, system software, application software}
{Randomized algorithm, sorting algorithm}, {system software, application software}
Algorithm (I, E, L)
isA(sorting algorithm, algorithm) - known as Taxonomy relationship
solve (algorithm, problem) - known as Non- Taxonomy relationship
RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)
3

Implemented approach follows Buitelaar et al. criteria in forming concepts
from terms
• An intentional definition of the concept
• Formal definition: A term can be considered as a concept if the term is linked with a valid relation to
another term.
• Informal definition: A term should have a textual description.
• A set of concept instances, i.e. its extensions: a term can be considered a concept if it has
instances.
• A set of linguistic realizations.
4

Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
5
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
For example ts = “we propose a hardware design, call the
virtual line scheme, that allows the utilization of large virtual
cache line when fetch datum from memory for better
exploitation of spatial locality”

cache#n#1, cache#n#2, and cache#n#3
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
6
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts

Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
7

• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
• Simplified LESK
• To solve combinatorial explosion
• Runs a separate disambiguation process for each ambiguous word in the input text
• Adapted LESK
• Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms,
attribute relations, and their associated definitions
8
Less accuracy

• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
9
ConSim (C1, C2) =
2∗N3
N1+N2+2∗N3
root
C3
C1 C2
N1 N2
N3

• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
10
Weight = C – path length – k * number of changes of direction

11
Information content + distance
Information Content : obtained by estimating probability of occurrence of class in a large text corpus

For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
12
Disambiguating Concepts (LESK ?)
cache#n#1, cache#n#2, and cache#n#3

For each sense
WordNet
matrix.
13
For example
• WNs1 e.g. “a hidden storage space for money or
provisions or weapons”
• WNs2 e.g. “a secret store of valuables or money”
• WNs3 e.g. “RAM memory that is set aside as a
specialized buffer storage, which is continually updated;
used to optimize data transfers between system
elements with different characteristics”

For each sense
WordNet
matrix.
14

For each sense
WordNet
matrix.
15

Evaluation – domain-specific concept extraction
Annotator 1 Annotator 2 Annotator 3
ComSciPrecision for concepts 75% 56% 78%
Our
approach
MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al.
Bio MedicalRecall 58.70% 57.73% 20.27%
• Identified 253 computer science domain-specific concepts validated by three domain experts
• Measured the inter-annotator agreement using Fleiss' kappa
• 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories)
• Identified 47 domain-specific concepts for the GENIA corpus
• compared with two different approaches discussed by Zhou et al. and Subramaniam et al.
16

Why LESK ?
17
Conclusion
Choosing a best WSD algorithm based on
• Nature of your problem
• Available factors
• Performance with respect to accuracy and time

References
18
K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on
Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41.
P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005.
X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer,
2006, pp. 1145-1149.
L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and
an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417.
G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305,
pp. 305-332, 1998.
S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed:
Springer, 2002, pp. 136-145.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138.
M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual
international conference on Systems documentation, 1986, pp. 24-26.
C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265-
283, MIT Press, 1998.
J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.

Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction

Ähnlich wie Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction