SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Usage of Word Sense Disambiguation in
Concept Identification in Ontology
Construction
1
Guest Talk at University of Moratuwa, Department of Computer Science and Engineering
5th November, 2016
Discussed by: Kiruparan Balachandran
Background Information - Ontology
Ontology provides a potential method to describe domain knowledge
2
algorithm
sorting algorithm
problem
solve
complexity
has
is a
Background Information - Ontology learning layer-cake approach
Terms
Relations
Concept Hierarchy
Concepts
Synonyms
{Randomized algorithm, sorting algorithm, system software, application software}
{Randomized algorithm, sorting algorithm}, {system software, application software}
Algorithm (I, E, L)
isA(sorting algorithm, algorithm) - known as Taxonomy relationship
solve (algorithm, problem) - known as Non- Taxonomy relationship
RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem)
3
Implemented approach follows Buitelaar et al. criteria in forming concepts
from terms
• An intentional definition of the concept
• Formal definition: A term can be considered as a concept if the term is linked with a valid relation to
another term.
• Informal definition: A term should have a textual description.
• A set of concept instances, i.e. its extensions: a term can be considered a concept if it has
instances.
• A set of linguistic realizations.
4
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
5
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
For example ts = “we propose a hardware design, call the
virtual line scheme, that allows the utilization of large virtual
cache line when fetch datum from memory for better
exploitation of spatial locality”
cache#n#1, cache#n#2, and cache#n#3
Feed (ts and to separately) referred as t and
sentence ts
Subject Phrase and Object Phrase identified in
each sentence
Iterate each sentence (ts) from the corpus
Identify sense tsense related to domain from the list of sense (disambiguating sense)
List of sense exist in WordNet for t
Full or part of subject phrases (ts) and object
phrases (to) exist in the list of domain-specific
6
Need of WSD in forming concepts from terms
If tsense is exist for both
tsense of ts and to are candidate for domain-specific concepts
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
7
Which algorithm best suited ?
• LESK
• Original LESK
• definition of a word meaning as a only source of contextual information for a given sense
• combinatorial explosion
• Use of Simulated annealing
• Simplified LESK
• To solve combinatorial explosion
• Runs a separate disambiguation process for each ambiguous word in the input text
• Adapted LESK
• Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms,
attribute relations, and their associated definitions
8
Less accuracy
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
9
ConSim (C1, C2) =
2∗N3
N1+N2+2∗N3
root
C3
C1 C2
N1 N2
N3
Which algorithm best suited ?
• Other well known algorithms with good performance use
• Path
• Depth of least common ancestor (LCS) referred as WUP
• Path length and path direction referred as HSO
• Link strength of a parent-child link using corpus statistical information
10
Weight = C – path length – k * number of changes of direction
Which algorithm best suited ?
• Link strength of a parent-child link using corpus statistical information
11
Information content + distance
Information Content : obtained by estimating probability of occurrence of class in a large text corpus
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
12
Disambiguating Concepts (LESK ?)
cache#n#1, cache#n#2, and cache#n#3
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
13
Disambiguating Concepts (LESK ?)
For example
• WNs1 e.g. “a hidden storage space for money or
provisions or weapons”
• WNs2 e.g. “a secret store of valuables or money”
• WNs3 e.g. “RAM memory that is set aside as a
specialized buffer storage, which is continually updated;
used to optimize data transfers between system
elements with different characteristics”
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
14
Disambiguating Concepts (LESK ?)
For each sense
Extract the informal definition of sense from
WordNet
Calculating the similarity between ts and WNsn by
calculating similarity matrix between ts and WNsn
using a LESK algorithm. The value is normalized
based on number of entries in the distance
matrix.
Return the synset, which has high similarity value
15
Disambiguating Concepts (LESK ?)
Evaluation – domain-specific concept extraction
Annotator 1 Annotator 2 Annotator 3
ComSciPrecision for concepts 75% 56% 78%
Our
approach
MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al.
Bio MedicalRecall 58.70% 57.73% 20.27%
• Identified 253 computer science domain-specific concepts validated by three domain experts
• Measured the inter-annotator agreement using Fleiss' kappa
• 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories)
• Identified 47 domain-specific concepts for the GENIA corpus
• compared with two different approaches discussed by Zhou et al. and Subramaniam et al.
16
Why LESK ?
17
Conclusion
Choosing a best WSD algorithm based on
• Nature of your problem
• Available factors
• Performance with respect to accuracy and time
References
18
K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on
Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41.
P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005.
X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer,
2006, pp. 1145-1149.
L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and
an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417.
G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305,
pp. 305-332, 1998.
S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed:
Springer, 2002, pp. 136-145.
Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138.
M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual
international conference on Systems documentation, 1986, pp. 24-26.
C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265-
283, MIT Press, 1998.
J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.
Questions ?
Thank You…
19

Weitere ähnliche Inhalte

Was ist angesagt?

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterSudarsun Santhiappan
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarizationAbdelaziz Al-Rihawi
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 

Was ist angesagt? (20)

Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...Taxonomy extraction from automotive natural language requirements using unsup...
Taxonomy extraction from automotive natural language requirements using unsup...
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Extraction Based automatic summarization
Extraction Based automatic summarizationExtraction Based automatic summarization
Extraction Based automatic summarization
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 

Andere mochten auch

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]akm sabbir
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Grupo HULAT
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 

Andere mochten auch (15)

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Word-sense disambiguation
Word-sense disambiguationWord-sense disambiguation
Word-sense disambiguation
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Ähnlich wie Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction

A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge BaseShubham Agarwal
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniquesvivatechijri
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approchanil maurya
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 

Ähnlich wie Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction (20)

A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Ontology learning
Ontology learningOntology learning
Ontology learning
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
G04124041046
G04124041046G04124041046
G04124041046
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Reasoning Over Knowledge Base
Reasoning Over Knowledge BaseReasoning Over Knowledge Base
Reasoning Over Knowledge Base
 
Survey on Text Prediction Techniques
Survey on Text Prediction TechniquesSurvey on Text Prediction Techniques
Survey on Text Prediction Techniques
 
NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
Fusing semantic data
Fusing semantic dataFusing semantic data
Fusing semantic data
 
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathe...
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 

Kürzlich hochgeladen

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 

Kürzlich hochgeladen (20)

《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 

Usage of Word Sense Disambiguation in Concept Identification for Ontology Construction

  • 1. Usage of Word Sense Disambiguation in Concept Identification in Ontology Construction 1 Guest Talk at University of Moratuwa, Department of Computer Science and Engineering 5th November, 2016 Discussed by: Kiruparan Balachandran
  • 2. Background Information - Ontology Ontology provides a potential method to describe domain knowledge 2 algorithm sorting algorithm problem solve complexity has is a
  • 3. Background Information - Ontology learning layer-cake approach Terms Relations Concept Hierarchy Concepts Synonyms {Randomized algorithm, sorting algorithm, system software, application software} {Randomized algorithm, sorting algorithm}, {system software, application software} Algorithm (I, E, L) isA(sorting algorithm, algorithm) - known as Taxonomy relationship solve (algorithm, problem) - known as Non- Taxonomy relationship RulesisA(sorting algorithm, algorithm) -> solve (sorting algorithm, problem) 3
  • 4. Implemented approach follows Buitelaar et al. criteria in forming concepts from terms • An intentional definition of the concept • Formal definition: A term can be considered as a concept if the term is linked with a valid relation to another term. • Informal definition: A term should have a textual description. • A set of concept instances, i.e. its extensions: a term can be considered a concept if it has instances. • A set of linguistic realizations. 4
  • 5. Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 5 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts For example ts = “we propose a hardware design, call the virtual line scheme, that allows the utilization of large virtual cache line when fetch datum from memory for better exploitation of spatial locality”
  • 6. cache#n#1, cache#n#2, and cache#n#3 Feed (ts and to separately) referred as t and sentence ts Subject Phrase and Object Phrase identified in each sentence Iterate each sentence (ts) from the corpus Identify sense tsense related to domain from the list of sense (disambiguating sense) List of sense exist in WordNet for t Full or part of subject phrases (ts) and object phrases (to) exist in the list of domain-specific 6 Need of WSD in forming concepts from terms If tsense is exist for both tsense of ts and to are candidate for domain-specific concepts
  • 7. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing 7
  • 8. Which algorithm best suited ? • LESK • Original LESK • definition of a word meaning as a only source of contextual information for a given sense • combinatorial explosion • Use of Simulated annealing • Simplified LESK • To solve combinatorial explosion • Runs a separate disambiguation process for each ambiguous word in the input text • Adapted LESK • Enlarged context : consider hypernyms, hyponyms, holonyms, meronyms, troponyms, attribute relations, and their associated definitions 8 Less accuracy
  • 9. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 9 ConSim (C1, C2) = 2∗N3 N1+N2+2∗N3 root C3 C1 C2 N1 N2 N3
  • 10. Which algorithm best suited ? • Other well known algorithms with good performance use • Path • Depth of least common ancestor (LCS) referred as WUP • Path length and path direction referred as HSO • Link strength of a parent-child link using corpus statistical information 10 Weight = C – path length – k * number of changes of direction
  • 11. Which algorithm best suited ? • Link strength of a parent-child link using corpus statistical information 11 Information content + distance Information Content : obtained by estimating probability of occurrence of class in a large text corpus
  • 12. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 12 Disambiguating Concepts (LESK ?) cache#n#1, cache#n#2, and cache#n#3
  • 13. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 13 Disambiguating Concepts (LESK ?) For example • WNs1 e.g. “a hidden storage space for money or provisions or weapons” • WNs2 e.g. “a secret store of valuables or money” • WNs3 e.g. “RAM memory that is set aside as a specialized buffer storage, which is continually updated; used to optimize data transfers between system elements with different characteristics”
  • 14. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 14 Disambiguating Concepts (LESK ?)
  • 15. For each sense Extract the informal definition of sense from WordNet Calculating the similarity between ts and WNsn by calculating similarity matrix between ts and WNsn using a LESK algorithm. The value is normalized based on number of entries in the distance matrix. Return the synset, which has high similarity value 15 Disambiguating Concepts (LESK ?)
  • 16. Evaluation – domain-specific concept extraction Annotator 1 Annotator 2 Annotator 3 ComSciPrecision for concepts 75% 56% 78% Our approach MaxMatcher discussed by Zhou et al. BioAnnotator Subramaniam et al. Bio MedicalRecall 58.70% 57.73% 20.27% • Identified 253 computer science domain-specific concepts validated by three domain experts • Measured the inter-annotator agreement using Fleiss' kappa • 0.36712, a fair agreement (3 annotators, 253concepts, 2 categories) • Identified 47 domain-specific concepts for the GENIA corpus • compared with two different approaches discussed by Zhou et al. and Subramaniam et al. 16
  • 17. Why LESK ? 17 Conclusion Choosing a best WSD algorithm based on • Nature of your problem • Available factors • Performance with respect to accuracy and time
  • 18. References 18 K. Balachandran and S. Ranathunga, "Domain-Specific Term Extraction for Concept Identification in Ontology Construction", in IEEE/WIC/ACM International Conference on Web Intelligence, Omaha, Nebraska, USA, 2016, pp. 34-41. P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications vol. 123: IOS press, 2005. X. Zhou, X. Zhang, and X. Hu, "MaxMatcher: Biological concept extraction using approximate dictionary lookup," in PRICAI 2006: Trends in Artificial Intelligence, ed: Springer, 2006, pp. 1145-1149. L. V. Subramaniam, S. Mukherjea, P. Kankar, B. Srivastava, V. S. Batra, P. V. Kamesam, et al., "Information extraction from biomedical literature: methodology, evaluation and an application," in Proceedings of the twelfth international conference on Information and knowledge management, 2003, pp. 410-417. G. Hirst and D. St-Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic lexical database, vol. 305, pp. 305-332, 1998. S. Banerjee and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," in Computational linguistics and intelligent text processing, ed: Springer, 2002, pp. 136-145. Z. Wu and M. Palmer, "Verbs semantics and lexical selection," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, 1994, pp. 133-138. M. Lesk, "Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone," in Proceedings of the 5th annual international conference on Systems documentation, 1986, pp. 24-26. C. Leacock and M. Chodorow, “Combining Local Context and Wordnet Similarity for Word Sense Disambiguation,” WordNet: An Electronic Lexical Database, vol. 49, pp. 265- 283, MIT Press, 1998. J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus statistics and lexical taxonomy,” in Proc. Int. Conf. Research in Computational Linguistics, 1998, pp. 19–33.