SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Can we use Linked Data
Semantic Annotators for
the Extraction of Domain-
Relevant Expressions?
Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada
Prof. Amal Zouaq, Royal Military College of Canada, Canada
Dr. Ludovic Jean-louis, Ecole Polytechnique de
Montréal, Canada
Introduction
What is Semantic Annotation on the LOD?
Candidate Selection
Disambiguation
Traditionally focused on named entity annotation
Semantic Annotators
LOD-based Semantic Annotators
Most used annotators
REST API / Web Services
Default configuration
Alchemy
(entities +
keywords)
YahooZemanta
Wikimeta
DBPedia
Spotlight
OpenCalais
Lupedia
Annotation Evaluation
Few comparative studies have been published
on the performance of linked data Semantic
Annotators
The available studies generally focus on
“traditional” named entity annotation evaluation
(ORG, Person, etc.)
NEs versus domain topics
Semantic Annotators start to go beyond these
traditional NEs
Predefined classes such as PERSON, ORG
Expansion of possible classes (NERD Taxonomy :
sport events, operating systems, political events, …)
 RDF (NE? Domain topic ?)
 Semantic Web (NE? Domain topic ?)
Domain Topics: important “concepts” for the
domain of interest
NE versus Topics Annotation
Research Question 1
Previous studies did not draw any conclusion about
the relevance of the annotated entities
But relevance might prove to be a crucial point for
tasks such as information seeking or query
answering
RQ1: Can we use linked data-based semantic
annotators to accurately identify Domain-relevant
expressions?
Research Methodology
We relied on a corpus of 8 texts taken from
Wikipedia, all related to the artificial intelligence
domain. Together, these texts represent 10570
words.
We obtained 2023 expression occurrences among
which 1151 were distinct.
Building the Gold Standard
We asked a human evaluator to analyze each
expression and make the following decisions:
Does the annotated expression represent an
understandable named entity or topic?
Is the expression a relevant keyword according
to the document?
Is the expression a relevant named entity
according to the document?
Distribution of Expressions
Note that only 639 out of 1151 detected expressions
are understandable (56%). Thus a substantial
number of detected expressions represents noise.
Frequency of detected Expressions
Most of the expressions were detected by only
one (76%) or two (16%) annotators.
Semantic annotators seemed complementary.
(326 relevant)
(117 relevant)
(40 relevant)
(15 relevant)
(4 relevant)
(4 relevant)
(1 relevant)
Evaluation
Standard precision, recall and F-score
“Partial” recall, since our Gold Standard
considers only the expressions detected by
at least one annotator, instead of considering
all possible expressions in the corpus.
All Expressions
mend
Named Entities
Domain Topics
Few Highlights
F-score is unsatisfactory for almost all annotators
Best F-score around 62-63 (All expressions, topics)
Alchemy seems to stand out
It obtains a high value for recall when all expressions
or only topics are considered
But the precision of Alchemy (59%) is not very
high, compared to the results of Zemanta
(81%), OpenCalais (75%) and Yahoo (74%)
None of the annotators was good at detecting relevant
named entities.
DBpedia Spotlight annotated many expressions that
are not considered understandable
Research Question 2
Given the complementarity of Semantic
Annotators, we decided to experiment with
this second research question:
Can we improve the overall performance
of semantic annotators using voting
methods and machine learning?
Voting and machine learning
methods
We experimented with the following
methods:
Simple votes
Weighted votes where the weight is the
precision of individual annotators on a
training corpus
K-nearest-neighbours classifier
Naïve Bayes classifier
Decision tree (C4.5)
Rule induction (PART algorithm)
Experiment Setup
We divided the set of annotated
expressions into 5 balanced partitions
One partition for testing
Remaining partitions for training
We ran 5 experiments for each of the
following situations: all entities, only
named entities and only topics
Baseline: Simple Votes
All Expressions:
Best prec : 77%
(Zemanta)
Best rec/F : 69% -
62% (Alchemy)
Best P: 81% (R/F
12%-21%)
(Zemanta)
Best R/F: 69%-63%
(Alchemy) P=59%
Threshold: Number of annotators
Weighted Votes
Best P: 81%
(Zemanta) (R/F
12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Machine Learning
Best P: 81% (Zemanta)
(R/F 12%-21%)
Best R/F: 69%-63%
(Alchemy) P=59%
Highlights
On the average, weighted vote displays the
best precision (0.66) among combination
methods if only topics are considered
Alchemy (0.59)
Alchemy finds many relevant expressions
that are not discovered by other annotators
Machine learning methods achieve better
recall and F-score on the average, especially
if decision tree or rule induction (PART) is
used
Conclusion
Can we use Linked Data Semantic
Annotators for the Extraction of Domain-
Relevant Expressions?
Best overall performance (Recall/F-Score:
Alchemy) (59%- 69%-63%)
Best precision: Zemanta (81%)
Need of filtering!
Combination/machine learning methods
Increased recall with weighted voting
scheme
Future Work
Limitations
The size of the gold standard
A limited set of semantic annotators
Increase the size of the GS
Explore various combinations for annotators/
features
Compare Semantic Annotators with keyword
extractors
QUESTIONS?
Thank you for your attention!
amal.zouaq@rmc.ca
http://azouaq.athabascau.ca

Weitere ähnliche Inhalte

Was ist angesagt?

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondPeter Geil
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Povedasemanticsconference
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionAldo Gangemi
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingOntotext
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrasesCassandra Jacobs
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Andres Garcia-Silva
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010AAT Taiwan
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionsssw2012
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineTrey Grainger
 

Was ist angesagt? (19)

Ontology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyondOntology Engineering for the Semantic Web and beyond
Ontology Engineering for the Semantic Web and beyond
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
 
Ontology engineering
Ontology engineering Ontology engineering
Ontology engineering
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 
Using and learning phrases
Using and learning phrasesUsing and learning phrases
Using and learning phrases
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
 
Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010Teldap4 getty multilingual vocab workshop2010
Teldap4 getty multilingual vocab workshop2010
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introduction
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 

Ähnlich wie Zouaq wole2013

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRubén Izquierdo Beviá
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRubén Izquierdo Beviá
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Quinsulon Israel
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppratnapatil14
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Marcia Zeng
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsNaveen Ashish
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15Hans Höchtl
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Sunil Kumar Kopparapu
 

Ähnlich wie Zouaq wole2013 (20)

SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Ontology
OntologyOntology
Ontology
 
Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...Extending models for controlled vocabularies to classification systems: model...
Extending models for controlled vocabularies to classification systems: model...
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
Deep Machine Reading for Customer Analytics
Deep Machine Reading for Customer AnalyticsDeep Machine Reading for Customer Analytics
Deep Machine Reading for Customer Analytics
 
Knowledge Extraction
Knowledge ExtractionKnowledge Extraction
Knowledge Extraction
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Eacl 2006 Pedersen
Eacl 2006 PedersenEacl 2006 Pedersen
Eacl 2006 Pedersen
 
Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.Do you Mean what you say? Recognizing Emotions.
Do you Mean what you say? Recognizing Emotions.
 

Kürzlich hochgeladen

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 

Kürzlich hochgeladen (20)

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 

Zouaq wole2013

  • 1. Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Prof. Michel Gagnon, Ecole Polytechnique de Montréal, Canada Prof. Amal Zouaq, Royal Military College of Canada, Canada Dr. Ludovic Jean-louis, Ecole Polytechnique de Montréal, Canada
  • 2. Introduction What is Semantic Annotation on the LOD? Candidate Selection Disambiguation Traditionally focused on named entity annotation
  • 3. Semantic Annotators LOD-based Semantic Annotators Most used annotators REST API / Web Services Default configuration Alchemy (entities + keywords) YahooZemanta Wikimeta DBPedia Spotlight OpenCalais Lupedia
  • 4. Annotation Evaluation Few comparative studies have been published on the performance of linked data Semantic Annotators The available studies generally focus on “traditional” named entity annotation evaluation (ORG, Person, etc.)
  • 5. NEs versus domain topics Semantic Annotators start to go beyond these traditional NEs Predefined classes such as PERSON, ORG Expansion of possible classes (NERD Taxonomy : sport events, operating systems, political events, …)  RDF (NE? Domain topic ?)  Semantic Web (NE? Domain topic ?) Domain Topics: important “concepts” for the domain of interest
  • 6. NE versus Topics Annotation
  • 7. Research Question 1 Previous studies did not draw any conclusion about the relevance of the annotated entities But relevance might prove to be a crucial point for tasks such as information seeking or query answering RQ1: Can we use linked data-based semantic annotators to accurately identify Domain-relevant expressions?
  • 8. Research Methodology We relied on a corpus of 8 texts taken from Wikipedia, all related to the artificial intelligence domain. Together, these texts represent 10570 words. We obtained 2023 expression occurrences among which 1151 were distinct.
  • 9. Building the Gold Standard We asked a human evaluator to analyze each expression and make the following decisions: Does the annotated expression represent an understandable named entity or topic? Is the expression a relevant keyword according to the document? Is the expression a relevant named entity according to the document?
  • 10. Distribution of Expressions Note that only 639 out of 1151 detected expressions are understandable (56%). Thus a substantial number of detected expressions represents noise.
  • 11. Frequency of detected Expressions Most of the expressions were detected by only one (76%) or two (16%) annotators. Semantic annotators seemed complementary. (326 relevant) (117 relevant) (40 relevant) (15 relevant) (4 relevant) (4 relevant) (1 relevant)
  • 12. Evaluation Standard precision, recall and F-score “Partial” recall, since our Gold Standard considers only the expressions detected by at least one annotator, instead of considering all possible expressions in the corpus.
  • 16. Few Highlights F-score is unsatisfactory for almost all annotators Best F-score around 62-63 (All expressions, topics) Alchemy seems to stand out It obtains a high value for recall when all expressions or only topics are considered But the precision of Alchemy (59%) is not very high, compared to the results of Zemanta (81%), OpenCalais (75%) and Yahoo (74%) None of the annotators was good at detecting relevant named entities. DBpedia Spotlight annotated many expressions that are not considered understandable
  • 17. Research Question 2 Given the complementarity of Semantic Annotators, we decided to experiment with this second research question: Can we improve the overall performance of semantic annotators using voting methods and machine learning?
  • 18. Voting and machine learning methods We experimented with the following methods: Simple votes Weighted votes where the weight is the precision of individual annotators on a training corpus K-nearest-neighbours classifier Naïve Bayes classifier Decision tree (C4.5) Rule induction (PART algorithm)
  • 19. Experiment Setup We divided the set of annotated expressions into 5 balanced partitions One partition for testing Remaining partitions for training We ran 5 experiments for each of the following situations: all entities, only named entities and only topics
  • 20. Baseline: Simple Votes All Expressions: Best prec : 77% (Zemanta) Best rec/F : 69% - 62% (Alchemy) Best P: 81% (R/F 12%-21%) (Zemanta) Best R/F: 69%-63% (Alchemy) P=59% Threshold: Number of annotators
  • 21. Weighted Votes Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 22. Machine Learning Best P: 81% (Zemanta) (R/F 12%-21%) Best R/F: 69%-63% (Alchemy) P=59%
  • 23. Highlights On the average, weighted vote displays the best precision (0.66) among combination methods if only topics are considered Alchemy (0.59) Alchemy finds many relevant expressions that are not discovered by other annotators Machine learning methods achieve better recall and F-score on the average, especially if decision tree or rule induction (PART) is used
  • 24. Conclusion Can we use Linked Data Semantic Annotators for the Extraction of Domain- Relevant Expressions? Best overall performance (Recall/F-Score: Alchemy) (59%- 69%-63%) Best precision: Zemanta (81%) Need of filtering! Combination/machine learning methods Increased recall with weighted voting scheme
  • 25. Future Work Limitations The size of the gold standard A limited set of semantic annotators Increase the size of the GS Explore various combinations for annotators/ features Compare Semantic Annotators with keyword extractors
  • 26. QUESTIONS? Thank you for your attention! amal.zouaq@rmc.ca http://azouaq.athabascau.ca