SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Multilingual Semantic Annotation
Engine for Agricultural
Documents
Benjamin Chu Min Xian
Arun Anand Sadanandan
Fadzly Zahari
Dickson Lukose
                                                 04.09.2012
                    International Symposium on Agricultural
                                Ontology Service (AOS2012)
Outline
   Introduction
   Related Work
   System Description: Text Annotation Engine
   Challenges
   Conclusion




                                                 2
Introduction




               3
Related Work
• Semantic Annotation techniques are
  typically categorized into pattern-based
  and machine learning-based
• Most of the annotation tools can only deal
  with a single language
• Not easily customized to work for different
  domains



                                                4
Text Annotation Engine (T-ANNE1)
• Semantic tagging system
    – Semantic web of tags
• Knowledge base approach
• Scalable system
    – Handles large sets of documents
    – Web services
• Distributed approach
    – Document Splitter
• Multilingual tagging
    – Language identifier
 1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition
 (2012). (Patent Pending)                                                                            5
Text Annotation Engine (T-ANNE)
Multilingual Semantic Annotation System Overview
Text Annotation Engine (T-ANNE)

                 Semantic
                Annotation
                                       AGROVOC
                  Engine
                 (T-ANNE)
  Documents                          Knowledge Base




              Semantic Annotations




                      TAGS


                 Knowledge Base
Text Annotation Engine (T-ANNE)
Example (Japanese)

                             Semantic
                            Annotation
                              Engine       AGROVOC
                             (T-ANNE)
                                         Knowledge Base




                          TAGS


                     Knowledge Base
Text Annotation Engine (T-ANNE)
• Knowledge-based approach
  • The number of languages and domains it can
    handle is only limited by the knowledge base
    it uses
  • Easily customized
  • Utilizes AGROVOC as the knowledge base
    for recognition and annotation of agriculture
    related documents



                                                    9
Text Annotation Engine (T-ANNE)
• Multilingual capability
  • Automatically determines the language of the text
  • AGROVOC – multilingual thesaurus more than
    40,000 concepts in up to 22 languages




                                                        10
Challenges
1. Ambiguity
2. Morphological Variations
3. Detail / Granularity Level




                                11
Challenges
1. Ambiguity

                                A song or the Himalayan region?


 “They performed Kashmir, written by Page and Plant. Page played unusual chords on
 his Gibson”.


     Guitar brand or actor “Mel Gibson”?



                  Guitarist “Jimmy Page” or the Google founder “Larry Page”?




                                                                                     12
Challenges
2. Morphological Variations

Variation of entities representing the same concept using:
    Plurals
    Acronyms / Abbreviations
    Different Spellings
    Compound Words
    Language




                                                             13
Challenges
3. Detail / Granularity Level

 Some annotation system will issue more generic tags while
  others issue more specific tags.


 For example, a general tag as ‘Cereals’ in contrast to a specific
  tag as ‘Waxy maize’.

 It really depends what would be the actual need of the results,
  whether the system should return coarse-grained or fine-grained
  annotation tags. It is important to choose the right granularity (detail)
  level.


                                                                              14
Conclusions
 Annotation engine uses knowledge based approach
  that performs concept entity recognition

 Application domains and the number of languages it can
  handle relies on the knowledge base used for the
  recognition purpose.

 Future work - Address the challenges (Entity Resolution,
  Disambiguation)




                                                             15
16

Weitere ähnliche Inhalte

Ähnlich wie Multilingual Semantic Annotation Engine for Agricultural Documents

Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataFernando Silva Parreiras
 
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATION
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATIONCHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATION
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATIONTSha7
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityChristoph Lange
 
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptxCobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptxmehrankhan7842312
 
Annotation seminar
Annotation seminarAnnotation seminar
Annotation seminarhozifa1010
 
Generations Of Programming Languages
Generations Of Programming LanguagesGenerations Of Programming Languages
Generations Of Programming Languagespy7rjs
 
B tech project_report
B tech project_reportB tech project_report
B tech project_reportabhiuaikey
 
Voice Enabled Desktop Interaction and Control System (VEDICS).
Voice Enabled Desktop Interaction and Control System (VEDICS).Voice Enabled Desktop Interaction and Control System (VEDICS).
Voice Enabled Desktop Interaction and Control System (VEDICS).AEGIS-ACCESSIBLE Projects
 
Php oops interview questions
Php oops interview questionsPhp oops interview questions
Php oops interview questionsVIjay Sunder
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1 ITNet
 
High level languages representation
High level languages representationHigh level languages representation
High level languages representationgaurav jain
 
English de lenguaje de programacion
English de lenguaje de programacionEnglish de lenguaje de programacion
English de lenguaje de programacionVillalba Griselda
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerBOSC 2010
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingScott Faria
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
DSL, the absolute weapon for the development
DSL, the absolute weapon for the developmentDSL, the absolute weapon for the development
DSL, the absolute weapon for the developmentESUG
 
Automatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrAutomatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrJIE GAO
 

Ähnlich wie Multilingual Semantic Annotation Engine for Agricultural Documents (20)

Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software Data
 
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATION
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATIONCHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATION
CHAPTER 1 OBJECT ORIENTED NOTES SLIDE PRESENTATION
 
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and ExtensibilityThe Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
The Distributed Ontology Language (DOL): Use Cases, Syntax, and Extensibility
 
Antconc
AntconcAntconc
Antconc
 
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptxCobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
Cobbbbbbbnnnnnnnnnnnnnnnnncepts of PL.pptx
 
Annotation seminar
Annotation seminarAnnotation seminar
Annotation seminar
 
Generations Of Programming Languages
Generations Of Programming LanguagesGenerations Of Programming Languages
Generations Of Programming Languages
 
B tech project_report
B tech project_reportB tech project_report
B tech project_report
 
LectureNotes-01-DSA
LectureNotes-01-DSALectureNotes-01-DSA
LectureNotes-01-DSA
 
Voice Enabled Desktop Interaction and Control System (VEDICS).
Voice Enabled Desktop Interaction and Control System (VEDICS).Voice Enabled Desktop Interaction and Control System (VEDICS).
Voice Enabled Desktop Interaction and Control System (VEDICS).
 
Php oops interview questions
Php oops interview questionsPhp oops interview questions
Php oops interview questions
 
Computer programing 111 lecture 1
Computer programing 111 lecture 1 Computer programing 111 lecture 1
Computer programing 111 lecture 1
 
High level languages representation
High level languages representationHigh level languages representation
High level languages representation
 
English de lenguaje de programacion
English de lenguaje de programacionEnglish de lenguaje de programacion
English de lenguaje de programacion
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseeker
 
8505548.ppt
8505548.ppt8505548.ppt
8505548.ppt
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
DSL, the absolute weapon for the development
DSL, the absolute weapon for the developmentDSL, the absolute weapon for the development
DSL, the absolute weapon for the development
 
Automatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache SolrAutomatic Term Recognition with Apache Solr
Automatic Term Recognition with Apache Solr
 

Mehr von AIMS (Agricultural Information Management Standards)

Mehr von AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Multilingual Semantic Annotation Engine for Agricultural Documents

  • 1. Multilingual Semantic Annotation Engine for Agricultural Documents Benjamin Chu Min Xian Arun Anand Sadanandan Fadzly Zahari Dickson Lukose 04.09.2012 International Symposium on Agricultural Ontology Service (AOS2012)
  • 2. Outline  Introduction  Related Work  System Description: Text Annotation Engine  Challenges  Conclusion 2
  • 4. Related Work • Semantic Annotation techniques are typically categorized into pattern-based and machine learning-based • Most of the annotation tools can only deal with a single language • Not easily customized to work for different domains 4
  • 5. Text Annotation Engine (T-ANNE1) • Semantic tagging system – Semantic web of tags • Knowledge base approach • Scalable system – Handles large sets of documents – Web services • Distributed approach – Document Splitter • Multilingual tagging – Language identifier 1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition (2012). (Patent Pending) 5
  • 6. Text Annotation Engine (T-ANNE) Multilingual Semantic Annotation System Overview
  • 7. Text Annotation Engine (T-ANNE) Semantic Annotation AGROVOC Engine (T-ANNE) Documents Knowledge Base Semantic Annotations TAGS Knowledge Base
  • 8. Text Annotation Engine (T-ANNE) Example (Japanese) Semantic Annotation Engine AGROVOC (T-ANNE) Knowledge Base TAGS Knowledge Base
  • 9. Text Annotation Engine (T-ANNE) • Knowledge-based approach • The number of languages and domains it can handle is only limited by the knowledge base it uses • Easily customized • Utilizes AGROVOC as the knowledge base for recognition and annotation of agriculture related documents 9
  • 10. Text Annotation Engine (T-ANNE) • Multilingual capability • Automatically determines the language of the text • AGROVOC – multilingual thesaurus more than 40,000 concepts in up to 22 languages 10
  • 11. Challenges 1. Ambiguity 2. Morphological Variations 3. Detail / Granularity Level 11
  • 12. Challenges 1. Ambiguity A song or the Himalayan region? “They performed Kashmir, written by Page and Plant. Page played unusual chords on his Gibson”. Guitar brand or actor “Mel Gibson”? Guitarist “Jimmy Page” or the Google founder “Larry Page”? 12
  • 13. Challenges 2. Morphological Variations Variation of entities representing the same concept using:  Plurals  Acronyms / Abbreviations  Different Spellings  Compound Words  Language 13
  • 14. Challenges 3. Detail / Granularity Level  Some annotation system will issue more generic tags while others issue more specific tags.  For example, a general tag as ‘Cereals’ in contrast to a specific tag as ‘Waxy maize’.  It really depends what would be the actual need of the results, whether the system should return coarse-grained or fine-grained annotation tags. It is important to choose the right granularity (detail) level. 14
  • 15. Conclusions  Annotation engine uses knowledge based approach that performs concept entity recognition  Application domains and the number of languages it can handle relies on the knowledge base used for the recognition purpose.  Future work - Address the challenges (Entity Resolution, Disambiguation) 15
  • 16. 16