Presentation held by Benjamin Chu Min Xian, Arun Anand Sadanandan, Fadzly Zahari, Dickson Lukose at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012
Multilingual Semantic Annotation Engine for Agricultural Documents
1. Multilingual Semantic Annotation
Engine for Agricultural
Documents
Benjamin Chu Min Xian
Arun Anand Sadanandan
Fadzly Zahari
Dickson Lukose
04.09.2012
International Symposium on Agricultural
Ontology Service (AOS2012)
2. Outline
Introduction
Related Work
System Description: Text Annotation Engine
Challenges
Conclusion
2
4. Related Work
• Semantic Annotation techniques are
typically categorized into pattern-based
and machine learning-based
• Most of the annotation tools can only deal
with a single language
• Not easily customized to work for different
domains
4
5. Text Annotation Engine (T-ANNE1)
• Semantic tagging system
– Semantic web of tags
• Knowledge base approach
• Scalable system
– Handles large sets of documents
– Web services
• Distributed approach
– Document Splitter
• Multilingual tagging
– Language identifier
1. Chu, M.X., Bahls, D., Lukose, D.: A System and Method for Concept and Named Entity Recognition
(2012). (Patent Pending) 5
7. Text Annotation Engine (T-ANNE)
Semantic
Annotation
AGROVOC
Engine
(T-ANNE)
Documents Knowledge Base
Semantic Annotations
TAGS
Knowledge Base
8. Text Annotation Engine (T-ANNE)
Example (Japanese)
Semantic
Annotation
Engine AGROVOC
(T-ANNE)
Knowledge Base
TAGS
Knowledge Base
9. Text Annotation Engine (T-ANNE)
• Knowledge-based approach
• The number of languages and domains it can
handle is only limited by the knowledge base
it uses
• Easily customized
• Utilizes AGROVOC as the knowledge base
for recognition and annotation of agriculture
related documents
9
10. Text Annotation Engine (T-ANNE)
• Multilingual capability
• Automatically determines the language of the text
• AGROVOC – multilingual thesaurus more than
40,000 concepts in up to 22 languages
10
12. Challenges
1. Ambiguity
A song or the Himalayan region?
“They performed Kashmir, written by Page and Plant. Page played unusual chords on
his Gibson”.
Guitar brand or actor “Mel Gibson”?
Guitarist “Jimmy Page” or the Google founder “Larry Page”?
12
14. Challenges
3. Detail / Granularity Level
Some annotation system will issue more generic tags while
others issue more specific tags.
For example, a general tag as ‘Cereals’ in contrast to a specific
tag as ‘Waxy maize’.
It really depends what would be the actual need of the results,
whether the system should return coarse-grained or fine-grained
annotation tags. It is important to choose the right granularity (detail)
level.
14
15. Conclusions
Annotation engine uses knowledge based approach
that performs concept entity recognition
Application domains and the number of languages it can
handle relies on the knowledge base used for the
recognition purpose.
Future work - Address the challenges (Entity Resolution,
Disambiguation)
15