Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azman from FSTM, UKM
1. Application of Ontology in Semantic Information Retrieval
Presentation for MyRENSeminar
Berjaya Hotel, Kuala Lumpur
27 November 2014
1
2. Brief speaker’s info
2
Shahrul Azman Mohd. Noah, Ph.D.
Knowledge Technology Research Group
Center for AI Technology (CAIT)
shahrul@ukm.edu.my
Graduated in BSc(Mathematics) from UKM
Graduated in MSc(IS) from Sheffield U.
Graduated in PhD(IS) from Sheffield U. –
knowledge-based systems
From Muar, Johor
4. What is ontology?
•Ontology may be considered as a kind of method to represent knowledge.
•From a philosophical discipline –the science of “what is”; the kinds and structures of objects, properties, events, processes and relations in every area of reality.
•Aristotle classification of animals is one
the first ontology developed.
6
5. Ontology in Computing
•An ontology is an engineering artifact:
–It is constituted by a specific vocabulary used to describe a certain reality, plus
–A set of explicit assumptions regarding the intended meaning of the vocabulary.
•Thus, an ontology describes a formal specification of a certain domain:
–Shared understanding of a domain of interest
–Formal and machine manipulablemodel of a domain of interest
7
6. 8
Ontology Definition
Formal, explicit specification of a shared conceptualization
commonly accepted understanding
conceptual model of a domain (ontological theory)
unambiguous terminology definitions
machine-readability with computational semantics
[Gruber93]
7. Source: Smith & Welty (2001)
a catalog
a set of
text files
a glossary
a thesaurus
a collection of
taxonomies
a set of
general logical
constraints
a collection of
frames
Complexity
An ontology is…
9
8. Various approaches to classify ontologies
10
Classify ontologies according to the information the ontology needs to express and the richness of its internal structure (Lassila& McGuiness, 2001)
Classify into 2 orthogonal dimensions: the amount and type of structure and the subject (Van Heijstet al., 1997)
Classify ontologies according to their level of dependence on a particular task (Guarino, 1998)
9. Ontology language
• Ontology languages are formal languages used to construct ontologies
– allow the encoding of knowledge about specific domains and often
– include reasoning rules that support the processing of that knowledge
• Various languages have been proposed: CycL, KL-One, Ontolingua, F-Logic,
OCML, LOOM, Telos, RDF(S), OIL, DAML+OIL, XOL, SHOE,
OWL etc.
• Usually based on Description Logic (DL).
• Summarised as (Kalibatiene & Vasilecas, 2011):
11
10. Example of ontologies
•Top level ontology -
12
Suggested Upper Merged Ontology (SUMO
11. 13
Portion of SUMO ontology with
USGS Geo-concepts inserted
17. Concepts
•“Information retrieval (IR)is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968).
•Applications of IR: recommendations, Q&A, filtering… and of course searching.
20
18. Issues in IR
•Some issues in IR:
–Relevance
–Evaluation
–Users and information needs
•Context based search
•Semantic search
•Etc.
21
21. Ontology and semantic search
•Various ways to support semantic search:
–Query expansion –users query are expanded with related terminological terms
–Disambiguation –resolving terms or concepts when they refer to more than one topics
–Classifying –classify documents such as ads into ontological topics to support semantic search
–Enhanced IR model –embed ontology into existing IR model resulting a modified IR model
25
22. Query Expansion
•Query expansion (QE) is needed due to the ambiguity of natural language.
•Main aim of QE –to add new meaningful terms to the initial query.
26
Bhogal, J., Macfarlane, A. & Smith, A. 2007. A review of ontology based query expansion. Information Processing and Management, 43: 866-886.
24. Semantic index
• Textual documents are indexed according to some ontology
model.
• Remember the concept of vocabulary in IR?
31
architecture
bus
computer
database
….
xml
computer science
collection index terms or vocabulary
of the collection
Extract Indexing
25. Semantic index
• Textual documents are indexed according to some ontology
model.
• Remember the concept of vocabulary in IR?
32
computer science
collection Replace the index with ontological-index
Extract Indexing
architecture
bus
computer
database
….
xml
26. Examples
•Three research projects that illustrate the applications of ontology-based IR:
–Semantic digital library
–Crime news retrieval
–Multi modality ontology-based image retrieval
35
27. Semantic digital library
•Proposed an approach for managing, organizing and populating ontology for document collections in digital library.
•The document metadata and content are inserted and populated to a knowledge base which allows sophisticated query and searching.
•Firstly to propose an ontology based information retrieval model which is based on the classic vector space model which includes document annotation, instance-based weighting and concept-based ranking.
36
29. Semantic digital library
•Involved three ontologies –ACM Topic hierarchies, Geo ontology and Dublin core metadata
•Portion of domain ontology focusing on academic thesis
38
32. VSM Index
#create Class Person
#create instance of Class Student
<Student rdf:ID="Student1">
<rdfs:label>ArifahAlhadi</rdfs:label>
</Student>
<Student rdf:ID="Student2">
<rdfs:labelrdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>AsyrafArifin</rdfs:label>
</Student>
#Create Instance of Class Supervisor
<Supervisor rdf:ID="Supervisor1">
<rdfs:label>PM Dr ShahrulAzman</rdfs:label>
<rdfs:label>Prof. MadyaDr. ShahrulAzmanMohdNoah</rdfs:label>
</Supervisor>
<Supervisor rdf:ID="Supervisor2">
<rdfs:label>Prof Aziz Deraman</rdfs:label>
</Supervisor>
Concept
Instance
Documents
http://www.ukm.my/thesis/supervisor#
http://www.ukm.my/thesis/person#
Supervisor1
Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student1
Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student2
Doc1
Id
Term
TFIDF
Frq
Doc
Id
1
ArifahAlhadi
0.11
2
Doc1
2
AsyrafArifin
0.123
1
Doc1
3
PMDr ShahrulAzman
0.45
1
Doc1
33. Ontology-based IR for crime news retrieval
•Each crime news must be classified into categories: Traffic Violation, Theft, Sex Crime, Murder, Kidnap, Fraud, Drugs, Cybercrime, Arsonand Gang(Chen et al. 2004)
•Useful entities need to be identified: Person, Location, Organisation, Date/Time, Weapon, Amount, Vehicle, Drug, Personel properties, and Age.
•Clustering of crime news into topics, e.g. NurinJazlinmurder, Canny Ong, Sosilawatietc.
•Clustering of specific topic into various
and chronological events.
•Mapping of named entities into news
ontology to support semantic querying and retrieval.
42
34. Example
43
Murder
Kidnap
Theft
Gang
NurinJazlin
Sosilawati
Canny Ong
Investigation into Canny Ong case include medical report and trial
Evidence/Suspect into Canny Ong case
DNA test
Family reacts into Canny Ong and negligence suit
Court Sentence, plead guilty
(17)
(6)
(3)
(9)
(13)
………………..
Classification
Clustering
Cluster into topics
35. Required methods
•In order to support the aforementioned requirements:
–Conventional text processing -tokenizing, indexing, stopping, stemming etc.
–Named entity recognition (NER)
–Classification and clustering
–Ontology mapping
44
37. Document representation
•Documents will be presented into meaningful forms:
–BoW–Bag of Words
–Named Entity Recognition –used the GATE Annie and Jape rules
–Adopt the Vector Space Model (VSM) but enhanced with ontological model
48
39. Document organization
•Documents need to be organised into categories, topics and events.
–Classification –Adaboostalgorithm
–Clustering –Used the KNN clustering
–Ontology mapping –we have develop a crime news ontology by extending the existing SNaPontology. Includes classes/entities which are important to crime such as classification of crimes, locationand weapon.
50
43. Ontology-based Image Retrieval
•Rapid growth of visual information (VI) –lead to difficulty in finding and accessing VI.
•Inability to capture the semantic content.
•Problem arise –lack of coincidence between information extracted from VI and user needs.
•Conventional approaches of image retrieval (IMR) -TBIR and CBIR have reached their limit in attempting to solve this problem.
•As a result –SBIR approach,
ontology-based provide an explicit
domain oriented semantic for
concept and relationship.
55
44. Ontology-based Image Retrieval
•Illustrate how images are describes based on it visual, textual and domain semantic features.
•Proposed a multi-modality ontology: visual ontology, textual ontology and domain ontology.
•Illustrate how such ontology can be integrated with open source knowledge base (DBpedia) to support a more comprehensive search.
56
48. Conclusion -Practical implementation of ontology-based IR
60
TBox
ABox
Ontology
Documents
Index
Extraction
build
Population
Annotation
Query Processing
query
ranked docs
49. Research issues
•Index representation –most still based on the conventional VSM.
•Ranking –weighting and ranking mechanisms
•Automatic population –supervised and unsupervised
•Extraction & annotation
•Multilingual and cross-language
61
50. References
•Castells, P., Fernandez, M.,Vallet, D. 2007. An Adaptation of Vector Space Model for Ontology Based Information Retrieval. IEEE Transaction on Knowledge and Data Engineering, 19(2):
•Shahrul Azman Noah, Nor AfniRaziahAlias, NurulAida Osman, ZuraidahAbdullah, NazliaOmar, YazrinaYahya, MaryatiMohd Yusof: Ontology-Driven Semantic Digital Library. AIRS2010: 141-150.
•Shahrul Azman Noah, DatulAida Ali: The Role of Lexical Ontology in Expanding the Semantic Textual Content of On-Line News Images. AIRS2010: 193-202.
•Fernández, M., Cantador, I., López, V. , Vallet, D., Castells, P., & Motta, E. 2011. Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9: 434-452.
•Kara, S. Alan, O., Sabuncu, O., Akpınar, S., CicekliN.K., & Alpaslan, F.N. 2012. An ontology-based retrieval system using semantic indexing. Information Systems, 37: 294-305.
•Kohler, J., Philippi, S., Specht, M., & Ruegg, A. 2006. Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19: 744-754.
•Etc.
62