Understanding large ontologies, with diverse semantics and modelling practices, is still an issue, and has an impact on many ontology engineering tasks. While existing methods summarise ontologies by extracting the most important nodes or subgraphs, a complete overview of an ontology, and a comparison between multiple ontologies, are not supported. Based on the hypothesis that ontologies are designed as compositions of patterns, this slides present a method able to extract conceptual components from multiple ontologies and the observed ontology design patterns implementing them.
related paper: https://arxiv.org/abs/2106.12831
Extraction of common conceptual components from multiple ontologies
1. The circles in this presentation are freely adapted from
Color Study: Squares with Concentric Circles by Wassily Kandinsky
Extraction of common
conceptual components
from multiple ontologies
Valentina Anita Carriero
PhD student
Computer Science and Engineering
University of Bologna
In collaboration with Valentina Presutti
& Luigi Asprino
STLab Spring seminars - May 19th, 2021
2. Before we start
KG = KB that encodes knowledge using a graph-based
structure
nodes = real-world entities (instances)
edges = relations between entities (properties)
1
e.g. ontologies
Dostoevskij
was born in
Moscow
was born in
Person Place
knowledge is formally
represented by schemas
defining categories of concepts
(classes) and relations between
concepts triple <s, p, o>
Knowledge
Graph
3. ONTOLOGY UNDERSTANDING
2
KGs use multiple and heterogeneous schemas
corresponding to diverse design choices
Understanding large ontologies is still an issue
crucial for many ontology engineering tasks
ontology reuse
ontology matching
ontology evaluation
(federated) querying
expressiveness, granularity,
coverage, intended meaning,
naming conventions, …
knowledge
soup
problem
[1]
[1] Aldo Gangemi and Valentina Presutti, Towards pattern science for the semantic web In SWJ (2010)
4. ONTOLOGY UNDERSTANDING
3
Ontology summarisation methods [2]
extract a subset of predicates (or a subgraph)
from the original ontology
summary
key concepts
and relations
You get the most important nodes of one ontology, but not
all the facts one (or more) ontology can represent
e.g.
location
cultural property
time
event
technique
location
cultural property
time event
technique
subject
subject
[2] Sejla Cebiric et al., Summarizing semantic graphs: a survey In VLDB Journal (2019)
5. CONCEPTUAL COMPONENTS
4
These facts can be represented by complex structures
often expressing a relational meaning
Conceptual components are the intensional counterparts
of OWL implementations in ontologies
= Ontology Design Patterns (ODPs)
membership,
locating,
interpreting,
observing …
sets of related
classes,
properties,
axioms
modelling solutions
answering to a number of
competency questions
=
conceptual
components (CCs)
6. CONCEPTUAL COMPONENTS
5
Membership
O1
O2
hasMember
Collection Object
involvesObject
Collection Object
Membership
involvesCollection
Ontology Design
Pattern X
Ontology Design
Pattern Y
implements implements
ontology understanding + comparison
possible CQ: which are the
members of a collection?
Time
atTime
possible CQ: which are the
members of a collection during a
specific time interval?
the fact of being a
member of a collection
8. CATALOGUE of CONCEPTUAL
COMPONENTS and observed ODPs
7
corpus of ontologies
ontology understanding + comparison
O1
O2
O3
O4
O6
O5
O7
O8
Membership
O2 ODP X
O8 ODP Y
O1 ODP Z
Event
O2 ODP A
O6 ODP B
… …
… …
9. APPROACH
8
An ontology is developed as a composition of
ontology design patterns
= solutions observed in existing ontologies,
regardless their correctness or quality
intuition
1) the density of their internal connections is
higher than the density of the connections
between entities from different ODPs
2) the combination of the words describing an
ODP is semantically coherent with the relation
it represents
hypotheses
(un)intentionally
10. APPROACH
9
1) the density of their internal connections is
higher than the density of the connections
between entities from different ODPs
2) the combination of the words describing an
ODP is semantically coherent with the relation
it represents
hypotheses
ODP
Address
Street
Object
Region
Province
Address
ODP
Event
Partici
pant
Time
Event
Place
ODP
Membership
has member
is member of
collection
member
connections between the
entities of the same ODP
13. CULTURAL HERITAGE corpus
12
43 ontologies on Cultural Heritage
except the ontologies that focus on related domains
and top-level ontologies
How? LOV [3] + online survey [4]
Inferred versions when possible (33/43), asserted otherwise
e.g. geometry
chemistry
[3] https://lov.linkeddata.es/
[4] https://t.co/ghwk6lxCOH?amp=1
Which of the following ontologies
modelling CH do you know?
Do you know any other relevant CH
ontology that was not included in the list?
Vocabs
sections
40 participants
4 ontology
networks
HermiT
reasoner
classes properties
2707 9132
14. CONFERENCE corpus
13
16 ontologies on the Conference domain
How? dataset of the Conference track of the Ontology Alignment
Evaluation Initiative (OAEI) 2020 campaign [5]
Inferred versions (16/16)
[5] http://oaei.ontologymatching.org/2020/results/conference
HermiT
reasoner
classes properties
851 714
16. INTENSIONAL ONTOLOGY GRAPHS
15
A graph derived from an ontology, encoding its intensional level,
where:
[credits] Aldo Gangemi
[credits]
[credits]
no
classes/properties
hierarchy
it captures the
context of use of :p
it enables overlapping
communities
19. INTENSIONAL ONTOLOGY GRAPHS
18
properties without domain/range declarations are assumed
to have owl:Thing as domain/range (r1)
domain/range declarations involving blank nodes are
ignored (r1)
class expressions in property restrictions on classes are
ignored (r2)
only 5.42% (CH corpus) and 9.22% (Conf corpus) of all domain/range restrictions
only 1.62% (CH corpus) and 1.48% (Conf corpus) of all subClassOf/equivalence axioms
empirically
observed:
it has no
great impact
no range
21. COMMUNITY DETECTION
20
Community detection aims at grouping the nodes of a network
such that there is a higher density of edges within groups
than between them
Clauset-Newman-Moore algorithm [7]
modified version: we run it recursively on communities with
a density lower than the average density of all communities
detected at the previous step
after running
some experiments
[7] Aaron Clauset et al. Finding community structure in very large networks In Physical review E 70.6 (2004)
22. COMMUNITY DETECTION
22
e.g.
without
recursion
with recursion on communities
with density < average
E11_Modifcation
E57_Material
P126_employed
E24_Physical_Man-Made_Thing
P31_has_modifed
P126i_was_employed_in P31i_was_modifed_by
E12_Production
P108i_was_produced_by
E79_Part_Addition
P110i_was_augmented_by
E1_CRM_Entity
P62_depicts E36_Visual_Item
P65_shows_visual_item
P108_has_produced
E18_Physical_Thing
P111i_was_added_by
P110_augmented
P111_added
P62i_is_depicted_by
P138i_has_representation
P65i_is_shown_by
P138_represents
E80_Part_Removal
P112_diminished
E11_Modifcation
E57_Material
P126_employed
E24_Physical_Man-Made_Thing
P31_has_modifed
P126i_was_employed_in
E24_Physical_Man-Made_Thing
E11_Modifcation
P31i_was_modifed_by
E1_CRM_Entity
P62_depicts P62i_is_depicted_by
E79_Part_Addition
P110_augmented
with recursion on
all communities
E11_Modifcation
E57_Material
P126_employed
E24_Physical_Man-Made_Thing
P31_has_modifed
P126i_was_employed_in
E18_Physical_Thing
P45i_is_incorporated_in
P31i_was_modifed_by
P45_consists_of
24. OBSERVED ODPs
24
OWL/RDF
implementations
retrieval
For each node:
triples asserting its type(s)
domain and range axioms
inverse properties
super- and equivalent classes and properties
all restrictions that involve at least one property in
the community
annotations
OWL/RDF ontology fragments that contain the nodes
of each community
27. CLUSTERING
27
If we cluster the communities according to their vocabularies, we
may identify the conceptual components that are shared by them
One virtual document for each community, by concatenating:
all English rdfs:label values from its entities
when no label, the local IDs
all repetitions are removed
except entities with
namespaces owl: rdf:
rdfs: xsd:
compound terms are split
e.g. using camel case
Redaction Agent is Involved In date event
Narration Mentioned Entity type of
Agent
Event
isInvolvedIn
DateEntity
date
string
eventNarration typeOfEvent
Redaction
isMentionedIn
28. CLUSTERING
28
All virtual documents are disambiguated
All FrameNet frames that have a skos:closeMatch with the
synsets in the virtual documents are extracted from Framester [11]
Additional more general frames are included, by exploiting the
hierarchy of frames
UKB [9], based on WordNet
(English) version 3.0 [10]
wn30:synset-event-noun-1 - frame:Event - frame:State
wn30:synset-mention-verb-1 - frame:Mention
wn30:synset-mention-verb-1 - frame:Adducing
wn30:synset-type-noun-1 - frame:Type
wn30:synset-involve-verb-1 - frame:Topic
wn30:synset-editing-noun-1
wn30:synset-agent-noun-1
wn30:synset-date-noun-1
wn30:synset-narrative-noun-1
wn30:synset-in-adverb-1
[9] https://github.com/asoroa/ukb
[10] https://wordnet.princeton.edu/
[11] https://github.com/framester/Framester
synset
Agent
Event
isInvolvedIn
DateEntity
date
string
eventNarration typeOfEvent
Redaction
isMentionedIn
frame
(closeMatch)
more general
frame
Redaction Agent is Involved In date event
Narration Mentioned Entity type of
29. CLUSTERING
29
K-Means [12] algorithm clusters communities virtual documents
partitioning the observations into a predefined number of k
disjoint groups
All clusters are part of a hierarchical network
two clusters c1 and c2 are hierarchically related if at least
one frame f1 - associated with c1 - inherits from at least
one frame f2 - associated with c2
[12] James MacQueen, Some methods for classification and analysis of multivariate observations In Proceedings of the 5th
Berkeley symposium on mathematical statistics and probability (1967)
Each cluster represents a conceptual component
weight [0,1]
sum of the frames in c1 that are subsumed
under at least one frame in c2, divided by max
= max number of inheritance relations between
frames of the two clusters
30. CLUSTERING
30
A name is assigned to each conceptual component
= the frame(s) or the synset(s) with the highest frequency
Each conceptual component is accompanied by a description
= concatenation of all terms representing its observed ODPs
= how many times the same
synset or frame is included in the virtual
documents belonging to the cluster
Each cluster represents a conceptual component
33. CATALOGUE of CCs and OBSERVED ODPs
33
A catalogue of ontologies
classified based on the conceptual components they implement
Each conceptual component
is linked to its associated ODPs within the ontologies
HTML rendering
CC
ODP
34. EXPERIMENTS and RESULTS
34
We run the method on the Cultural Heritage and Conference corpora
(CH) 43 ontologies (Conf) 16 ontologies
CH Conf
avg # nodes of intensional graphs 165 91
avg # edges of intensional graphs 217 115
avg % classes / properties preserved
in intensional graphs
47 % / 90 % 54 % / 87 %
this is because ontologies with poor axiomatisation are affected by these rules
ontologies with
no domain/range restrictions
no property restrictions on classes
basically, taxonomies
not really ODPs
e.g.
Intensional ontology graphs
35. EXPERIMENTS and RESULTS
35
CH Conf
tot # communities 1,300 419
avg # communities per ontology 30 26
There are patterns common to many communities, which have a
correlation with the modelling practice adopted for a specific
ontology (fragment)
Community detection
36. EXPERIMENTS and RESULTS
36
e.g. EDM
Thing P129_is_aboutP138_representsP67_refers_toP79_beginning_is_qualifed_byP80_end_is_qualifed_byaggregatescollectionNamecontributorcoveragecreatordateeuropeanaProxyformathasFormathasMethasVersionincorporatesisDerivativeOfisFormatOfisNextInSequenceisReplacedByisSimilarToisSuccessorOfisVersionOflanguageproxyForpublisherrelationreplacesrightssourcesubjecttableOfContentstypeuri
E71_Man-Made_Thing
string
U194_has_system_requirements U211_has_composition_of_material U212_has_technical_features U215_has_groove_caracteristics U216_has_tape_confguration U67_has_subtitle U68_has_variant_title
e.g. bad communities without a conceptual unity correspond to poor
axiomatisation: the topology can t support the identification of
significant modules, but the vocabulary highlights different
conceptual areas (about 5% of CH, 1% of Conf)
e.g. DOREMUS
they can
support
ontology
evaluation
37. EXPERIMENTS and RESULTS
37
e.g.
FABIO Newspaper
NewspaperIssue
partOf
NewspaperIssue
Newspaper
part
In some cases, community detection splits too much
e.g. only one axiom on each class, properties with no range/domain
NB
1) when we retrieve the
RDF/OWL implementation of
the ODPs, we include also
inverse property axioms
2) these two communities end
up in the same cluster
and
38. EXPERIMENTS and RESULTS
38
The majority of the detected communities has a good level of
semantic coherence
e.g. CIDOC
CRM E10_Transfer_of_Custody
E39_Actor
P28_custody_surrendered_by P29_custody_received_by P28i_surrendered_custody_through P29i_received_custody_through
NB none of the
(clearly) inverse
property have
been explicitly
asserted as
inverse in the
original ontology
E85_Joining
E39_Actor
P143_joined
E74_Group
P144_joined_with
P143i_was_joined_by P144i_gained_member_by
E6_Destruction
E18_Physical_Thing
P13_destroyedP13i_was_destroyed_by
39. EXPERIMENTS and RESULTS
39
The majority of the detected communities have a good level of
semantic coherence
e.g. ArCo
Thing
Situation
hasSituation isSituationOf
TimeInterval
atTime
Agent
involvesAgent
Estimate
Thing
isEstimateOf
Literal
estimatedValue
hasEstimate
SexInterpretation
CulturalEntity
isSexInterpretationOf
Literal
sex
hasSexInterpretation
ArchaeologicalProperty
hasSexInterpretation AgeOfDeathInterpretation
CulturalEntity
isAgeOfDeathInterpretationOf
Literal
ageOfDeath
ArchaeologicalProperty
hasAgeOfDeathInterpretation
hasAgeOfDeathInterpretation
Legibility
ConservationStatus
isLegibilityOf hasLegibility
ConservationStatusType
hasConservationStatusType
CulturalEntity
isConservationStatusOf
CulturalProperty
hasConservationStatus
isConservationStatusTypeOf hasConservationStatus
40. EXPERIMENTS and RESULTS
40
Clustering
CH Conf
# clusters 100 81
avg # communities per cluster 13 5
avg # ontologies, that the communities belong to, per cluster 4.5 2.6
based on
the elbow
method
Based on the overlap coefficient, the clusters of both corpora result
to be dissimilar good quality of the clusters
a similarity measure that measures the overlap between 2 finite sets
0.20 (CH) and 0.17 (Conf) [0.0, 1.0]
dissimilar
similar
An evaluation of the results against the ontology matching (OM) task
shows that clusters and their relations may be used to improve
the performance of OM algorithms
hypothesis: given a pair of similar entities to align, they should belong to either the same cluster or 2 related clusters
41. EXPERIMENTS and RESULTS
41
The clusters identify a wide range of different conceptual
components, with different levels of abstraction
general components emerge from both corpora
there are also components specific to the domain
Event
Categorization
Membership
Intentionally act
Performing arts
Measurement
Attribution
Submitting documents
Respond to proposal
Award
Acquisition
CH Conf
45. ONGOING and FUTURE WORK
45
Not all clusters are good conceptual components
need to be split / merged
bad names (because of disambiguation)
how to improve CCs extraction?
synset-work-noun-1 - Being_employed: 6 times
Conceptual Work Written embodies
has derivated work type Derivated is of author Agent location
C1001 has descriptive work
Content type Work
Geographic coverage Work
item is portrayal of work has creator
e.g.
!= FRBR: the work is an intellectual or artistic creation
e.g.
[…]
111 communities
No frame(s) / synset(s) really emerges
synset-agent-noun-1: only 14 times
e.g. merge?
split!
46. ONGOING and FUTURE WORK
46
How to identify and exploit communities features for
evaluating ontologies, e.g. recognising bad communities
User-based evaluation of the method, by e.g.
evaluating a pattern-based visualisation tool [Christian Colonna]
using the method for ontology selection / reuse tasks
Including class expressions in the intensional graphs
Automatically linking catalogue s and foundational ontologies ODPs
to observed ODPs
Annotation language of CCs and ODPs in ontologies [starting from OPLa]
[…]
47. Thanks!
Color Study: Squares with Concentric Circles Wassily Kandinsky (1913)
Valentina
Anita
Carriero
valentina.carriero3
@unibo.it