M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
ISOcat and RELcat, two cooperating semantic registries
1. www.isocat.org
ISOcat and RELcat:
2 cooperating Semantic Registries
Menzo Windhouwer
menzo.windhouwer@dans.knaw.nl
The Language Archive – DANS
Ineke Schuurman
ineke@ccl.kuleuven.be
KU Leuven, CLARIN-NL – Utrecht University
17 January 2014
CLIN 24
1
2. www.isocat.org
Outline
• The need for explicit semantics
– ISOcat
• Mapping issues
– Languages, theoretical frameworks
– Granularity levels
– RELcat
• CGN case study
• Conclusions and future work
17 January 2014
CLIN 24
2
3. www.isocat.org
Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{
NOTION tdn:GrammaticalDistinctions
LABEL "Grammatical distinctions for nouns."
GROUPS {
NOTION tdn:AgentNouns
LABEL "Agent nouns."
DESCRIPTION "Nouns can function as the agent of a clause."
LINK TO CONCEPT agentRole
GROUPS {
NOTION tdn:v098_plusAffix
LABEL "Agent nouns formed by verb stem plus affix."
LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix)
DESCRIPTION
<p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p>
NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX"
IS FIELD v098;
...
Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;
17 January 2014
CLIN 24
3
also this not a TDN punchcard
5. www.isocat.org
ISOcat
• An open Data Category/Concept Registry where
everyone can
– find and select data categories/concepts
– create new data categories/concepts
– share data categories/concepts
• Each data category/concept has a Persistent
Identifier which can be embedded in a resource
(schema) to make the intended semantics (more)
explicit
17 January 2014
CLIN 24
5
6. www.isocat.org
Mapping issues
• Interesting resources for a specific research
question might
– use very different theoretical frameworks, which
might share few/none data categories/concepts
– use more coarse or finer grained data
categories/concepts
• How to overcome these differences by
mapping data categories/concepts to each
other?
17 January 2014
CLIN 24
6
7. www.isocat.org
Some examples
• definite article (PoS)
– EN: 1 (-)
– FR: 2 (masc, fem)
– NL: 2 (neuter, non-neuter)
– DE: 3 (masc, fem, neuter)
Dutch ‘non-neuter’ , for example, should be
related to ‘masc’ and ‘fem’
17 January 2014
CLIN 24
7
8. www.isocat.org
Some examples
• Indirect object (syntax)
– EN: indirect object
– NL:
• meewerkend voorwerp (1), or
• meewerkend voorwerp (2) plus belanghebbend
voorwerp
– All translated as ‘indirect object’
=> 3 definitions of ‘indirect object’, relations are
to be shown !
17 January 2014
CLIN 24
8
9. www.isocat.org
Some examples
• Event (semantics)
– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other theories (Kamp & Reyle etc): eventuality,
two subtypes: ‘event’ and ‘state’
Concepts ‘eventuality’, ‘event’ and ‘state’ are to
be related
17 January 2014
CLIN 24
9
10. www.isocat.org
ISOcat internal issues
Data categories that are almost the same,
apart from type, profile, language, …
Currently we insert a new DC. But note that the
original one and the new one should be
marked as having a same-as relation
17 January 2014
CLIN 24
10
11. www.isocat.org
RELcat
• A Relation Registry (under construction) to store
–
–
–
–
(almost) same-as relationships
subsumption relationships (isSuperClassOf, isSubClassOf)
mereology relationships (isPartOf, hasPart)
…
between data categories/concepts
• The focus is on informal and possibly partial
ontologies to be used for resource discovery
• Based on RDF triples
17 January 2014
CLIN 24
11
12. www.isocat.org
CGN case study
• Atomic building blocks of CGN tags are
defined in ISOcat (still private)
• The EBNF schema of a CGN tag is stored in
SCHEMAcat
• The subsumption relations in the value
domains are stored in RELcat
• (almost) same-as relationships with other data
categories/concepts are also stored in RELcat
17 January 2014
CLIN 24
12
13. www.isocat.org
CGN granularity mappings
• How to deal with (almost) same-as
relationships that involve more then one
atomic CGN data category/concept?
– Example: N(SOORT) = Common Noun
• Based on the CGN EBNF this involves the
following slots of the /CGN tag/
– /PoS/ = /N/
– /NTYPE/ = /SOORT/
• How to express this in RDF?
17 January 2014
CLIN 24
13
14. www.isocat.org
RELcat RDF mapping
• Data categories/concepts can function as
subjects and objects in an RDF triple
• The predicate of an RDF triple is a RELcat
relationship type
• Alternative: complex data categories as
properties
17 January 2014
CLIN 24
14
16. www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more parts
NTYPE
has more
potential
values
has more
potential
values
sameAs
hasPotentialValue
N
17 January 2014
Common Noun
CLIN 24
hasPotentialValue
SOORT
16
17. www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more
potential
values
hasPart
hasPart
isA
hasValue
hasPotentialValue
17 January 2014
NTYPE
has more
potential
values
isA
sameAs
isA
N
has more parts
hasValue
hasPotentialValue
isA
Common Noun
CLIN 24
SOORT
17
18. www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more
potential
values
hasPart
hasPart
isA
hasValue
hasPotentialValue
17 January 2014
NTYPE
has more
potential
values
isA
sameAs
isA
N
has more parts
hasValue
hasPotentialValue
isA
Common Noun
CLIN 24
SOORT
18
19. www.isocat.org
Cooperation between
ISOcat and RELcat
• ISOcat: value domains of closed data
categories
– RELcat: hasPotentialValue (new relationship type)
• ISOcat: is-a relations between simple data
categories
– RELcat: subsumption relations
• SCHEMAcat: part-of relationships
– RELcat: mereology relationships
17 January 2014
CLIN 24
19
20. www.isocat.org
Conclusions and future work
• Simple mappings are easy
• Complex mapping get easily fairly complex
– UI support?
– DSL support?
– Alternative RDF mapping?
• User front-end for RELcat
– Integration of RELcat and ISOcat?
17 January 2014
CLIN 24
20