2. 2
⢠Create an interoperable domain of Language
Resources (LR)
â Interoperable formats for LR content
â Persistent identification (and citation) of LRs
â Use of SAML based AAI for access to LRs
â Use of the Component Metadata Infrastructure (CMDI) for describing
LRs
3. 3
⢠Created as a response to a fragmented situation of LR metadata
⢠Flexible
â Not a single schema, but supports different metadata schema
â Different schema for different situations
â Semantic Interoperability via linking to semantic registries
⢠Community driven
â communities can model their own metadata schema
â know their data and can create the right schema
â know the right terminology
⢠Sharing
â Concepts, Terminology, Vocabularies
⢠CLARIN Concept Registry for linguistic concepts,
⢠ISO 368 and other relevant vocabularies
⢠CLAVAS for organisation names
â Components & profiles via the CLARIN metadata component registry
4. 4
⢠A Component groups together metadata
Elements, which naturally belong together
to describe a property of the resources
â The Location where a SpeechRecording took place
â The Location of an Actor
â A Location is described by an address a/o region a/o
country a/o continent
⢠Components can be nested
â The Language a specific Actor speaks
â An Actor who takes part in a SpeechRecording for a
specific Project
⢠A Profile is a specific collection of
Components for a specific type of
resources, e.g., speech recordings
SpeechRecordingP
ActorC
LocationC
- addressE
- regionE
- countryE
- continentE
LocationC
ProjectC
LanguageC
LanguageC
Technical
MetadataC
6. 6
⢠Started in 2010, version 1.2 released in 2016 supporting
remote vocabularies
⢠Actively supported by CLARIN ERIC and several national CLARIN
consortia
⢠Many supporting tools:
â VLO, COMEDI, ARBIL, CMDI maker, Virtual Collection Registry âŚ
⢠Link to the Linked (open) Data world: CMDI2RDF
CMDI LODCMDI2RDF
7. 7
⢠Started as a 2014 CLARIN NL project by TLA/MPI and DANS
⢠Now a service supported by CLARIAH WP2 (X11.400)
⢠Linking also to other âlinguisticâ LoD information sources:
â WALS for linguistic typology information
â CLAVAS organization names
â DBpedia (currently only used as glue)
⢠Automatic synchronization CMDI metadata
⢠Simplification of the RDFs CMDI model
8. 8
⢠CMD is classic W3C schema constrained XML
⢠To map a CMD record to RDF we need
â A mapping for the basic component model to RDFS
⢠Basic classes and properties to represent profiles, components,
elements, attributes and their relationships and values
â A mapping for a specific profile or component to RDFS
⢠A specific subclass or subproperty of the basic component model
â A mapping for specific metadata records to RDF instances of RDFS
⢠Instances of profile or component
â Additionaly there is a generic CMD envelop that is mapped using
common LOD vocabularies
9. 9
ď§ Basic CMD model is described by ISO/DIS 24622-1
ď§ 1st part of ISO TC 37 SC 4 3 CMD standards family
ď§ Natural mapping to RDF would be:
ď§ Profiles/components to RDF Classes
ď§ Elements to RDF Properties
ď§ Complication
ď§ CLARINâs CMDI allows attributes on both Components and Elements
ď§ So elements have to be RDF Classes as well
10. 10
⢠Nevertheless introduces extra hierarchy
⢠CMDI is already a hierarchical metadata schema
⢠Human readability decreases
⢠Other solutions welcome!
R 14
Age
<Description URI= âŚ. >
<Age>14</Age>
âŚ
</Person
<DescriptionâŚ. >
<Age status=âUâ>14</Age>
âŚ
</Description> R
Age
14
U
Simplified example
status
13. 13
⢠Offers LoD for different LR
metadata infrastructures
â LRE Map (LREC)
â META-SHARE
â CLARIN
â DataHub (linguistic part)
⢠However
â Wrt. CLARIN only data with DC
profiles
⢠Just a small part of CLARIN
â Seems partly based on static old
data dumps
14. 14
⢠Goals:
â Find metadata type of information about LRs in LD format
â Translate that into a âsuitableâ CMDI profile based metadata record
⢠Is there such LD that is not already available direct in another
format: OLAC, CLARIN, DC, META-SHARE
â If so, useful to have this metadata in the CLARIN VLO metadata catalogue
â Humanities data archives will have mostly DC, (inventory available from
different projects: e.g. DASISH) and frequently offer LD
â Easier ways exist to translate DC into CMDI (e.g. the CMDI DC profile)
â But LD can be a pivot set for many such translations
⢠Still in exploratory phase
â Would like to use a general strategy,
â Its very labor intensive to craft specific transformations for every LD set.
15. 15
⢠Useful for CLARIN?
â Enriching existing CMDI metadata and
recycling them
â Relations to sources already known as:
⢠WALS, DBpedia, CLAVAS, GlotoLog, âŚ
⢠Relations to CLARIAH LD sources ?
â Enable the VLO (or an alternative browser)
for visualizing this information
â Increasing metadata quality:
⢠Use CLAVAS to repair errors
⢠Include preferred labels
â Some CMDI adaptations required
⢠Foreign namespace support in CMDI
payload
A
VLO
B
C
RDF2CMD
CLARIN CENTRES
CLARIAH?
Enriched
CMDI
CMDI
DPpedia Glotolog
RDFstore
Virtuoso as a tripelstore
Tomcat as application server
Elda as browser
Conversion pipeline in Java core transforms in XSLT
all in a Docker package
Code all on GitHub: