The second episode of metadata and brokering.
Topics covered:
1. additional definition (ontology, relational database and others)
2. the wide picture: data fabric elements from Research Data Alliance (RDA) and possible concrete implementations of those guidelines
Breaking the Kubernetes Kill Chain: Host Path Mount
Metadata & brokering - a modern approach #2
1. Daniele Bailo
M E T A D A T A
& BROKERING
a modern approach
E P I S O D E # 2
2. Previously on…
Metadata &
Brokering#1
Main concepts
- Digital Data
- Metadata
- Brokering system
- The triad <PID, MD, DO>
- Database
- APIs (web services)
Side concepts
- Ontologies / Semantics
- PID
- Digital Object
- Standard
- Interoperability
- Open Access
3. Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
Features
1. APIs
2. <PID, metadata, DO>
3. Contextualization
metadata
4. Support ontologies
Data from Irpinia
<PID, metadata, DO>
request response
THE PERFECT
SYSTEM
#6 Metadata driven
canonical Brokering
with contextualization
& PID
5. Metadata
Purposes
1. Discovery (humans &
machines)
2. Contextualization:
which is the context of
the data
3. Use it for processing
or other advanced
tasks
Usually attached to D.O.
6. Interoperability
What & Why
Enables 2 system to
1. Exchange information
2. Understand information
Usually achieved
through:
- Agreed language
- Software “translators”
interfaces thin layers
...ma che parli
Arabo???
7. Ontologies
Why an ontology?
It is the way machines
manage “meaning”
How does it work?
1. Connects concepts
2. Needs vocabulary
Issues
• Many ontologies exist
• Vocabulary Mapping
Michelini
CNT
Is Director of
INGV
Is section of Gresta
Is president of
Sailing
Has hobby
Trieste
Is Born
Italy
Located in
Boat
use
sea
use
9. Metadata
Catalogue#2
How to implement it?
Single table (bad habit)
One table with all data
Multi table (good habit)
- Data is stored in
multiple tables (one for
concept)
- Tables are linked
- Can contextualize data
Metadata catalogue =
relational database *
Single table
Multi table
10. Metadata
Catalogue#2
How to implement it?
Single table (bad habit)
One table with all data
Multi table (good habit)
- Data is stored in
unique tables (one for
concept)
- Tables are linked
- Can contextualize data
Metadata catalogue =
relational database *
Single table
Multi table and
contextualization
11. Catalogue Interface
Human interface (GUI)
Website or portal
Machine interface
- API or Web service
- which execute scripts
or queries
- Returns metadata in a
given standard
12. What is it?
It does something for the
user
(deliver value to
customer)*
A “thin layer”
We usually don’t know
what’s under the hood
Examples
- FDSN stations
(web) service
FDSN stations
FDSN Dataselect
Database
(MD catalogue)
Waveform
repository
13. CKAN
CKAN GUI
METADATA
catalogue
CKAN APIs
EIDA stations ISIDE stations
Metadata
replication
What is it?
- Metadata Catalogue
- With interfaces
(GUI+API)
- No direct
CKAN <-> sources
connection
Examples
- Works FDSN stations
- Doesn’t work with
FDSN dataselect
Plugins
Plugins
Plugins Plugins
Plugins
Plugins
Plugins Plugins
14. Brokering System
(e.g. VERCE framework)
BROKER GUI
METADATA
catalogue
BROKER APIs
EIDA stations
ISIDE
stations
Metadata
replication
What is it?
- Metadata Catalogue
- With interfaces
(GUI+API)
- System manager
- Other modules
- BROKER <-> sources
interactive connection
Examples
- EIDA stations
- EIDA dataselect
- Processing Job at
System
manager
Interactive
access to
service
EIDA
dataselect
Processing
facility
? ? ?
16. A global view
Data initiatives
RDA
-”regulate” data
sharing/use
EUDAT
- Common data
infrastructure
EGI
- Organize National Grid
Infrastructures (CINECA)
EPOS
- ESFRI integrating Solid
17. RDA
Do for data what has been
done for the internet
(TCP/IP)
18. RDA concepts
Data Fabric
What?
Identifies mechanisms,
standard, components and
interfaces making data
science efficient and cost
effective
Data Management Plan
• Data management
• Data analysis
• Data preservation
• Data publication
• Data sharing
[UK data Archive http://www.data-archive.ac.uk/]
19. RDA concepts
Data Fabric
[RDA WG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf]
How to store?How to register?
How to discover?
How to cite?
How to document
processing?
How to integrate?
How to collect
new DP?
How to
access?
21. standards?
How to preserve
data?Registry
systemWhat?
An agreed/legacy catalog
of:
- data formats (schemas)
- metadata formats
- Vocabularies & semantic
categories
- Data types
- Trusted repositories
- ….
Registry
Ahaa.. Ma
‘npratica è ‘n
database..
…anfatti…
22. How to register/cite data or
publications?
PID system
Purpose
- DO / publication can be
uniquely referenced
- Assign a PID at data
creation times
Issues
- Need for a simple
mechanism to implement
it
- Now EUDAT can help
- Peter & Massimo
23. How to access data?
AAI system
(federeated &
distributed)
Purpose
- Authenticate users
- Authorize users
Issues
- Delegation
- Many system,
sometimes non
interoperable
24. How to store data?
Data repository
(trusted)
What?
- Store data
- Couple with PIDs
- Ensure preservation (not
curation)
- Can be trusted (DSA)
Opportunity
- INGV DSA repository…
25. How to document data
processing?
Workflow engines
Purpose
- Tracks data
transformation
- Allows versioning
- Allows reproducibility
Comments
- Interoperability among
various workflow engines
- VERCE did it
26. Brokering System
(e.g. VERCE framework)
BROKER GUI
METADATA
catalogue
BROKER APIs
Full version include
- Metadata Catalogue
- interfaces (GUI+API)
- System manager
- AAI system
- Workflow engine
External actors
- PID System
- Trusted repositories
- Registries
- Processing facilities
System
manager
Data
set
Data
set Data
set
Data
set Data
set
Data
set
API API
AAI
system
Workflow
Engine
Trusted
repository
Trusted
repository
Registry
PID
system
HPC
center
DIGITAL DATA
Sequence of (digital) symbols
With a meaning
Can be stored
Can be transmitted
Can be computed
METADATA
DATA ABOUT DATA
What is metadata to me, can be data to others
Many standards
Ontologies
BROKERING SYSTEM
- Intermediary software
Access to several system at your place
Collects data for you (integration)
DATABASE
- Collection of (organized) DATA
Usually has DBMS
APIs
Application Programming Interfae
Standard procedures or instructions to access to a service (or function)
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Esempio carta identità
Data management –enterprise to build a data repository, manage an information catalog, & enforce management policy
Data analysis –enterprise to process a data collection, apply analysis tools, and automate a processing pipeline.
Data preservation –enterprise to build reference collections and knowledge bases that comprise the intellectual capital, while managing technology evolution
Data publication –discovery and access of data collections
Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs
Data management –enterprise to build a data repository, manage an information catalog, & enforce management policy
Data analysis –enterprise to process a data collection, apply analysis tools, and automate a processing pipeline.
Data preservation –enterprise to build reference collections and knowledge bases that comprise the intellectual capital, while managing technology evolution
Data publication –discovery and access of data collections
Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs