2. 2
Motivation and driving consideration about the service
Service architecture and interfaces: overview
- How the user can access the service
E.g.: REST, GUI, CLIs, etc.
- Service options and attributes
Acceptable Usage Policy (AUP)
Use cases
Documentation/tutorial/information
3/5/2019
Content
3. 3
A research community wants to improve the services offered to its users
granting that:
- their data will still be available after years;
- the data are easily accessible from a researcher with just a browser as well as from a
data manager who needs to transfer massive amounts of data;
- the data are easily discoverable through a well defined set of metadata attributes
and tools;
- the data can be moved to computing resources when needed and back;
- those improvements will not disrupt the user workflows because they will be
inplemented in a transparent way through a seamless integration with the current
community services, which will enforce the authorization policies defined by the
community.
3/5/2019
Motivation
Data planning
Long term data preservation
Data curation
Data access
Data discovery
Well defined API and protocols
Data transfer
4. Distributed architecture 1
EUDAT infrastructure (CDI)
• Different administrative
domains
• We need to federate
them to offer a common
user management
Easy
access
7. EUDAT has built an additional layer on top of iRODS to
streamline the processes which supports the replication and
long term data archiving.
iRODS +
EUDAT B2SAFE package +
back-end storage =
B2SAFE service
http://eudat.eu/services/userdoc/configure-b2safe
https://github.com/EUDAT-B2SAFE/B2SAFE-core
Architecture 2
12. 123/5/2019
Interfaces towards other services:
data flow 1
thanks to www.vecteezy.com for the pictures
Community data
Policies: data are stored according to
the rules defined by the community
data are identified
data are registereddata are made discoverable
data can be easily retrieved
data can be easily moved
data are secured
1234
1234
1234
13. 133/5/2019
Interfaces towards other services:
data flow 2
Data are stored according to the
rules defined by the community
Data are identified
Data are made
discoverable
Data are registered
Data can be easily retrieved
Data are secured
Data can be easily moved
A set of EUDAT rules is defined: they
implement the most common data flows.
Community specific rules are added when needed
Longtermdatapreservation
Persistent Identifiers (PIDs) are associated to the data and
registered in the B2HANDLE service
Persistent Identifiers (PIDs) are globally resolvable, they can
be used in B2SHARE and B2STAGE services
Data are replicated according to the defined policy across
different nodes of the EUDAT CDI, making them tolerant to
single node failures and single copy corruption
HTTP API and GridFTP allow to download and upload
data using standard protocols.
Data
discovery
Data
transfer
14. 143/5/2019
Interfaces towards other services:
Data Policy Manager 1
DPM
Definition of policies for data
management
Policies life cycle
management
Policies translation
Policies enforcement
User
authentication
?
Data
manager
Resource
provider
Resource provider
feedback
15. 153/5/2019
Interfaces towards other services:
Data Policy Manager 2
DPM relies on B2ACCESS for the authentication
through the Shibboleth protocol
Policies are implemented as XML documents which
can be created through a web portal
The B2SAFE rules are scheduled according to the
policy trigger and executed by the rule engine
The policies, described in high level language, are
translated into B2SAFE rules
The status of the policy is reported back to the data
manager. It can be waiting in a queue, enforced,
rejected by the resource provider or completed
User authentication
Definition of policies for
data management
Policies translation
Policies enforcement
Resource provider
feedback
Datacuration
Policies life cycle
management
Policies are stored in an XML DB and identified through a
unique id. They can be modified and removed
16. iRODS icommands: it is a set of CLI commands which can be
deployed through RPM or DEB packages.
(https://irods.org/download)
Davrods: it is webDAV interface on top of iRODS.
(https://github.com/UtrechtUniversity/davrods)
The B2STAGE service offers two interfaces for B2SAFE:
- The GridFTP iRODS-DSI to enable fast data transfer through the GridFTP
protocol;
- HTTP API interface to furnish a RESTful interface towards EUDAT
services.
How to access the service 1
17. The GridFTP iRODS-DSI
● DSI (Data Storage Interface): GridFTP can be extended to
support different underlying storage system
● Implemented making use of the iRODS C API
● Supports the main iRODS operations (get, put, delete, list,
checksum calculation)
UberFTP
Globus Online
globus-url-copy
WebFTS
FTS3 Rest CLI
data
The GridFTP iRODS-DSI allows users to manage
data on EUDAT nodes (B2SAFE) through any
standard GridFTP client
18. 183/5/2019
HTTP API
User is authenticating
with username/password
Upload
Download
Oauth2: HTTP API get a oauth2 token from
B2ACCESS and provides an api token to the
user
data are streamed from the http
client to b2safe, avoiding to cache
them at the HTTP API server
B2SAFE validates the
oauth2 token and gets
user attributes to map
the user on a local
account
HTTP API talks with
B2SAFE on behalf of
the user, using the
oauth2 token
data are streamed from b2safe,to the http client, avoiding
to cache them at the HTTP API server
23. 233/5/2019
Featured use cases
Use cases
CLARIN
https://www.eudat.eu/communities/common-language-resources-and-
technology-infrastructure
ClimateModel
https://www.eudat.eu/communities/support-to-scientific-research-on-
seasonal-to-decadal-climate-and-air-quality-modelling
EISCAT https://www.eudat.eu/communities/unified-access-to-eiscat-radar-data
EPOS https://www.eudat.eu/communities/european-plate-observing-system
Herbadrop
https://www.eudat.eu/communities/long-term-preservation-of-herbarium-
specimen-images
IST
https://www.eudat.eu/communities/eudat-services-to-guarantee-long-
time-archiving-and-visibility-to-the-repository-of-ist
VPH https://www.eudat.eu/communities/virtual-humans
SDC https://www.seadatanet.org/About-us/SeaDataCloud
24. 24
The SeaDataNet portal (CDI: Common Data Index) collects only part of
the data produced by more than one hundreds of marine research
institutions.
The others are stored locally from the institutions and offered to the
users after a request via email. They are made accessible via a
temporary web service endpoint.
The quality checks are performed by the local institutions, without any
central mechanism, therefore the risk of inconsistencies and
duplications is high.
There is not a Virtual Research Environment, but a set of desktop and
web applications , independent from each other. The user is forced to
upload the data set that she wants to analyze and to download the
result: there is not a shared data space, neither there is a personal one.
3/5/2019
SeaDataCloud: the challenge
26. 26
B2SAFE and B2STAGE services are hidden behind the community web portal (CDI) which takes care to
manage user and community specific metadata registration (DATA DISCOVERY).
Each of the five EUDAT data centers offers a B2SAFE instance federated with the others.
Each data center provides two storage areas:
- one for the ingestion of the new data uploaded by the data producers, which are the hundreds of marine
science institutions of SeaDataNet (DATA TRANSFER);
- one for the production ready data, which have been validated by the data manager through the community
web portal.
The community web portal triggers quality check workflows on the B2SAFE and B2HOST side (DATA
PLANNING, DATA CURATION).
Once moved into the production area, the data are replicated following a star pattern: each replica has
the same master copy. And a B2HANDLE PID is associated to them (LONG TERM DATA PRESERVATION)
Data can then be shared with applications running on the B2HOST environment (DATA TRANSFER)
3/5/2019
SeaDataCloud: the solution