How to Troubleshoot Apps for the Modern Connected Worker
PID Training - B2HANDLE Python Library Overview
1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Persistent Identifiers in EUDAT
services
B2HANDLE Python library
Version 1
June 2017
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2. Content
Data and Persistent Identifiers
B2HANDLE python library
Data life cycle
Data life cycle with PID
PID Training
3. What do we want from data?
Findable – Easy to find by both humans and computer
systems Metadata
Accessible – Stored for long term, accessed and/or
downloaded with well-defined license and access
Interoperable – Ready to be combined with other
datasets by humans as well as computer systems;
Reusable – Ready to be used for future research and to
be processed further using computational methods.
PID Training
4. What do we know about Persistent Identifiers?
A Persistent Identifier, also known as PID, is an identifier
that is effectively permanently assigned to a resource.
PID Training
identifier resource location
a permanent name or
identity
although it may change
over time
1. points to a resource(s)
2. globally unique
3. PID is persistent over time
5. Use persistent identifier for your data
Managing data online, includes managing the persistent
identifier (PID) for the data.
Synchronize PID, Data during creation, maintenance,
update and deletion of your data!
PID should always
be updated to point to the new location (URL).
continue to provide the latest information about the
resource.
PID Training
6. What do you actually need ?
a PID service, that offers an API for creating and managing
PIDs
EUDAT has adopted Handle-based persistent identifiers
Some EUDAT services require integration of Handle in your
infrastructure
EUDAT offers B2HANDLE, a service dedicated to provide,
resolve and mint persistent identifiers
supported by the ePIC consortium.
PID Training
7. Data life cycle
PID Training
Publish
online
Move to
another
location
used by
another
researcher
has a life cycle, which involves it going online, accessed by users
Published online: http://www.test.com/test.html
Other users may cite, access, re-use this url
Relocate the resource at http://www.example.com/
Other users are not informed -> 404
8. Data Life Cycle
Publish
online
Move to
another
location
used by
another
researcher
has a life cycle, which involves it going online, accessed by users
Python-library
PIDs are subject to the same life cycle
Resolve
PID
Resolve
PID
Resolve
PID
Register
PID
Update PID
Get PID
Details
Handle
Resolution
service
9. Hands-on material
Material on PID hands-on (part 7a and c)
Hands-on tutorial which
shows how to:
Create, manage and
delete PIDs
Employ HTTP restful API
with cURL
Employ the B2HANDLE
python library
https://github.com/EUDAT-
Training/B2SAFE-B2STAGE-
Training
Training module which
provides hands-on
material for:
EUDAT B2SAFE
iRODS4
B2HANDLE
and the EUDAT B2STAGE
service.
11. www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Themis Zamani, GRNET Ellen Leenarts, DANS
Christine Staiger, SURFsara
Thank you
Hinweis der Redaktion
In this presentation we are going to see the use of PIDs based on the data life cycle.
Data and persistent identifiers: What do we want from data in the era of science data?. What do we know about Persistent Identifiers? And how can we use persistent identifiers for our data?
In order to understand PIDs and the PID service we are going to discuss about:
The Data life Cycle: The simplest data life cycle.
And the Data life cycle with the PID system. Data has its own lifecycle but at the same time PIDs are subject to the same life cycle.
Data generation is getting easier/cheaper.
At the same time there is a Shift from data generation to data processing & analysis . A new way to do science.
As a result the number of data output is increasing. A new data world.
So as to make the data world a better place for science we must have sοme data principles in mind. The idea of these data principles are:
Findable – Easy to find by both humans and computer systems Metadata
Accessible – Stored for long term, accessed and/or downloaded with well-defined license and access
Interoperable – Ready to be combined with other datasets by humans as well as computer systems;
Reusable – Ready to be used for future research and to be processed further using computational methods.
An identifier is a unique name, identity applied to something so that something can be easily referenced.
- Points to a resource: The resource is a black box. The type of the url doesn't matter. It may be a file, a metadata record, a code collection.
- globally unique: Once it is created, the resource is globally addressable. You wont find an identifier with the same name that points to a different resource.
- PID is persistent over time : There is a persistent relationship between the pid and the same resource over time
The infrastructure can support access to resources as they move from one repo to another.
From the moment you decide to use PID data and PIDs are strictly connected.
Managing data online includes managing the persistent identifier for the data so that it continues to provide information about whatever it identifies—no matter where it is stored online.
So you actually have to Synchronize PID, Data during creation, maintenance, update and deletion of your data!
The main goal is PID should always:
be updated to point to the new location (URL).
continue to provide the latest information about the resource.
What do you actually need?
a PID service, that offers an API for creating and managing PIDs
EUDAT has adopted Handle-based persistent identifiers
So EUDAT requires integration of Handle in your infrastructure.
EUDAT offers a a service dedicated to provide, resolve and mint persistent identifiers (PID) supported by the ePIC consortium.
In order to understand PIDs and the PID service let’s see and discuss a real example.
The simplest data life cycle.
You have a research output and you want to share it with other researchers.
The traditional way to store your data is to upload to a site, a repository, a directory. In order to access it you bookmark or share a URL. A) Published online: http://www.test.com/test.html. B) Other users may cite, access, re-use this url
As long as nothing changes about the way the data is accessed, this works fine. But one day you decide to move the resource to another location c) Relocate the resource at http://www.example.com/
Other users are not informed about this relocation and when they are trying to access the resource - at the first location – they always get a Page Not Found response.
So this arrangement has proven to be fragile.
Lets see what will happen when you decide to connect to a PID service.
As we can see in this slide, data and PIDs are strictly connected.
As we have already discussed, managing data online includes managing the persistent identifier for the data so that it continues to provide information about whatever it identifies—no matter where it is stored online
Data has its own lifecycle. PIDs are subject to the same life cycle.
When you decide to publish something online at the same time you must register for a PID
When you decide to move the data to another location, at the same time you must inform the service and update the PID
After a PID is created, it is always resolvable by the Handle resolution service.
The whole training module provides hands-on material for iRODS4, EUDAT B2SAFE, B2HANDLE (based on handle version 8) and B2STAGE.
It provides install files which indicate how the training machines are set up and which will give the users an idea how to install the software stack themselves.
The training material itself is targeted at scientist end-users and site admins. The order of the markdown files proposes the curriculum of the training. Each component takes about 1 hour.
EUDAT PID hands-on
The PID training employing cURL requests is part of that training module.
This hands-on tutorial shows you how to work with the handle system version 8 as a data user. It gives insight into how to create and manage PIDs for different purposes and use cases. In the following we will use the HTTP restful API with cURL requests.