1. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
B2STAGE Installation
How to enable B2STAGE on your site
Version 1.1
August 2016
This work is licensed under the Creative
Commons CC-BY 4.0 licence.
Attribution: EUDAT – www.eudat.eu
2. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
B2STAGE
B2STAGE is a reliable, efficient, light-weight and easy-to-use service
to transfer research data sets between EUDAT storage resources and
high-performance computing (HPC) workspaces
4. eudat.eu/b2stage
Allowing third party transfers
4
User desktop
Data location
or PID
HPC
GridFTP server
data
Your site
PID
Registry
PID
controlcontrol
5. eudat.eu/b2stage
Move large amounts of data between
data stores and high-performance
compute resources by means of
different protocols and API clients
Ingest computational results back into
EUDAT
Deposit large data sets into EUDAT
resources for long-term preservation
Deploying B2STAGE allows your users to:
Features:
High-speed transfer
Reliable and light-weight
Data access by PIDs
5
Purpose
6. eudat.eu/b2stage
What exactly will this allow?
6
Your site
GridFTP server
iRODS-DSI
User desktop
GridFTP client
data
control
PID
Registry
PID
control
HPC
GridFTP server
8. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
Prerequisites
9. eudat.eu/b2stage
Prerequisites
iRODS v4.1 deployment and configuration
Including the Development Tools and Runtime Libraries packages (see
http://irods.org/download/)
Globus GridFTP server (globus-gridftp-server-progs) deployment and
configuration
Software components deployment:
CMake 2.7 or higher
libglobus-common-dev (.deb) or globus-common-devel (.rpm)
libglobus-gridftp-server-dev (.deb) or globus-gridftp-server-devel (.rpm)
libglobus-gridmap-callout-error-dev (.deb) or globus-gridmap-callout-
error-devel (.rpm) (see http://www.ige-
project.eu/downloads/software/releases/downloads)
libcurl4-openssl-dev
It is possible to use the official iRODS and GridFTP
server packages without recompiling them. 9
10. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
Basic deployment and
configuration
11. Hands-on material
B2STAGE installation
(part 9)
Example installation on
Ubuntu
Installation of the iRODS-
DSI
Configuring the gridFTP
server
Configuring the PID
resolution
Giving access to users
https://github.com/EUDAT-
Training/B2SAFE-B2STAGE-
Training
Material on
Training module which
provides hands-on
material for:
EUDAT B2SAFE
iRODS4
B2HANDLE
and the EUDAT B2STAGE
service.
12. eudat.eu/b2stage
B2STAGE Examples - Listing
List data in iRODS with globus-url-copy:
globus-url-copy -list
gsiftp://<server>/<irodszone>/home/<user>/
$ globus-url-copy -list gsiftp://eve.eudat-
sara.vm.surfsara.nl/eveZone/home/eve/Collection/
globus-url-copy -list gsiftp://<server>/<PID>
where the PID is either attached to a file or an iRODS collection
$ globus-url-copy -list gsiftp://eve.eudat-
sara.vm.surfsara.nl/846/cc83ae10-5e37-11e6-9c19-04040a64004a/
Both commands will list the same folder
12
13. eudat.eu/b2stage
B2STAGE Examples - Copy
Copy data from iRODS to another server:
globus-url-copy –r gsiftp://<server>/<irodszone>/home/<user>/
<local Path>
$ globus-url-copy -r gsiftp://eve.eudat-
sara.vm.surfsara.nl/eveZone/home/eve/Collection/ /home/eve/getData/
globus-url-copy –r gsiftp://<server>/<PID> <local Path>
$ globus-url-copy -r gsiftp://eve.eudat-
sara.vm.surfsara.nl/846/cc83ae10-5e37-11e6-9c19-04040a64004a/
/home/eve/getData/
Both commands will copy the data in Collection to the folder getData
on your local machine.
13
14. Get Data to Computation
eudat.eu/b2stage
www.eudat.eu
Additional configuration
16. eudat.eu/b2stage
Globus Online Checksums
Enabling the checksum checking offered by Globus.org
Configure iRODS to use MD5 checksums by default
(iRODS 4 otherwise defaults to SHA-256).
Edit /etc/irods/server_config.json and set:
"default_hash_scheme": "MD5",
16
17. eudat.eu/b2stage
Specify a policy to manage more than one
iRODS resource
Edit $GLOBUS_LOCATION/etc/gridftp.conf.
Set $irodsResourceMap to a file, e.g. called mapResourcefile
$irodsResourceMap "path/to/mapResourcefile"
Populate path/to/mapResourcefile with lines mapping
particular iRODS paths with iRODS resource to be used. Use ‘;’ to
separate them. For example, assume that resc-repl is an
alternative iRODS resource:
$ cat path/to/mapResourcefile
/CINECA01/home/cin_staff/rmucci00;resc-repl
/CINECA01/home/cin_staff/mrossi;resc-repl
If none of the listed paths is matched, the iRODS default resource is
used.
17
18. eudat.eu/b2stage
Handling unmapped users
Users whose distinguished name (DN) is not yet mapped to an
iRODS user, can be automatically provided with access
Configure the DSI to invoke an iRODS server-side command with
iexec
The command receives the certificate’s DN (distinguished name)
Edit $GLOBUS_LOCATION/etc/gridftp.conf
Set '$irodsDnCommand' to the name of the command to
execute.
E.g., to invoke a script called 'createUser', add:
$irodsDnCommand "createUser"
On the iRODS server, the command should be installed in
'$IRODS_HOME/server/bin/cmd/'
18
19. For more info: http://eudat.eu/services/b2stage
User documentation: http://eudat.eu/services/userdoc/b2stage
Thank you
20. www.eudat.eu
Authors Contributors
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Roberto Mucci (CINECA)
Kostas Kavoussanakis (EPCC)
Christine Staiger (SURFsara)
Thank you
Editor's Notes
The EUDAT datacentres store and replicate large amounts of data for the communities. But what about processing these data? And how do these data get into the EUDAT datacentres in the first place?
As a community centre or EUDAT centre you already installed iRODS at your site to replicate data to other centres in the network.
But how can users access or dump files in iRODS and make use of e.g. the automatic replication to another site or employ the PID registry?
Most users/scientists are not familiar with iRODS, they might know some standard protocols and how to script e.g. in python. You can give access to iRODS by means of gridFTP and by this offer them a way to use tools like Globus Online or webFTS to manage their data in iRODS.
The installation at your site
Users have a PID or an iRODS path to locate the data
The user would like to do some computations on the data at an HPC site
Thus the user needs to be able to control data movements at both sites and
Between the two sites
In the following we will show you how you can employ B2STAGE to facilitate these data movements.
The purpose of this session is to train administrators to deploy B2STAGE on their site. This will allow big, research data to move efficiently between storage and computation. The service also takes care of depositing the computation output from the HPC facilities to EUDAT. B2STAGE can also be used to deposit the community data into the EUDAT facilities. B2STAGE uses the established gridFTP protocol to ensure high-speed transfer between the sites. Data transfer is reliable and requires very little user interaction. B2STAGE also allows for accessing data by their persistent identifiers (PIDs), more specifically handles. Output that the user elects to inject back into the EUDAT datacentres can be transferred by B2STAGE and in combination with B2SAFE be labelled with a PID.
Note animation
This session will show you how to deploy the EUDAT Data Storage Interface (DSI) component. To this end you need to deploy a GridFTP server on the same machine as the iRODS server runs on. Subsequently, you can install the iRODS DSI which ties the GridFTP server to this particular iRODS instance. Your users will then be able to employ the GridFTP client of their choice to transfer data efficiently between your iRODS instance and other sites or their computer. They will be able to use iRODS paths or PIDs to specify their digital objects.
In combination with the B2SAFE module you can define event hooks in iRODS like:
When users place data at a designated space at your site, the B2SAFE service ensures that a PID is generated by B2HANDLE for each data object, and this is recorded in the PID Register. The iRODS Server also handles any replication required for these data objects, according to the community policies that apply to the user who initiated the transfer.
You can configure B2STAGE in such a way that users can access their data directly by PID without needing to know the exact iRODS path.
This session does not cover deployment and configuration of iRODS v4.1; seek the B2SAFE training material for this. Also, deployment and configuration of GridFTP is assumed; note in particular firewall considerations apply to GridFTP.
You will also need the following software components:
CMake 2.7 or higher
libglobus-common-dev (.deb) or globus-common-devel (.rpm)
libglobus-gridftp-server-dev (.deb) or globus-gridftp-server-devel (.rpm)
libglobus-gridmap-callout-error-dev (.deb) or globus-gridmap-callout-error-devel (.rpm) (see http://www.ige-project.eu/downloads/software/releases/downloads)
libcurl4-openssl-dev
It is important to note that you can use the official iRODS and gridftp server binaries.
This training module provides hands-on material for iRODS4, EUDAT B2SAFE, B2HANDLE (based on handle version 8) and B2STAGE.
It provides install files which indicate how the training machines are set up and which will give the users an idea how to install the software stack themselves. The training material itself is targeted at scientist end-users and site admins. The order of the markdown files proposes the curriculum of the training. Each component takes about 1 hour.
EUDAT B2STAGE hands-on
This hands-on tutorial illustrates how to install B2STAGE on top of B2SAFE, it takes you through a set of configuration steps that are necessary to
Access data in iRODS
Access data and folders registered by PIDs (To register data with PIDs use the respective B2SAFE rules or define your own PID-workflow by means of B2HANDLE)
It will also show you how to give simple access to users via grid certificates and map them to iRODS accounts
After the installation and configuration of B2STAGE users can list their iRODS collection with a gridFTP client. Here we show examples for the globus-url-copy client.
If B2STAGE is configured with the PID resolver, users can access data in this iRODS instance by their PID. Note, that this only works when the PID resolves to the iRODS instance the B2STAGE module/gridFTP endpoint is configured for.
See slide above.
Instead of <local Path> you can also insert another gridFTP instance and do thrid-party transfers. This would also work when you are transferring a collection from your iRODS/B2STAGE via the PID.
Some external services that can be combined with the gridFTP server like Globus Online are dependent on a certain checksum algorithm. E.g. Globus Online uses MD5 to verify the integrity of the data after a data transfer. Thus, we need to make sure that iRODS provides the appropriate checksums by default.
This can be configured in the iRODS server_config.json
For now all data that is written to iRODS via gridFTP is stored in the default resource specified in the irods environment file under the user who runs gridFTP (see installation and configuration); i.e. in our example the iRODS demo resource is used.
Here we show you how to configure different resources for iRODS paths. Note, that in the example above all data under /CINECA01/home/cin_staff/rmucci00 will be stored on the resc-repl resource, also subcollections.
The resources the iRODS paths are mapped to needs to be defined in iRODS and can become fairly complex. Please refer to the iRODS manual for more information on this topic.
B2STAGE offers the possibilty to automatically create new iRODS users or automatically map DNs to existing users in case there is no mapping yet.
On the iRODS server you need to provide a script, that will be called in that case. The script needs to be installed in the irods/server/cmd folder.
By setting the variable $ irodsDnCommand to the name of this script you enable this feature.
For more info please visit: http://eudat.eu/services/b2stage. The User documentation can be found at: http://eudat.eu/services/userdoc/b2stage