This document provides an introduction to the ARC middleware, including:
- An overview of the NorduGrid collaboration and the ARC middleware.
- Steps for getting started with ARC such as installing the client, requesting and installing certificates, and logging into the grid.
- Examples of writing job descriptions, submitting jobs, monitoring jobs, and fetching results.
- Additional topics covered include using storage elements, runtime environments, and examples of real-life applications on the grid.
Take control of your SAP testing with UiPath Test Suite
Introduction to ARC Middleware Hands-on Tutorial
1. Introduction to ARC Middleware
ISSGC’09, Sophia Antipolis, Nice, France
Ivan Degtyarenko and Michael Gindonis
CSC – IT Center for Science, Espoo, Finland
July 11th, 2009
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 1 / 36
2. Today’s session
What is it about?
After a quick introduction, you will familiarize
yourselves with ARC middleware with practical
examples.
By this point you have already covered grid
middleware basics, X509, certificates, proxies,
virtual organizations, etc. so let’s dive in!
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 2 / 36
3. ARC Tutorial: timetable for this morning
Time Title
Session I Welcoming and Seminar Practicalities
09:00 – 10:00 Applications on the Grid
Amphitheater NorduGrid / ARC Middleware Overview
Session II Off to the PC Class
10:00 – 12:30 Hands-on ARC tutorial Exercises
Class room (break as required / when the coffee etc. is
available)
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 3 / 36
4. Short introduction:
A “Hello Grid” job with ARC
$ grid-proxy-init generate proxy
$ ngsub -f hello.xrsl submit
$ ngstat -a monitor
$ ngget hello fetch the results
hello.xrsl hello.sh
& (executable=hello.sh) #!/bin/sh
(jobname=hello) echo “Hello Grid!”
(stdout=hello.out)
(stderr=hello.err)
(gmlog=gridlog)
(cputime=10)
(memory=200)
(disk=1)
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 4 / 36
5. Steps to start running on Grid
● get an account for a system with a Grid User Interface
installed (or install it on your own PC)
once ● request a certificate from a Certificate Authority (CA)
● install the certificate into ~/.globus/
● join a VO
● log in to the Grid (create a proxy)
● write a job description in a file
every ● check available resources (optional)
session ● submit the job
● monitor the progress of the job
● fetch the results
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 5 / 36
6. Privacy
Note! When working on the Grid, you must
accept that some information about your jobs and
your Grid identity may be made public, for
example via monitoring tools i.e.
your name / affiliation
IP address of your client computer
job names and duration
runtime environment
other information
Fortunately, for today you are relatively anonymous:
/C=IT/O=GILDA/OU=Personal Certificate/L=Sophia
Antipolis/CN=ISSGCXX
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 6 / 36
7. Security Policies
● policies vary in different grids and VOs
● you will need to accept these terms to use
these resource
● Since you are in the Gilda VO you have
already accepted its policy
● You will need to accept the M-grid
Acceptable Use Policy since some resources
used in this tutorial are part of M-grid
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 7 / 36
8. The NorduGrid collaboration
a community around the open source ARC Grid middleware
− national Grids (e.g. M-
grid, SweGrid, NorGrid),
users also outside the
Nordic countries
− real users, real
applications
− implemented a
production Grid system
working non stop since
May 2002
− open for anyone to
participate
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 8 / 36
9. ARC Middleware
ARC middleware (Advanced Resource Connector)
● open source out-of-the-box Grid solution software which
enables production quality computational and data Grids
● Easily Installable/Buildable for a variety of distributions
− non-intrusive server installation
● Supports a many common LRMS (Batch Systems)
− Grid Engine, PBS/torque, Platform LSF
● builds upon standard Open Source solutions such as
OpenLDAP, OpenSSL, SASL and Globus Toolkit
− adds services not provided by Globus such as
scheduling
− extends or completely replaces some Globus
components
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 9 / 36
10. ARC Middleware (cont.)
• provides a reliable implementation of the fundamental Grid
services, such as information services, resource discovery and
monitoring, job submission and management, brokering and
data management and resource management
● integrates computing resources and storage elements via a
secure Grid layer
● provides a light-weight standalone client, the User Interface,
which allows to submit, manage and monitor jobs on the Grid,
move data around and query recourse info
● UI built-in broker allows to select the best resource for a job
● Grid job requirements are expressed in extended Resource
Specification Language (xRSL)
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 10 / 36
12. The not so short introduction:
Installing the ARC client
● required to submit jobs to NorduGrid
● download from
http://ftp.nordugrid.org/download/
−binaries for various Linux distributions, source code
also available
● the easiest way to install the client is to use the
standalone version
−uncompress in a directory (no root privileges
required):
$ tar zxvf nordugrid-standalone-
<latest>.i386.tgz
− run the environment setup script:
$ cd nordugrid-standalone-<latest>
$ . ./setup.sh
● RPM packages are recommended for multi-user
installations
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 12 / 36
13. Requesting and Installing the grid
Certificate
● create a certificate request
$ grid-cert-request -int
− generates the .globus subdirectory with a key
(userkey.pem) and the request (usercert_request.pem)
− identity string: e.g.
/O=Grid/O=NorduGrid/OU=bccs.uib.no/CN=Per Hansen
− remember to select a good passphrase and keep the
key secret!
● send the file ~/.globus/usercert_request.pem to a
Certification Authority (CA)
see the instructions at your local site / country which CA to
−
contact
● wait for an answer from the CA
− signed certificate returned by the Certificate Authority
should be saved as file .globus/usercert.pem
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 13 / 36
14. Logging in to the Grid
● "Log in": grid-proxy-init
− the command does not actually log in anywhere,
but decrypts the private key and uses it to create
a time-limited proxy
− the proxy is used for authenticating to the
resources
● "Log out": grid-proxy-destroy
− destroys the proxy
● "whoami": grid-proxy-info
− Shows information about the validity of the proxy
subject : /O=Grid/O=NorduGrid/OU=csc.fi/CN=Michael Gindonis/CN=413289378
issuer : /O=Grid/O=NorduGrid/OU=csc.fi/CN=Michael Gindonis
identity : /O=Grid/O=NorduGrid/OU=csc.fi/CN=Michael Gindonis
type : Proxy draft (pre-RFC) compliant impersonation proxy
strength : 512 bits
path : /tmp/x509up_u7060
timeleft : 11:59:39
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 14 / 36
15. Writing a job description file
● Resource Specification Language (RSL) files
are used to specify job requirements and
parameters for submission
−NorduGrid uses an extended language (xRSL)
based on the Globus RSL
● similar to scripts for local batch systems, but
include some additional attributes
− job name
− executable location and parameters
− location of input and output files of the job
− architecture, memory, disk and CPU time
requirements
− runtime environment requirements
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 15 / 36
16. xRSL example
● hellogrid.sh
#!/bin/sh
echo “Hello Grid!”
● hellogrid.xrsl
& (executable=hellogrid.sh)
(jobname=hellogrid)
(stdout=hello.out)
(stderr=hello.err)
(gmlog=gridlog)
(cputime=10)
(memory=200)
(disk=1)
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 16 / 36
17. Submitting the job
● submit the job
$ ngsub -d 1 -f hellogrid.xrsl
● a job id is returned
=> Job submitted with jobid
gsiftp://ametisti.grid.
helsinki.fi:2811/jobs/4556112397793721413313
07
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 17 / 36
18. ARC Grid Monitor
● shows currently
connected resources
● almost all elements
"clickable"
− browse queues and
job states by cluster
− list jobs belonging to
a certain user
● no authentication,
anyone can browse
the info
− privacy issues
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 18 / 36
19. Monitoring the Job
● Query the status using the command line
$ ngstat hellogrid
=> Job gsiftp://ametisti.grid.helsinki.fi:2811/
jobs/455611239779372141331307
Jobname: hellogrid
Status: INLRMS:Q
− Most common status values are ACCEPTED,
PREPARING, INLRMS:Q, INLRMS:R, FINISHING,
FINISHED
● Or use the Grid Monitor
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 19 / 36
20. Fetching the results
● print the job output
$ ngcat hellogrid
− shows the standard output of the job
− this can be done also during the job is running
● download the result files
$ ngget hellogrid
=> ngget: downloading files to
/home/ajt/455611239779372141331307
ngget: download successful - deleting job
from gatekeeper.
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 20 / 36
21. Using a storage element
● Storage Elements are disk servers accessible via the
Grid
−can be used to store job output while user is logged out and
client machine disconnected from the Grid
● allows to store input files close to the cluster where the
program is executed, on a high bandwidth network
● files can be local and remote in the same job:
(inputFiles=
("input1". "/home/user/myexperiment"
("input2", "gsiftp://se.example.com/files/data"))
(outputFiles=
("output",
"gsiftp://se.example.com/mydir/result1")
("prog.out",
"gsiftp://se.example.com/mydir/stdout"))
(stdout="prog.out")
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 21 / 36
22. Runtime environments
● software packages which are preinstalled on a
computing resource and made available through Grid
− just send the data and/or parameters to be processed
− useful if there are many users of the same software
or if the same program is used frequently
− allows local platform specific optimizations
● For a specific CPU or Parallel Environment
● Perhaps in the near future… GPUs, CUDA
● required runtime environments can be specified in
the job description file, for example:
(runtimeenvironment=APPS/GRAPH/POVRAY-3.6)
● Runtime Environment Registry:
− http://www.csc.fi/grid/rer/
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 22 / 36
23. ARC / NorduGrid / M-grid references
NorduGrid (resource monitor, presentations, tutorials, docs, …)
http://nordugrid.org/
ARC middleware
http://nordugrid.org/middleware
User guide: http://www.nordugrid.org/documents/ui.pdf
user support mailing list: nordugrid-support at nordugrid.org
M-grid (Finnish National Grid)
http://www.csc.fi/english/research/Computing_services/grid_environments/mgrid
https://extras.csc.fi/mgrid/
support email at CSC: grid-support at csc.fi
regular ARC training by CSC: http://www.csc.fi/english/csc/courses
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 23 / 36
24. Do I need to change my application
to use ARC?
three different approaches:
using the application as is: grid middleware will move the
executable and the data to the target system
● library dependencies often need to be resolved by linking
statically or packing them to go with the application
installing the application on the target system and using it via the
Grid interface
● batch processing type applications normally work without
changes, interactive applications are more difficult
● with ARC middleware this is facilitated by runtime
environments (RE)
modifying the application to fully exploit a distributed
environment
● using ARC libraries
● distributing over a large geographical area is not practical
unless the computation can be split to independent parts
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 24 / 36
25. Real life applications
● it's common to send several smaller jobs to the
Grid to solve a larger problem
● parallel MPI jobs to a single cluster are supported
(if correct runtime environment installed), but no
MPI between clusters
● splitting the job to suitable parts and gathering
the parts together is left to the user
− more error prone environment than traditional local
systems => error checking and recovery important
− fault reporting and debugging has room for
improvements
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 25 / 36
26. Real life applications
● Size your job to best exploit the grid
− group many short jobs into one to avoid submission
overhead
− If possible break up larger or longer jobs into
independent parts
− If your job must run for a long time, checkpoint your
results so that your calcuation can be resumed, no
resource will stay up indefinitely…
− M-grid is ideally suited to jobs of length 1 hour to 1 day.
● Use file caching if it is available
− Eliminate unnecessarily file transfers (load on network)
− Save time needed to stage files
− Save disk space on the cluster front-ends
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 26 / 36
27. Further development of ARC middleware
● Stated goal: not to undermine existing functionality and capabilities
available in “pre-…”ARC components (current stable version)
● Two SVN branches
● ARC0 (version 0.6.5, 0.8rc)
● Pre-existing production components (Pre-KnowARC project)
− Backported features from KnowARC
− Nordic DataGrid Facility provides support and backports features from
the KnowARC project into the current stable releases of ARC
● ARC1 (0.9.xxx)
− Next generation components developed by the KnowARC project
● More information at www.ndgf.org and www.knowarc.eu
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 27 / 36
28. What is new
Service Oriented Architecture
Modular structure
Self-sufficient core components
Interoperability built on standard
User and developer friendly
Business friendly open source
License Apache 2.0
Portable – runs on almost all Linux variants,
Solaris, porting to Windows and Mac OS in progress
Aiming at integration into Fedora
Debian and Ubuntu
07/11/09 www.knowarc.eu 28
30. Key Feature - New ARC client
Relies on dedicated library
Implemented in C++
Python and Java bindings
Allows easy development of application-
specific clients
Implements a user Grid toolbox
Handling of user & host credentials
computing resource discovery &
information retrieval
matchmaking & brokering & job
submission
input/output data handling
The new library and arc* commands can
handle glite-CREAM and UNICORE
Windows and Mac OS client
GUI – user interface, just delivered !
07/11/09 www.knowarc.eu 30
31. Key Feature - HED
HED – The Hosting
Environment Daemon
Container for all the
server-side functional
components
Main functions:
Route messages
between the services and
the outside world
Provide inter service
communication
Provides a basic security
infrastructure
Consists of pluggable
modules
Light-weight (no Apache,
no Axis)
07/11/09 www.knowarc.eu 31
32. Key Service – A-Rex
ARC Resource-coupled Execution Service
Provides Execution Management capability
The Grid Manager from ARC Classic as core
Extended with WS interface implementing Basic Execution
Service (BES)
Accepts Job Submission Description Language (JSDL)
Information and resource discovery – GLUE 2 schema
Support for wide range of Local Resource
Management Systems:
Torque, PBS/OpenPBS, SGE,
LoadLeveler, LSF, Condor and SLURM
Released in ARC 0.8, available at:
http://wiki.nordugrid.org/index.php/ARC_v0.8
07/11/09 www.knowarc.eu 32
33. Key Service – New Storage
‘Distributed by Design’ storage system
v Global namespace
v Supports collections and subcollections
to any depth
A-Hash – a replicated database to
store metadata
Librarian – handles:
v Metadata and hierarchy of collections
and files
v The location of replicas
v Health data of the shepherd services
Bartender - high-level interface for the
users an for other services
Shepherd – manages storage
services, and provides a simple
interface for storing files on storage
nodes
07/11/09 www.knowarc.eu 33
34. Welcome to ARC
Let’s begin…
Off to the PC classroom! (unless the coffee is ready)
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 34 / 36
35. Abstracting the middleware
http://technical.eu-egee.org/index.php?id=290
Expand the functionality of the grid infrastructure for users,
Reduce duplicated development when porting applications, and
Speeds the porting of new application to the grid.
GridWay Metascheduler (http://www.gridway.org/)
The GridWay Metascheduler performs job execution management and resource brokering, allowing
unattended, reliable, and efficient execution of jobs, job arrays, and workflows on heterogeneous and
dynamic Grids.
P-GRADE Portal (http://portal.p-grade.hu/)
The Parallel Grid Run-time and Application Development Environment Portal (P-GRADE Portal) is a
workflow-oriented graphical environment that covers every stage of Grid application lifecycles.
Ganga (http://ganga.web.cern.ch/ganga/)
Ganga is an easy-to-use frontend for job definition and management, implemented in Python. Ganga
allows trivial switching between testing on a local batch system and large-scale processing on Grid
resources.
ISSGC09, Sofia-Antipolis,France - Intro to ARC middleware, CSC – IT Center for Science Ltd. Slide 35 / 36