Grid computing enables sharing and aggregation of distributed resources as a single system. It originated in 1997 to better utilize idle computing resources. Key developments included OGSA/OGSI for services and Globus Toolkit for security. Grids coordinate decentralized resources using open standards to provide quality of service. They allow exploiting underutilized resources and provide massive parallel CPU capacity for data-intensive applications like bioinformatics. Users must enroll in a virtual organization and install client software to access shared resources for their jobs.
1. An Introduction to Grid Computing
Prashanth Chengi
NPSF, C-DAC Pune
July 09, 2012
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 1 / 35
2. Plan of Talk
1 What is ‘Grid Computing’ ?
2 History of Grid Computing
3 Milestones in Grid Computing
4 Grid Checklist
5 Need for Grid Computing
6 Advantanges of Grid Computing
7 Applications which can run on the grid
8 Architecture of the Grid
9 Virtual Organizations
10 Grid software components
11 Grid Computing: A user’s perspective
12 Grid Computing: An administrator’s perspective
13 Challenges of Grid Computing
14 Questions?
15 References
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 2 / 35
3. What is Grid Computing?
Grid computing is enabling, sharing, selection, and aggregation of
distributed resources and presenting them as a single, unified resource.
Grand vision: Analogous to power grids.
Users can use resources without needing to know source.
An abstraction of implementation specifics from users.
‘The whole is bigger than the part’.
Allows users to use more resources than they independently own.
Related terms: Collaborative computing, cooperative computing,
shared computing.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 3 / 35
4. History of Grid Computing
Born at a workshop called “Building a Computational Grid” held at
Argonne National Laboratory in September 1997.
Ian Foster, Carl Kesselman, and Steve Tuecker are regarded as
‘Fathers of grid computing’.
Immediate ancestor: Metacomputing, Circa 1990
FAFNER (Factoring via Network-Enabled Recursion) and I-WAY
(Information Widea Area Year) were early adopters of metacomputing.
SETI@Home, Folding@Home, BOINC, GIMPS
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 4 / 35
5. Milestones in GridComputing [4]
Open Grid Services Architecture (OGSA)
Specifies security policies.
Uses Grid Security Infrastructure (GSI) protocol.
Uses Web Services Description Language (WSDL) and Simple Object
Access Protocol (SOAP) for grid services.
Open Grid Services Infrastructure (OGSI)
Specifies communication protocols.
Web Services Resource Framework (WSRF)
Refactoring of OSGI to exploit web services.
Globus Toolkit
Toolkit for developing grid software.
Includes authentication framework, message-level and transport-level
security.
Provides Java classes and libraries for certificate-based authentication
support, access controls and credential management
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 5 / 35
6. Foster’s Three point grid checklist [1]
Coordinates resources that are not subjected to centralized control...
...using standard, open, general-purpose protocols and interfaces...
...to deliver nontrivial qualities of service.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 6 / 35
7. Need for Grid Computing
Millions of computer instruction cycles are wasted when not in use.
Users‘ programs are constrained by limited amount of available
resources.
If CPU cycle scavenging could be done, and the saved cycles shared,
resources could be better utilized.
Mutual resource sharing would mean users are no longer constrained
to use only resources actually owned/operated by themselves.
Enter grid computing.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 7 / 35
8. Advantages of Grid Computing
Exploitation of under utilized resources
In most organizations, desktop machines are busy less than 5% of the
time.
Often, even servers are idle.
Resources such as storage may also be under utilized.
These resources can be shared over the grid.
Parallel CPU capacity
In addition to pure scientific needs, industries such as bio-medical field,
finance, oil exploration and motion picture animation require massive
parallel CPU capacity.
These applications can easily tap into resources available over the grid.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 8 / 35
9. Advantages of Grid Computing
Data grids
Files and databases can span many systems and thus have larger
capacities than on any single system.
Such spanning can improve data transfer rates through use of striping
techniques.
Data can be duplicated throughout the grid to serve as backup.
Resource balancing
For applications that are grid-enabled, scheduling can be done on
machines with low utilization thereby achieving a resource balancing
effect.
An unexpected peak can be routed to relatively idle machines on the
grid.
If the grid is already fully utilized, lowest priority tasks can be
suspended or even cancelled and taken up later.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 9 / 35
10. Advantages of Grid Computing
Representation of data grid. Diag courtesy [3, IBM Redbook]
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 10 / 35
11. Advantages of Grid Computing
Reliability
High-end conventional computing systems use expensive hardware to
increase reliability.
Grid allows for machine redundancy and instant failover to other
resources.
Resources can be taken down for maintainance/upgrades withoun
crippling projects involved.
Communication
When machines on a grid are connected to the internet and don’t share
the same communication paths, they add to the total available
bandwidth.
It makes it possible to have redundant communication paths, as
communication can quickly be rerouted through other paths.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 11 / 35
12. Applications which can run on the grid
The nature of the grid restricts usage of the grid.
The grid cannot be used for all applications, but it is extremely
practical for certain types of applications.
High Throughput problems
Computing grids can be used to schedule these tasks across resources.
As soon as a processor finishes one task, the next task arrives. In this
way, hundreds of tasks can be performed in a very short time.
Embarrassingly parallel problems
These are problems which can be broken down into parts which are
completely independent of each other.
Example: Fingerprint matching in an extremely large database. The
images are unique and not dependent on each other.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 12 / 35
13. Applications which can run on the grid
Course-grained calculations
These are often embarrassingly parallel “Monte Carlo simulations”,
where parameters are varied and results observed.
High-performance problems
These are problems which require supercomputing resources.
Supercomputers generally deal with computer-centric problems; the
secret to solving these probems is “teraflops”: as many as possible.
HPC grids require extremely low-latency/high-throughput networks.
TeraGrid in US and DEISA in Europe are examples of supercomputing
grids.
In general, HPC applications are not suitable for running on grids
where network connectivity is not excellent or bandwidth is a
constraint.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 13 / 35
14. Architecture of the Grid
Network layer: The lowest layer, connecting the grid resources.
Resource layer: Resources may be computers, storage systems,
electronic catalogues, sensors etc connected to the network.
Middleware layer: Tools that enable various elements of the grid to
participate in a grid.
Application layer: The highest layer, it includes applications in
science, engineering, business, finance and more, as well as portals
and development toolkit to support applications.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 14 / 35
15. Virtual Organizations
Virtual Organinizations (VOs) are groups of people who share a
common goal.
To achieve their mutual goal, VO members share access to each
other’s computers, programs, files, data and networks in a controlled,
secure and flexible manner.
For example, the Earth sciences VO unites scientists and researchers
working in the domain of Earth sciences.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 15 / 35
16. Grid software components
Management components
All grids have some management components to keep track of resource
availability, membership information etc.
Grid software also needs to track capacities and current utilization of
nodes in realtime.
It’s also responsible for monitoring node health, usage patterns and
statistics.
Some grid systems provide their own login to the grid while others
depend on the native operating systems for user authentication.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 16 / 35
17. Grid software components
Distributed grid management
Often, grid software are hierarchical, thereby allowing for decentralized
management.
Clusters of clusters approach.
For example, a top-level scheduler only submits tasks to the
cluster-level scheduler, instead of trying to schedule the actual run of
the job.
Lower level schedulers handle the assignment of the task to the
individual machines and gathering of output to be passed to the
higher-level job manager.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 17 / 35
18. Grid software components
Donor software
Each machine on the grid needs to install some software which is
required by other members of the VO.
These software may be scientific libraries, compilers and other software
packages.
The machines need to have the necessary binaries to execute the users’
jobs.
Submission software
Usually any member machine of a grid can be used to submit jobs to
the grid and initiate grid queries.
However, on some grid systems, this function is implemented as a
separate component installed on submission nodes or submission
clients.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 18 / 35
19. Grid software components
Schedulers
Most grid systems include some sort of job scheduling software.
The scheduler locates a machine on which to run a grid job that has
been submitted by a user.
Simplest of schedulers are round-robin schedulers, which cyclically
assign jobs to machines matching the job requirements.
Other schedulers have complex scheduling logic and manage multiple
queues of jobs.
Schedulers measure current utilization of machines or depend on
cluster management software to provide it relevant figures.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 19 / 35
20. Grid software components
Schedulers (contd)
Schedulers may also be hierarchical, i.e. top-level scheduler submitting
jobs to cluster schedulers.
Schedulers generally maintain job state information and are responsible
for resubmitting jobs in the event of failures.
Schedulers also offer resource reservation, thereby eliminating the need
for the users to manually monitor resource availability.
Schedulers also often offer opportunistic job migration.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 20 / 35
21. Grid software components
Communications
Jobs submitted on the grid may need to communicate with each other.
For example, a job may split itself into a large number of subjobs which
need to exchange information amongst themselves.
The subjobs would need to be able to locate other subjobs and send
appropriate data.
As a result, the open standard Message Passing Interface (MPI) and its
variations are a often included as part of the grid system.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 21 / 35
22. Grid software components
Monitoring and measurement
Donor software often includes tools that measure current load and
activity on given machine using either OS tools or by direct
measurement.
Some grid systems provide means for implementing custom load
sensors for other than CPU or storage resources.
Schedulers often depend on these tools to make scheduling decisions.
These statistics are also useful for discovering usage patters in the grid.
Usage pattern analysis is used to better predict resource requirements
of the job for its next run.
The measurement information can be saved for accounting purposes or
as the basis for grid resource brokering.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 22 / 35
23. Grid Computing: A user’s perspective
In this section, we will see the the grid from a user’s perspective.
Enrolling and installing grid software
While there may be testbed grid setups with free and unrestricted
access to all, production grids require users to first sign up for VO
membership.
In order to obtain VO membership, it is mandatory to obtain a digital
certificate vouching for his/her identity.
The identity certificate will have to be obtained from a certification
authority (CA) trusted by the VO which the user wishes to subscribe to.
Upon verifying the identity of the user, the CA will issue a digital
certificate to the user, which the user has to safeguard and take
responsibility for.
Upon installing identity credentials, the user then has to install client
software for accessing the grid.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 23 / 35
24. Grid Computing: A user’s perspective
Logging onto the grid.
Many grid systems require the user to log on to a system using an id
enrolled in the grid.
Often, the digital certificate itself forms the user’s id for logging onto
the grid.
In case of the former, the user’s login information must be replicated
all over the grid in the exact fashion.
In case of the latter, the user’s credentials may be mapped to any local
account and it is completely opaque to the user.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 24 / 35
25. Grid Computing: A user’s perspective
Querying and submitting jobs
The user usually performs queries to check to the resource availability
on the grid.
The user may specify custom requirements in his submit script.
Grid systems usually provide command-line tools, if not graphical, to
check the status of jobs already submitted by the user and to query
status of the grid.
This allows users to write custom scripts to check the status of the grid
and automatically fire jobs, if conditions are favorable.
Scripts can also be used to submit pipeline jobs: a series of jobs in
which each job depends on the output of of it’s predecessor.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 25 / 35
26. Grid Computing: A user’s perspective
The job submit process
Firstly, the job input data and possibly the executable program/script is
staged in. Alternatively, the data and/or executable may already be on
the grid machine.
The job is the executed on the grid machine, either using a common
user credential or the user’s own grid identity.
The results of the job are sent back to the submitter in a process called
staging out.
In some cases, intermediate output is made available to the user
through console/GUI.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 26 / 35
27. Grid Computing: A user’s perspective
Data configuration
The data accessed by grid jobs may simply be staged in and out by the
grid system.
However, in case of pipe-line jobs and other subjobs, repeated staging
in can be avoided by using a networked file system instead.
Many grid sites also offer storage resource manager services which can
be used to store input data for repeated retrieval.
A user should always respect the grid site’s file storage policies.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 27 / 35
28. Grid Computing: A user’s perspective
Resource reservation
Many grid site offer the service of advance job reservations.
A user wanting to execute a job may apply for a slot in advance, in
which case jobs submitted by him will await for unreserved resource
availability or the commencement of reservation window, whichever
comes first.
Reservations fix the latest time a job may come into execution.
Users have to be careful in estimating resource requirements as
inaccurate estimations may adversely affect the job’s time spent in
queued state.
Sites offer reservations for not only compute resources but also other
resources such as scanners, sensors and storage.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 28 / 35
29. Grid Computing: An administrator’s perspective
Planning
The admin should understand the organization’s requirements and
accordingly deploy resources.
It is advisible to deploy a testbed grid to gain understanding of the
system and to experiment with settings before deploying them onto the
production environment.
Security
Admins must take care to prevent unauthorized access of data in a
multi-user grid environment.
The machines must be constantly monitored and updated to fix
vulnerability issues as and when they are discovered.
Public keys must be backed up and private keys must be carefully
secured.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 29 / 35
30. Grid Computing: An administrator’s perspective
User and quota management
Admins are required to ensure that VO members possess valid
accounts/credential mapping on all grid resources.
Admins must stay updated about user credential revokations and
cancellations and remove access privileges to such users.
They must plan and enforce restrictions on resources such as
processors, storage etc to ensure fair usage opportunity to all users.
They must actively monitor machines to ensure that all necessary
services are up and running.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 30 / 35
31. Grid Computing: An administrator’s perspective
Certificate Authority (CA)
It is critical to maintain highest levels of security in a grid because it
allows multiple users to not only access data but also to execute code.
The CA is responsible for positively identifying entities requesting for
VO membership/credentials and ensure their bonafides.
Issue certificates to users whose bonafides have been verified.
The CA should take all measures to protect the CA server.
He/she should ensure that members who have quit the VO are
promptly removed and revocation lists are regularly published.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 31 / 35
32. Challenges of Grid Computing
Security
Access policy: What is shared? Who is allowed to share? When sharing
can occur?
Authentication: How do you identify a user or resource?
Authorization: How to determine whether a certain operation is
adhering to rules?
These questions led to development of security infrastructure for the
grid.
User requirements
Compute resources are often not general purpose. They are tuned for
performance for certain classes of applications.
Users often require installation of custom software to run their
applications. This is problematic in shared access scenarios.
A need was therefore felt for setting up ‘Virtual Organizations’ (VOs),
in which people working on similar technologies/domains could share
resources amongst each other.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 32 / 35
33. Challenges of Grid Computing
Networking performance
Grids by definition have to be allow geographic distribution.
Networking becomes a major problem when resources are spread across
a WAN, across cities or even contries.
Grid middleware needs to have high degrees of fault tolerance, to allow
for intermittent and transient network failures.
Gridifying applications
Not all applications can be transformed to run in parallel or on a grid.
There are no practical tools for transforming arbitrary applications to
exploit the parallel capabilities of a grid: applications need to often be
rewritten.
Parallelizing a non-parallel application requires mathematical and
programming expertize.
Scalability of the actual problem.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 33 / 35
35. References
[1] Ian Foster. What is the Grid? A Three Point Checklist. url:
http://dlib.cs.odu.edu/WhatIsTheGrid.pdf.
[2] url: http://www.gridcafe.org/.
[3] B. Jacob et al. Introduction to grid computing. 2005. url:
http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf.
[4] Ken North. Milestones in Grid Computing. url:
http://www.gridsummit.com/Articles/Milestones.htm.
Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 35 / 35