Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
1. GRID COMPUTING
1. GRID COMPUTING
Sandeep Kumar Poonia
Head Of Dept. CS/IT
B.E., M.Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
2. WHY GRID COMPUTING?
40% Mainframes are idle
90% Unix servers are idle
95% PC servers are idle
0-15% Mainframes are idle in peak-hour
70% PC servers are idle in peak-hour
Source: “Grid Computing” Dr Daron G Green
SandeepKumarPoonia
3. OUTLINE
Introduction to Grid Computing
Methods of Grid computing
Grid Middleware
Grid Architecture
SandeepKumarPoonia
4. SandeepKumarPoonia
ELECTRICAL POWER GRID
ANALOGY
Electrical power
grid
users (or electrical
appliances) get access to
electricity through wall
sockets with no care or
consideration for where or
how the electricity is
actually generated.
“The power grid” links
together power plants of
many different kinds
The Grid
users (or client applications) gain
access to computing resources
(processors, storage, data,
applications, and so on) as needed
with little or no knowledge of where
those resources are located or what
the underlying technologies,
hardware, operating system, and so
on are
"the Grid" links together computing
resources (PCs, workstations, servers,
storage elements) and provides the
mechanism needed to access them.
5. Sandeep Kumar Poonia
WHY NEED GRID COMPUTING?
Core networking technology now accelerates at a much
faster rate than advances in microprocessor speeds
Exploiting under utilized resources
Parallel CPU capacity
Virtual resources and virtual organizations for
collaboration
Access to additional resources
6. Sandeep Kumar Poonia
WHO NEEDS GRID COMPUTING?
Not just computer scientists…
scientists “hit the wall” when faced with situations:
The amount of data they need is huge and the data is stored in
different institutions.
The amount of similar calculations the scientist has to do is
huge.
Other areas:
Government
Business
Education
Industrial design
……
7. LIVING IN AN EXPONENTIAL WORLD
(1) COMPUTING & SENSORS
Moore‘s Law: transistor count doubles each 18 months
Magnetohydro-
dynamics
star formation
SandeepKumarPoonia
8. LIVING IN AN EXPONENTIAL WORLD:
(2) STORAGE
Storage density doubles every 12 months
Dramatic growth in online data (1 petabyte =
1000 terabyte = 1,000,000 gigabyte)
2000 ~0.5 petabyte
2005 ~10 petabytes
2010 ~100 petabytes
2015 ~1000 petabytes?
Transforming entire disciplines in physical and,
increasingly, biological sciences; humanities
next?
SandeepKumarPoonia
9. DATA INTENSIVE PHYSICAL SCIENCES
High energy & nuclear physics
Including new experiments at CERN
Gravity wave searches
LIGO, GEO, VIRGO
Time-dependent 3-D systems (simulation, data)
Earth Observation, climate modeling
Geophysics, earthquake modeling
Fluids, aerodynamic design
Pollutant dispersal scenarios
Astronomy: Digital sky surveys
SandeepKumarPoonia
10. ONGOING ASTRONOMICAL MEGA-SURVEYS
Large number of new surveys
Multi-TB in size, 100M objects or larger
In databases
Individual archives planned and under way
Multi-wavelength view of the sky
> 13 wavelength coverage within 5 years
Impressive early discoveries
Finding exotic objects by unusual colors
L,T dwarfs, high redshift quasars
Finding objects by time variability
Gravitational micro-lensing
MACHO
2MASS
SDSS
DPOSS
GSC-II
COBE
MAP
NVSS
FIRST
GALEX
ROSAT
OGLE
...
SandeepKumarPoonia
11. COMING FLOODS OF ASTRONOMY DATA
The planned Large Synoptic Survey Telescope
will produce over 10 petabytes per year by 2008!
All-sky survey every few days, so will have fine-grain
time series for the first time
SandeepKumarPoonia
12. DATA INTENSIVE BIOLOGY AND MEDICINE
Medical data
X-Ray, mammography data, etc. (many petabytes)
Digitizing patient records
X-ray crystallography
Molecular genomics and related disciplines
Human Genome, other genome databases
Proteomics (protein structure, activities, …)
Protein interactions, drug delivery
Virtual Population Laboratory (proposed)
Simulate likely spread of disease outbreaks
Brain scans (3-D, time dependent)
SandeepKumarPoonia
13. And comparisons must be
made among many
We need to get to one micron to know location of every cell. We’re just now
starting to get to 10 microns – Grids will help get us there and further
A BRAIN
IS A LOT
OF DATA!
(MARK ELLISMAN, UCSD)
SandeepKumarPoonia
14. Fastest virtual supercomputers
SandeepKumarPoonia
As of April 2013, Folding@home – 11.4 x86-equivalent
(5.8 "native") PFLOPS.
As of March 2013, BOINC – processing on average 9.2
PFLOPS.
As of April 2010, MilkyWay@Home computes at over
1.6 PFLOPS, with a large amount of this work coming from
GPUs.
As of April 2010, SETI@Home computes data averages
more than 730 TFLOPS.
As of April 2010, Einstein@Home is crunching more than
210 TFLOPS.
As of June 2011, GIMPS is sustaining 61 TFLOPS.
15. HOW GRID COMPUTING WORKS
Super computer,
Big mainframe…
Idol time
Idol CPU
Idol CPU
Idol time
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
SandeepKumarPoonia
16. HOW GRID COMPUTING WORKS
Virtual machine
Virtual CPU…
Idol time
Idol CPU
Idol CPU
Idol time
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
SandeepKumarPoonia
17. HOW GRID COMPUTING WORKS
Grid
Computing
0% idol
0% idol
0% idol
0% idol
Source: “The Evolving Computing Model: Grid Computing” Michael Teyssedre
SandeepKumarPoonia
19. WHAT IS A GRID?
Many definitions exist in the literature
Early defs: Foster and Kesselman, 1998
―A computational grid is a hardware and software
infrastructure that provides dependable, consistent,
pervasive, and inexpensive access to high-end
computational facilities‖
Kleinrock 1969:
―We will probably see the spread of ‗computer utilities‘,
which, like present electric and telephone utilities, will
service individual homes and offices across the country.‖
SandeepKumarPoonia
20. 3-POINT CHECKLIST (FOSTER 2002)
1. Coordinates resources not subject to
centralized control
2. Uses standard, open, general purpose protocols
and interfaces
3. Deliver nontrivial qualities of service
• e.g., response time, throughput, availability,
security
SandeepKumarPoonia
21. DEFINITION
Grid computing is…
A distributed computing system
Where a group of computers are connected
To create and work as one large virtual
computing power, storage, database, application,
and service
SandeepKumarPoonia
22. DEFINITION
Grid computing…
Allows a group of computers to share the system
securely and
Optimizes their collective resources to meet
required workloads
By using open standards
SandeepKumarPoonia
23. GRID COMPUTING
Grid computing is a form of distributed computing
whereby a "super and virtual computer" is composed of a
cluster of networked, loosely coupled computers, acting in
concert to perform very large tasks.
Grid computing (Foster and Kesselman, 1999) is a
growing technology that facilitates the executions of
large-scale resource intensive applications on
geographically distributed computing resources.
Facilitates flexible, secure, coordinated large scale
resource sharing among dynamic collections of
individuals, institutions, and resource
Enable communities (―virtual organizations‖) to share
geographically distributed resources as they pursue
common goals
Ian Foster and Carl Kesselman
SandeepKumarPoonia
24. A COMPARISON
SERIAL
Fetch/Store
Compute
PARALLEL
Fetch/Store
Compute/
communicate
Cooperative game
GRID
Fetch/Store
Discovery of Resources
Interaction with remote
application
Authentication /
Authorization
Security
Compute/Communicate
Etc
SandeepKumarPoonia
25. DISTRIBUTED COMPUTING VS. GRID
Grid is an evolution of distributed computing
Dynamic
Geographically independent
Built around standards
Internet backbone
Distributed computing is an ―older term‖
Typically built around proprietary
software and network
Tightly couples systems/organization
SandeepKumarPoonia
26. WEB VS.
GRID
Web
Uniform naming access to documents
Grid - Uniform, high performance access to computational
resources
Colleges/R&D
Labs
Software
Catalogs
Sensor nets
http://
http://
SandeepKumarPoonia
27. IS THE WORLD WIDE WEB A
GRID ?
Seamless naming? Yes
Uniform security and Authentication? No
Information Service? Yes or No
Co-Scheduling? No
Accounting & Authorization ? No
User Services? No
Event Services? No
Is the Browser a Global Shell ? No
SandeepKumarPoonia
28. WHAT DOES THE WORLD WIDE WEB BRING TO
THE GRID ?
Uniform Naming
A seamless, scalable information service
A powerful new meta-data language: XML
XML will be standard language for
describing information in the grid
SOAP – simple object access protocol
Uses XML for encoding. HTML for protocol
SOAP may become a standard RPC
mechanism for Grid services
Uses XML for encoding. HTML for protocol
Portal Ideas
SandeepKumarPoonia
29. THE ULTIMATE GOAL
In future I will not know or care
where my application will be
executed as I will acquire and pay
to use these resources as I need
them
SandeepKumarPoonia
30. WHY GRIDS?
Large-scale science and engineering are done
through the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed.
The overall motivation for ―Grids‖ is to facilitate
the routine interactions of these resources in order
to support large-scale science and Engineering.
SandeepKumarPoonia
31. AN EXAMPLE VIRTUAL ORGANIZATION:
CERN‘S LARGE HADRON COLLIDER
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?
SandeepKumarPoonia
32. GRID COMMUNITIES & APPLICATIONS:
DATA GRIDS FOR HIGH ENERGY PHYSICS
Tier2 Centre
~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional
Centre
Italy Regional
Centre
Germany Regional
Centre
InstituteInstituteInstitute
Institute
~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
channels; data for these channels should be cached by the
institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec
or Air Freight (deprecated)
Tier2 Centre
~1 TIPS
Tier2 Centre
~1 TIPS
Tier2 Centre
~1 TIPS
Caltech
~1 TIPS
~622 Mbits/sec
Tier
0
Tier
1
Tier
2
Tier
4
1 TIPS is approximately 25,000
SpecInt95 equivalents
www.griphyn.org www.ppdg.net www.eu-datagrid.org
SandeepKumarPoonia
34. Early 90s
Gigabit testbeds, metacomputing
Mid to late 90s
Early experiments (e.g., I-WAY), academic software projects
(e.g., Globus, Legion), application experiments
2002
Dozens of application communities & projects
Major infrastructure deployments
Significant technology base (esp. Globus ToolkitTM)
Growing industrial interest
Global Grid Forum: ~500 people, 20+ countries
THE GRID:
A BRIEF HISTORY
SandeepKumarPoonia
35. HOW IT EVOLVES
Utility computing
Service grid
Data grid
Processing grid
Virtualization
Service-oriented
Open standard
SandeepKumarPoonia
36. EARLY ADOPTERS
Academic
Big science
Life science
Nuclear engineering
Simulation…
SandeepKumarPoonia
37. MARKET POTENTIAL
Financial services:
risk management and compliance
Automotive:
acceleration of product development
Petroleum:
discovery of oils
Source: “Perspectives on grid: Grid computing - next-generation distributed computing" Matt Haynos, 01/27/04
SandeepKumarPoonia
38. Criteria for a Grid:
Coordinates resources that are not subject to
centralized control.
Uses standard, open, general-purpose protocols
and interfaces.
Delivers nontrivial qualities of service.
e.g., response time, throughput, availability, security
Benefits
Exploit Underutilized resources
Resource load Balancing
Virtualize resources across an enterprise
Data Grids, Compute Grids
Enable collaboration for virtual organizations
SandeepKumarPoonia
39. WHY DO WE NEED GRIDS?
Many large-scale problems cannot be solved by a
single computer
Globally distributed data and resources
SandeepKumarPoonia
40. GRID APPLICATIONS
Data and computationally intensive applications:
This technology has been applied to computationally-
intensive scientific, mathematical, and academic problems
like drug discovery, economic forecasting, seismic analysis
back office data processing in support of e-commerce
A chemist may utilize hundreds of processors to screen
thousands of compounds per hour.
Teams of engineers worldwide pool resources to analyze
terabytes of structural data.
Meteorologists seek to visualize and analyze petabytes of
climate data with enormous computational demands.
Resource sharing
Computers, storage, sensors, networks, …
Sharing always conditional: issues of trust, policy,
negotiation, payment, …
Coordinated problem solving
distributed data analysis, computation, collaboration, …
SandeepKumarPoonia
41. GRID TOPOLOGIES
• Intragrid
– Local grid within an organisation
– Trust based on personal contracts
• Extragrid
– Resources of a consortium of organisations
connected through a (Virtual) Private Network
– Trust based on Business to Business contracts
• Intergrid
– Global sharing of resources through the
internet
– Trust based on certification
SandeepKumarPoonia
42. COMPUTATIONAL GRID
―A computational grid is a hardware and software infrastructure
that provides dependable, consistent, pervasive, and inexpensive
access to high-end computational capabilities.‖
‖The Grid: Blueprint for a New Computing Infrastructure‖,
Kesselman & Foster
Example : Science Grid (US Department of Energy)
SandeepKumarPoonia
43. DATA GRID
A data grid is a grid computing system that deals with
data — the controlled sharing and management of
large amounts of distributed data.
Data Grid is the storage component of a grid environment.
Scientific and engineering applications require access to
large amounts of data, and often this data is widely
distributed. A data grid provides seamless access to the
local or remote data required to complete compute
intensive calculations.
Example :
Biomedical informatics Research Network (BIRN),
the Southern California earthquake Center (SCEC).
SandeepKumarPoonia
45. CLUSTER COMPUTING
Idea: put some PCs together and get them to
communicate
Cheaper to build than a mainframe
supercomputer
Different sizes of clusters
Scalable – can grow a cluster by adding more PCs
SandeepKumarPoonia
47. PEER-TO-PEER COMPUTING
Connect to other computers
Can access files from any computer on the
network
Allows data sharing without going through
central server
Decentralized approach also useful for Grid
SandeepKumarPoonia
50. DISTRIBUTED SUPERCOMPUTING
Combining multiple high-capacity resources on
a computational grid into a single, virtual
distributed supercomputer.
Tackle problems that cannot be solved on a
single system.
Examples: climate modeling, computational
chemistry
Challenges include:
Scheduling scarce and expensive resources
Scalability of protocols and algorithms
Maintaining high levels of performance across
heterogeneous systems
SandeepKumarPoonia
51. HIGH-THROUGHPUT COMPUTING
Uses the grid to schedule large numbers of
loosely coupled or independent tasks, with the
goal of putting unused processor cycles to
work.
Schedule large numbers of independent tasks
Goal: exploit unused CPU cycles (e.g., from
idle workstations)
Unlike distributed computing, tasks loosely
coupled
Examples: parameter studies, cryptographic
problems
SandeepKumarPoonia
52. On-Demand Computing
Uses grid capabilities to meet short-term
requirements for resources that are not
locally accessible.
Models real-time computing demands.
Use Grid capabilities to meet short-term
requirements for resources that cannot
conveniently be located locally
Unlike distributed computing, driven by cost-
performance concerns rather than absolute
performance
Dispatch expensive or specialized
computations to remote servers
SandeepKumarPoonia
53. COLLABORATIVE COMPUTING
Concerned primarily with enabling and
enhancing human-to-human interactions.
Enable shared use of data archives and
simulations
Applications are often structured in terms of a
virtual shared space.
Examples:
Collaborative exploration of large geophysical data sets
Challenges:
Real-time demands of interactive applications
Rich variety of interactions
SandeepKumarPoonia
54. Data-Intensive Computing
The focus is on synthesizing new information
from data that is maintained in geographically
distributed repositories, digital libraries, and
databases.
Particularly useful for distributed data mining.
Examples:
•High energy physics generate terabytes of distributed data, need complex
queries to detect “interesting” events
•Distributed analysis of Sloan Digital Sky Survey data
SandeepKumarPoonia
55. LOGISTICAL NETWORKING
Logistical networks focus on exposing storage
resources inside networks by optimizing the
global scheduling of data transport, and data
storage.
Contrasts with traditional networking, which
does not explicitly model storage resources in the
network.
high-level services for Grid applications
Called "logistical" because of the analogy it bears
with the systems of warehouses, depots, and
distribution channels.
SandeepKumarPoonia
56. P2P COMPUTING VS GRID
COMPUTING
Differ in Target Communities
Grid system deals with more complex, more
powerful, more diverse and highly interconnected
set of resources than
P2P.
SandeepKumarPoonia
57. A TYPICAL VIEW OF GRID
ENVIRONMENT
User
Resource Broker
Grid Resources
Grid Information Service
A User sends computation
or data intensive application
to Global Grids in order to
speed up the execution of
the application.
A Resource Broker distribute the
jobs in an application to the Grid
resources based on user’s QoS
requirements and details of available
Grid resources for further executions.
Grid Resources (Cluster, PC,
Supercomputer, database,
instruments, etc.) in the Global
Grid execute the user jobs.
Grid Information Service
system collects the details of
the available Grid resources
and passes the information
to the resource broker.
Computation result
Grid application
Computational jobs
Details of Grid resources
Processed jobs
1
2
3
4
SandeepKumarPoonia
58. GRID MIDDLEWARE
Grids are typically managed by grid ware -
a special type of middleware that enable sharing and
manage grid components based on user requirements
and resource attributes (e.g., capacity, performance)
Software that connects other software components or
applications to provide the following functions:
Run applications on suitable available resources
– Brokering, Scheduling
Provide uniform, high-level access to resources
– Semantic interfaces
– Web Services, Service Oriented Architectures
Address inter-domain issues of security, policy, etc.
– Federated Identities
Provide application-level status
monitoring and control
SandeepKumarPoonia
59. MIDDLEWARES
Globus –chicago Univ
Condor – Wisconsin Univ – High throughput
computing
Legion – Virginia Univ – virtual workspaces-
collaborative computing
IBP – Internet back pane – Tennesse Univ –
logistical networking
NetSolve – solving scientific problems in
heterogeneous env – high throughput & data
intensive
SandeepKumarPoonia
60. TWO KEY GRID COMPUTING GROUPS
The Globus Alliance (www.globus.org)
Composed of people from:
Argonne National Labs, University of Chicago, University of
Southern California Information Sciences Institute,
University of Edinburgh and others.
OGSA/I standards initially proposed by the Globus Group
The Global Grid Forum (www.ggf.org)
Heavy involvement of Academic Groups and Industry
(e.g. IBM Grid Computing, HP, United Devices, Oracle,
UK e-Science Programme, US DOE, US NSF, Indiana
University, and many others)
Process
Meets three times annually
Solicits involvement from industry, research groups, and
academics
SandeepKumarPoonia
61. GRID USERS
Many levels of users
Grid developers
Tool developers
Application developers
End users
System administrators
SandeepKumarPoonia
62. SOME GRID CHALLENGES
Data movement
Data replication
Resource management
Job submission
SandeepKumarPoonia
63. SOME OF THE MAJOR GRID PROJECTS
Name URL/Sponsor Focus
EuroGrid, Grid
Interoperability
(GRIP)
eurogrid.org
European Union
Create tech for remote access to super
comp resources & simulation codes; in
GRIP, integrate with Globus Toolkit™
Fusion Collaboratory fusiongrid.org
DOE Off. Science
Create a national computational
collaboratory for fusion research
Globus Project™ globus.org
DARPA, DOE,
NSF, NASA, Msoft
Research on Grid technologies;
development and support of Globus
Toolkit™; application and deployment
GridLab gridlab.org
European Union
Grid technologies and applications
GridPP gridpp.ac.uk
U.K. eScience
Create & apply an operational grid within the
U.K. for particle physics research
Grid Research
Integration Dev. &
Support Center
grids-center.org
NSF
Integration, deployment, support of the NSF
Middleware Infrastructure for research &
education
SandeepKumarPoonia
64. SandeepKumarPoonia
Grid in India-GARUDA
•GARUDA is India's Grid Computing
initiative connecting 17 cities across the
country.
•The 45 participating institutes in this
nationwide project include all the IITs and
C-DAC centers and other major institutes
in India.
65. GLOBUS GRID TOOLKIT
Open source toolkit for building Grid systems and
applications
Enabling technology for the Grid
Share computing power, databases, and other tools securely
online
Facilities for:
Resource monitoring
Resource discovery
Resource management
Security
File management
SandeepKumarPoonia
66. DATA MANAGEMENT IN GLOBUS
TOOLKIT
Data movement
GridFTP
Reliable File Transfer (RFT)
Data replication
Replica Location Service (RLS)
Data Replication Service (DRS)
SandeepKumarPoonia
67. GRIDFTP
High performance, secure, reliable data transfer protocol
Optimized for wide area networks
Superset of Internet FTP protocol
Features:
Multiple data channels for parallel transfers
Partial file transfers
Third party transfers
Reusable data channels
Command pipelining
SandeepKumarPoonia
68. MORE GRIDFTP FEATURES
Auto tuning of parameters
Striping
Transfer data in parallel among multiple senders and
receivers instead of just one
Extended block mode
Send data in blocks
Know block size and offset
Data can arrive out of order
Allows multiple streams
SandeepKumarPoonia
70. LIMITATIONS OF GRIDFTP
Not a web service protocol (does not employ
SOAP, WSDL, etc.)
Requires client to maintain open socket
connection throughout transfer
Inconvenient for long transfers
Cannot recover from client failures
SandeepKumarPoonia
72. RELIABLE FILE TRANSFER (RFT)
Web service with ―job-scheduler‖ functionality for data
movement
User provides source and destination URLs
Service writes job description to a database and moves
files
Service methods for querying transfer status
SandeepKumarPoonia
74. REPLICA LOCATION SERVICE (RLS)
Registry to keep track of where replicas exist on physical
storage system
Users or services register files in RLS when files created
Distributed registry
May consist of multiple servers at different sites
Increase scale
Fault tolerance
SandeepKumarPoonia
75. REPLICA LOCATION SERVICE (RLS)
Logical file name – unique identifier for contents of
file
Physical file name – location of copy of file on
storage system
User can provide logical name and ask for replicas
Or query to find logical name associated with
physical file location
SandeepKumarPoonia
76. DATA REPLICATION SERVICE (DRS)
Pull-based replication capability
Implemented as a web service
Higher-level data management service built on top of RFT
and RLS
Goal: ensure that a specified set of files exists on a storage
site
First, query RLS to locate desired files
Next, creates transfer request using RFT
Finally, new replicas are registered with RLS
SandeepKumarPoonia
77. CONDOR
Original goal: high-throughput computing
Harvest wasted CPU power from other machines
Can also be used on a dedicated cluster
Condor-G – Condor interface to Globus resources
SandeepKumarPoonia
78. CONDOR
Provides many features of batch systems:
job queueing
scheduling policy
priority scheme
resource monitoring
resource management
Users submit their serial or parallel jobs
Condor places them into a queue
Scheduling and monitoring
Informs the user upon completion
SandeepKumarPoonia
79. NIMROD-G
Tool to manage execution of parametric studies across
distributed computers
Manages experiment
Distributing files to remote systems
Performing the remote computation
Gathering results
User submits declarative plan file
Parameters, default values, and commands necessary for
performing the work
Nimrod-G takes advantage of Globus toolkit features
SandeepKumarPoonia
82. EARTH SYSTEM GRID
Provide climate studies scientists with access to
large datasets
Data generated by computational models –
requires massive computational power
Most scientists work with subsets of the data
Requires access to local copies of data
SandeepKumarPoonia
83. ESG INFRASTRUCTURE
Archival storage systems and disk storage systems at
several sites
Storage resource managers and GridFTP servers to
provide access to storage systems
Metadata catalog services
Replica location services
Web portal user interface
SandeepKumarPoonia
86. LASER INTERFEROMETER
GRAVITATIONAL WAVE
OBSERVATORY (LIGO)
Instruments at two sites to detect gravitational waves
Each experiment run produces millions of files
Scientists at other sites want these datasets on local storage
LIGO deploys RLS servers at each site to register local
mappings and collect info about mappings at other sites
SandeepKumarPoonia
87. LARGE SCALE DATA REPLICATION
FOR LIGO
Goal: detection of gravitational waves
Three interferometers at two sites
Generate 1 TB of data daily
Need to replicate this data across 9 sites to make
it available to scientists
Scientists need to learn where data items are,
and how to access them
SandeepKumarPoonia
89. LIGO SOLUTION
Lightweight data replicator (LDR)
Uses parallel data streams, tunable TCP windows, and
tunable write/read buffers
Tracks where copies of specific files can be found
Stores descriptive information (metadata) in a
database
Can select files based on description rather than filename
SandeepKumarPoonia
90. TERAGRID
NSF high-performance computing facility
Nine distributed sites, each with different
capability , e.g., computation power, archiving
facilities, visualization software
Applications may require more than one site
Data sizes on the order of gigabytes or terabytes
SandeepKumarPoonia
92. TERAGRID
Solution: Use GridFTP and RFT with front end
command line tool (tgcp)
Benefits of system:
Simple user interface
High performance data transfer capability
Ability to recover from both client and server software
failures
Extensible configuration
SandeepKumarPoonia
93. TGCP DETAILS
Idea: hide low level GridFTP commands from users
Copy file smallfile.dat in a working directory to another
system:
tgcp smallfile.dat tg-login.sdsc.teragrid.org:/users/ux454332
GridFTP command:
globus-url-copy -p 8 -tcp-bs 1198372
gsiftp://tg-gridftprr.uc.teragrid.org:2811/home/navarro/smallfile.dat
gsiftp://tg-login.sdsc.teragrid.org:2811/users/ux454332/smallfile.dat
SandeepKumarPoonia
95. THE HOURGLASS MODEL
Focus on architecture issues
Propose set of core services as
basic infrastructure
Used to construct high-level,
domain-specific solutions
(diverse)
Design principles
Keep participation cost low
Enable local control
Support for adaptation
―IP hourglass‖ model
Diverse global services
Core
services
Local OS
A p p l i c a t i o n s
SandeepKumarPoonia
96. LAYERED GRID ARCHITECTURE
(BY ANALOGY TO INTERNET ARCHITECTURE)
Application
Fabric
“Controlling things locally”: Access
to, & control of, resources
Connectivity
“Talking to things”: communication
(Internet protocols) & security
Resource
“Sharing single resources”:
negotiating access, controlling use
Collective
“Coordinating multiple resources”:
ubiquitous infrastructure services,
app-specific distributed services
Internet
Transport
Application
Link
InternetProtocolArchitecture
SandeepKumarPoonia
97. EXAMPLE:
DATA GRID ARCHITECTURE
Discipline-Specific Data Grid Application
Coherency control, replica selection, task management,
virtual data catalog, virtual data code catalog, …
Replica catalog, replica management, co-allocation,
certificate authorities, metadata catalogs,
Access to data, access to computers, access to network
performance data, …
Communication, service discovery (DNS), authentication,
authorization, delegation
Storage systems, clusters, networks, network caches, …
Collective
(App)
App
Collective
(Generic)
Resource
Connect
Fabric
SandeepKumarPoonia
99. SIMULATION TOOL
GridSim is a Java-based toolkit for modeling,
and simulation of distributed resource
management and scheduling for conventional
Grid environment.
GridSim is based on SimJava, a general
purpose discrete-event simulation package
implemented in Java.
All components in GridSim communicate with
each other through message passing operations
defined by SimJava.
SandeepKumarPoonia
100. SALIENT FEATURES OF THE GRIDSIM
It allows modeling of heterogeneous types of
resources.
Resources can be modeled operating under space-
or time-shared mode.
Resource capability can be defined (in the form of
MIPS (Million Instructions Per Second)
benchmark.
Resources can be located in any time zone.
Weekends and holidays can be mapped
depending on resource‘s local time to model non-
Grid (local) workload.
Resources can be booked for advance reservation.
Applications with different parallel application
models can be simulated.
SandeepKumarPoonia
101. SALIENT FEATURES OF THE GRIDSIM
Application tasks can be heterogeneous and they can
be CPU or I/O intensive.
There is no limit on the number of application jobs
that can be submitted to a resource.
Multiple user entities can submit tasks for execution
simultaneously in the same resource, which may be
time-shared or space-shared. This feature helps in
building schedulers that can use different market-
driven economic models for selecting services
competitively.
Network speed between resources can be specified.
It supports simulation of both static and dynamic
schedulers.
Statistics of all or selected operations can be recorded
and they can be analyzed using GridSim statistics
analysis methods.
SandeepKumarPoonia
102. A MODULAR ARCHITECTURE FOR GRIDSIM
PLATFORM AND COMPONENTS.
Appn Conf Res Conf User Req Grid Sc Output
Application, User, Grid Scenario’s input and Results
Grid Resource Brokers or Schedulers
…
Appn
modeling
Res entity Info serv Job mgmt Res alloc Statis
GridSim Toolkit
Single
CPU
SMPs Clusters Load Netw Reservation
Resource Modeling and Simulation
SimJava Distributed SimJava
Basic Discrete Event Simulation Infrastructure
PCs Workstation ClustersSMPs Distributed
Resources
Virtual Machine
SandeepKumarPoonia