Weitere ähnliche Inhalte Ähnlich wie Grid07 7 Gagliardi (20) Mehr von imec.archive (20) Grid07 7 Gagliardi1. 6/28/2007 11:46 PM
Outline
• Introductory remarks
• Reviewing emergence of e_Science
Opportunities and Challenges in • the intensive computing side
e_Science • the massive data side
• The opportunity of e_Science
• The challenges of e_Science
• A Microsoft contribution
Fabrizio Gagliardi & Christophe Van • Conclusions
Mollekot
Microsoft Corporation
2
Introductory remarks
Introductory remarks 2
• Who am I?
• A computer scientist who has spent 30 years at CERN • Joined Microsoft on 1/November/2005
(and in other scientific laboratories) developing HPC • My mission: Promoting Microsoft Computing into
systems for physics and other sciences Science and Science into Microsoft Computing
• Started in real-time, data acquisition and networking • by exploring and building important collaborations with
• Pioneered ES, AI, MPP systems, cluster computing and science in Europe, Middle East, Africa and Latin
in the last 7 years, Grid computing America
• Initiator of EU-DataGrid, EGEE and more than 10 other • Director in the Technical Computing team led by Tony
HPC and Grid projects (mostly within the EU IST Hey (Corporate VP)
programmes)
• Co-founder of the Global Grid Forum (started in
Amsterdam in 2001 together with EU-DataGrid)
• See my last article on IEEE Spectrum Magazine (July
2006)
3 4
A New Science Paradigm
Life
Thousand years ago: Sciences Social
Experimental Science Earth Sciences
- description of natural phenomena Sciences
Last few hundred years: 2
⎛ . ⎞
Theoretical Science ⎜a⎟ 4π G ρ c2
⎜ ⎟ = 3 − Κ a2
⎜a⎟
- Newton’s Laws, Maxwell’s Equations …
Newton’ Maxwell’ ⎝ ⎠
Last few decades:
decades:
Computational Science Accelerating
- simulation of complex phenomena Discovery
Today:
e-Science or Data-centric Science
Data- New Materials,
- unify theory, experiment, and simulation Multidisciplinary Technologies
- using massive computing and large data Research & Processes
exploration and mining:
• Data captured by instruments
• Data generated by simulations
Computer &
• Data generated by sensor networks
Information Math and
Scientists mostly work on computers
Sciences Physical Science
(With thanks to Jim Gray)
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
2. 6/28/2007 11:46 PM
CERN LHC Technology evolution has helped…
40 million particle collisions every 1991 1998 2005
second reduced by online computers to a System Cray Y-MP C916 Sun HPC10000 Small Form Factor PCs
few hundred “good” events per sec.
Architecture 16 x Vector 24 x 333MHz Ultra- 4 x 2.2GHz Athlon64
4GB, Bus SPARCII, 24GB, SBus 4GB, GigE
OS UNICOS Solaris 2.5.1 Windows Server 2003 SP1
Which are recorded on disk and magnetic tape GFlops ~10 ~10 ~10
at 100-1,000 MegaBytes/sec ~15 PetaBytes per year Top500 # 1 500 N/A
for all four experiments $40,000,000 $1,000,000 (40x drop) < $4,000 (250x drop)
Price
Customers Government Labs Large Enterprises Every Engineer & Scientist
Applications Classified, Climate, Manufacturing, Energy, Bioinformatics, Materials
Physics Research Finance, Telecom Sciences, Digital Media
7 8
High Energy Physics (LCG)
Top 500 Architectures / Systems Enabling Grids for E-sciencE
LCG depends on two major science Grid
500
infrastructures (plus regional Grids)
SIMD EGEE - Enabling Grids for E-Science
400 OSG - US Open Science Grid
Single Proc.
300
SMP
200 Const. Scale (June 2006):
~ 200 sites in 40 countries
100 Cluster
~ 25 000 CPUs
0 MPP > 10 PB storage
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
> 35 000 jobs per day
> 100 Virtual Organizations
INFSO-RI-508833 10
Grids in Biomedical Sciences Future ITER Fusion reactor
Enabling Grids for E-sciencE
• A multiplication of projects around the world
– Example: the National Bioinformatics Initiative in Holland
• The example of EGEE
– More than 20 applications in medical imaging, bioinformatics and
drug discovery
– Large scale deployment of in silico drug discovery initiatives
•T01 (E119A)
•T01 energy statistics
In Silico Docking On Malaria on 5
•90000 binding energy
Impact of mutations
grid infrastructures is breaking the
the world record for in silico on drug efficiency
•80000
docking energy
docking throughput against H5N1
•70000
Applications with distributed calculations: Monte Carlo,
•compound numbers
•60000
•55%
•1f8c
•1f8b, 1f8c Separate estimates, …
•number
•50000
•Do
•Bi
Multiple Ray Tracing: e. g. TRUBA
•40000
•11.58%
•30000 •2qwe
•binding energy
Stellarator Optimization: VMEC
•20000
•docking energy Transport and Kinetic Theory: Monte Carlo Codes
•10000
•0
•-23•-22•-21•-20•-19•-18•-17•-16•-15•-14•-13•-12•-11•-10 •-9 •-8 •-7 •-6 •-5 •-4 •-3 •-2 •-1 •0
12
EGEE-II INFSO-RI-031688 •kcal/mol
•Kcal/mol 11
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
3. 6/28/2007 11:46 PM
The data deluge
Data, Data, Data
• e_Science is now dominated by huge amounts of data
• Many discoveries are hidden in those data, but…
• How to organize, mine and understand the data?
• How to address the above issues in a scientist
friendly environment, this is where commodity
computing tools developed by Microsoft for
business and industry could help…
14
©
13
Courtesy of Carole Goble
The opportunity in e_Science
Courtesy of Carole Goble
• Replacing experimental activity (or part of it) with
computing simulation and modelling based on large
distributed computing infrastructures is what is now
called e_Science
• Allowing sharing of resources, not only computing, but
also data and people’s knowledge is what motivated
the emergency of grid computing and the
establishment of international virtual organisations
which replace local resident scientists
• This is major paradigm shift which requires scientists to
become expert in complex computing methods
15
©
16
The challenges (still) in e_Science The Problem for the e-Scientist
Experiments &
fa c
The applied scientist is obliged to become Instruments ts
also a computer scientist Other Archives facts questions
Far too much time is spent in developing
often over engineered computing solutions
distracting the applied scientist from their
Literature
Simulations
facts
fac
ts
? answers
primary mission
This has shifted the conventional scientific Data ingest Data Query and Visualization
tools
Managing Petabytes
computing paradigm and could limit Common schemas Support/training
scientific discovery in the future and How to organize it? Performance
produce major set backs How to reorganize it? Execute queries in a minute
Batch (big) query scheduling
How to coexist & cooperate with others?
17 18
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
4. 6/28/2007 11:46 PM
Can “Here and Now” technologies
accelerate discovery?
Can “Business” Tools and techniques
for dealing with
Computational Real-world
Modeling Data
Persistent
Distributed
Data
Workflow,
be used in scientific research to allow Data Mining
& Algorithms
researchers to be scientists and not
computer scientists… Interpretation
& Insight
19 20
Conclusion
We need to advance in making computing
easy to use for the scientists to
concentrate their energy on their science
rather than on the computing tools
Computational Real-world
Modeling Data Only in this way e_Science will be
successful in accelerating discovery and
Persistent
Distributed producing new breakthroughs
Data
Microsoft is investigating solutions in
Workflow,
Data Mining collaborations with leading scientists
& Algorithms
around the world with its Technical
Computing Initiative
Interpretation
& Insight
21 22
Four ‘Pillars’ of Technical Computing
Pillars’
@ Microsoft
Technical Computing @ Microsoft
Commitment to Science
Mission Statement:
Global Collaboration
‘Promoting Computing into Science
and Science into Computing’
Computing’ Technology Excellence
Interoperability
23 24
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
5. 6/28/2007 11:46 PM
Technical Computing at Microsoft Fighting HIV with Computer Science
Advanced Computing for Science and
Engineering A major problem: Over 40 million infected
Application of new algorithms, tools and Drug treatments are effective but are an expensive
life commitment
technologies to scientific and engineering
problems Vaccine needed for third world countries
Effective vaccine could eradicate disease
High Productivity Computing
Methods from computer science are helping
Application of high performance clusters,
with the design of vaccine
information worker tools and database
technologies to industrial and scientific Machine learning: Finding biological patterns that
may stimulate the immune system to fight the HIV
applications virus
Radical Computing Optimization methods: Compressing these patterns
into a small, effective vaccine
Research in potential breakthrough
25 technologies 26
MICROSOFT SPONSORED RESEARCH AT THE CENTER Technical Computing and HPC
FOR BIOINFORMATICS AND GENOME BIOLOGY AND
THE FUNDACION CIENCIA PARA LA VIDA, CHILE Collaboration with MS HPC product
groups
complement and extend MS HPC
institutes
Some examples:
HPC for Aerospace at Southampton
Cancer research, financial and climate
modeling at Oxford OeRC
HPC for automotive industry at HLRS
Stuttgart
HPC support to computational system
biology at MSRC joint centre with
Courtesy of David Holmes
28
University of Trento in Italy
Top Challenges Microsoft HPC Institutes
“Make high-end computing
• Setup is painful easier and more productive TACC –
University of
University of
Virginia
Charlottesville,
Southampton
University
Nizhni Novgorod
University
Nizhni Novgorod,
Texas
• Takes a long time to get clusters to use. Emphasis should be Austin, TX USA VA USA
Southampton, UK
Russia
up and running placed on time to solution,
• Clusters are separate islands the major metric of value to University of
• Lack of integration into IT high-end computing users… Utah
Salt Lake City, UT
Cornell Theory
Center
Ithaca, NY USA
Tokyo Institute of
Technology
Tokyo, Japan
infrastructure A common software USA
• Job management environment for scientific
computation encompassing HLRS –
• Lack of integration into University of
Tennessee
University of Shanghai Jiao
desktop to high-end systems Knoxville, TN
Stuttgart
Stuttgart,
Tong University
Shanghai, PRC
end-user apps will enhance productivity
USA Germany
• Application availability gains by promoting ease of
• Limited eco-system of use and manageability of
application that can exploit systems.”
parallel processing High-End Computing Revitalization Task Force, 2004
capabilities (Office of Science and Technology Policy, Executive Office of the President)
29 30
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.
6. 6/28/2007 11:46 PM
Radical Computing: The End of Moore’s Law?
Moore’
Future of silicon chips
“100’s of cores on a chip in 2015”
100’ 2015”
10,000 Sun’s Surface
Sun’
(Justin Rattner, Intel)
Rattner,
Power Density (W/cm2)
Rocket Nozzle
1,000 Challenge for IT industry and
Nuclear Reactor Computer Science community
100
Pentium®
Can we make parallel computing on a chip
8086 Hot Plate
10 4004 8085 easier than message-passing?
message-
8008 386
286
8080 486 Challenge for the Scientific Community
1
‘70 ‘80 ‘90 ‘00 ‘10 How will the Multi-Core transition affect
Multi-
scientific computing?
31 32
Intel Developer Forum, Spring 2004 - Pat Gelsinger
Radical Computig @ BSC Summary
Microsoft wishes to work with the university
research and business communities to:
Major collaboration at the Barcelona • develop interoperable high-level services, work
high-
Super Computer Centre (Prof. Mateo flows, tools and data services (make computing
Valero) on development of S/W easy)
environment for support of Many-
Many- • accelerate progress in a small number of
multicore architectures in societally important scientific applications (make
collaboration with Microsoft Research a difference)
in Cambridge • explore radical new directions in computing and
ways and applications to exploit on-chip
on-
parallelism
www.microsoft.com/science
33 34
© 2006 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft
Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided
after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE
INFORMATION IN THIS PRESENTATION.