1. Current Trends in High Performance
Computing
Dr. Putchong Uthayopas
Department Head, Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Bangkok, Thailand.
pu@ku.ac.th
3. Introduction
• High Performance Computing
– An area of computing that involve the
hardware and software that help solving
large and complex problem fast
• Many applications
– Science and Engineering research
• CFD, Genomics, Automobile Design, Drug
discovery
– High Performance Business Analysis
• Knowledge Discovery
• Risk analysis
• Stock portfolio management
– Business is moving more to the analysis of
data from data warehouse
4. Why we need HPC?
• Change in scientific discovery
– Experimental to simulation and visualization
• Critical need to solve an ever larger problem
– Global Climate modeling
– Life science
– Global warming
• Modern business need
– Design more complex machinery
– More complex electronics design
– Complex and large scale financial system analysis
– More complex data analysis
5. Top 500: Fastest Computer on Our
Planet
• List of the 500
most powerful
supercomputers
generated twice a
year (June and
November)
• Latest was
announced in
June 2012
12. Processor Just not running faster
• Processor speed keep increasing for the last
20 years
• Common technique
– Smaller process technology
– increase clock speed
– Improve microarchitecture
• Pentium, Pentium II, Pentium III, Pentium IV, Centrino,
Core
13. Pitfall
• Smaller process technology let
to denser transistor but….
– Heat dissipation
– Noise – reduce voltage
• Increase clock speed
– More power used since CMOS
consume power only when switch
• Improve microarchitecture
– Small improvement for a lot more
complex design
• The only solution left is to use
concurrency. Doing many things at
the same time
14. Parallel Computing
• Speeding up the execution by splitting task into many
independent subtask and run them on multiple
processors or core
– Break large task into many small sub tasks
– Execute these sub tasks on multiple core ort processors
– Collect result together
14
15. How to achieve concurrency
• Adding more concurrency into hardware
• Processor
• I/O
• Memory
• Adding more concurrency into software
– How to express parallelism better in software
• Adding more concurrency into algorithm
– How to do many thing at the same time
– How to make people think in parallel
18. Rational for Hybrid Architecture
• Most scientific application has fine grain
parallelism inside
– CFD, Financial computation, image processing
• Energy efficient
– Employing large number of slow processor and
parallelism can help lower the power
consumption and heat
19. Two main approaches
• Using multithreading and scale down
processor that is compatible to conventional
processor
– Intel MIC
• Using very large number of small processors
core in a SIMD model. Evolving from graphics
technology
– NVIDIA GPU
– AMD fusion
20. Many Integrated Core Architecture
• Effort by Intel to add a
large number of core
into a computing
system
24. Challenges
• Large number of core will have to divide
memory among them
– Much smaller memory per core
– Demand high memory bandwidth
• Still need an effective fine grain parallel
programming model
• No free lunch , programmer have to do some
work
25.
26. What is GPU Computing?
4 cores
Computing with CPU + GPU
Heterogeneous Computing
27. Not 2x or 3x : Speedups are 20x to 150x
146X 36X 18X 50X 100X
Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana
149X 47X 20X 130X 30X
Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana
28. CUDA Parallel Computing Architecture
• Parallel computing
architecture and
programming model
• Includes a C compiler plus
support for OpenCL and
DX11 Compute
• Architected to natively ATI’s Compute
support all computational
interfaces
“Solution”
(standard languages and
APIs)
29. Compiling C for CUDA Applications
C CUDA Rest of C
Key Kernels Application
NVCC CPU Code
CUDA object CPU object
files files
Linker
CPU-GPU
Executable
30. Simple “C” Description For
Parallelism
void saxpy_serial(int n, float a, float *x, float *y)
{
for (int i = 0; i < n; ++i)
y[i] = a*x[i] + y[i];
} Standard C Code
// Invoke serial SAXPY kernel
saxpy_serial(n, 2.0, x, y);
__global__ void saxpy_parallel(int n, float a, float
*x, float *y)
{
Parallel C Code
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
31. Computational Finance
Financial Computing Software vendors
SciComp : Derivatives pricing modeling
Hanweck: Options pricing & risk analysis
Aqumin: 3D visualization of market data
Exegy: High-volume Tickers & Risk Analysis
Source: SciComp
QuantCatalyst: Pricing & Hedging Engine
Oneye: Algorithmic Trading
Arbitragis Trading: Trinomial Options Pricing
Ongoing work
LIBOR Monte Carlo market model
Callable Swaps and Continuous Time Finance
Source: CUDA SDK
32. Weather, Atmospheric, & Ocean
Modeling
CUDA-accelerated WRF available
Other kernels in WRF being ported
Ongoing work
Tsunami modeling Source: Michalakes,
Vachharajani
Ocean modeling
Several CFD codes
Source: Matsuoka, Akiyama, et al
33. New emerging Standard
• OpenCL
– Support by many vendor including apple
– Target for both GPU based SIMD and multithreading
– More complex to program that CUDA
• OpenACC
– OpenACC is a programming standard for parallel
computing developed by Cray, CAPS, Nvidia and PGI
– simplify parallel programming of heterogeneous
CPU/GPU systems.
– Directives based
34. Cluster computing
• The use of large number of server that linked on
a high speed local network as one single large
supercomputer
• Popular way of building supercomputer
• Software
– Cluster aware OS
• Windows compute cluster server 2008
• NPACI Rocks Linux
• Programming system such as MPI
• Use mostly in computer aided design,
engineering, scientific research
35. Comment
• Cluster computing is a very mature discipline
• We know how to build a sizable cluster very well
– Hardware integration
– Storage integration : Luster, GPFS
– Scheduler: PBS, Torque, SGE, LSF
– Programming MPI
– Distribution : ROCKS
• Cluster is a foundation fabric for grid and cloud
36. TERA Cluster
2.5Gbps to Uninet
Storage 48 TB
• KU Fiber Backbone
1 Frontend (HP
ProLiant DL360 G5 (1Gbps Fiber)
Server) and 192 1 Gbps Ethernet/Fiber
computer nodes
– Intel Xeon 3.2
GHz (Dual core, Edge Switch 1Gbps Ethernet
Dual processor)
– Memory 4 GB
(8GB for
Frontend & FE FE WinHPC TERA Anatta SPARE1 SPARE2
infiniband Sunyata Araya (FE) (FE) (FE) (FE) (FE)
nodes)
– 70x4 GB SCSI
HDD (RAID1)
• 4 Storage Servers 96 nodes
– Lustre file 64 + 15
4 nodes 4 nodes nodes 16 spare nodes
system for TERA
cluster's storage nodes
– Attached with
Smart Array
P400i Controller
for 5TB space 200 Ports Gigabit Ethernet switch
Storage Tier 5TB Lustre FS
FS FS FS FS
1 2 3 4
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
37. Grid Computing Technology
• Grid computing enables the
virtualization of distributed
computing and data resources such
as processing, network bandwidth
and storage capacity to create a
single system image, granting users
and applications seamless access to
vast IT capabilities.
• Just as an Internet user views a
unified instance of content via the
Web, a grid user essentially sees a
single, large virtual computer.
38. Grid Architecture
• Fabric Layer
– Protocol and interface that provide
access to computing resources such Application Layer
as CPU, storage
• Connectivity Layer
– Protocol for Grid-specific network Collective Layer
transaction such as security GSI
• Resources Layer
– Protocol to access a single resources
from application Resources
• GRAM (Grid Resource Allocation
Management)
• GridFTP ( data access)
• Grid Resource Information Service Connectivity
• Collective layer
– Protocol that manage and access
group of resources Fabric
39.
40. Globus as
Service-Oriented Infrastructure
User User
User Application
Application
Application Tool
Tool Reliable
File User Svc
Uniform interfaces, Transfer
Host Env
security mechanisms, MDS-
Web service transport, Index MyProxy
monitoring
DAIS
User Svc
GRAM GridFTP IBM
Host Env
IB M
IBM
IB M
Database
Specialized
Computers Storage
resource
41. Introduction to ThaiGrid
• A National Project under Software
Industry Promotion Agency (Public
Organization) , Ministry of Information
and Communication Technology
• Started in 2005 from 14 member
organizations
• Expanded to 22 organizations in 2008
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
42. Thai Grid Infrastructure
19 sites
1 Gbps
About 1000 CPU core.
s
1 Gbp
155 M 2.5 Gbps
bps
31 bps
s
M bp
155M
0
Mbps
ps
1G
155
ps
310 ps
Mb
Mb
bp
Gb
155
s
2 .5
5
15 bps
M ps
Mb
5
1 5 bps
15
M
5
s
bp
1G
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
43. ThaiGrid Usage
• ThaiGrid provides about 290
years of computing time for
members
– 9 years on the grid
– 280 years on tera
• 41 projects from 8 areas are
being support on Teraflop
machine
• More small projects on each
machines
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
44. Medicinal Herb Research
• Partner
– Cheminormetics Center, Kasetsart
Univesity (Chak Sangma and team)
• Objective
– Using 3D-molecular databse and virtual
screening to verify the traditional
medicinal herb
• Benefit
– Scientific proof of the ancient
traditional drug
– Benefit poor people that still rely on
the drug from medicinal herb
– Potential benefit for local
pharmaceutical industry Virtual
Screening
Infrastructure
Lab Test
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
45. NanoGrid
Computing Resources
Computing Resources
2 MS-Gateway
3
1
MS-Gateway
ThaiGrid
• Objective
– Platform that support computational Nano science
research
• Technology used
– AccelRys Materials Studio
– Cluster Scheduler: Sun Grid Engine and Torque
TGCC 2008, Khon Khan University ,
August 29,2008
Thailand
46. Challenges
• Size and Scale
• Manageability
– Deployment
– Configuration
– Operation
• Software and
Hardware
Compatibility
47. Grid System Architecture
• Clusters
– Satellite Sets
• 16 clusters delivered from
ThaiGrid for initial members
• Composed of 5 nodes of IBM
eServer xSeries 336
– Intel Xeon 2.8Ghz (Dual
Processor)
– x86_64 architecture
– Memory: 4 GB (DDR2 SDRAM)
– Other sets
• Various type of servers and
number of nodes
• Provided by member institutes
of ThaiGrid
48. Grid as a Super Cluster
Grid Scheduler
GCC
REN H
H
H
C C C C H C C C C
C C C C
C C C C
August 29,2008 TGCC 2008, Khon Khan University , Thailand
49. Is grid still alive?
• Yes, grid is a useful technology for certain task
– Bit torrent for massive file exchange infrastructure
– European Grid is using it to share LHC data
• Pit fall of the grid
– Network is still not reliable and fast enoughlong term
operation
– Multi-site , multi- authority concept make it very complex
for
• system management
• Security
• User to really use the system
• Recent trend is to move to centralized cloud
50. What is Clouding Computing?
Google
Saleforce
Amazon
Source: Wikipedia (cloud computing)
Microsoft
Yahoo
51. Why Cloud Computing?
• The illusion of infinite computing resources available on
demand, thereby eliminating the need for Cloud
Computing users to plan far ahead for provisioning.
• The elimination of an up-front commitment by Cloud
users, thereby allowing companies to start small and
increase hardware resources only when there is an
increase in their needs.
• The ability to pay for use of computing resources on a
short-term basis as needed (e.g., processors by the hour
and storage by the day) and release them as needed,
thereby rewarding conservation by letting machines and
storage go when they are no longer useful.
Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC
Berkeley
52. Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC
Berkeley
53. Cloud Computing Explained
• Saas (Software as a Services)
Application delivered over
internet as a services (gmail)
• Cloud is a massive server and
network that serve Saas to
large number of user
• Service being sold is called
Utility computing
Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berekeley
54. Enabling Technology for Cloud
Computing
• Cluster and Grid Technoogy
– The ability to build a highly scalable computing
system that consists of 100000 -1000000 nodes
• Service oriented Architecture
– Everything is a service
– Easy to build, distributed, integrate into large scale
aplication
• Web 2.0
– Powerful and flexible user interface for intenet enable
world
57. Architecture of Service Oriented Cloud
Computing Systems (SOCCS)
SOCCS can be
User Interface
constructed by
combining CCR/DSS
Cloud Application Software to form
scalable service to a
client application.
DSS
CSM
Cloud Service
CCR Management (CSM) acts
as a resources
Operating
System
Operating
System
Operating
System
management system that
keeps track of the
Node
Hardware
Node
Hardware
Node
Hardware availability of services on
Interconnection Network
the cloud.
57
58. Cloud System Configuration
Cloud User
Interface (Excel)
Cloud Cloud Service Management
Application (CSM)
Service Service Service Service
OS OS OS OS
HW HW HW HW
Interconnection network
58
59. A Proof-of-Concept Application
Pickup and Delivery Problem with Time Window (PDPTW) is a
problem of serving a number of transportation requests based
on limited number of vehicles.
The objective of the problem is to minimize the sum of the
distance traveled by the vehicles and minimize the sum of the
time spent by each vehicle.
59
60. PDPTW on the cloud using SOCCS
Master/Worker
model is adopted as
a framework for
service interaction.
The algorithm is
partitioned using
domain
decomposition
approach.
Cloud application
control the
decomposition of
the problem by
sending each sub
problem to worker
service and collect
the results back to
the best answer.
60
63. We are living in the world of Data
Video
Surveillance
Social Media
Mobile Sensors
Gene Sequencing
Smart Grids
Geophysical Medical Imaging
Exploration
64. Big Data
“Big data is data that exceeds the processing capacity of
conventional database systems. The data is too big,
moves too fast, or doesn’t fit the strictures of your
database architectures. To gain value from this data, you
must choose an alternative way to process it.”
Reference: “What is big data? An introduction to the big data landscape.”,
Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html
65. The Value of Big Data
• Analytical use
– Big data analytics can reveal insights hidden
previously by data too costly to process.
• peer influence among customers, revealed by analyzing
shoppers’ transactions, social and geographical data.
– Being able to process every item of data in reasonable
time removes the troublesome need for sampling and
promotes an investigative approach to data.
• Enabling new products.
– Facebook has been able to craft a highly personalized
user experience and create a new kind of advertising
business
67. Big Data Challenge
• Volume
– How to process data so big that can not be move, or
store.
• Velocity
– A lot of data coming very fast so it can not be stored
such as Web usage log , Internet, mobile messages.
Stream processing is needed to filter unused data or
extract some knowledge real-time.
• Variety
– So many type of unstructured data format making
conventional database useless.
68. How to deal with big data
• Integration of
– Storage
– Processing
– Analysis Algorithm
– Visualization Processing
Massive
Data Stream Processing Visualize
Stream processing
Storage
Processing
Analysis
69. A New Approach For Distributed Big
L.A.
Data
BOSTON LONDON L.A. BOSTON LONDON
Storage Islands Single Storage Pool
• Disparate Systems • Single System Across Locations
• Manual Administration • Automated Policies
• One Tenant, Many Systems • Many Tenants One System
• IT Provisioned Storage • Self-Service Access
70. Hadoop
• Hadoop is a platform for distributing computing problems across a
number of servers. First developed and released as open source by
Yahoo.
– Implements the MapReduce approach pioneered by Google in
compiling its search indexes.
– Distributing a dataset among multiple servers and operating on the
data: the “map” stage. The partial results are then recombined: the
“reduce” stage.
• Hadoop utilizes its own distributed filesystem, HDFS, which makes
data available to multiple computing nodes
• Hadoop usage pattern involves three stages:
– loading data into HDFS,
– MapReduce operations, and
– retrieving results from HDFS.
71. WHAT FACEBOOK KNOWS
Cameron Marlow calls himself Facebook's "in-
house sociologist." He and his team can analyze
http://www.facebook.com/data essentially all the information the site gathers.
72. The links of Love
• Often young women specify that
they are “in a relationship” with
their “best friend forever”.
– Roughly 20% of all relationships for
the 15-and-under crowd are
between girls.
– This number dips to 15% for 18-
year-olds and is just 7% for 25-year-
olds.
• Anonymous US users who were
over 18 at the start of the
relationship
– the average of the shortest number
of steps to get from any one U.S.
user to any other individual is 16.7.
– This is much higher than the 4.74
steps you’d need to go from any
Facebook user to another through
friendship, as opposed to romantic, Graph shown the relationship of anonymous US users who were over
ties. 18 at the start of the relationship.
http://www.facebook.com/notes/facebook-data-team/the-links-of-
love/10150572088343859
73. Why?
• Facebook can improve users experience
– make useful predictions about users' behavior
– make better guesses about which ads you might
be more or less open to at any given time
• Right before Valentine's Day this year a blog
post from the Data Science Team listed the
songs most popular with people who had
recently signaled on Facebook that they had
entered or left a relationship
74. Data Tsunami
• Data flood is coming, no
where to run now!
– Data being generated
anytime, anywhere, anyone
– Data is moving in fast
– Data is too big to move, too
big to store
• Better be prepare
– Use this to enhance your
business and offer better
services to customer
75. The Opportunities and Challenges of
Exascale Computing
• Summary of findings
from many workshop in
US.
• List issues needed to
overcome
• We will present only
some challenges
77. Power Challenge
• Power consumption of the
computers is the largest hardware
research challenge.
• Today, power costs for the largest
petaflop systems are in the range of
$5-10M60 annually
• An exascale system using current
technology.
– the annual power cost to operate
the system would be above $2.5B
per year.
– The power load would be over a
gigawatt
• The target of 20 megawatts,
identified in the DOE Technology
Roadmap, is primarily based on
keeping the operational cost of the
system in some kind of feasible
range.
80. System Resiliency Challenge
• For exascale systems, the number of system
components will be increasing faster than
component reliability, with projections in the
minutes or seconds for exascale systems.
• Exascale systems will experience various kind
of faults many times per day.
– Systems running 100 million cores will continually
see core failures and the tools for
• Dealing with them will have to be rethought.
82. The Computer Science Challenges
• A programming model effort is a critical
component
– clock speeds will be flat or even dropping to save
energy. All performance improvements within a
chip will come from increased parallelism. The
amount of memory per arithmetic
– need for fine-grained parallelism and a
programming model other than message passing
or coarse-grained threads
83. Under the radar
• Mobile processor run super computer
• Hybrid war! GPU VS. MIC
• I/O goes solid state
• Programming standard war
– CUDA/ OpenCL/ OpenMP/ OpenACC
84. Summary
• We are in the challenging world
• Demand for HPC system, application will
increase.
– Software tool , technology, hardware is changing
to catch up.
• The greatest challenge is how to quickly
develop software for the next generation
computing system
CUDA is an architecture with a number of entry points. Today, developers are programming in C for CUDA. Using NVIDIA compilers. Programming language support for Fortran and other languages is coming soon. Also, CUDA supports emerging API programming standards such as OpenCL. Because the OpenCL and CUDA constructs for parallelism are so similar, applications written in C can easily be ported to OpenCL if desired. OpenCL applications sit on top of the CUDA architecture.
Not just WSDLs on things, but common abstractions that apply across many resources and services. (A work in progress.)
The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization
Content and service providers as well as global organizations that need to distribute large content files are challenged with managing and ensuring performance of these distributed systems. Thus a new approach using a single storage pool in the cloud that provides policies for content placement, multi-tenancy and self service can be beneficial to their business.