Enterprise supercomputers ike nassi_v5

Enterprise Supercomputers

Ike Nassi

(Work done while at SAP
Joint work with Rainer Brendle, Keith Klemba, and David Reed)

UCSC, April, 2012

A Personal Reflection

Let me start with a story.

The setting. Early 1982:

Digital Equipment Corporation (DEC) had introduced the VAX (a super-
minicomputer) three years earlier

Sales were going well

The VAX was a popular product

And then one day…

January 17, 2012 2
2

The Moral of the Story

We see that…

Powerful personal
workstations

Unmet Needs
A Large Market
+ Inflection Point ($$$$)

A Set of New
Technologies

 Bitmapped Displays
 10 MB Ethernet The Market for Workstations
 Powerful µprocessors
 Open source UNIX

January 17, 2012 3
3

What are the big questions and issues (IMHO)

Data: Scale up vs. Scale out vs. Hybrid

Shared kernel… (or not)? (Shared Memory and/or Message Passing?)

Just what is the role for flash? Memory, SSD, other?

Can we innovate without disrupting? (HW moves faster than software)

How far can we push scalable coherent shared memory?

System Design: simplify complex problems and connect the dots

January 17, 2012 4
4

Is “the Cloud” an inflection point?

Emergence of Cloud indicates that Emerging Needs
we have been missing something

MOBILITY

CLOUD
BIG DATA

NEW UI

GAP? ELASTIC

SOCIAL
CURRENT SYSTEMS

NEW BUSINESS
Current systems represent the
work of the last two decades Drivers

January 17, 2012 5
5

Cloud computing has emerged as a viable mode for the
enterprise

 Ease of access
 Elastic
 Mobility and sensors
 More pervasive data storage
 Computing as a utility, devices as appliances
 Expandable storage
 New business models are emerging (SaaS, PaaS, IaaS)

January 17, 2012 6
6

Today I’ll talk a bit about

 Current commodity server design
 How this creates a computing gap for the enterprise *
 How businesses of tomorrow will function
 And how enterprise supercomputers will support them
 A different layer of cloud

Cloud Computing
Cloud Computing

*A Closer Look A Gap

Current Systems
Current Systems

The Perception My Reality

January 17, 2012 7
7

The short story…

Virtual
Machine 1
Virtual
Machine 2
Virtual
Machine 3 … Virtual
Machine N

– Current View
Physical Machine

Physical
Machine 1
Physical
Machine 2
Physical
Machine 3 … Physical
Machine M
– Emerging View

Virtual Machine

January 17, 2012 8
8

Enterprise HW Computing Landscape

– And what has emerged

– The Chip – The Board BladeServer – The Cloud

January 17, 2012 9
9

And this creates a perceived technology gap

Problems with today’s servers:
 Cores and memory scale in lock step
 Access latencies to large memories from This creates a technology
modern cores conflict:
 Addressing those memories Cores and memory need to
scale more independently
Current and emerging business needs: rather than in lock step

 Low latency decision support and analytics
 “What if?” simulation How can we resolve this
conflict?
 Market/Sentiment analysis
 Pricing and inventory analysis
 In other words, we need “real” real time

January 17, 2012 10
10

Flattening the layers

 We need to innovate without disrupting customers. (Like your multicore
laptops.)
Seamless platform transitions are difficult, but doable
(Digital, Apple, Intel, Microsoft)
 Large memory technology is poised to take off, but it needs appropriate
hardware
 Early results in High Performance Computing (HPC)* have yielded results that
can now be applied to Enterprise Software. But today it’s different:
We have a lot of cores per chip now and better interconnects
We can “flatten” the layers and simplify
We can build systems that improve our software now without making any
major modifications
 Nail Soup argument: But, if we are willing to modify our software, we can win
bigger. But we do this on our own schedule. (“Stone Soup” on Wikipedia)
*Scientific/Technical Computing

January 17, 2012 11
11

The Chip

How do we leverage advances in:
 Processor Architecture
 Multiprocessing (Cores and threads)
 NUMA (e.g. QPI, Hypertransport)
 Connecting to the board (sockets) – The Chip

 Core to memory ratio
We have 64 bits of address, but can we reach those addresses?

January 17, 2012 12
12

Typical Boards

Highlights:

 Sockets (2, 4, 8)
 Speed versus Access
How do we make these
choices for the enterprise? – An 8-socket board layout

January 17, 2012 13
13

Storage

Considerations for storage
 Cost per Terabyte (for the data explosion)
 Speed of access (for real time access)
 Storage network (for global business)
 Disk, Flash, DRAM
 How can applications be optimized for Example: A 2TB OCZ Card
storage?

January 17, 2012 14
14

Memory Considerations

Highlights
 Today, cores and memory currently scale
in lock step
 Large memory computing (to handle
modern business needs)
 Collapse the multi-tiered architecture
levels
 NUMA support is built in to Linux
 Physical limits (board
design, connections, etc)
Memory Riser

January 17, 2012 15
15

The “Magic” Cable

January 17, 2012 16
16

Reincarnating the “mainframe”

• In-memory computing provides the real time performance we’re
looking for
• In-memory computing requires more memory per core in a
symmetric way
• I remember what Dave Cutler said about the VAX:
• “There is nothing that helps virtual memory like real memory”
• Ike’s corollary: “There is nothing that helps an in-memory
database like real memory”
• The speed of business should drive the need for more flexible
server architectures
• Current server architectures are limiting, and we must change the
architecture and do so inexpensively
• And how do we do this?

January 17, 2012 17
17

Taking In-Memory Computing Seriously

Basic Assumptions and observations
 Hard disks are for archives only. All active data must be in DRAM memory.
 Good caching and data locality are essential.
– Otherwise CPUs are stalled due to too many cache misses
 There can be many levels of caches
Problems and Opportunities for In-Memory Computing
 Addressable DRAM per box is limited due to processor physical address pins.
– But we need to scale memory independently from physical boxes
 Scaling Architecture
– Arbitrary scaling of the amount of data stored in DRAM
– Arbitrary and independent scaling of the number of active users and associated computing
load
 DRAM access times have been the limiting factor for remote communications
– Adjust the architecture to DRAM latencies (~100 ns?)
– InterProcess Communication is slow and hard to program (latencies are in the area of
~0.5-1ms )
We can do better.
January 17, 2012 18
18

Putting it together – The “Big Iron” Initiative

January 17, 2012 19
19

What is “Big Iron” Technology Today?
Big Iron-1 Big Iron-2
Extreme Performance, Low Cost Extreme Performance, Scalability,
and much simpler system model
Traditional Cluster

Enterprise
Supercomputer

Developers
can treat as a
single large Large shared
physical coherent
server memory
across nodes

5x 4U Nodes (Intel XEON x7560 2.26Ghz)

10 x 4U Nodes (Intel XEON x7560 2.26Ghz)  160 cores (320 Hyper-threads), 32 cores per node

 320 cores (640 Hyper-threads), 32 cores per node  5 TB memory total, 30TB solid state disk

 5TB memory (10TB max), 20TB solid state disk  160 GB InfiniBand interconnect per node
 1 NAS (72TB Expandable to 180)
System architecture: SAP
Scalable coherent shared memory (via ScaleMP)
System architecture: SAP

January 17, 2012 20
20

The Next Generation Servers a.k.a. “Big Iron” are a reality
Big Iron -1 Big Iron -2
Fully Operational Fully Operational
September 30, 2010 2:30pm February 4, 2011

All 10 Big Iron-1 nodes
up and running

All 5 Big Iron-2 nodes
up and running

January 17, 2012 21
21

At a Modern Data Center

These bad
boys are
huge
backup Special
Pod way
generators Cooling
down
has won
there for
awards
Big Iron

January 17, 2012 22
22

Progress, Sapphire and others:

Sapphire Type 1 Machine (Senior) Sapphire Type 1 Machine (Junior)
• 32 Nodes (5 racks, 2.5 tons) • 1 node (1 box, on stage)
• 1024 Cores • 80 cores
• 16 TB of Memory • 1 TB of Memory
• 64 TB of Solid State Storage • 2 TB of SSD
• 108 TB of Network Attached Storage • No Attached Storage

January 17, 2012 23
23

Coherent Shared Memory –
A Much Simpler Alternative to Remote Communication
Uses high-speed, low latency networks
(Optical fiber/copper with 40Gb/s or above)
 Typical latencies of this are in the area of 1-5 μsec
 Aggregated throughput matches CPU consumption
 L4 cache needed to balance the longer latency on non-local access
(cache-coherent non-uniform memory access over different physical machines)
Separate the data transport and cache layers into a separate tier below the
operating system
• (almost) never seen by the application or the operating system!
Applications and database code can just reference data
 The data is just “there”, i.e. it’s a load/store architecture, not network datagrams
 Application level caches are possibly not necessary – the system does this for you (?)
 Streaming of query results is simplified, L4 cache schedules the read operations for you.
 Communication is much lighter weight. Data is accessed directly and thread calls are simple
and fast (higher quality by less code)
 Application designers do not confront communications protocol design issues
 Parallelization of analytics and combining simulation with data are far simpler, enabling
powerful new business capabilities of mixed analytics and decision support at scale
January 17, 2012 24
24

Software implications of Coherent Shared Memory

On Application Servers
 Access to database can be “by reference”
 No caches on application server side. Application can refer to database query results including
metadata, master data etc. within the database process.
 Caches are handled by “hardware” and are guaranteed to be coherent.
 Lean and fast application server for in-memory computing
On Database Code
 A physically distributed database can approach DRAM-like latencies
 Database programmers can focus on database problems
 Data replication and sharding are handled by touching the data and L4 cache does the
automatic distribution
In fact, do we need to separate application servers from database servers at all?
No lock-in to fixed machine configurations or clock rates
No need to make app-level design tradeoffs between communications and
memory access
One O/S instance, one security model, easier to manage

January 17, 2012 25
25

But what about…

Distributed Transactions?
• We don’t need no stinkin’ distributed transactions!

What about traditional relational databases?
• In the future, databases become data structures!

(Do I really believe this? Well, not really. Just wanted to make the point.)

Why not build a mainframe?
 From a SW perspective, it is a mainframe (with superior price/performance for in-memory computing)

Is it Virtualization?
• In traditional virtualization, you take multiple virtual machines and multiplex them onto the same physical hardware. We’re
taking physical hardware instances and running them on a single virtual instance.

January 17, 2012 26
26

A simplified programming model and landscape

Virtual
Machine 1
Virtual
Machine 2
Virtual
Machine 3 … Virtual
Machine N

– Current View
Physical Machine

Physical
Machine 1
Physical
Machine 2
Physical
Machine 3 … Physical
Machine M
– Emerging View

Virtual Machine

(Single OS instance)

January 17, 2012 27
27

The business benefits of coherent shared memory

Improved in-memory computing performance at dramatically lower cost
 The ability to build high performance “mainframe-like” computing systems with commodity
cluster server components
 Ability to scale memory capacity in a more in-memory computing-friendly fashion
 Simplified software system landscape using system architecture that can be made invisible
to application software

Minimize changes to applications
 Enables applications to scale seamlessly without changes to the application code or
additional programming effort
 Traditional cluster based parallel processing requires specialized programming skills held
only by top developers; this will constrain an organization’s ability to scale development for
in-memory computing
 With coherent shared memory, the bulk of developers can develop as they do today and let
the underlying hardware and lower level software handle some of the resource
allocation, unlike today.
 But, existing, standard NUMA capabilities in Linux remain available.

January 17, 2012 28
28

Big Iron Research

 Next generation server architectures for in-memory computing
Processor Architectures
Interconnect technology
Systems software for coherent memory
OS considerations
Storage
 Fast data load utilities
 Reliability and Failover
 Performance engineering for enterprise computing loads
 Price/performance analysis
 Private and public cloud approaches
 And more

January 17, 2012 29
29

Some experiments that we have run

 Industry standard system benchmarks
 Evaluation of shared memory program scalability
 Scalability of “Big Data” analytics – just beginning

January 17, 2012 30
30

Industry Standard Benchmarks
STREAM: specJbb:
 large memory vector processing  Highly concurrent Java transactions
 BigIron would be 12th highest recorded test
 Nearly linear in system size  BigIron would be 13th highest recorded test
 Measures scalability of memory access  More computational than STREAM

January 17, 2012 31
31

To sum up

 Cloud computing has emerged
 The businesses of tomorrow will function in real time. Really!
 There is a growing computing technology gap
 There is an unprecedented wave of advances in HW
 The “mainframe” has been reincarnated with it’s attendant benefits, but
can be made from commodity parts

– Enterprise supercomputers leveraging coherent shared memory principles
and the wave of SW+HW advances will enable this new reality

January 17, 2012 32
32

Questions?

Contact information:

Ike Nassi
Email: inassi@ucsc.edu

Enterprise supercomputers ike nassi_v5

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (14)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Enterprise supercomputers ike nassi_v5

Ähnlich wie Enterprise supercomputers ike nassi_v5 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Enterprise supercomputers ike nassi_v5