2. A Personal Reflection
Let me start with a story.
The setting. Early 1982:
Digital Equipment Corporation (DEC) had introduced the VAX (a super-
minicomputer) three years earlier
Sales were going well
The VAX was a popular product
And then one day…
January 17, 2012 2
2
3. The Moral of the Story
We see that…
Powerful personal
workstations
Unmet Needs
A Large Market
+ Inflection Point ($$$$)
A Set of New
Technologies
Bitmapped Displays
10 MB Ethernet The Market for Workstations
Powerful µprocessors
Open source UNIX
January 17, 2012 3
3
4. What are the big questions and issues (IMHO)
Data: Scale up vs. Scale out vs. Hybrid
Shared kernel… (or not)? (Shared Memory and/or Message Passing?)
Just what is the role for flash? Memory, SSD, other?
Can we innovate without disrupting? (HW moves faster than software)
How far can we push scalable coherent shared memory?
System Design: simplify complex problems and connect the dots
January 17, 2012 4
4
5. Is “the Cloud” an inflection point?
Emergence of Cloud indicates that Emerging Needs
we have been missing something
MOBILITY
CLOUD
BIG DATA
NEW UI
GAP? ELASTIC
SOCIAL
CURRENT SYSTEMS
NEW BUSINESS
Current systems represent the
work of the last two decades Drivers
January 17, 2012 5
5
6. Cloud computing has emerged as a viable mode for the
enterprise
Ease of access
Elastic
Mobility and sensors
More pervasive data storage
Computing as a utility, devices as appliances
Expandable storage
New business models are emerging (SaaS, PaaS, IaaS)
January 17, 2012 6
6
7. Today I’ll talk a bit about
Current commodity server design
How this creates a computing gap for the enterprise *
How businesses of tomorrow will function
And how enterprise supercomputers will support them
A different layer of cloud
Cloud Computing
Cloud Computing
*A Closer Look A Gap
Current Systems
Current Systems
The Perception My Reality
January 17, 2012 7
7
8. The short story…
Virtual
Machine 1
Virtual
Machine 2
Virtual
Machine 3 … Virtual
Machine N
– Current View
Physical Machine
Physical
Machine 1
Physical
Machine 2
Physical
Machine 3 … Physical
Machine M
– Emerging View
Virtual Machine
January 17, 2012 8
8
9. Enterprise HW Computing Landscape
– And what has emerged
– The Chip – The Board BladeServer – The Cloud
January 17, 2012 9
9
10. And this creates a perceived technology gap
Problems with today’s servers:
Cores and memory scale in lock step
Access latencies to large memories from This creates a technology
modern cores conflict:
Addressing those memories Cores and memory need to
scale more independently
Current and emerging business needs: rather than in lock step
Low latency decision support and analytics
“What if?” simulation How can we resolve this
conflict?
Market/Sentiment analysis
Pricing and inventory analysis
In other words, we need “real” real time
January 17, 2012 10
10
11. Flattening the layers
We need to innovate without disrupting customers. (Like your multicore
laptops.)
Seamless platform transitions are difficult, but doable
(Digital, Apple, Intel, Microsoft)
Large memory technology is poised to take off, but it needs appropriate
hardware
Early results in High Performance Computing (HPC)* have yielded results that
can now be applied to Enterprise Software. But today it’s different:
We have a lot of cores per chip now and better interconnects
We can “flatten” the layers and simplify
We can build systems that improve our software now without making any
major modifications
Nail Soup argument: But, if we are willing to modify our software, we can win
bigger. But we do this on our own schedule. (“Stone Soup” on Wikipedia)
*Scientific/Technical Computing
January 17, 2012 11
11
12. The Chip
How do we leverage advances in:
Processor Architecture
Multiprocessing (Cores and threads)
NUMA (e.g. QPI, Hypertransport)
Connecting to the board (sockets) – The Chip
Core to memory ratio
We have 64 bits of address, but can we reach those addresses?
January 17, 2012 12
12
13. Typical Boards
Highlights:
Sockets (2, 4, 8)
Speed versus Access
How do we make these
choices for the enterprise? – An 8-socket board layout
January 17, 2012 13
13
14. Storage
Considerations for storage
Cost per Terabyte (for the data explosion)
Speed of access (for real time access)
Storage network (for global business)
Disk, Flash, DRAM
How can applications be optimized for Example: A 2TB OCZ Card
storage?
January 17, 2012 14
14
15. Memory Considerations
Highlights
Today, cores and memory currently scale
in lock step
Large memory computing (to handle
modern business needs)
Collapse the multi-tiered architecture
levels
NUMA support is built in to Linux
Physical limits (board
design, connections, etc)
Memory Riser
January 17, 2012 15
15
17. Reincarnating the “mainframe”
• In-memory computing provides the real time performance we’re
looking for
• In-memory computing requires more memory per core in a
symmetric way
• I remember what Dave Cutler said about the VAX:
• “There is nothing that helps virtual memory like real memory”
• Ike’s corollary: “There is nothing that helps an in-memory
database like real memory”
• The speed of business should drive the need for more flexible
server architectures
• Current server architectures are limiting, and we must change the
architecture and do so inexpensively
• And how do we do this?
January 17, 2012 17
17
18. Taking In-Memory Computing Seriously
Basic Assumptions and observations
Hard disks are for archives only. All active data must be in DRAM memory.
Good caching and data locality are essential.
– Otherwise CPUs are stalled due to too many cache misses
There can be many levels of caches
Problems and Opportunities for In-Memory Computing
Addressable DRAM per box is limited due to processor physical address pins.
– But we need to scale memory independently from physical boxes
Scaling Architecture
– Arbitrary scaling of the amount of data stored in DRAM
– Arbitrary and independent scaling of the number of active users and associated computing
load
DRAM access times have been the limiting factor for remote communications
– Adjust the architecture to DRAM latencies (~100 ns?)
– InterProcess Communication is slow and hard to program (latencies are in the area of
~0.5-1ms )
We can do better.
January 17, 2012 18
18
20. What is “Big Iron” Technology Today?
Big Iron-1 Big Iron-2
Extreme Performance, Low Cost Extreme Performance, Scalability,
and much simpler system model
Traditional Cluster
Enterprise
Supercomputer
Developers
can treat as a
single large Large shared
physical coherent
server memory
across nodes
5x 4U Nodes (Intel XEON x7560 2.26Ghz)
10 x 4U Nodes (Intel XEON x7560 2.26Ghz) 160 cores (320 Hyper-threads), 32 cores per node
320 cores (640 Hyper-threads), 32 cores per node 5 TB memory total, 30TB solid state disk
5TB memory (10TB max), 20TB solid state disk 160 GB InfiniBand interconnect per node
1 NAS (72TB Expandable to 180)
System architecture: SAP
Scalable coherent shared memory (via ScaleMP)
System architecture: SAP
January 17, 2012 20
20
21. The Next Generation Servers a.k.a. “Big Iron” are a reality
Big Iron -1 Big Iron -2
Fully Operational Fully Operational
September 30, 2010 2:30pm February 4, 2011
All 10 Big Iron-1 nodes
up and running
All 5 Big Iron-2 nodes
up and running
January 17, 2012 21
21
22. At a Modern Data Center
These bad
boys are
huge
backup Special
Pod way
generators Cooling
down
has won
there for
awards
Big Iron
January 17, 2012 22
22
23. Progress, Sapphire and others:
Sapphire Type 1 Machine (Senior) Sapphire Type 1 Machine (Junior)
• 32 Nodes (5 racks, 2.5 tons) • 1 node (1 box, on stage)
• 1024 Cores • 80 cores
• 16 TB of Memory • 1 TB of Memory
• 64 TB of Solid State Storage • 2 TB of SSD
• 108 TB of Network Attached Storage • No Attached Storage
January 17, 2012 23
23
24. Coherent Shared Memory –
A Much Simpler Alternative to Remote Communication
Uses high-speed, low latency networks
(Optical fiber/copper with 40Gb/s or above)
Typical latencies of this are in the area of 1-5 μsec
Aggregated throughput matches CPU consumption
L4 cache needed to balance the longer latency on non-local access
(cache-coherent non-uniform memory access over different physical machines)
Separate the data transport and cache layers into a separate tier below the
operating system
• (almost) never seen by the application or the operating system!
Applications and database code can just reference data
The data is just “there”, i.e. it’s a load/store architecture, not network datagrams
Application level caches are possibly not necessary – the system does this for you (?)
Streaming of query results is simplified, L4 cache schedules the read operations for you.
Communication is much lighter weight. Data is accessed directly and thread calls are simple
and fast (higher quality by less code)
Application designers do not confront communications protocol design issues
Parallelization of analytics and combining simulation with data are far simpler, enabling
powerful new business capabilities of mixed analytics and decision support at scale
January 17, 2012 24
24
25. Software implications of Coherent Shared Memory
On Application Servers
Access to database can be “by reference”
No caches on application server side. Application can refer to database query results including
metadata, master data etc. within the database process.
Caches are handled by “hardware” and are guaranteed to be coherent.
Lean and fast application server for in-memory computing
On Database Code
A physically distributed database can approach DRAM-like latencies
Database programmers can focus on database problems
Data replication and sharding are handled by touching the data and L4 cache does the
automatic distribution
In fact, do we need to separate application servers from database servers at all?
No lock-in to fixed machine configurations or clock rates
No need to make app-level design tradeoffs between communications and
memory access
One O/S instance, one security model, easier to manage
January 17, 2012 25
25
26. But what about…
Distributed Transactions?
• We don’t need no stinkin’ distributed transactions!
What about traditional relational databases?
• In the future, databases become data structures!
(Do I really believe this? Well, not really. Just wanted to make the point.)
Why not build a mainframe?
From a SW perspective, it is a mainframe (with superior price/performance for in-memory computing)
Is it Virtualization?
• In traditional virtualization, you take multiple virtual machines and multiplex them onto the same physical hardware. We’re
taking physical hardware instances and running them on a single virtual instance.
January 17, 2012 26
26
27. A simplified programming model and landscape
Virtual
Machine 1
Virtual
Machine 2
Virtual
Machine 3 … Virtual
Machine N
– Current View
Physical Machine
Physical
Machine 1
Physical
Machine 2
Physical
Machine 3 … Physical
Machine M
– Emerging View
Virtual Machine
(Single OS instance)
January 17, 2012 27
27
28. The business benefits of coherent shared memory
Improved in-memory computing performance at dramatically lower cost
The ability to build high performance “mainframe-like” computing systems with commodity
cluster server components
Ability to scale memory capacity in a more in-memory computing-friendly fashion
Simplified software system landscape using system architecture that can be made invisible
to application software
Minimize changes to applications
Enables applications to scale seamlessly without changes to the application code or
additional programming effort
Traditional cluster based parallel processing requires specialized programming skills held
only by top developers; this will constrain an organization’s ability to scale development for
in-memory computing
With coherent shared memory, the bulk of developers can develop as they do today and let
the underlying hardware and lower level software handle some of the resource
allocation, unlike today.
But, existing, standard NUMA capabilities in Linux remain available.
January 17, 2012 28
28
29. Big Iron Research
Next generation server architectures for in-memory computing
Processor Architectures
Interconnect technology
Systems software for coherent memory
OS considerations
Storage
Fast data load utilities
Reliability and Failover
Performance engineering for enterprise computing loads
Price/performance analysis
Private and public cloud approaches
And more
January 17, 2012 29
29
30. Some experiments that we have run
Industry standard system benchmarks
Evaluation of shared memory program scalability
Scalability of “Big Data” analytics – just beginning
January 17, 2012 30
30
31. Industry Standard Benchmarks
STREAM: specJbb:
large memory vector processing Highly concurrent Java transactions
BigIron would be 12th highest recorded test
Nearly linear in system size BigIron would be 13th highest recorded test
Measures scalability of memory access More computational than STREAM
January 17, 2012 31
31
32. To sum up
Cloud computing has emerged
The businesses of tomorrow will function in real time. Really!
There is a growing computing technology gap
There is an unprecedented wave of advances in HW
The “mainframe” has been reincarnated with it’s attendant benefits, but
can be made from commodity parts
– Enterprise supercomputers leveraging coherent shared memory principles
and the wave of SW+HW advances will enable this new reality
January 17, 2012 32
32