1. One Stone, Three Birds:
Performance, Power and Space
Ihab Bishara
Director, Cloud Computing Products
May 21st 2010
2. What is cloud?
Web Web
Surfing Surfing
Clients Thousands and millions of servers for Clients
your computing needs
3. Cloud is needs bigger, more cost efficient,
and more power efficient datacenters
Power 200KW/R-Cu/Ft 150KW/R-Cu/Ft 50KW/R-Cu/Ft
Budget 10KW/R-Cu/Ft
Datacenter Profit center
Cost center
Operation
Enterprise scale
Compute Cloud scale
serving thousands
Requirement serving 100s of millions
<1998 – 2002 2003 – 2007 2008 2015
� Compute requirements increasing to serve millions of users
� Focus on maximizing the bottom line by reducing datacenter cost
� Power allowances decreasing due to size and delivery complexities
4. Power and cooling are top cost issues
� Power and cooling cost is
growing faster than new server B
investment
� Only 3% of the power is used
for computing
� IT organization will spend
almost $1 on P&C for every
$1they spend on new servers
� Developing a power efficient
datacenter is more important
than ever
5. Current solutions are not solving cloud
datacenters issues
“To build servers for companies like Facebook, and Amazon, and other
people who are operating fairly homogeneous applications, the servers
have to be cheap, and they have to be super power-efficient. The latest
generations of server processors from Intel and AMD don't deliver the
performance”
Jonathan Heiliger,
Facebook's VP of technical operations
6. “Problems cannot be solved by the same level of
thinking that created them.” Albert Einstein
them.”
• Current few cores technologies fail to deliver
for cloud
� Power too high
� Performance not increasing fast enough
� Integration too low
� Cores are inefficient and continue to bloat
• Manycore is the way to a new horizon
� Higher performance at much lower power
� SoC integration to reduce cost and real estate
� Standard programming models and improving
7. Manycore provides performance, low
power and low cost
Current Technology Tilera Manycore
workers
C C
S S
P P
Helpers and
Management
� More cores � more BOPS
Madison Itanium2
� Lower frequency � low power
� Less than 4% of area to ALUs Chandrakasan effect
� High frequency, high power � Simple, integrated � Lower cost
8. Tilera: the only technology delivering on
the promise of multicore cores
100
- Performance & Scalability 64
cores
- Power, Price, Footprint
- Same architecture 36
cores
up and down
16 Outstanding Scalability
cores
Scalability
Up to
8 cores
Up to The Other Solutions:
32 starved
Quad
core cores • Discontinuity in architectures
Dual • Limited scalability
core
• Power inefficiency
# of Cores
9. Proven best performance/Watt
2X Quad XEON x86 Server 1X TILEPro Server
300W under load 40W under load
30W target (optimized server)
Measured by Tier 11Server OEM running MemcacheD on its own Tilera and x86-based servers
Measured by Tier Server OEM running MemcacheD on its own Tilera and x86-based servers
One TILEPro64 performance = Dual Quad XEON
Much Lower Power
7X Compute/Watt advantage
10. An order of magnitude better processing in
a standard 5Kwatt rack
High efficiency x86 Tilera-based high density
Server 2U production server
� 12 2U servers per rack � 20 2U servers per rack
� 48 processors � 160 processors
� 196 cores � 10,240 cores
� 1,944 BOPS � 20,000+ BOPS
� 96 Gbps I/O � 3,500 Gbps I/O
� 5K watt � 5K watts
New Tilera server ideal for cloud throughput applications
Best performance, I/O, power, and density
Complete utilization of power and space of a standard rack
11. Slashing Total Cost of Ownership for a
given performance
Tilera x86
Same
Performance
60 dual socket servers 100 dual socket servers
4K watts 20K watts
� Up to 40% CAPEX savings
� Up to 80% OPEX savings
� Slashing the TCO by up to 50%
12. Standard SW stack on Tier 1 servers
Front end
Markets Web
Database Data-Mining
Lighttpd
Infrastructure
Apps Memcached
Language Perl
Support
Commercial
Compiler, OS Linux
gcc & g++ Distribution
Hypervisor
~40
Low Power Watts
Tilera-based Prototype Tier 1
Servers Server (now) ODM/OEM
Production Server starting Q3 10
13. Focused on internet datacenters running
LAMP stack
Web
Apache/ Data-
Surfing Mem- mining
Clients PHP cached
Web Servers
Apps
Servers
Load Network
Balancers Switches
Database
Servers
15. Tilera single core performance comparable
to Atom & ARM Cortex-A9 cores
Single-Core Single thread CoreMark™ Comparison
3,500
3,000
CoreMark Score
2,500
2,000
1,500
1,000
500
-
Tilera Tilera ARM Intel
TILEPro64 TILE-Gx36 Cortex-A9 Atom N270
866 MHz 1.25 GHz 1 GHz 1600 MHz
- Data for TILEPro, ARM Cortex-A9, Atom N270 is available on the CoreMark website http://coremark.org/home.php
- Telex and single thread Atom results were measured in Tilera labs
- Single core, single thread result for ARM is calculated based on chip scores
16. Tilera offers standards-based tools and
software stack
Multicore Development Environment
Standards-based tools Standard application stack
Standard programming Application layer
� SMP Linux 2.6.26 � Open source apps Applications
Applications
� Java � Standard C/C++ libs libraries
� ANSI C/C++
Operating System layer Operating System
Integrated tools � 64-way SMP Linux kernel drivers
� SGI or GCC compiler � Zero Overhead Linux
� Standard gdb gprof � Bare metal environment Hypervisor
� Eclipse IDE Virtualization and high
speed I/O drivers
Hypervisor layer
Innovative tools � Virtualizes hardware Tile Processor
� Multicore debug � I/O device drivers Tile Tile Tile Tile …
� Multicore profile
17. Example implementation on a
Manycore-based server
� Complete functionality DDR2 Controller 0
DDR2 Controller 0 DDR2 Controller 1
DDR2 Controller 1
– Networking
– Load balancer
SerDes
SerDes
SerDes
SerDes
PCIe 0
PCIe 0 XAUI 0
XAUI 0
– Application
Cloud applications
Flexible
Flexible
� Manycore values I/O
I/O
UART
UART
GbE 0
GbE 0
JTAG
JTAG
– Granularity: Complete SPI, I2C
SPI, I2C
GbE 1
GbE 1
server on one chip
– Integration: I/O and TCP termination
SerDes
SerDes
SerDes
PCIe 1
SerDes
PCIe 1 XAUI 1
XAUI 1
memory controllers
– Low power: 20watts mgmntLoad balance NIC
– High performance DDR2 Controller 3
DDR2 Controller 3 DDR2 Controller 2
DDR2 Controller 2
– highly efficient
communication on-chip
One SMP OS
18. A complete LAMP server at 40watts
� Complete LAMP stack running on
server
Client browser
– Linux, Apache, PHP, MySQL Data and
management
� Serving standard cloud applications over network
– SugarCRM for enterprise
– Gallery2 for photo sharing
~40
Watts
� Using standard Server management
protocols
– Standard SNMP
– MRTG
� At 40 watts
– 40 watts power draw
– Target 30 watts for optimized platform
LAMP Server
19. This is just the beginning…
beginning…
Tilera’s portfolio demonstrates the scale of many-core
Tilera’
Next generation Performance
Next generation Performance Performance
� >4X the performance of Pro64
� >4X the performance of Pro64
� Twice the power density of Pro64
� Twice the power density of Pro64 Gx
100
High Tier
Gx
64
TILEPro64
TILEPro64 Mid Tier
Gx
TILE64
TILE64 36
Low Tier
Gx
16 TILE-Gx Mid-Low Tier
TILE-Gx Mid-Low Tier
� Small footprint
� Small footprint
TILEPro36 � Low power
� Low power
TILEPro36 � Rich I/O
� Rich I/O
2007 2008 2009 2010
22. Summary
� Cloud faces critical performance, power and cost
issues
� Tilera TILEPro processors deliver a proven
solution to all three problems. Slashing TCO
� Tilera roadmap continues to deliver the promise
of many core for performance, power and cost