SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Rise of the (Wimpy) Machines
Datacenter Efficiency with ARM-based Servers
John Mao!
Director of Strategy, Calxeda!
What is the name of the computer system in
this movie that tried to end the human-race?

Skynet
Origins of Wimpy Core Computing
•  FAWN:	
  A	
  Fast	
  Array	
  of	
  Wimpy	
  Nodes	
  
–  Project	
  from	
  CMU	
  led	
  by	
  Prof.	
  David	
  Anderson,	
  
started	
  in	
  2008	
  (acDve	
  through	
  2012)	
  
–  Measure	
  and	
  compare	
  performance	
  per	
  Joule	
  of	
  	
  
energy	
  advantages	
  over	
  tradiDonal	
  servers	
  
–  Original	
  focus	
  on	
  large	
  distributed	
  key-­‐value	
  store	
  	
  
applicaDons	
  and	
  use-­‐cases	
  (i.e.	
  Amazon	
  Dynamo,	
  	
  
LinkedIn’s	
  Voldemort,	
  Facebook’s	
  memcached)	
  
	
  
[PublicaDon]	
  hTp://www.sigops.org/sosp/sosp09/papers/andersen-­‐sosp09.pdf	
  
[Website]	
  hTp://www.cs.cmu.edu/~fawnproj/	
  
FAWN: A Fast Array of Wimpy Nodes
•  Why	
  FAWN?	
  MoDvated	
  by	
  key	
  trends:	
  
–  Increasing	
  CPU-­‐I/O	
  Gap	
  
–  CPU	
  power	
  consumpDon	
  grows	
  super-­‐linearly	
  	
  
with	
  speed	
  
–  Dynamic	
  power	
  scaling	
  on	
  tradiDonal	
  systems	
  is	
  
surprisingly	
  inefficient	
  
FAWN: A Fast Array of Wimpy Nodes

1G

3G

2G
5G

4G

[Photo	
  Credit]	
  
h-p://www.cs.cmu.edu/~fawnproj/	
  
FAWN: A Fast Array of Wimpy Nodes
•  Multiple generations of hardware used:
–  1G (2008)
•  Single-core 500MHz AMD Geode LX processor
•  256MB DDR SDRAM (400MHz)
•  100Mbps Ethernet

–  5G (2012)
•  Intel Atom D510 – 1.66GHz dual-core w/HT
•  2-4GB DDR2 (667MHz)
•  100Mbps Ethernet
Key Findings from FAWN Project
	
  
“The	
  FAWN	
  cluster	
  achieves	
  364	
  queries	
  per	
  	
  
Joule	
  —	
  two	
  orders	
  of	
  magnitude	
  be-er	
  than	
  	
  
tradiDonal	
  disk-­‐based	
  clusters.”	
  
	
  
	
  
	
  
[Source]	
  hTp://www.sigops.org/sosp/sosp09/papers/andersen-­‐sosp09.pdf	
  

	
  
So what about

®?
ARM

ARM is a good “wimpy” processor & CPU
architecture for the datacenter because:
1.  Focus on low power: origins in embedded
systems and mobile devices
2.  Datacenter focused roadmap: 32-bit CPUs
today, 64-bit CPUs in 1-2 years; increasing
performance (with same energy efficiency)
3.  Business model: ability to integrate for specific
markets and applications
4.  Emerging software ecosystem: while not x86,
ARM has growing ecosystem
Focus on Low Power
•  History in targeting energy-sensitive markets:
–  Netbooks, Smartbooks, Tablets, Thin Clients
–  Smartphones, Feature phones
–  Set-top Box, Digital TV, Blu-Ray players, Gaming
consoles
–  Automotive Infotainment, Navigation
–  Wireless base-stations, VoIP phones and
equipment

•  Design Goals
–  Performance, Power, Easy Synthesis
Focus on Low Power
In	
  2005,	
  about	
  98%	
  of	
  all	
  mobile	
  phones	
  sold	
  
used	
  at	
  least	
  one	
  ARM	
  processor.	
  
	
  
As	
  of	
  2009,	
  due	
  to	
  low	
  power	
  consumpDon	
  the	
  ARM	
  
architecture	
  is	
  the	
  most	
  widely	
  used	
  32-­‐bit	
  RISC	
  	
  
architecture	
  in	
  mobile	
  devices	
  and	
  embedded	
  	
  
systems.	
  
	
  
[Source]	
  hTp://en.wikipedia.org/wiki/ARM_architecture	
  

	
  
Focus on Low Power
Translating ARM energy-efficiency into the
modern datacenter with Cortex-A9:
Total System* Power
(Today!)

~Power per ECX-1000 Node
(with disk @Wall)

Linux at Rest

130 W

5.4 W

phpbench

155 W

6.5 W

Coremark (4 threads per SOC)

169 W

7.0 W

Website @ 70% Utilization

172 W

7.2 W

LINPACK

191 W

7.9 W

STREAM

205 W

8.5 W

Workload
(on 24 nodes & SSDs)

*All measurements done on a 24-node system @1.1GHz, with 24 SSDs and 96 GB DRAM in the Calxeda Lab.

For specific workloads, ECX-1000 can enable a complete
24-node cluster at similar power level as a 2 socket x86.
But, what about performance?
Online Review: Calxeda’s ARM Server Tested

Anandtech chartered review
comparing Boston Viridis’
24-Calxeda ECX-1000
(Cortex-A9) cluster against
Intel E5-2650Lsystem.
(March 2012)

http://www.anandtech.com/show/6757/calxedas-arm-server-tested
Calxeda Provides Better Web Throughput

Boston Viridis outperforms
Xeon E5-2650L by 30% with
more than 15 users.
	
  
Test	
  is	
  PHPbb	
  running	
  on	
  Apache2	
  with	
  
variable	
  numbers	
  of	
  users	
  (concurrency)	
  
generaDng	
  traffic.	
  
Calxeda Provides Lower Response Times

Boston Viridis outperforms
Xeon E5-2650L by 60% with
more than 15 users.
	
  
Test	
  is	
  PHPbb	
  running	
  on	
  Apache2	
  with	
  
variable	
  numbers	
  of	
  users	
  (concurrency)	
  
generaDng	
  traffic.	
  
Calxeda Provides Highest Performance/Watt

Boston Viridis provides 80%
more throughput per Watt
than Xeon E5.
•  10-36% less raw power
	
  
Test	
  is	
  PHPbb	
  running	
  on	
  Apache2	
  with	
  
variable	
  numbers	
  of	
  users	
  (concurrency)	
  
generaDng	
  traffic.	
  
Online Review: Calxeda’s ARM Server Tested
Reviewer’s Key Takeaways:
–  For scale-out workloads, Calxeda’s ARM-based scale-out
hardware architecture is very promising.
–  Microbenchmarks show Calxeda ECX-1000 ~10% behind
Intel Atom N2800 @1.86 MHz
–  “Real World” Application Benchmarking shows 70%+ higher
performance-per-watt than Intel Xeon E5 at mid to high user load
–  “Calxeda really did it: each server needs about 8.3W (200W/24),
measured at the wall…about 6W (at 1.4GHz) per server node…”
–  “So on the one hand, no, the current Calxeda servers are no
Intel Xeon killers (yet). However, we feel that Calxeda's
ECX-1000 server node is revolutionary technology.”
®
ARM

Cortex-A15

•  Based on ARMv7A architecture
–  Ensures software application compatibility
with orther Cortex-A processors

•  LPAE support up to 1TB physical memory
•  Full hardware virtualization support
•  From ARM: delivers 2X performance over
Cortex-A9 processor with similar power
•  big.LITTLE configuration support for
mobile devices
Datacenter Focused Roadmap
3rd Generation
Calxeda Fabric and I/O

Lago (ARM® Cortex A57)

“Triple Play”: 3 Generations
of Pin-Compatible SOCs

Sarita (ARM® Cortex A57)

Flagship 64-bit Product for a
Broader Application Set

Compatible 64-bit On-Ramp for Early Access and
Ecosystem Enablement

Midway: ECX-2000 (4 Core, ARM® Cortex A15)
Performance/$ for Cloud and Analytics

Highbank: ECX-1000 (4 Core, ARM® Cortex A9)
Power Efficient Solution for Storage and Web Hosting

2013

2014

2015

[Source] Calxeda public SOC roadmap (June 2013)
“Midway”: Calxeda ECX-2000
Compared to Calxeda’s Cortex-A9 SOC
(ECX-1000), the “Midway” SOC delivers:
–  1.5X more single-thread performance
–  2X more floating point performance
–  3X STREAM (memory b/w) performance
–  4X+ more physical memory support (16GB+)
–  Same performance-per-Watt
Plan to update Anandtech benchmark report
But, ARM doesn’t make/sell SOCs?
®
ARM

Business Model

•  ARM does not make or sell SOC.
•  Instead, ARM licenses IP and technology
to partners (like Calxeda) who design and
build System-on-Chips (SOCs) for various
industries and markets.
•  Calxeda is focused exclusively on bringing
ARM-based technology to the datacenter.
–  Calxeda provides own IP (e.g. Fabric) as
additional value for servers.
EnergyCore® architecture at a glance
A complete building block for hyper-efficient computing

EnergyCore
Management Engine
Advanced system, power
and fabric management for
energy-proportional
computing

I/O Controllers
Standard drivers, standard
interfaces. No surprises.

Processor Complex
Multi-core ARM®
processors integrated
with high bandwidth
memory controllers

EnergyCore
Fabric Switch
Integrated high-performance
fabric provides inter-node
connectivity with industry
standard networking
®
EnergyCore

Fabric (F1/F2)
Integrated 80Gb (8x10Gb cross-bar)
Fabric Switch:
•  Up to 5 external links:
–  Dynamic bandwidth: 1Gb to 10 Gb
per link
–  < 200 Nano-Seconds latency,
node to node

•  3 internal links (to the SOC):
–  2x 10Gb Ethernet ports to the OS
–  1x 10Gb Ethernet port to Mgmt
–  Transparent to OS and software

•  Topology agnostic

à Eliminates Top-of-Rack-Switch ports & cabling
à Enables extreme density; lowers cost and power
So, what can we use this for?
Target Workloads
•  Data-Intensive Applications:
–  Storage (scale-out, distributed storage)
•  i.e. Ceph, Gluster, etc.

–  Analytics (NoSQL, MapReduce, distributed
databases)
•  i.e. Hadoop, Cassandra, etc.

•  Distributed, State-less Applications
–  Web Front End
–  Caching Servers
–  Content Distribution Networks (CDN)
Use-Case: Storage via Ceph
•  Official Ceph “Dumpling”+ release now supports
Calxeda-based platforms
•  Initial benchmarks complete (with x86 comparison)
–  Even without optimizations, performance is promising

•  Identified optimization areas (under investigation):
–  Potentially use NEON instructions for CRC32
–  Implement zero-copy on OSD’s
–  Transition reads/write to bufferlists
–  Optimize client side too – librados/librbd
Use-Case: Storage via Ceph

With same number of HDD’s,
Calxeda-based system delivers
50% more performance than
traditional x86-servers.
The AAEON CRS-200S-2R Advantage
An ARM-based, lower cost, higher performance server platform for scale-out storage

Calxeda’s ARM-based SOCs:
•  Energy Efficient
•  More cores per HDD
•  Lower system power
•  High Bandwidth Fabric
•  Multi-10Gb links for
data-intensive apps

Compared to traditional x86-based,
2U rack mount servers, the AAEON
CRS-200S-2R server platform is:

ü  35% Lower TCO*
ü  66% Less Rack Space
ü  50% Higher performance
Summary
•  Even 64-bit ARM processors are not ideal for
every single workload.
•  However, scale-out, data-intensive, workloads
can leverage ARM’s energy-efficiency to provide a
significantly better TCO.
•  For the server market (especially with scale-out
apps), replacing the CPU core is not enough.
–  Look for SOCs that optimize “between the nodes” in a
cluster (e.g. fabric interconnects will help dramatically)

•  Interested in joining the “ARM revolution”?
–  Contact us! – John Mao, john.mao@calxeda.com
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
IBM Switzerland
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel Presentation
Eric Van Hensbergen
 

Was ist angesagt? (20)

Solaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and TuningSolaris Linux Performance, Tools and Tuning
Solaris Linux Performance, Tools and Tuning
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
EC2 Foundations - Laura Thomson
EC2 Foundations - Laura ThomsonEC2 Foundations - Laura Thomson
EC2 Foundations - Laura Thomson
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
Ceph Day Seoul - Ceph on Arm Scaleable and Efficient
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Huawei Powers Efficient and Scalable HPC
Huawei Powers Efficient and Scalable HPCHuawei Powers Efficient and Scalable HPC
Huawei Powers Efficient and Scalable HPC
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
IBM Power for Life Sciences
IBM Power for Life SciencesIBM Power for Life Sciences
IBM Power for Life Sciences
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
ISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel PresentationISC14 Embedded HPC BoF Panel Presentation
ISC14 Embedded HPC BoF Panel Presentation
 
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPCPerforming Simulation-Based, Real-time Decision Making with Cloud HPC
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
 
Nimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing MiddleNimbix: Cloud for the Missing Middle
Nimbix: Cloud for the Missing Middle
 

Andere mochten auch

Haeinsa deview _최종
Haeinsa deview _최종Haeinsa deview _최종
Haeinsa deview _최종
NAVER D2
 
(Michal karnicki & alex chiang) canonical
(Michal karnicki & alex chiang) canonical(Michal karnicki & alex chiang) canonical
(Michal karnicki & alex chiang) canonical
NAVER D2
 
Deview 발표자료 v1.5.3
Deview 발표자료 v1.5.3Deview 발표자료 v1.5.3
Deview 발표자료 v1.5.3
NAVER D2
 
241 towards real-time collaboration system
241 towards real-time collaboration system241 towards real-time collaboration system
241 towards real-time collaboration system
NAVER D2
 
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
NAVER D2
 
112 deview
112 deview112 deview
112 deview
NAVER D2
 
Deview 2013 keynote final
Deview 2013 keynote finalDeview 2013 keynote final
Deview 2013 keynote final
NAVER D2
 
232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
NAVER D2
 
파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표
NAVER D2
 

Andere mochten auch (12)

Haeinsa deview _최종
Haeinsa deview _최종Haeinsa deview _최종
Haeinsa deview _최종
 
(Michal karnicki & alex chiang) canonical
(Michal karnicki & alex chiang) canonical(Michal karnicki & alex chiang) canonical
(Michal karnicki & alex chiang) canonical
 
Deview 발표자료 v1.5.3
Deview 발표자료 v1.5.3Deview 발표자료 v1.5.3
Deview 발표자료 v1.5.3
 
241 towards real-time collaboration system
241 towards real-time collaboration system241 towards real-time collaboration system
241 towards real-time collaboration system
 
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
 
Recommendation for dummy
Recommendation for dummyRecommendation for dummy
Recommendation for dummy
 
112 deview
112 deview112 deview
112 deview
 
Deview 2013 keynote final
Deview 2013 keynote finalDeview 2013 keynote final
Deview 2013 keynote final
 
232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
 
Deview 2013 - 나는 왜 개발자인데 자신이 없을까?
Deview 2013 - 나는 왜 개발자인데자신이 없을까?Deview 2013 - 나는 왜 개발자인데자신이 없을까?
Deview 2013 - 나는 왜 개발자인데 자신이 없을까?
 
파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표
 
Culture
CultureCulture
Culture
 

Ähnlich wie Deview 2013 rise of the wimpy machines - john mao

Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
NETWAYS
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
Taldor Group
 

Ähnlich wie Deview 2013 rise of the wimpy machines - john mao (20)

RedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power SystemsRedisConf17 - Redis Enterprise on IBM Power Systems
RedisConf17 - Redis Enterprise on IBM Power Systems
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
IBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsIBM Power leading Cognitive Systems
IBM Power leading Cognitive Systems
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems Superior Cloud Economics with Power Systems
Superior Cloud Economics with Power Systems
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Grid rac preso 051007
Grid rac preso 051007Grid rac preso 051007
Grid rac preso 051007
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster Interconnects
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
3. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 20133. ami big data hadoop on ucs seminar may 2013
3. ami big data hadoop on ucs seminar may 2013
 

Mehr von NAVER D2

Mehr von NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Deview 2013 rise of the wimpy machines - john mao

  • 1. Rise of the (Wimpy) Machines Datacenter Efficiency with ARM-based Servers John Mao! Director of Strategy, Calxeda!
  • 2. What is the name of the computer system in this movie that tried to end the human-race? Skynet
  • 3.
  • 4. Origins of Wimpy Core Computing •  FAWN:  A  Fast  Array  of  Wimpy  Nodes   –  Project  from  CMU  led  by  Prof.  David  Anderson,   started  in  2008  (acDve  through  2012)   –  Measure  and  compare  performance  per  Joule  of     energy  advantages  over  tradiDonal  servers   –  Original  focus  on  large  distributed  key-­‐value  store     applicaDons  and  use-­‐cases  (i.e.  Amazon  Dynamo,     LinkedIn’s  Voldemort,  Facebook’s  memcached)     [PublicaDon]  hTp://www.sigops.org/sosp/sosp09/papers/andersen-­‐sosp09.pdf   [Website]  hTp://www.cs.cmu.edu/~fawnproj/  
  • 5. FAWN: A Fast Array of Wimpy Nodes •  Why  FAWN?  MoDvated  by  key  trends:   –  Increasing  CPU-­‐I/O  Gap   –  CPU  power  consumpDon  grows  super-­‐linearly     with  speed   –  Dynamic  power  scaling  on  tradiDonal  systems  is   surprisingly  inefficient  
  • 6. FAWN: A Fast Array of Wimpy Nodes 1G 3G 2G 5G 4G [Photo  Credit]   h-p://www.cs.cmu.edu/~fawnproj/  
  • 7. FAWN: A Fast Array of Wimpy Nodes •  Multiple generations of hardware used: –  1G (2008) •  Single-core 500MHz AMD Geode LX processor •  256MB DDR SDRAM (400MHz) •  100Mbps Ethernet –  5G (2012) •  Intel Atom D510 – 1.66GHz dual-core w/HT •  2-4GB DDR2 (667MHz) •  100Mbps Ethernet
  • 8. Key Findings from FAWN Project   “The  FAWN  cluster  achieves  364  queries  per     Joule  —  two  orders  of  magnitude  be-er  than     tradiDonal  disk-­‐based  clusters.”         [Source]  hTp://www.sigops.org/sosp/sosp09/papers/andersen-­‐sosp09.pdf    
  • 9. So what about ®? ARM ARM is a good “wimpy” processor & CPU architecture for the datacenter because: 1.  Focus on low power: origins in embedded systems and mobile devices 2.  Datacenter focused roadmap: 32-bit CPUs today, 64-bit CPUs in 1-2 years; increasing performance (with same energy efficiency) 3.  Business model: ability to integrate for specific markets and applications 4.  Emerging software ecosystem: while not x86, ARM has growing ecosystem
  • 10. Focus on Low Power •  History in targeting energy-sensitive markets: –  Netbooks, Smartbooks, Tablets, Thin Clients –  Smartphones, Feature phones –  Set-top Box, Digital TV, Blu-Ray players, Gaming consoles –  Automotive Infotainment, Navigation –  Wireless base-stations, VoIP phones and equipment •  Design Goals –  Performance, Power, Easy Synthesis
  • 11. Focus on Low Power In  2005,  about  98%  of  all  mobile  phones  sold   used  at  least  one  ARM  processor.     As  of  2009,  due  to  low  power  consumpDon  the  ARM   architecture  is  the  most  widely  used  32-­‐bit  RISC     architecture  in  mobile  devices  and  embedded     systems.     [Source]  hTp://en.wikipedia.org/wiki/ARM_architecture    
  • 12. Focus on Low Power Translating ARM energy-efficiency into the modern datacenter with Cortex-A9: Total System* Power (Today!) ~Power per ECX-1000 Node (with disk @Wall) Linux at Rest 130 W 5.4 W phpbench 155 W 6.5 W Coremark (4 threads per SOC) 169 W 7.0 W Website @ 70% Utilization 172 W 7.2 W LINPACK 191 W 7.9 W STREAM 205 W 8.5 W Workload (on 24 nodes & SSDs) *All measurements done on a 24-node system @1.1GHz, with 24 SSDs and 96 GB DRAM in the Calxeda Lab. For specific workloads, ECX-1000 can enable a complete 24-node cluster at similar power level as a 2 socket x86.
  • 13. But, what about performance?
  • 14. Online Review: Calxeda’s ARM Server Tested Anandtech chartered review comparing Boston Viridis’ 24-Calxeda ECX-1000 (Cortex-A9) cluster against Intel E5-2650Lsystem. (March 2012) http://www.anandtech.com/show/6757/calxedas-arm-server-tested
  • 15. Calxeda Provides Better Web Throughput Boston Viridis outperforms Xeon E5-2650L by 30% with more than 15 users.   Test  is  PHPbb  running  on  Apache2  with   variable  numbers  of  users  (concurrency)   generaDng  traffic.  
  • 16. Calxeda Provides Lower Response Times Boston Viridis outperforms Xeon E5-2650L by 60% with more than 15 users.   Test  is  PHPbb  running  on  Apache2  with   variable  numbers  of  users  (concurrency)   generaDng  traffic.  
  • 17. Calxeda Provides Highest Performance/Watt Boston Viridis provides 80% more throughput per Watt than Xeon E5. •  10-36% less raw power   Test  is  PHPbb  running  on  Apache2  with   variable  numbers  of  users  (concurrency)   generaDng  traffic.  
  • 18. Online Review: Calxeda’s ARM Server Tested Reviewer’s Key Takeaways: –  For scale-out workloads, Calxeda’s ARM-based scale-out hardware architecture is very promising. –  Microbenchmarks show Calxeda ECX-1000 ~10% behind Intel Atom N2800 @1.86 MHz –  “Real World” Application Benchmarking shows 70%+ higher performance-per-watt than Intel Xeon E5 at mid to high user load –  “Calxeda really did it: each server needs about 8.3W (200W/24), measured at the wall…about 6W (at 1.4GHz) per server node…” –  “So on the one hand, no, the current Calxeda servers are no Intel Xeon killers (yet). However, we feel that Calxeda's ECX-1000 server node is revolutionary technology.”
  • 19. ® ARM Cortex-A15 •  Based on ARMv7A architecture –  Ensures software application compatibility with orther Cortex-A processors •  LPAE support up to 1TB physical memory •  Full hardware virtualization support •  From ARM: delivers 2X performance over Cortex-A9 processor with similar power •  big.LITTLE configuration support for mobile devices
  • 20. Datacenter Focused Roadmap 3rd Generation Calxeda Fabric and I/O Lago (ARM® Cortex A57) “Triple Play”: 3 Generations of Pin-Compatible SOCs Sarita (ARM® Cortex A57) Flagship 64-bit Product for a Broader Application Set Compatible 64-bit On-Ramp for Early Access and Ecosystem Enablement Midway: ECX-2000 (4 Core, ARM® Cortex A15) Performance/$ for Cloud and Analytics Highbank: ECX-1000 (4 Core, ARM® Cortex A9) Power Efficient Solution for Storage and Web Hosting 2013 2014 2015 [Source] Calxeda public SOC roadmap (June 2013)
  • 21. “Midway”: Calxeda ECX-2000 Compared to Calxeda’s Cortex-A9 SOC (ECX-1000), the “Midway” SOC delivers: –  1.5X more single-thread performance –  2X more floating point performance –  3X STREAM (memory b/w) performance –  4X+ more physical memory support (16GB+) –  Same performance-per-Watt Plan to update Anandtech benchmark report
  • 22. But, ARM doesn’t make/sell SOCs?
  • 23. ® ARM Business Model •  ARM does not make or sell SOC. •  Instead, ARM licenses IP and technology to partners (like Calxeda) who design and build System-on-Chips (SOCs) for various industries and markets. •  Calxeda is focused exclusively on bringing ARM-based technology to the datacenter. –  Calxeda provides own IP (e.g. Fabric) as additional value for servers.
  • 24. EnergyCore® architecture at a glance A complete building block for hyper-efficient computing EnergyCore Management Engine Advanced system, power and fabric management for energy-proportional computing I/O Controllers Standard drivers, standard interfaces. No surprises. Processor Complex Multi-core ARM® processors integrated with high bandwidth memory controllers EnergyCore Fabric Switch Integrated high-performance fabric provides inter-node connectivity with industry standard networking
  • 25. ® EnergyCore Fabric (F1/F2) Integrated 80Gb (8x10Gb cross-bar) Fabric Switch: •  Up to 5 external links: –  Dynamic bandwidth: 1Gb to 10 Gb per link –  < 200 Nano-Seconds latency, node to node •  3 internal links (to the SOC): –  2x 10Gb Ethernet ports to the OS –  1x 10Gb Ethernet port to Mgmt –  Transparent to OS and software •  Topology agnostic à Eliminates Top-of-Rack-Switch ports & cabling à Enables extreme density; lowers cost and power
  • 26. So, what can we use this for?
  • 27. Target Workloads •  Data-Intensive Applications: –  Storage (scale-out, distributed storage) •  i.e. Ceph, Gluster, etc. –  Analytics (NoSQL, MapReduce, distributed databases) •  i.e. Hadoop, Cassandra, etc. •  Distributed, State-less Applications –  Web Front End –  Caching Servers –  Content Distribution Networks (CDN)
  • 28. Use-Case: Storage via Ceph •  Official Ceph “Dumpling”+ release now supports Calxeda-based platforms •  Initial benchmarks complete (with x86 comparison) –  Even without optimizations, performance is promising •  Identified optimization areas (under investigation): –  Potentially use NEON instructions for CRC32 –  Implement zero-copy on OSD’s –  Transition reads/write to bufferlists –  Optimize client side too – librados/librbd
  • 29. Use-Case: Storage via Ceph With same number of HDD’s, Calxeda-based system delivers 50% more performance than traditional x86-servers.
  • 30. The AAEON CRS-200S-2R Advantage An ARM-based, lower cost, higher performance server platform for scale-out storage Calxeda’s ARM-based SOCs: •  Energy Efficient •  More cores per HDD •  Lower system power •  High Bandwidth Fabric •  Multi-10Gb links for data-intensive apps Compared to traditional x86-based, 2U rack mount servers, the AAEON CRS-200S-2R server platform is: ü  35% Lower TCO* ü  66% Less Rack Space ü  50% Higher performance
  • 31. Summary •  Even 64-bit ARM processors are not ideal for every single workload. •  However, scale-out, data-intensive, workloads can leverage ARM’s energy-efficiency to provide a significantly better TCO. •  For the server market (especially with scale-out apps), replacing the CPU core is not enough. –  Look for SOCs that optimize “between the nodes” in a cluster (e.g. fabric interconnects will help dramatically) •  Interested in joining the “ARM revolution”? –  Contact us! – John Mao, john.mao@calxeda.com