SlideShare ist ein Scribd-Unternehmen logo
1 von 44
An introduction to the
Design of Warehouse-Scale Computers
Computer Architecture
A.Y. 2014/2015
Authors:
Piscione Pietro
Villardita Alessio
Degree: Computer Engineering
What is a WSC
Warehouse-Scale
Computer
● Scalable
● Distributed
● Cost efficiency
VMs and
applications
Disks
Networking
Servers
Cooling
Energy
proportionality Costs
WSC @
Repair and
failures
Web search
and QPS
e-commerce
Why WSC
Motivations:
● Cloud services
● E-mail
● Social network
● News
● E-commerce
and so on...
Is WSC a data center
Data centers:
● Not co-located
● Host services for
multiple providers
● Third party SW
solution
WSCs:
● Co-located
● Single organization
● Homogenous SW
and HW organization
Cost efficiency at scale
It requires more:
● Computing power
● Storage
● Throughput
● Reliability
Morecosts
WSC architecture overview
Low-end
server Cluster
WSC: SW and HW techniques
● Replication
● Error correction
● Sharding
● Load-balancing
● Health checking
● Compression
● Consistency
● Canaries
● Platform-level software: common firmware,
kernel, operating system distribution, and
libraries
● Cluster-level infrastructure software:
MapReduce, BigTable, Hadoop, Spanner, etc.
● Application-level software: Google search,
Gmail, Google Maps, etc.
Software Layers
Platform-level software
Virtual machines
Pro: Versatile, Reliable, Isolation, Performance,
Encapsulation, Costs, Flexibility, Checkpointing,
Live Migration.
Cons:
I/O intensive WL
Hardware Building Blocks
● Server hardware
● Network fabric
● Storage hierarchy components
Large SMP vs low-end server nodes
Warehouse scale
Limits of very low-end cores
● Amdahl’s law: difficult to reduce serialization
and communication overheads
● The larger # of threads, the larger the
variability in response times
Ex.: Web Server Latency per request
High-End cores Low-End cores (3x slower)
1s/request (50% CPU) 2s/request (75% CPU)
Network fabric
● Network scalability: hard to put in practice;
offloading some traffic to a special-purpose
network
● Protocols: FCoE (FibreChannel over Ethernet)
and iSCSI (SCSI over IP)
● Programmable network: OpenFlow and SDN
WSC architecture overview - Network
Characteristics Ethernet
cable
Optical fiber
Performance (Gbs) 1-10 10-1000
MTBF (years) >45 >10
Costs ($/km) 200-500 700-1200
What protocol is used in the data center? Infiniband-Ethernet
Storage hierarchy componentsLatency
Size
WSC architecture overview - Disks
Characteristics HDD SDD
Performance (MBs) R:59 W:60 R:100 W:80
Active Power (W) 3.86 1
MTBF (Mh) >2 <0.7
Costs ($/TB) 60-75 130-150
Which is the file system? GFS
Modelling costs
Total Cost=Capital Cost+Operational cost
Capital cost depends from:
● Design
● Size
● Location
● Speed of construction
Operational cost:
It hardly depends
from applications
Capital Cost - example1
1
Ref. [2]
Servers
$2,997,090
Power &
Cooling
$1,296,902
Power
$1,042,440
Other
$284,686
Operational Cost - example
● Power consumption
○ Cooling
○ Servers
○ Energy power efficiency
○ Workload
● Repairs and failure
WSC Power Consumption: overview
● A datacenter uses
10-20% of the
servers power
● Cooling
● High-efficiency in
power conversion
CPUs
DRAM
Disks
Cooling
Closed Cooling System
Energy and power efficiency
● Measures are workload dependant
● Distinguish between three main factors:
● State-of-the-Art TPUE = PUE x SPUE around 1.44
● Average data centers have TPUE = 3.2
Efficiency
1
SPUE
1
PUE
C
TEEC
Facility Server Computing
For each productive watt, 2.2 more are consumed!
Sources of Efficiency Losses
IT
Equipment
Cooling
UPSAir movement
Workload
Large continuous
batch
Mix: online services
Energy proportionality
Energy efficiency key factors
● Efficient load distribution: Live migration and
Google File System
● Idle times must be little
● Energy-proportional computing
● Workload peaks prediction models (complex)
Energy efficiency Benchmarks
● LINPACK: world’s top supercomputers
● JouleSort
● SPECpower
● Emerald
● SP C-2/E
● SPECpower_ssj2008: based on a broad class of
server workloads
Storage: # of transactions per Watt
Server-level: performance-to-power
Dealing with failures and repairs
System
Available @ 99.9%
Unavailable
FailureHW upgrade Maintenance
Tolerating faults, not hiding them
“A gracefully degraded service”
But how?
Fault-Tolerant SW Infrastructure
Requirements:
● HW faults can be tolerated
● HW level: its faults must
always be detected and
reported to software
● support a broad class of
operational procedures
inexpensive PC-class HW
costs saving and
optimization
Pros:
reactive containment and
recovery actions
turn in
Truly faulty
Main faults causes
@Google:
● Software errors
● Human mistakes
● Wrong
configurations
But also (10-25%):
● Hardware-related
○ Disk errors
○ DRAM soft errors
Config
SW
Human
HW
Net
Oth
World is not perfect, and holds on
And Google’s WSCs do so:
● 1.2-2 crashes per year (mature server)
● with 2,000 servers, approximately 1 crash every 2.5 h
(10 per day)
● ⅓ of servers is affected by correctable DRAM errors,
on average per year (1 error per server every 2.5 h)
● with ECC, only 1.3% of all machines ever experience
uncorrectable memory errors per year
Hardware
Google’s Availability
55%
6 30
25%
1% > 1 day!
99.84%
● Monitors servers’
configuration, activity,
environmental, and
error data
● Individual machine
diagnostics
● Stability of new system
software versions
● Suggest repairs action
Google System Health
Study case: web search
Web size?
Nobody knows it.
Classification?
Using PageRank.
QPS?
Not possible to
establish a priori
Logical view of a web index
Study case: web search - 2
No energy proportionality
Hour
CPU - energy proportionality
VFS solution
Trade-off
Performance vs.
Power consumption
A real life example
Benchmark for Enterprise applications
16 x DELL M1000e
14 x IBM Blade
Center Model
16 x HP C7000
What are we going to test?
SPECpower_ssj2008 description
How it’s composed ?
● New Order (30.3%)
● Payment (30.3%)
● Order Status (3.0%)
● Delivery (3.0%)
● Stock Level (3.0%)
● Customer Report (30.3%)
How does it work ?
Benchmark results
Lower is better
Benchmark results - 2
Higher is better
Conclusions
Internet grows tirelessly!
User side
● Services
● Price
● Latency
● Availability
WSC side
● Hardware
● Costs
● Performance
● Reliability and
fault tolerance
References
[1] Barroso, Clidaras, Hölzle, The Datacenter as a Computer: An
Introduction to the Design of Warehouse-Scale Machines, Morgan &
Claypool Publishers, 2013
[2]http://perspectives.mvdirona.com/2008/11/cost-of-power-in-large-
scale-data-centers/, James Hamilton, AWS Team
Thank you for
listening !

Weitere ähnliche Inhalte

Was ist angesagt? (20)

Unit v. HDL Synthesis Process
Unit v. HDL Synthesis ProcessUnit v. HDL Synthesis Process
Unit v. HDL Synthesis Process
 
Hardware Software Codesign
Hardware Software CodesignHardware Software Codesign
Hardware Software Codesign
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Low Power VLSI Designs
Low Power VLSI DesignsLow Power VLSI Designs
Low Power VLSI Designs
 
Lect09 adv-branch-prediction
Lect09 adv-branch-predictionLect09 adv-branch-prediction
Lect09 adv-branch-prediction
 
Intro to parallel computing
Intro to parallel computingIntro to parallel computing
Intro to parallel computing
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Serial Communication Interfaces
Serial Communication InterfacesSerial Communication Interfaces
Serial Communication Interfaces
 
Virtual memory managment
Virtual memory managmentVirtual memory managment
Virtual memory managment
 
Fpga
FpgaFpga
Fpga
 
Verilog Lecture4 2014
Verilog Lecture4 2014Verilog Lecture4 2014
Verilog Lecture4 2014
 
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
energy efficient unicast
energy efficient unicastenergy efficient unicast
energy efficient unicast
 
Storage Management
Storage ManagementStorage Management
Storage Management
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
8086 assembly
8086 assembly8086 assembly
8086 assembly
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 

Ähnlich wie An introduction to the Design of Warehouse-Scale Computers

Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Ankit Gupta
 
Design Like a Pro: How to Pick the Right System Architecture
Design Like a Pro: How to Pick the Right System ArchitectureDesign Like a Pro: How to Pick the Right System Architecture
Design Like a Pro: How to Pick the Right System ArchitectureInductive Automation
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingRoger Rafanell Mas
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationSarmad Makhdoom
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesPapitha Velumani
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on ITAnand Haridass
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Sudip Roy
 
Windows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper VWindows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper VAmit Gatenyo
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Deepak Shankar
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesScyllaDB
 
Compute Engine overview _ Sales _ Y21.pptx
Compute Engine overview _ Sales _ Y21.pptxCompute Engine overview _ Sales _ Y21.pptx
Compute Engine overview _ Sales _ Y21.pptxmosharafhossain95
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsAlluxio, Inc.
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesPapitha Velumani
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesPapitha Velumani
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...Renato Bonomini
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.
 

Ähnlich wie An introduction to the Design of Warehouse-Scale Computers (20)

Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
Design Like a Pro: How to Pick the Right System Architecture
Design Like a Pro: How to Pick the Right System ArchitectureDesign Like a Pro: How to Pick the Right System Architecture
Design Like a Pro: How to Pick the Right System Architecture
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud Computing
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databases
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on IT
 
Virtualization Go Green
Virtualization Go GreenVirtualization Go Green
Virtualization Go Green
 
Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)Multi_Core_Processor_2015_(Download it!)
Multi_Core_Processor_2015_(Download it!)
 
Windows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper VWindows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper V
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
Compute Engine overview _ Sales _ Y21.pptx
Compute Engine overview _ Sales _ Y21.pptxCompute Engine overview _ Sales _ Y21.pptx
Compute Engine overview _ Sales _ Y21.pptx
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and Applications
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databases
 
Distributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databasesDistributed, concurrent, and independent access to encrypted cloud databases
Distributed, concurrent, and independent access to encrypted cloud databases
 
Connecticut CMG - Demystifying Oracle database capacity management with wor...
Connecticut CMG - Demystifying Oracle database  capacity management with  wor...Connecticut CMG - Demystifying Oracle database  capacity management with  wor...
Connecticut CMG - Demystifying Oracle database capacity management with wor...
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 

Kürzlich hochgeladen

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf
 

Kürzlich hochgeladen (20)

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 

An introduction to the Design of Warehouse-Scale Computers

  • 1. An introduction to the Design of Warehouse-Scale Computers Computer Architecture A.Y. 2014/2015 Authors: Piscione Pietro Villardita Alessio Degree: Computer Engineering
  • 2. What is a WSC Warehouse-Scale Computer ● Scalable ● Distributed ● Cost efficiency
  • 4. Why WSC Motivations: ● Cloud services ● E-mail ● Social network ● News ● E-commerce and so on...
  • 5. Is WSC a data center Data centers: ● Not co-located ● Host services for multiple providers ● Third party SW solution WSCs: ● Co-located ● Single organization ● Homogenous SW and HW organization
  • 6. Cost efficiency at scale It requires more: ● Computing power ● Storage ● Throughput ● Reliability Morecosts
  • 8. WSC: SW and HW techniques ● Replication ● Error correction ● Sharding ● Load-balancing ● Health checking ● Compression ● Consistency ● Canaries
  • 9. ● Platform-level software: common firmware, kernel, operating system distribution, and libraries ● Cluster-level infrastructure software: MapReduce, BigTable, Hadoop, Spanner, etc. ● Application-level software: Google search, Gmail, Google Maps, etc. Software Layers
  • 10. Platform-level software Virtual machines Pro: Versatile, Reliable, Isolation, Performance, Encapsulation, Costs, Flexibility, Checkpointing, Live Migration. Cons: I/O intensive WL
  • 11. Hardware Building Blocks ● Server hardware ● Network fabric ● Storage hierarchy components
  • 12. Large SMP vs low-end server nodes Warehouse scale
  • 13. Limits of very low-end cores ● Amdahl’s law: difficult to reduce serialization and communication overheads ● The larger # of threads, the larger the variability in response times Ex.: Web Server Latency per request High-End cores Low-End cores (3x slower) 1s/request (50% CPU) 2s/request (75% CPU)
  • 14. Network fabric ● Network scalability: hard to put in practice; offloading some traffic to a special-purpose network ● Protocols: FCoE (FibreChannel over Ethernet) and iSCSI (SCSI over IP) ● Programmable network: OpenFlow and SDN
  • 15. WSC architecture overview - Network Characteristics Ethernet cable Optical fiber Performance (Gbs) 1-10 10-1000 MTBF (years) >45 >10 Costs ($/km) 200-500 700-1200 What protocol is used in the data center? Infiniband-Ethernet
  • 17. WSC architecture overview - Disks Characteristics HDD SDD Performance (MBs) R:59 W:60 R:100 W:80 Active Power (W) 3.86 1 MTBF (Mh) >2 <0.7 Costs ($/TB) 60-75 130-150 Which is the file system? GFS
  • 18. Modelling costs Total Cost=Capital Cost+Operational cost Capital cost depends from: ● Design ● Size ● Location ● Speed of construction Operational cost: It hardly depends from applications
  • 19. Capital Cost - example1 1 Ref. [2] Servers $2,997,090 Power & Cooling $1,296,902 Power $1,042,440 Other $284,686
  • 20. Operational Cost - example ● Power consumption ○ Cooling ○ Servers ○ Energy power efficiency ○ Workload ● Repairs and failure
  • 21. WSC Power Consumption: overview ● A datacenter uses 10-20% of the servers power ● Cooling ● High-efficiency in power conversion CPUs DRAM Disks Cooling
  • 23. Energy and power efficiency ● Measures are workload dependant ● Distinguish between three main factors: ● State-of-the-Art TPUE = PUE x SPUE around 1.44 ● Average data centers have TPUE = 3.2 Efficiency 1 SPUE 1 PUE C TEEC Facility Server Computing For each productive watt, 2.2 more are consumed!
  • 24. Sources of Efficiency Losses IT Equipment Cooling UPSAir movement
  • 27. Energy efficiency key factors ● Efficient load distribution: Live migration and Google File System ● Idle times must be little ● Energy-proportional computing ● Workload peaks prediction models (complex)
  • 28. Energy efficiency Benchmarks ● LINPACK: world’s top supercomputers ● JouleSort ● SPECpower ● Emerald ● SP C-2/E ● SPECpower_ssj2008: based on a broad class of server workloads Storage: # of transactions per Watt Server-level: performance-to-power
  • 29. Dealing with failures and repairs System Available @ 99.9% Unavailable FailureHW upgrade Maintenance Tolerating faults, not hiding them “A gracefully degraded service” But how?
  • 30. Fault-Tolerant SW Infrastructure Requirements: ● HW faults can be tolerated ● HW level: its faults must always be detected and reported to software ● support a broad class of operational procedures inexpensive PC-class HW costs saving and optimization Pros: reactive containment and recovery actions turn in
  • 31. Truly faulty Main faults causes @Google: ● Software errors ● Human mistakes ● Wrong configurations But also (10-25%): ● Hardware-related ○ Disk errors ○ DRAM soft errors Config SW Human HW Net Oth
  • 32. World is not perfect, and holds on And Google’s WSCs do so: ● 1.2-2 crashes per year (mature server) ● with 2,000 servers, approximately 1 crash every 2.5 h (10 per day) ● ⅓ of servers is affected by correctable DRAM errors, on average per year (1 error per server every 2.5 h) ● with ECC, only 1.3% of all machines ever experience uncorrectable memory errors per year Hardware
  • 34. ● Monitors servers’ configuration, activity, environmental, and error data ● Individual machine diagnostics ● Stability of new system software versions ● Suggest repairs action Google System Health
  • 35. Study case: web search Web size? Nobody knows it. Classification? Using PageRank. QPS? Not possible to establish a priori Logical view of a web index
  • 36. Study case: web search - 2 No energy proportionality Hour
  • 37. CPU - energy proportionality VFS solution Trade-off Performance vs. Power consumption
  • 38. A real life example
  • 39. Benchmark for Enterprise applications 16 x DELL M1000e 14 x IBM Blade Center Model 16 x HP C7000 What are we going to test?
  • 40. SPECpower_ssj2008 description How it’s composed ? ● New Order (30.3%) ● Payment (30.3%) ● Order Status (3.0%) ● Delivery (3.0%) ● Stock Level (3.0%) ● Customer Report (30.3%) How does it work ?
  • 42. Benchmark results - 2 Higher is better
  • 43. Conclusions Internet grows tirelessly! User side ● Services ● Price ● Latency ● Availability WSC side ● Hardware ● Costs ● Performance ● Reliability and fault tolerance
  • 44. References [1] Barroso, Clidaras, Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool Publishers, 2013 [2]http://perspectives.mvdirona.com/2008/11/cost-of-power-in-large- scale-data-centers/, James Hamilton, AWS Team Thank you for listening !