SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice1
Phil Day, HP Consulting
8th November 2010
22
Small vs Large Clusters
Small Production Clusters and
Proof of Concept
– Build and run by a few skilful
people
– Can be a natural extension
to conventional IT
– You know the servers by
name
Large Production Clusters
– Build and run by pioneers
– Large development staff
– Major Hadoop contributors
– Understand the problems of
scale
Images: Creative Commons 2.0 – Attribution Andrew Morrell (Flickr )
33
– Have, or want to start with, a small PoC (10’s of nodes)
– Want to quickly scale to large cluster (100’s of nodes)
– Want the scale of large clusters, but with the build and operational
model of a small one
– Want to run the cluster rather than build and develop it
– Need to integrate it with existing systems
Large Scale Early Adopters
Unfortunately not all things in life scale as well as Hadoop
Design – The Technology Challenge
Build – The Engineering Challenge
Transfer to Operations - The Service Management Challenge
44
Design – The Technology Challenge
Selecting all the right bits
Server Selection
– Core Nodes: Resilient, Big Memory, RAID
– Data Nodes: Not resilient, no RAID or hot swap, basic iLO
– Trade off Disks vs Cores vs Memory to match target load
– Need to consider disc allocation policy
– Network redundancy is useful to avoid rack switch failures
– Edge Nodes (Data ingress/egress & Mgmt)
– Higher spec data nodes
– Help provide the “appliance” view of the cluster
– Have Hadoop installed but don’t run as part of the cluster.
– Network Selection
– Dual 1Gb from data nodes to rack switches
– 10Gb from rack switches to core, and from Edge nodes
55
Build – The Engineering Challenge
Do you realise how many cardboard boxes that is ?
Building at the scale of 500+ servers has its own set of problems
• Space and Environment
• Consistency of Build
• Failures during the Build
• Deployment time and the cost of rework
Two things we found very helpful:
Factory Integration Services
Cluster Management Utility
66
Build – HP Factory Integration Services
Reducing risk and time
• Many years experience of building large clusters
• Site inspection
• Build, Configure, Soak Test
• Diagnose and fix DoAs
• Rack and Label
• Asset tagging
• Custom build and set-up
• Pack and Ship
• On-Site build and integration
www.hp.com/go/factoryexpress
Complex solutions ...
... Made simple
77
Build – HP Cluster Management Utility
Rack aware deployment and monitoring
• Proven cluster deployment and management tool
• 11 Years of experience
• Proven with clusters of 3500+ nodes
• Deployment
• Network and power load aware deployment
• Easily extensible
• Kickstart integration
• Monitoring
• Scalable non intrusive monitoring
• Collectl integration
• Administration
• Command Line or GUI
• Cluster wide configuration
www.hp.com/go/cmu
88
CMU Dashboard
99
Cluster Performance over time
Disk (read)
CPU
Disk (write)
Network
Map
Red
05:00
10:00
15:00
1010
Operate – the organisational challenge
How do we know when its working ?
Clusters are not just large numbers of servers
• At scale it may never be 100% up (like a network)
.... but it can be 100% down (like a server)
• Need to think more in terms of “How healthy is it ?”
• Core nodes are important
• Data nodes much less so – unless they fail in patterns
• Edge nodes – somewhere in between
• Look at HDFS health for replication counts
• Nagios & ganglia
• Collectl / CMU to visualise the cluster
1111
Summary
Key considerations when building a large cluster
• Use a pilot system to establish your server configuration
• Stand on the shoulders of the Pioneers
• Build and test in the factory if you can
• Consistency in the build and configuration is vital
• Cherish the NameNode, protect the Edge Nodes, and develop the
right level of indifference to the Data Nodes
• Practice the key recovery cases
• Match training and support to the service expectations
And remember not all things in life scale as well as Hadoop
12
Questions ?

Weitere ähnliche Inhalte

Was ist angesagt?

(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deploymentBIOVIA
 
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUsDavid Klee
 
Nutanix vdi workshop presentation
Nutanix vdi workshop presentationNutanix vdi workshop presentation
Nutanix vdi workshop presentationHe Hariyadi
 
My personal journey through the World of Open Source! How What Was Old Beco...
My personal journey through  the World of Open Source!  How What Was Old Beco...My personal journey through  the World of Open Source!  How What Was Old Beco...
My personal journey through the World of Open Source! How What Was Old Beco...Ceph Community
 
Scale up is history! is scale out the future for storage
Scale up is history!  is scale out the future for storageScale up is history!  is scale out the future for storage
Scale up is history! is scale out the future for storageStarWind Software
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed_Hat_Storage
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaKamesh Pemmaraju
 
Hyper Converged Systems
Hyper Converged Systems Hyper Converged Systems
Hyper Converged Systems Megan Salley
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015cmilsted
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudICT-Partners
 
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...marketingunitrends
 
Red Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for ContainersRed Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for ContainersRed_Hat_Storage
 
SUSE Enterprise Storage
SUSE Enterprise StorageSUSE Enterprise Storage
SUSE Enterprise StorageSUSE
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red_Hat_Storage
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
 
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System Cliff Kinard
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
Nutanix - Expert Session - Metro Availability
Nutanix -  Expert Session - Metro AvailabilityNutanix -  Expert Session - Metro Availability
Nutanix - Expert Session - Metro AvailabilityChristian Johannsen
 

Was ist angesagt? (20)

(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment(ATS4-PLAT06) Considerations for sizing and deployment
(ATS4-PLAT06) Considerations for sizing and deployment
 
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
 
Nutanix vdi workshop presentation
Nutanix vdi workshop presentationNutanix vdi workshop presentation
Nutanix vdi workshop presentation
 
My personal journey through the World of Open Source! How What Was Old Beco...
My personal journey through  the World of Open Source!  How What Was Old Beco...My personal journey through  the World of Open Source!  How What Was Old Beco...
My personal journey through the World of Open Source! How What Was Old Beco...
 
Scale up is history! is scale out the future for storage
Scale up is history!  is scale out the future for storageScale up is history!  is scale out the future for storage
Scale up is history! is scale out the future for storage
 
MyCloud for $100k
MyCloud for $100kMyCloud for $100k
MyCloud for $100k
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Hyper Converged Systems
Hyper Converged Systems Hyper Converged Systems
Hyper Converged Systems
 
Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015Red hat on_power-ibm _lop_day_2015
Red hat on_power-ibm _lop_day_2015
 
Azure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloudAzure en Nutanix: your journey to the hybrid cloud
Azure en Nutanix: your journey to the hybrid cloud
 
HPC Advisory Council
HPC Advisory CouncilHPC Advisory Council
HPC Advisory Council
 
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
 
Red Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for ContainersRed Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for Containers
 
SUSE Enterprise Storage
SUSE Enterprise StorageSUSE Enterprise Storage
SUSE Enterprise Storage
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System
Introducing Affordable HPC or HPC for the Masses - IBM NeXtScale System
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Nutanix - Expert Session - Metro Availability
Nutanix -  Expert Session - Metro AvailabilityNutanix -  Expert Session - Metro Availability
Nutanix - Expert Session - Metro Availability
 

Andere mochten auch

Digital Pebble Behemoth
Digital Pebble BehemothDigital Pebble Behemoth
Digital Pebble BehemothSteve Loughran
 
2014 01-02-patching-workflow
2014 01-02-patching-workflow2014 01-02-patching-workflow
2014 01-02-patching-workflowSteve Loughran
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduceSteve Loughran
 
Hadoop And Universities
Hadoop And UniversitiesHadoop And Universities
Hadoop And UniversitiesSteve Loughran
 
High availability hadoop november 2010
High availability hadoop   november 2010High availability hadoop   november 2010
High availability hadoop november 2010Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
High Availability Hadoop
High Availability HadoopHigh Availability Hadoop
High Availability HadoopSteve Loughran
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentreSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 

Andere mochten auch (19)

Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
Community Engagement
Community EngagementCommunity Engagement
Community Engagement
 
Digital Pebble Behemoth
Digital Pebble BehemothDigital Pebble Behemoth
Digital Pebble Behemoth
 
2014 01-02-patching-workflow
2014 01-02-patching-workflow2014 01-02-patching-workflow
2014 01-02-patching-workflow
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Datamining Location
Datamining LocationDatamining Location
Datamining Location
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
 
Hadoop And Universities
Hadoop And UniversitiesHadoop And Universities
Hadoop And Universities
 
HDFS Issues
HDFS IssuesHDFS Issues
HDFS Issues
 
High availability hadoop november 2010
High availability hadoop   november 2010High availability hadoop   november 2010
High availability hadoop november 2010
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
High Availability Hadoop
High Availability HadoopHigh Availability Hadoop
High Availability Hadoop
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentre
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
HDFS
HDFSHDFS
HDFS
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 

Ähnlich wie Lessons from building large clusters

How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Whd master deck_final
Whd master deck_final Whd master deck_final
Whd master deck_final Juergen Domnik
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfInSync2011
 
Best Practices for Deploying Enterprise Applications on UNIX
Best Practices for Deploying Enterprise Applications on UNIXBest Practices for Deploying Enterprise Applications on UNIX
Best Practices for Deploying Enterprise Applications on UNIXNoel McKeown
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSamir Ibradzic
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSusan Wu
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualizationIvan Donev
 
How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.Steve Hoffman
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltStack
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructurexKinAnx
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructuresolarisyourep
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectPeak Hosting
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...Altair
 

Ähnlich wie Lessons from building large clusters (20)

How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Whd master deck_final
Whd master deck_final Whd master deck_final
Whd master deck_final
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
 
Best Practices for Deploying Enterprise Applications on UNIX
Best Practices for Deploying Enterprise Applications on UNIXBest Practices for Deploying Enterprise Applications on UNIX
Best Practices for Deploying Enterprise Applications on UNIX
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Sql server consolidation and virtualization
Sql server consolidation and virtualizationSql server consolidation and virtualization
Sql server consolidation and virtualization
 
How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.How Open Source is Transforming the Internet. Again.
How Open Source is Transforming the Internet. Again.
 
SaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google ScaleSaltConf14 - Brendan Burns, Google - Management at Google Scale
SaltConf14 - Brendan Burns, Google - Management at Google Scale
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
How to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data ProjectHow to Choose a Host for a Big Data Project
How to Choose a Host for a Big Data Project
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
 

Mehr von Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Steve Loughran
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-statusSteve Loughran
 

Mehr von Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 

Lessons from building large clusters

  • 1. ©2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice1 Phil Day, HP Consulting 8th November 2010
  • 2. 22 Small vs Large Clusters Small Production Clusters and Proof of Concept – Build and run by a few skilful people – Can be a natural extension to conventional IT – You know the servers by name Large Production Clusters – Build and run by pioneers – Large development staff – Major Hadoop contributors – Understand the problems of scale Images: Creative Commons 2.0 – Attribution Andrew Morrell (Flickr )
  • 3. 33 – Have, or want to start with, a small PoC (10’s of nodes) – Want to quickly scale to large cluster (100’s of nodes) – Want the scale of large clusters, but with the build and operational model of a small one – Want to run the cluster rather than build and develop it – Need to integrate it with existing systems Large Scale Early Adopters Unfortunately not all things in life scale as well as Hadoop Design – The Technology Challenge Build – The Engineering Challenge Transfer to Operations - The Service Management Challenge
  • 4. 44 Design – The Technology Challenge Selecting all the right bits Server Selection – Core Nodes: Resilient, Big Memory, RAID – Data Nodes: Not resilient, no RAID or hot swap, basic iLO – Trade off Disks vs Cores vs Memory to match target load – Need to consider disc allocation policy – Network redundancy is useful to avoid rack switch failures – Edge Nodes (Data ingress/egress & Mgmt) – Higher spec data nodes – Help provide the “appliance” view of the cluster – Have Hadoop installed but don’t run as part of the cluster. – Network Selection – Dual 1Gb from data nodes to rack switches – 10Gb from rack switches to core, and from Edge nodes
  • 5. 55 Build – The Engineering Challenge Do you realise how many cardboard boxes that is ? Building at the scale of 500+ servers has its own set of problems • Space and Environment • Consistency of Build • Failures during the Build • Deployment time and the cost of rework Two things we found very helpful: Factory Integration Services Cluster Management Utility
  • 6. 66 Build – HP Factory Integration Services Reducing risk and time • Many years experience of building large clusters • Site inspection • Build, Configure, Soak Test • Diagnose and fix DoAs • Rack and Label • Asset tagging • Custom build and set-up • Pack and Ship • On-Site build and integration www.hp.com/go/factoryexpress Complex solutions ... ... Made simple
  • 7. 77 Build – HP Cluster Management Utility Rack aware deployment and monitoring • Proven cluster deployment and management tool • 11 Years of experience • Proven with clusters of 3500+ nodes • Deployment • Network and power load aware deployment • Easily extensible • Kickstart integration • Monitoring • Scalable non intrusive monitoring • Collectl integration • Administration • Command Line or GUI • Cluster wide configuration www.hp.com/go/cmu
  • 9. 99 Cluster Performance over time Disk (read) CPU Disk (write) Network Map Red 05:00 10:00 15:00
  • 10. 1010 Operate – the organisational challenge How do we know when its working ? Clusters are not just large numbers of servers • At scale it may never be 100% up (like a network) .... but it can be 100% down (like a server) • Need to think more in terms of “How healthy is it ?” • Core nodes are important • Data nodes much less so – unless they fail in patterns • Edge nodes – somewhere in between • Look at HDFS health for replication counts • Nagios & ganglia • Collectl / CMU to visualise the cluster
  • 11. 1111 Summary Key considerations when building a large cluster • Use a pilot system to establish your server configuration • Stand on the shoulders of the Pioneers • Build and test in the factory if you can • Consistency in the build and configuration is vital • Cherish the NameNode, protect the Edge Nodes, and develop the right level of indifference to the Data Nodes • Practice the key recovery cases • Match training and support to the service expectations And remember not all things in life scale as well as Hadoop