SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Distro-independent Hadoop cluster management
Denis Shestakov
Hadoop engineer
Denis.Shestakov@BrightComputing.Com
Hadoop cluster dream …
• Fast deployment
• Easy maintenance and management
• Re-use of existing IT personnel’s expertise
• Cost-efficiency
Outline
• Overview of Hadoop cluster operations
• Hadoop deployment/maintenance
• Architecture for distro-agnostic Hadoop cluster manager
• Bright Cluster Manager for Apache Hadoop
• Open issues
Hadoop cluster operations
Hadoop cluster operations
• (hardware selection/network design/storage considerations/…)
• Deployment
• Provisioning
• Hadoop distro & component selection
• Initial setup and configuration
• Management
• Monitoring
• Health-checking
• Optimization
Hadoop cluster operations
• Deployment
• Provisioning
• Hadoop distro & component selection
• Initial setup and configuration
• Management
• Monitoring
• Health-checking
• Optimization
Cluster
Hadoop
stack
Hadoop deployment/maintenance
Hadoop deployment
• Includes cluster deployment and Hadoop-stack deployment
• Without proper infrastructure setup, Hadoop will not run
• Proper Hadoop setup relies on network/OS/filesystem tuning
• Knowledge and expertise in both are rare
Hadoop deployment
Challenge for cluster admins
• E.g., configuring Hadoop components
• Understanding of HDFS/MapReduce/YARN/HBase/…
essential
• Numerous configuration settings
• Hadoop distribution choice
• Different Hadoop modes
• HDFS & HDFS HA, YARN & YARN HA, …
Hadoop deployment
Challenge for Hadoop admins
• Configuration is still hard
• Numerous configuration settings
• Deprecated properties
• MRv1 to YARN migration
• OS/network tuning essential
Hadoop deployment/maintenance
• Hadoop validation by running test/benchmarking jobs
• Monitoring and health checking on both OS and Hadoop stack levels
• Upgrades
Hadoop deployment/maintenance
• Provisioning workflow
• Automation tools (chef, puppet, scripts, …)
• Monitoring tools
• Hadoop cluster manager
Hadoop deployment/maintenance
Provisioning workflow
Automation tools (chef, puppet, scripts, …)
Monitoring tools
Hadoop cluster manager
Unified tool?
Architecture for distro-independent
Hadoop manager
Architectural considerations
• Pilot/development/production Hadoop cluster
• Choose from different Hadoop distros/versions
• Several types of nodes:
• Master nodes
• Worker nodes
• Gateway nodes
• Worker nodes have similar OS/software stack
• Cluster growth expected: more workers added
• Easy node replacement
• Heterogeneous hardware
• Grouping nodes by their hardware
Architecture
Cluster
head node
Node-A
Node-C
Node-B
Cluster
Management
Interface
Third-Party
Applications
Cluster management daemon
Architecture
• Cluster management daemon:
• Low overhead
• All nodes run the same daemon
• Assigned roles define which tasks cluster management daemon
can perform
Architecture
• Role can be assigned to a node to do a task
• E.g., a provisioning role makes a node to spread software images
onto other nodes
• HDFS NameNode role makes a node to store HDFS metadata and
control nodes with HDFS DataNode roles
• Assignment of HDFS DataNode role to a node: adding and starting
DataNode service
Bright Cluster Manager for
Apache Hadoop
Architecture
Bright Cluster
CMDaemon
head node
node001
node003
node002
JSON+SSL
JSON
API+SSL
Cluster
Management
GUI
Cluster
Management
Shell
Web-Based
User Portal
Third-Party
Applications
Interfaces
Graphical User Interface (GUI)
 Offers administrator full cluster control
 Standalone desktop application
 Manages multiple clusters simultaneously
 Runs natively on Linux, Windows and OS X
Cluster Management Shell (CMSH)
 All GUI functionality also available through
Cluster Management Shell
 Interactive and scriptable in batch mode
Cluster
Management
GUI
Cluster
Management
Shell
Managing Clusters
• Bright Cluster Manager manages several types of clusters
• HPC, private cloud (OpenStack), …
• Hadoop
• Cluster of any type:
• Deployed
• Configured
• Provisioned
• Managed
• Monitored
• Health-checked
Hadoop support
• Choice of distributions
• Management/monitoring from one place
• CLI and GUI: cmsh, cmgui
• Hadoop stack support
• Including support for Spark (Spark Standalone mode since release
7.1)
• Flexible configuration
Hadoop configuration
Hadoop configuration through roles
• Nodes configured to run certain Hadoop related services by assigning
roles
• 15 Hadoop and 3 Spark roles:
E.g., HDFS DataNode, MRv1 JobTracker, YARN ResourceManager,
HBaseMaster, Zookeeper, SparkWorker, …
• Assigning/unassigning role will:
• Write out corresponding configuration files based on configurable role
parameters
• Start/stop/monitor relevant services
• Hadoop configuration settings changed from inside Bright
Bright’s Hadoop Cluster Management
Bright Cluster Manager 7.1 for Apache Hadoop
• Just released
• Single-pane-of-glass for managing both physical cluster and Hadoop
• Easy installation of Hadoop
• Apache Hadoop 1.2.1, 2.6.0 (on Bright DVD)
• Cloudera CDH 4.6.x, 4.7.x, 5.2.x, 5.3.x (5.4.x soon)
• HortonWorks HDP 1.3.x, 2.1.x, 2.2.x
• Pivotal HD 2.1.0 (3.0.0 soon)
• Configuration, monitoring and healthchecking of Hadoop instances
• Graphical UI, command-line interface and API access
Key Features
• Multiple Hadoop cluster instances on same cluster
• Choice of Hadoop distributions/versions
• Flexible Hadoop configuration controlled through GUI and CLI
• Hadoop configuration groups address ‘cluster heterogeneity’ problem
• JSON/Python API
• Scriptable deployment/configuration operations
• Alternative filesystems to HDFS (e.g. Hadoop on Lustre)
Open issues &
conclusion
Open issues
Building and running cost-efficient Hadoop clusters
• Hard to optimize
• Workload-specific
• Tuning on all levels: OS/network/Hadoop
• Bright’s architecture
• All cluster/Hadoop operational data aggregated in one place
• Flexible configuration of hardware/software components
Conclusion
• Architecture of distro-agnostic Hadoop cluster manager
• Bright provides tried & tested implementation of this architecture
• Hundreds of clusters are being managed using Bright Cluster Manager
• Complete solution for setup, management & monitoring of Hadoop clusters
• Single pane of glass for cluster & Hadoop stack
• Well suited for ‘multi-purpose’ clusters: e.g., supporting both HPC
computations and Hadoop jobs
Come to our booth
• Meet with Bright guys
• See demo
• Tell us about your cluster
Credits
Questions?
BigDataTeam@brightcomputing.com

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overviewSiva Pandeti
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventGruter
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed ComputingFederico Cargnelutti
 
Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Milind Bhandarkar
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem OverviewGerrit van Vuuren
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Syncsort et le retour d'expĂŠrience ComScore
Syncsort et le retour d'expĂŠrience ComScoreSyncsort et le retour d'expĂŠrience ComScore
Syncsort et le retour d'expĂŠrience ComScoreModern Data Stack France
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystemtcloudcomputing-tw
 

Was ist angesagt? (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Syncsort et le retour d'expĂŠrience ComScore
Syncsort et le retour d'expĂŠrience ComScoreSyncsort et le retour d'expĂŠrience ComScore
Syncsort et le retour d'expĂŠrience ComScore
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 

Andere mochten auch

Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationDenis Shestakov
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawlingDenis Shestakov
 
Current challenges in web crawling
Current challenges in web crawlingCurrent challenges in web crawling
Current challenges in web crawlingDenis Shestakov
 
Deep Web: Databases on the Web
Deep Web: Databases on the WebDeep Web: Databases on the Web
Deep Web: Databases on the WebDenis Shestakov
 
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...Denis Shestakov
 
On building a search interface discovery system
On building a search interface discovery systemOn building a search interface discovery system
On building a search interface discovery systemDenis Shestakov
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
Sampling national deep Web
Sampling national deep WebSampling national deep Web
Sampling national deep WebDenis Shestakov
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Yahoo Developer Network
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithmsMahdi Esmailoghli
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...Sonatype
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?Bernard Marr
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Andere mochten auch (17)

Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
 
Intelligent web crawling
Intelligent web crawlingIntelligent web crawling
Intelligent web crawling
 
Current challenges in web crawling
Current challenges in web crawlingCurrent challenges in web crawling
Current challenges in web crawling
 
Deep Web: Databases on the Web
Deep Web: Databases on the WebDeep Web: Databases on the Web
Deep Web: Databases on the Web
 
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
Lectio Praecursoria: Search Interfaces on the Web: Querying and Characterizin...
 
On building a search interface discovery system
On building a search interface discovery systemOn building a search interface discovery system
On building a search interface discovery system
 
Mild reminder
Mild reminderMild reminder
Mild reminder
 
Introducing Big Data
Introducing Big DataIntroducing Big Data
Introducing Big Data
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Sampling national deep Web
Sampling national deep WebSampling national deep Web
Sampling national deep Web
 
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Ähnlich wie Distro-independent Hadoop cluster management

How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeIan Lumb
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecturesaipriyacoool
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Presentation
PresentationPresentation
Presentationch samaram
 
Getting started big data
Getting started big dataGetting started big data
Getting started big dataKibrom Gebrehiwot
 

Ähnlich wie Distro-independent Hadoop cluster management (20)

How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Unit 5
Unit  5Unit  5
Unit 5
 
Presentation
PresentationPresentation
Presentation
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 

Mehr von DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash CourseDataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

KĂźrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 

KĂźrzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Distro-independent Hadoop cluster management

  • 1. Distro-independent Hadoop cluster management Denis Shestakov Hadoop engineer Denis.Shestakov@BrightComputing.Com
  • 2. Hadoop cluster dream … • Fast deployment • Easy maintenance and management • Re-use of existing IT personnel’s expertise • Cost-efficiency
  • 3. Outline • Overview of Hadoop cluster operations • Hadoop deployment/maintenance • Architecture for distro-agnostic Hadoop cluster manager • Bright Cluster Manager for Apache Hadoop • Open issues
  • 5. Hadoop cluster operations • (hardware selection/network design/storage considerations/…) • Deployment • Provisioning • Hadoop distro & component selection • Initial setup and configuration • Management • Monitoring • Health-checking • Optimization
  • 6. Hadoop cluster operations • Deployment • Provisioning • Hadoop distro & component selection • Initial setup and configuration • Management • Monitoring • Health-checking • Optimization Cluster Hadoop stack
  • 8. Hadoop deployment • Includes cluster deployment and Hadoop-stack deployment • Without proper infrastructure setup, Hadoop will not run • Proper Hadoop setup relies on network/OS/filesystem tuning • Knowledge and expertise in both are rare
  • 9. Hadoop deployment Challenge for cluster admins • E.g., configuring Hadoop components • Understanding of HDFS/MapReduce/YARN/HBase/… essential • Numerous configuration settings • Hadoop distribution choice • Different Hadoop modes • HDFS & HDFS HA, YARN & YARN HA, …
  • 10. Hadoop deployment Challenge for Hadoop admins • Configuration is still hard • Numerous configuration settings • Deprecated properties • MRv1 to YARN migration • OS/network tuning essential
  • 11. Hadoop deployment/maintenance • Hadoop validation by running test/benchmarking jobs • Monitoring and health checking on both OS and Hadoop stack levels • Upgrades
  • 12. Hadoop deployment/maintenance • Provisioning workflow • Automation tools (chef, puppet, scripts, …) • Monitoring tools • Hadoop cluster manager
  • 13. Hadoop deployment/maintenance Provisioning workflow Automation tools (chef, puppet, scripts, …) Monitoring tools Hadoop cluster manager Unified tool?
  • 15. Architectural considerations • Pilot/development/production Hadoop cluster • Choose from different Hadoop distros/versions • Several types of nodes: • Master nodes • Worker nodes • Gateway nodes • Worker nodes have similar OS/software stack • Cluster growth expected: more workers added • Easy node replacement • Heterogeneous hardware • Grouping nodes by their hardware
  • 17. Architecture • Cluster management daemon: • Low overhead • All nodes run the same daemon • Assigned roles define which tasks cluster management daemon can perform
  • 18. Architecture • Role can be assigned to a node to do a task • E.g., a provisioning role makes a node to spread software images onto other nodes • HDFS NameNode role makes a node to store HDFS metadata and control nodes with HDFS DataNode roles • Assignment of HDFS DataNode role to a node: adding and starting DataNode service
  • 19. Bright Cluster Manager for Apache Hadoop
  • 21. Interfaces Graphical User Interface (GUI)  Offers administrator full cluster control  Standalone desktop application  Manages multiple clusters simultaneously  Runs natively on Linux, Windows and OS X Cluster Management Shell (CMSH)  All GUI functionality also available through Cluster Management Shell  Interactive and scriptable in batch mode Cluster Management GUI Cluster Management Shell
  • 22.
  • 23. Managing Clusters • Bright Cluster Manager manages several types of clusters • HPC, private cloud (OpenStack), … • Hadoop • Cluster of any type: • Deployed • Configured • Provisioned • Managed • Monitored • Health-checked
  • 24. Hadoop support • Choice of distributions • Management/monitoring from one place • CLI and GUI: cmsh, cmgui • Hadoop stack support • Including support for Spark (Spark Standalone mode since release 7.1) • Flexible configuration
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Hadoop configuration Hadoop configuration through roles • Nodes configured to run certain Hadoop related services by assigning roles • 15 Hadoop and 3 Spark roles: E.g., HDFS DataNode, MRv1 JobTracker, YARN ResourceManager, HBaseMaster, Zookeeper, SparkWorker, … • Assigning/unassigning role will: • Write out corresponding configuration files based on configurable role parameters • Start/stop/monitor relevant services • Hadoop configuration settings changed from inside Bright
  • 30.
  • 31. Bright’s Hadoop Cluster Management Bright Cluster Manager 7.1 for Apache Hadoop • Just released • Single-pane-of-glass for managing both physical cluster and Hadoop • Easy installation of Hadoop • Apache Hadoop 1.2.1, 2.6.0 (on Bright DVD) • Cloudera CDH 4.6.x, 4.7.x, 5.2.x, 5.3.x (5.4.x soon) • HortonWorks HDP 1.3.x, 2.1.x, 2.2.x • Pivotal HD 2.1.0 (3.0.0 soon) • Configuration, monitoring and healthchecking of Hadoop instances • Graphical UI, command-line interface and API access
  • 32. Key Features • Multiple Hadoop cluster instances on same cluster • Choice of Hadoop distributions/versions • Flexible Hadoop configuration controlled through GUI and CLI • Hadoop configuration groups address ‘cluster heterogeneity’ problem • JSON/Python API • Scriptable deployment/configuration operations • Alternative filesystems to HDFS (e.g. Hadoop on Lustre)
  • 34. Open issues Building and running cost-efficient Hadoop clusters • Hard to optimize • Workload-specific • Tuning on all levels: OS/network/Hadoop • Bright’s architecture • All cluster/Hadoop operational data aggregated in one place • Flexible configuration of hardware/software components
  • 35. Conclusion • Architecture of distro-agnostic Hadoop cluster manager • Bright provides tried & tested implementation of this architecture • Hundreds of clusters are being managed using Bright Cluster Manager • Complete solution for setup, management & monitoring of Hadoop clusters • Single pane of glass for cluster & Hadoop stack • Well suited for ‘multi-purpose’ clusters: e.g., supporting both HPC computations and Hadoop jobs
  • 36. Come to our booth • Meet with Bright guys • See demo • Tell us about your cluster

Hinweis der Redaktion

  1. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  2. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  3. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  4. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  5. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  6. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  7. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  8. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  9. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  10. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  11. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  12. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  13. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  14. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  15. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  16. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  17. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  18. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  19. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  20. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  21. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  22. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  23. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  24. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  25. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.
  26. Bright Cluster Manager is Image based, which means that A slave node image stored in a directory on the head node which includes a complete Linux file system structure, including directories such as /etc, /var, /usr An unlimited number of images can be created. For example, you can have different images for different types of slaves nodes, or you can clone an image before you start experimenting with it. Software changes for the slave nodes are made inside the image(s) on the head node. For example, you can use an RPM command with the root option, or “chroot” into the image to install or remove RPMs, or you can edit a file using an editor. The provisioning system ensures that changes are propagated to the slave nodes. Because only changes are propagated, its happens as fast as possible and consumes minimal network bandwidth. Nodes always boot over the network (Although this can be disabled) Slave nodes PXE boot into the Node Installer, which identifies the node, configures the BMC (such as IPMI) and detects and enables GPUs, partition disks and creates file systems, if necessary. It then installs or updates the software image, if necessary. and finally, it “Pivots” the root from NFS to the local file system.