SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Apache Ambari
Hadoop Cluster Manifest/Blueprint
Sumit Mohanty
Member of Technical Staff
Hortonworks
Agenda
• Cluster Manifest
• Scenarios
• Cluster Blueprint
• Using Cluster Manifest
• What’s next?
Hadoop Cluster Manifest
• Declarative representation of a Hadoop Cluster
– Stack Definition
– Configuration
– Host Details
– Component Mapping
• A common spec. across tools/services
• Targets
– Package Author, Hadoop Admins, and System Admins
Cluster Manifest:
Package Definition
• Package metadata
• Repository details
• Constituent services and their components
• Service specific metadata
• Configurable parameters
Cluster Manifest:
Package Definition
{
"schemaVersion:" : "1”,
"version" : "1.3.0”,
"author" : "Hortonworks”,
"created" : "03-31-2013”,
"manifestId" : "GUID",
"stackVersion" : "1.3.0”, "stackName" : "HDP",
"context" : […],
"packages" : {
"type" : "rpm",
"osSpecificPackages" : […]
},
"services" : [
{
"name" : "HDFS",
"components" : [
{
"name" : "NAMENODE",
"category" : "MASTER",
…
},
{
"name" : "DATANODE", …
],
"configurations" : [
{
"type":"core-site.xml",
"properties" : [
{
"propertyName" : "fs.trash.interval",
"defaultValue" : "360",
"propertyDescription" : "..."
},
…
],
"isManageable": "true",
"isRequired": "true",
"packages": […],
"serviceContext" : […]
}
}
Cluster Manifest:
Package Configuration
• Configurable parameters and values
– Non-default
– Organization, environment, instance specific
• Service or component specific values
{
"schemaVersion:" : "1", …
"context" : [
{ "name" : "targetStackVersion", "value" : "1.3.0" },
],
"deployedServices" : ["HDFS”, … ],
"configuration" : [
{
"type":"core-site.xml",
"properties" : [
{ "name" : "fs.trash.interval", "value" : "300" },
...
]
},
…
"configOverrides" : [
/* delta changes on the top level changes */
{
"type" : "SERVICE”, "name" : "HDFS",
"configuration" : [
{
"type":"core-site.xml",
"properties" : [
{ "name" : "fs.trash.interval", "value" : "480" },
...
},
{
"type" : "COMPONENT"
"name" : "JOBTRACKER",
...
}
Cluster Manifest:
Host List
• List of hosts
– Can be fully specified
– Or, can be a set of requirements
– Or, can even be non-existent
{
"schemaVersion:" : "1", …
"context" : […],
"hostGroups" : [
{
"name" : "masterHosts",
"members" : {
"count" : "1",
"hosts" : [
{ "FQDN" : "host1.domain1.com", "ip" : "" }
]
},
"properties" : […]
},
{
"name" : ”slaveHosts",
"members" : {…},
"properties" : […]
},
{
"name" : "clientHosts",
"members" : {…},
"properties" : [
{ "name" : "host_type", "value" : "High-CPU Medium" }
]
},
...
]
}
Cluster Manifest:
Host Component Mapping
• A mapping of components to hosts
– Simple component mapping to named hosts
– A set of constraints that can be used to find best
match (e.g. evaluate against host properties)
• System resources
– users, groups, ports, etc.
• Host specific configuration
– Non-homogeneous cluster
Cluster Manifest:
Host Component Mapping
{
"schemaVersion:" : "1”, …
"context" : […],
"hostResourceMapping" : [
{
"hosts" : [
{
"predicate" : "name=*"
}
],
"systemResources" : {
"hadoopGroup" : "hadoop",
"groups" : [
{
"name" : "hadoop",
...
],
"users" : [
{
"groups" : [
"hadoop"
],
"name" : "hdfs",
"type" : "LOCAL”
],
"ports" : […]
...
],
"hostComponentMapping" : [
{
"hosts" : [
{
"predicate" : "name=masterhosts1"
"configOverrides" : [
{
"type":"core-site.xml",
"properties" : [
{ "name" : "fs.trash.interval", "value" : "480" },
...
],
"components" : [
"NAMENODE",
"JOBTRACKER",
...
]
},
...
}
Scenarios
• Define cluster templates
– and, host specific templates
• On demand cluster creation
– Cluster extension (e.g. add Datanodes)
• Export cluster manifest
• A uniform “language” across cluster managers
and environments
Cluster Blueprint
• Blueprint is manifest with “holes”
– Typically
• Hostnames
• Config parameters that use hostname
– But, any config params that a Hadoop admin
deems necessary to be parameterized
• Blueprint = Manifest + Parameter Values
Using Cluster Manifest
What’s Next?
• Apache Ambari JIRA 1783, is tracking this
project
– https://issues.apache.org/jira/browse/AMBARI-
1783
– Comments and suggestions, welcome
• In next releases, we will enhance Ambari to
add support for manifest and blueprints

Weitere ähnliche Inhalte

Mehr von Hortonworks

Mehr von Hortonworks (20)

Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 
4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data4 Essential Steps for Managing Sensitive Data
4 Essential Steps for Managing Sensitive Data
 
5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data5 Steps to Create a Company Culture that Embraces the Power of Data
5 Steps to Create a Company Culture that Embraces the Power of Data
 
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake DebateExploring the Heated-and Completely Unnecessary- Data Lake Debate
Exploring the Heated-and Completely Unnecessary- Data Lake Debate
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Apache Ambari BOF - Blueprint - Hadoop Summit 2013

  • 1. Apache Ambari Hadoop Cluster Manifest/Blueprint Sumit Mohanty Member of Technical Staff Hortonworks
  • 2. Agenda • Cluster Manifest • Scenarios • Cluster Blueprint • Using Cluster Manifest • What’s next?
  • 3. Hadoop Cluster Manifest • Declarative representation of a Hadoop Cluster – Stack Definition – Configuration – Host Details – Component Mapping • A common spec. across tools/services • Targets – Package Author, Hadoop Admins, and System Admins
  • 4. Cluster Manifest: Package Definition • Package metadata • Repository details • Constituent services and their components • Service specific metadata • Configurable parameters
  • 5. Cluster Manifest: Package Definition { "schemaVersion:" : "1”, "version" : "1.3.0”, "author" : "Hortonworks”, "created" : "03-31-2013”, "manifestId" : "GUID", "stackVersion" : "1.3.0”, "stackName" : "HDP", "context" : […], "packages" : { "type" : "rpm", "osSpecificPackages" : […] }, "services" : [ { "name" : "HDFS", "components" : [ { "name" : "NAMENODE", "category" : "MASTER", … }, { "name" : "DATANODE", … ], "configurations" : [ { "type":"core-site.xml", "properties" : [ { "propertyName" : "fs.trash.interval", "defaultValue" : "360", "propertyDescription" : "..." }, … ], "isManageable": "true", "isRequired": "true", "packages": […], "serviceContext" : […] } }
  • 6. Cluster Manifest: Package Configuration • Configurable parameters and values – Non-default – Organization, environment, instance specific • Service or component specific values { "schemaVersion:" : "1", … "context" : [ { "name" : "targetStackVersion", "value" : "1.3.0" }, ], "deployedServices" : ["HDFS”, … ], "configuration" : [ { "type":"core-site.xml", "properties" : [ { "name" : "fs.trash.interval", "value" : "300" }, ... ] }, … "configOverrides" : [ /* delta changes on the top level changes */ { "type" : "SERVICE”, "name" : "HDFS", "configuration" : [ { "type":"core-site.xml", "properties" : [ { "name" : "fs.trash.interval", "value" : "480" }, ... }, { "type" : "COMPONENT" "name" : "JOBTRACKER", ... }
  • 7. Cluster Manifest: Host List • List of hosts – Can be fully specified – Or, can be a set of requirements – Or, can even be non-existent { "schemaVersion:" : "1", … "context" : […], "hostGroups" : [ { "name" : "masterHosts", "members" : { "count" : "1", "hosts" : [ { "FQDN" : "host1.domain1.com", "ip" : "" } ] }, "properties" : […] }, { "name" : ”slaveHosts", "members" : {…}, "properties" : […] }, { "name" : "clientHosts", "members" : {…}, "properties" : [ { "name" : "host_type", "value" : "High-CPU Medium" } ] }, ... ] }
  • 8. Cluster Manifest: Host Component Mapping • A mapping of components to hosts – Simple component mapping to named hosts – A set of constraints that can be used to find best match (e.g. evaluate against host properties) • System resources – users, groups, ports, etc. • Host specific configuration – Non-homogeneous cluster
  • 9. Cluster Manifest: Host Component Mapping { "schemaVersion:" : "1”, … "context" : […], "hostResourceMapping" : [ { "hosts" : [ { "predicate" : "name=*" } ], "systemResources" : { "hadoopGroup" : "hadoop", "groups" : [ { "name" : "hadoop", ... ], "users" : [ { "groups" : [ "hadoop" ], "name" : "hdfs", "type" : "LOCAL” ], "ports" : […] ... ], "hostComponentMapping" : [ { "hosts" : [ { "predicate" : "name=masterhosts1" "configOverrides" : [ { "type":"core-site.xml", "properties" : [ { "name" : "fs.trash.interval", "value" : "480" }, ... ], "components" : [ "NAMENODE", "JOBTRACKER", ... ] }, ... }
  • 10. Scenarios • Define cluster templates – and, host specific templates • On demand cluster creation – Cluster extension (e.g. add Datanodes) • Export cluster manifest • A uniform “language” across cluster managers and environments
  • 11. Cluster Blueprint • Blueprint is manifest with “holes” – Typically • Hostnames • Config parameters that use hostname – But, any config params that a Hadoop admin deems necessary to be parameterized • Blueprint = Manifest + Parameter Values
  • 13. What’s Next? • Apache Ambari JIRA 1783, is tracking this project – https://issues.apache.org/jira/browse/AMBARI- 1783 – Comments and suggestions, welcome • In next releases, we will enhance Ambari to add support for manifest and blueprints