SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Page1 © Hortonworks Inc. 2014
Enterprise-Grade Rolling Upgrade for a Live
Hadoop Cluster
Sanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc
Page 1
Page2 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
Agenda
•Introduction
•What is Rolling Upgrade?
•Problem – Several key issues to be addressed
–Wire compatibility and side-by-side installs are not sufficient!!
–Must Address: Data safety, Service degradation and disruption
•Enhancements to various components
–Packaging – side-by-side install
–HDFS, Yarn, Hive, Oozie
Page 2
Page3 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
Hello, my name is Sanjay Radia
•Chief Architect, Founder, Hortonworks
•Part of the Hadoop team at Yahoo! since 2007
–Chief Architect of Hadoop Core at Yahoo!
–Apache Hadoop PMC and Committer
• Prior
–Data center automation, schedulers, virtualization, Java, HA, OSs, File
Systems
– (Startup, Sun Microsystems, Inria …)
–Ph.D., University of Waterloo
Page 3
Page4 © Hortonworks Inc. 2014
HDP Upgrade: Two Upgrade Modes
Stop the Cluster Upgrade
Shutdown services and cluster and then upgrade.
Traditionally this was the only way
Rolling Upgrade
Upgrade cluster and its services while cluster is
actively running jobs and applications
Note: Upgrade time is proportional to # nodes, not data size
Enterprises run critical services and data on a Hadoop cluster.
Need live cluster upgrade that maintains SLAs without degradation
Page5 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
But you can Revert to Prior State
Rollback
Revert bits and state of cluster and its services back to a
checkpoint’d state.
Why? This is an emergency procedure.
Downgrade
Downgrade the service and component to prior version, but
keep any new data and metadata that has been generated
Why? You are not happy with performance, or app compatibility, ….
Page6 © Hortonworks Inc. 2014
But aren’t wire compatibility and
side-by-side installs sufficient for
Rolling upgrades?
Unfortunately No!! Not if you want
• Data safety
• Keep running jobs/apps continue to run correctly
• Maintain SLAs
• Allow downgrade/rollbacks in case of problems
Page 6
Page7 © Hortonworks Inc. 2014
Issues that need to be addressed (1)
• Data safety
• HDFS’s upgrade checkpoint does not work for rolling upgrade
• Service degradation – note every daemon is restarted in rolling fashion
• HDFS write pipeline
• Yarn App masters restart
• Node manager restart
• Hive server is processing client queries – it cannot restart to new version without loss
• Client must not see failures – many components do not have retry
BUT Hadoop deals with failures, it will fix pipelines, restart tasks –
what is the big deal!!
Service degradation will be high because every daemon is restarted
Page8 © Hortonworks Inc. 2014
Issues that need to be addressed (2)
• Maintaining the job submitters context (correctness)
• Yarn tasks get their context from the local node
– In the past the submitters and node’s context were identical
– But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter
- Half of the job could execute with old binaries and the other with the new one!!
• Persistent state
• Backward compatibility for upgrade (or convert)
• Forward compatibility for downgrade (or convert)
• Wire compatibility
• With clients (forward and backward)
• Internally (Between Masters and Slaves or Peers)
– Note: the upgrade is in a rolling fashion
Page9 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page10 © Hortonworks Inc. 2014
Packaging: Side-by-side Installs (1)
• Need side-by-side installs of multiple versions on same node
• Some components are version N, while others are N+1
• For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN)
• HDP’s solution: Use OS-distro standard packaging solution
• Rejected a proprietary packing solution (no lock-in)
• Want to support RU via Ambari and Manually
• Standard packaging solutions like RPMs have useful tools and mechanisms
– Tools to install, uninstall, query, etc
– Manage dependencies automatically
– Admins do not need to learn new tools and formats
• Side benefits for ‘stop-the-world” upgrade:
• Can install the new binaries before the shutdown
Page11 © Hortonworks Inc. 2014
Packaging: Side-by-side installs (2)
• Layout: side-by-side
• /usr/hdp/2.2.0.0/hadoop
• /usr/hdp/2.2.0.0/hive
• /usr/hdp/2.3.0.0/hadoop
• /usr/hdp/2.3.0.0/hive
• Define what is current for each component’s
daemon and clients
• /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop
• /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop
• /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop
• Distro-select helps you manage the version switch
• Our solution: the package name contains the version number:
• E.g hadoop_2_2_0_0 is the RPM package name itself
– Hadoop_2_3_0_0 is different peer package
• Bin commands point to current:
/usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
Page12 © Hortonworks Inc. 2014
Packaging: Side-by-side installs (3)
• distro-select tool to select current binary
• Per-component, Per-daemon
• Maintain stack consistency – that is what QE tested
• Each component refers to its siblings of same stack version
• Each component knows the “hadoop home” of the same stack
– Wrapper bin-scripts set this up
• Config updates can be optionally synchronized with binary upgrade
• Configs can sit in their old location
• But what if the new binary version requires slightly different config?
• Each binary version has its own config pointer
– /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
Page13 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page14 © Hortonworks Inc. 2014
HDFS Enhancements (1)
Data safety
• Since version 2007, HDFS supported an upgrade-checkpoint
• Backups of HDFS not practical – too large
• Protects against HDFS bugs in new version deleting files
• Standard practice to use for ALL upgrade even patch releases
• But this only works for “stop-the-world” full upgrade and does not support downgrade
• Irresponsible to do rolling upgrade without such a mechanism
HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535)
• Markers for rollback
• “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code
• Old scheme had hardlinks but we now delay the deletes
• Added downgrade capability
• Protobuf based fsImage for compatible extensibility
Page15 © Hortonworks Inc. 2014
HDFS Enhancements (2)
Minimize service degradation and retain data safety
• Fast datanode restart (HDFS-5498)
• Write pipeline – every DN will be upgraded and hence many write
pipelines will break and repaired
• Umbrella Jira HDFS-5535
– Repair it to the same DN during RU (avoid replica data copy)
– Retain same number of replicas in pipeline
• Upgrade HA standby and failover (NN HA available for a long time)
Page16 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page17 © Hortonworks Inc. 2014
YARN Enhancements: Minimize Service Degradation
• YARN RM retains app/job queue (2013)
• YARN RM HA (2014)
• Note this retains the queues but ALL jobs are restarted
• Yarn RM can restart while retaining jobs (2015)
Page18 © Hortonworks Inc. 2014
YARN Enhancements: Minimize Service Degradation
• A restarted YARN NodeManager retains existing containers (2015)
• Recall restarting containers will cause serious SLA degradation
Page19 © Hortonworks Inc. 2014
YARN Enhancement: Compatibility
• Versioning of state-stores of RM and NMs
• Compatible evolution of tokens over time
• Wire compatibility between mixed versions of RM
Page20 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page21 © Hortonworks Inc. 2014
Retaining Job/App context
Previously a Job/Apps uses libraries from the local node
• Worked because client-node & compute-nodes had same version
• But during RU, the node manager has multiple versions
• Must use the same version as used by the client when submitting a job
• Solution:
• Framework libraries are now installed in HDFS
• Client-context sent as “distro-version” variable in job config
• Has side benefits
– Frameworks now installed in single node and then uploaded to HDFS
• Note Oozie also enhanced to maintain consistent context
Page22 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page23 © Hortonworks Inc. 2014
Hive Enhancements
• Fast restarts + client-side reconnection
• Hive metastore and Hive client
• Hive-server2: statefull server that submits the client’s query
• Need to keep it running till the old queries complete
• Solution:
• Allow multiple Hive-servers to run, each registered in Zookeeper
• New client requests go to new servers
• Old server completes old queries but does not receive any new ones
– Old-server is removed from Zookeeper
• Side benefits
• HA + Load balancing solution for Hiveserver2
Page24 © Hortonworks Inc. 2014
Automated Rolling Upgrade
Via Ambari
Via Your own cluster management scripts
Page25 © Hortonworks Inc. 2014
HDP Rolling Upgrades Runbook
Pre-requisites
• HA
• Configs
Prepare
• Install bits
• DB backups
• HDFS checkpoint
Rolling
Upgrade
Finalize
Rolling
Downgrade
Rollback
NOT Rolling. Shutdown all
services.
Note: Upgrade time is proportional to # nodes, not data size
Page28 © Hortonworks Inc. 2014
Both Manual and Automated Rolling Upgrade
• Ambari supports fully automated upgrades
• Verifies prerequisites
• Performs HDFS upgrade-checkpoint, prompts for DB backups
• Performs rolling upgrade
• All the components, in the right order
• Smoke tests at each critical stages
• Opportunities for Admin verification at critical stages
• Downgrade if you change your mind
• Have published the runbook for those that do not use Ambari
• You can do it manually or automate your own process
Page29 © Hortonworks Inc. 2014
Runbook: Rolling Upgrade
Ambari has automated
process for Rolling Upgrades
Services are switched over to
new version in rolling fashion
Any components not installed
on cluster are skipped
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Finalize
HDFS, YARN, MR,
Tez, HBase, Pig.
Hive, Phoenix,
Mahout
HDFS
YARN
HBase
Page30 © Hortonworks Inc. 2014
Runbook: Rolling Downgrade
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Downgrade
Finalize
Page31 © Hortonworks Inc. 2014
Summary
• Enterprises run critical services and data on a Hadoop cluster.
• Need a live cluster upgrade without degradation and maintaining SLAs
• We enhanced Hadoop components for enterprise-grade rolling upgrade
• Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, )
• Data safety
– HDFS checkpoints and write-pipelines
• Maintain SLAs – solve a number of service degradation problems
– HDFS write pipelines, Yarn RM, NM state recovery, Hive, …
• Jobs/apps continue to run correctly with the right context
• Allow downgrade/rollbacks in case of problems
• All enhancements truly open source and pushed back to Apache?
• Yes of course – that is how Hortonworks does business …
Page32 © Hortonworks Inc. 2014
Backup slides
Page33 © Hortonworks Inc. 2014
Why didn’t you use alternatives
• Alternatives generally keep one version active, not two
• We need to move some services as a pack (clients)
• We need to support managing confs and binaries together and
separately
• Maybe we could have done it, but it was getting complex …..

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarnBDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Hadoop & devOps : better together
Hadoop & devOps : better togetherHadoop & devOps : better together
Hadoop & devOps : better together
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition Deployment
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
HP Helion OpenStack step by step
HP Helion OpenStack step by stepHP Helion OpenStack step by step
HP Helion OpenStack step by step
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero DowntimeHow to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
HP Helion OpenStack and Professional Services
HP Helion OpenStack and Professional ServicesHP Helion OpenStack and Professional Services
HP Helion OpenStack and Professional Services
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Red Hat Container Strategy
Red Hat Container StrategyRed Hat Container Strategy
Red Hat Container Strategy
 

Ähnlich wie Docker based Hadoop provisioning - anywhere

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
Cloudera, Inc.
 

Ähnlich wie Docker based Hadoop provisioning - anywhere (20)

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop ClusterEnterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Habitat at SRECon
Habitat at SREConHabitat at SRECon
Habitat at SRECon
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Docker based Hadoop provisioning - anywhere

  • 1. Page1 © Hortonworks Inc. 2014 Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster Sanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc Page 1
  • 2. Page2 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential Agenda •Introduction •What is Rolling Upgrade? •Problem – Several key issues to be addressed –Wire compatibility and side-by-side installs are not sufficient!! –Must Address: Data safety, Service degradation and disruption •Enhancements to various components –Packaging – side-by-side install –HDFS, Yarn, Hive, Oozie Page 2
  • 3. Page3 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential Hello, my name is Sanjay Radia •Chief Architect, Founder, Hortonworks •Part of the Hadoop team at Yahoo! since 2007 –Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer • Prior –Data center automation, schedulers, virtualization, Java, HA, OSs, File Systems – (Startup, Sun Microsystems, Inria …) –Ph.D., University of Waterloo Page 3
  • 4. Page4 © Hortonworks Inc. 2014 HDP Upgrade: Two Upgrade Modes Stop the Cluster Upgrade Shutdown services and cluster and then upgrade. Traditionally this was the only way Rolling Upgrade Upgrade cluster and its services while cluster is actively running jobs and applications Note: Upgrade time is proportional to # nodes, not data size Enterprises run critical services and data on a Hadoop cluster. Need live cluster upgrade that maintains SLAs without degradation
  • 5. Page5 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential But you can Revert to Prior State Rollback Revert bits and state of cluster and its services back to a checkpoint’d state. Why? This is an emergency procedure. Downgrade Downgrade the service and component to prior version, but keep any new data and metadata that has been generated Why? You are not happy with performance, or app compatibility, ….
  • 6. Page6 © Hortonworks Inc. 2014 But aren’t wire compatibility and side-by-side installs sufficient for Rolling upgrades? Unfortunately No!! Not if you want • Data safety • Keep running jobs/apps continue to run correctly • Maintain SLAs • Allow downgrade/rollbacks in case of problems Page 6
  • 7. Page7 © Hortonworks Inc. 2014 Issues that need to be addressed (1) • Data safety • HDFS’s upgrade checkpoint does not work for rolling upgrade • Service degradation – note every daemon is restarted in rolling fashion • HDFS write pipeline • Yarn App masters restart • Node manager restart • Hive server is processing client queries – it cannot restart to new version without loss • Client must not see failures – many components do not have retry BUT Hadoop deals with failures, it will fix pipelines, restart tasks – what is the big deal!! Service degradation will be high because every daemon is restarted
  • 8. Page8 © Hortonworks Inc. 2014 Issues that need to be addressed (2) • Maintaining the job submitters context (correctness) • Yarn tasks get their context from the local node – In the past the submitters and node’s context were identical – But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter - Half of the job could execute with old binaries and the other with the new one!! • Persistent state • Backward compatibility for upgrade (or convert) • Forward compatibility for downgrade (or convert) • Wire compatibility • With clients (forward and backward) • Internally (Between Masters and Slaves or Peers) – Note: the upgrade is in a rolling fashion
  • 9. Page9 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 10. Page10 © Hortonworks Inc. 2014 Packaging: Side-by-side Installs (1) • Need side-by-side installs of multiple versions on same node • Some components are version N, while others are N+1 • For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN) • HDP’s solution: Use OS-distro standard packaging solution • Rejected a proprietary packing solution (no lock-in) • Want to support RU via Ambari and Manually • Standard packaging solutions like RPMs have useful tools and mechanisms – Tools to install, uninstall, query, etc – Manage dependencies automatically – Admins do not need to learn new tools and formats • Side benefits for ‘stop-the-world” upgrade: • Can install the new binaries before the shutdown
  • 11. Page11 © Hortonworks Inc. 2014 Packaging: Side-by-side installs (2) • Layout: side-by-side • /usr/hdp/2.2.0.0/hadoop • /usr/hdp/2.2.0.0/hive • /usr/hdp/2.3.0.0/hadoop • /usr/hdp/2.3.0.0/hive • Define what is current for each component’s daemon and clients • /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop • /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop • /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop • Distro-select helps you manage the version switch • Our solution: the package name contains the version number: • E.g hadoop_2_2_0_0 is the RPM package name itself – Hadoop_2_3_0_0 is different peer package • Bin commands point to current: /usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
  • 12. Page12 © Hortonworks Inc. 2014 Packaging: Side-by-side installs (3) • distro-select tool to select current binary • Per-component, Per-daemon • Maintain stack consistency – that is what QE tested • Each component refers to its siblings of same stack version • Each component knows the “hadoop home” of the same stack – Wrapper bin-scripts set this up • Config updates can be optionally synchronized with binary upgrade • Configs can sit in their old location • But what if the new binary version requires slightly different config? • Each binary version has its own config pointer – /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
  • 13. Page13 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 14. Page14 © Hortonworks Inc. 2014 HDFS Enhancements (1) Data safety • Since version 2007, HDFS supported an upgrade-checkpoint • Backups of HDFS not practical – too large • Protects against HDFS bugs in new version deleting files • Standard practice to use for ALL upgrade even patch releases • But this only works for “stop-the-world” full upgrade and does not support downgrade • Irresponsible to do rolling upgrade without such a mechanism HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535) • Markers for rollback • “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code • Old scheme had hardlinks but we now delay the deletes • Added downgrade capability • Protobuf based fsImage for compatible extensibility
  • 15. Page15 © Hortonworks Inc. 2014 HDFS Enhancements (2) Minimize service degradation and retain data safety • Fast datanode restart (HDFS-5498) • Write pipeline – every DN will be upgraded and hence many write pipelines will break and repaired • Umbrella Jira HDFS-5535 – Repair it to the same DN during RU (avoid replica data copy) – Retain same number of replicas in pipeline • Upgrade HA standby and failover (NN HA available for a long time)
  • 16. Page16 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 17. Page17 © Hortonworks Inc. 2014 YARN Enhancements: Minimize Service Degradation • YARN RM retains app/job queue (2013) • YARN RM HA (2014) • Note this retains the queues but ALL jobs are restarted • Yarn RM can restart while retaining jobs (2015)
  • 18. Page18 © Hortonworks Inc. 2014 YARN Enhancements: Minimize Service Degradation • A restarted YARN NodeManager retains existing containers (2015) • Recall restarting containers will cause serious SLA degradation
  • 19. Page19 © Hortonworks Inc. 2014 YARN Enhancement: Compatibility • Versioning of state-stores of RM and NMs • Compatible evolution of tokens over time • Wire compatibility between mixed versions of RM
  • 20. Page20 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 21. Page21 © Hortonworks Inc. 2014 Retaining Job/App context Previously a Job/Apps uses libraries from the local node • Worked because client-node & compute-nodes had same version • But during RU, the node manager has multiple versions • Must use the same version as used by the client when submitting a job • Solution: • Framework libraries are now installed in HDFS • Client-context sent as “distro-version” variable in job config • Has side benefits – Frameworks now installed in single node and then uploaded to HDFS • Note Oozie also enhanced to maintain consistent context
  • 22. Page22 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 23. Page23 © Hortonworks Inc. 2014 Hive Enhancements • Fast restarts + client-side reconnection • Hive metastore and Hive client • Hive-server2: statefull server that submits the client’s query • Need to keep it running till the old queries complete • Solution: • Allow multiple Hive-servers to run, each registered in Zookeeper • New client requests go to new servers • Old server completes old queries but does not receive any new ones – Old-server is removed from Zookeeper • Side benefits • HA + Load balancing solution for Hiveserver2
  • 24. Page24 © Hortonworks Inc. 2014 Automated Rolling Upgrade Via Ambari Via Your own cluster management scripts
  • 25. Page25 © Hortonworks Inc. 2014 HDP Rolling Upgrades Runbook Pre-requisites • HA • Configs Prepare • Install bits • DB backups • HDFS checkpoint Rolling Upgrade Finalize Rolling Downgrade Rollback NOT Rolling. Shutdown all services. Note: Upgrade time is proportional to # nodes, not data size
  • 26. Page28 © Hortonworks Inc. 2014 Both Manual and Automated Rolling Upgrade • Ambari supports fully automated upgrades • Verifies prerequisites • Performs HDFS upgrade-checkpoint, prompts for DB backups • Performs rolling upgrade • All the components, in the right order • Smoke tests at each critical stages • Opportunities for Admin verification at critical stages • Downgrade if you change your mind • Have published the runbook for those that do not use Ambari • You can do it manually or automate your own process
  • 27. Page29 © Hortonworks Inc. 2014 Runbook: Rolling Upgrade Ambari has automated process for Rolling Upgrades Services are switched over to new version in rolling fashion Any components not installed on cluster are skipped Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Finalize HDFS, YARN, MR, Tez, HBase, Pig. Hive, Phoenix, Mahout HDFS YARN HBase
  • 28. Page30 © Hortonworks Inc. 2014 Runbook: Rolling Downgrade Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Downgrade Finalize
  • 29. Page31 © Hortonworks Inc. 2014 Summary • Enterprises run critical services and data on a Hadoop cluster. • Need a live cluster upgrade without degradation and maintaining SLAs • We enhanced Hadoop components for enterprise-grade rolling upgrade • Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, ) • Data safety – HDFS checkpoints and write-pipelines • Maintain SLAs – solve a number of service degradation problems – HDFS write pipelines, Yarn RM, NM state recovery, Hive, … • Jobs/apps continue to run correctly with the right context • Allow downgrade/rollbacks in case of problems • All enhancements truly open source and pushed back to Apache? • Yes of course – that is how Hortonworks does business …
  • 30. Page32 © Hortonworks Inc. 2014 Backup slides
  • 31. Page33 © Hortonworks Inc. 2014 Why didn’t you use alternatives • Alternatives generally keep one version active, not two • We need to move some services as a pack (clients) • We need to support managing confs and binaries together and separately • Maybe we could have done it, but it was getting complex …..

Hinweis der Redaktion

  1. HDFS write pipeline – slow down writes, risk data Yarn App masters restart – app failure if App master does not have persistent state Node manager restart – Tasks fail, restarts, SLA degrades Hive server is processing client queries – it cannot restart for new version Client must not see failures – many components do not have retry