Apache Ambari is now the preferred way of provisioning, managing and monitoring Hadoop Clusters. Ambari helps users to manage Hadoop clusters simplifying actions such as upgrades, configuration management, service management, etc. From release 2.0, Ambari started supporting automated Rolling Upgrades. This was further enhanced with release 2.2.0.0 to include support for Express Upgrades, which allows users to upgrade large scale clusters faster but requiring cluster downtime.
This talk will cover planning and execution of Hadoop cluster upgrades from an operational perspective. The talk will also cover the internals of the upgrade process including the various stages such as pre-upgrade, backup, service checks, configuration upgrades, and finalization. Finally, the talk will cover troubleshooting upgrade failures, monitoring services during upgrades and post upgrade actions. The presentation will conclude with a case study that will cover how the upgrade process works on a large cluster (including aspects such as planning the upgrade, the amount of time required for the various stages, and troubleshooting)
stack.upgrade.auto.retry.timeout.mins : Number of mins to retry for. Ideally, this would be between 15-20 mins. Default is 0 since this feature is turned off.
stack.upgrade.auto.retry.check.interval.secs : Thread sleep interval in seconds, defaults to 20 secs.
stack.upgrade.auto.retry.command.names.to.ignore : Don't auto-retry commands whose names are in this list. Default value is each name enclosed in quotes and separated by commas, "ComponentVersionCheckAction","FinalizeUpgradeAction"
stack.upgrade.auto.retry.command.details.to.ignore : Don't auto-retry commands whose details are in this list. Default value is each name enclosed in quotes and separated by commas, "Execute HDFS Finalize"
Based on the example above, to change the status from “HOLDING_FAILED” to “PENDING”, “Retry” button can be used. Or the following API can be used:
PUT http://vpamb2010.novalocal:8080/api/v1/clusters/Ambari21/upgrades/441/upgrade_groups/106/upgrade_items/1
{"UpgradeItem": { "status" : "PENDING" } }And then refresh the Ambari server page to continue the upgrade / downgrade.
Based on the example above, to change the status from “HOLDING_FAILED” to “PENDING”, “Retry” button can be used. Or the following API can be used:
PUT http://vpamb2010.novalocal:8080/api/v1/clusters/Ambari21/upgrades/441/upgrade_groups/106/upgrade_items/1
{"UpgradeItem": { "status" : "PENDING" } }And then refresh the Ambari server page to continue the upgrade / downgrade.
/**
* Not queued for a host.
*/
PENDING,
/**
* Queued for a host, or has already been sent to host, but host did not answer yet.
*/
QUEUED,
/**
* Host reported it is working, received an IN_PROGRESS command status from host.
*/
IN_PROGRESS,
/**
* Task is holding, waiting for command to proceed to completion.
*/
HOLDING,
/**
* Host reported success
*/
COMPLETED,
/**
* Failed
*/
FAILED,
/**
* Task is holding after a failure, waiting for command to skip or retry.
*/
HOLDING_FAILED,
/**
* Host did not respond in time
*/
TIMEDOUT,
/**
* Task is holding after a time-out, waiting for command to skip or retry.
*/
HOLDING_TIMEDOUT,
/**
* Operation was abandoned
*/
ABORTED,
/**
* The operation failed and was automatically skipped.
*/
SKIPPED_FAILED;
Based on the example above, to change the status from “HOLDING_FAILED” to “PENDING”, “Retry” button can be used. Or the following API can be used:
PUT http://vpamb2010.novalocal:8080/api/v1/clusters/Ambari21/upgrades/441/upgrade_groups/106/upgrade_items/1
{"UpgradeItem": { "status" : "PENDING" } }And then refresh the Ambari server page to continue the upgrade / downgrade.
Based on the example above, to change the status from “HOLDING_FAILED” to “PENDING”, “Retry” button can be used. Or the following API can be used:
PUT http://vpamb2010.novalocal:8080/api/v1/clusters/Ambari21/upgrades/441/upgrade_groups/106/upgrade_items/1
{"UpgradeItem": { "status" : "PENDING" } }And then refresh the Ambari server page to continue the upgrade / downgrade.
Based on the example above, to change the status from “HOLDING_FAILED” to “PENDING”, “Retry” button can be used. Or the following API can be used:
PUT http://vpamb2010.novalocal:8080/api/v1/clusters/Ambari21/upgrades/441/upgrade_groups/106/upgrade_items/1
{"UpgradeItem": { "status" : "PENDING" } }And then refresh the Ambari server page to continue the upgrade / downgrade.