3. Chef
Patterns
at
Bloomberg
Scale
//
WHY A VM? 3
• LIGHTWEIGHT PRE-REQUISITE
• Low memory/Storage Requirements
• RAPID DEPLOYMENT
• Vagrant for Bring-Up
• Vagrant for Re-Configuration
• EASY RELEASE MANAGEMENT
• MULTIPLE VM PER HYPERVISOR
• Multiple Clusters
• EASY RELOCATION
4. Chef
Patterns
at
Bloomberg
Scale
//
SERVICES OFFERED 4
• REPOSITORIES
• APT
• Ruby Gems
• Static Files (Chef!)
• CHEF SERVER
• KERBEROS KDC
• PXE SERVER
• DHCP/TFTP Server
• Cobbler (https://github.com/bloomberg/cobbler-cookbook)
• Bridged Networking (for test VMs)
• STRONG ISOLATION
5. Chef
Patterns
at
Bloomberg
Scale
//
BUILDING BOOTSTRAP 5
• CHEF AND VAGRANT
• Generic Image (Jenkins)
• NETWORK CONFIGURATION
• CORRECTING “KNIFE.RB”
• CHEF SERVER RECONFIGURATION
• CLEAN UP (CHEF REST API)
• CONVERT BOOTSTRAP TO BE AN ADMIN CLIENT
• Secrets/Keys
6. Chef
Patterns
at
Bloomberg
Scale
//
BUILDING BOOTSTRAP 6
• CHEF-SOLO PROVISIONER
# Chef provisioning
bootstrap.vm.provision "chef_solo" do |chef|
chef.environments_path = [[:vm,""]]
chef.environment = env_name
chef.cookbooks_path = [[:vm,""]]
chef.roles_path = [[:vm,""]]
chef.add_recipe("bcpc::bootstrap_network")
chef.log_level="debug"
chef.verbose_logging=true
chef.provisioning_path="/home/vagrant/chef-bcpc/"
end
• CHEF SERVER RECONFIGURATION
• NGINX, SOLR, RABBITMQ
# Reconfigure chef-server
bootstrap.vm.provision :shell, :inline => "chef-server-ctl reconfigure"
7. Chef
Patterns
at
Bloomberg
Scale
//
BUILDING BOOTSTRAP 7
• CLEAN UP (REST API)
ruby_block "cleanup-old-environment-databag" do
block do
rest = Chef::REST.new(node[:chef_client][:server_url], "admin",
"/etc/chef-server/admin.pem")
rest.delete("/environments/GENERIC")
rest.delete("/data/configs/GENERIC")
end
end
ruby_block "cleanup-old-clients" do
block do
system_clients = ["chef-validator", "chef-webui"]
rest = Chef::REST.new(node[:chef_client][:server_url], "admin",
"/etc/chef-server/admin.pem")
rest.get_rest("/clients").each do |client|
if !system_clients.include?(client.first)
rest.delete("/clients/#{client.first}")
end
end
end
end
8. Chef
Patterns
at
Bloomberg
Scale
//
BUILDING BOOTSTRAP 8
• CONVERT TO ADMIN (BOOTSTRAP_CONFIG.RB)
ruby_block "convert-bootstrap-to-admin" do
block do
rest = Chef::REST.new(node[:chef_client][:server_url],
"admin",
"/etc/chef-server/admin.pem")
rest.put_rest("/clients/#{node[:hostname]}",{:admin => true})
rest.put_rest("/nodes/#{node[:hostname]}",
{ :name => node[:hostname],
:run_list => ['role[BCPC-Bootstrap]'] }
)
end
end
10. Chef
Patterns
at
Bloomberg
Scale
//
DEPLOY TO HDFS 10
• USE CHEF DIRECTORY RESOURCE
• USE CUSTOM PROVIDER
• https://github.com/bloomberg/chef-
bach/blob/master/cookbooks/bcpc-
hadoop/libraries/hdfsdirectory.rb
directory “/projects/myapp” do
mode 755
owner “foo”
recursive true
provider BCPC::HdfsDirectory
end
11. Chef
Patterns
at
Bloomberg
Scale
//
DEPLOY KAFKA TOPIC 11
• USE LWRP
• Dynamic Topic; Right Zookeeper
• PROVIDER CODE AVAILABLE AT
• https://github.com/mthssdrbrg/kafka-cookbook/pull/49
# Kafka Topic Resource
actions :create, :update
attribute :name, :kind_of => String , :name_attribute => true
attribute :partitions, :kind_of => Integer, :default => 1
attribute :replication, :kind_of => Integer, :default => 1
12. Chef
Patterns
at
Bloomberg
Scale
//
KERBEROS 12
• KEYTABS
• Per Service / Host
• Up to 10 Keytabs per Host
• WHAT ABOUT MULTI HOMED HOSTS?
• Hadoop imputes _HOST
• PROVIDERS
• WebHDFS uses SPNEGO
• SYSTEM ROLE ACCOUNTS
• TENANT ROLE ACCOUNTS
• AVAILABLE AT
• https://github.com/bloomberg/chef-bach/tree/kerberos
13. Chef
Patterns
at
Bloomberg
Scale
//
LOGIC INJECTION 13
• COMPLETE CODE CAN BE FOUND AT
• Community cookbook
• https://github.com/mthssdrbrg/kafka-cookbook#controlling-restart-of-
kafka-brokers-in-a-cluster
• Wrapper custom recipe
• https://github.com/bloomberg/chef-
bach/blob/rolling_restart/cookbooks/kafka-bcpc/recipes/coordinate.rb
Statutory Warning
Code snippets are edited to fit the slides which may have resulted in logic
incoherence, bugs and un-readability. Readers discretion requested.
14. Chef
Patterns
at
Bloomberg
Scale
//
LOGIC INJECTION 14
• WE USE COMMUNITY COOKBOOKS
• Takes care of standard install, enable and starting of services
• NEED TO ADD LOGIC TO COOKBOOK RECIPES
• Take action on a service only when conditions are satisfied
• Take action on a service based on dependent service state
15. Chef
Patterns
at
Bloomberg
Scale
//
template ::File.join(node.kafka.config_dir, 'server.properties') do
source 'server.properties.erb'
...
helpers(Kafka::Configuration)
if restart_on_configuration_change?
notifies :restart, 'service[kafka]', :delayed
end
end
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
end
LOGIC INJECTION 15
VANILLA COMMUNITY COOKBOOK:
16. Chef
Patterns
at
Bloomberg
Scale
//
template ::File.join(node.kafka.config_dir, 'server.properties') do
source 'server.properties.erb'
...
helpers(Kafka::Configuration)
if restart_on_configuration_change?
notifies :restart, 'service[kafka]', :delayed
end
end
#----- Remove ----#
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
end
#----- Remove----#
LOGIC INJECTION 16
VANILLA COMMUNITY COOKBOOK:
17. Chef
Patterns
at
Bloomberg
Scale
//
template ::File.join(node.kafka.config_dir, 'server.properties') do
source 'server.properties.erb’
...
helpers(Kafka::Configuration)
if restart_on_configuration_change?
notifies :create, 'ruby_block[pre-shim]', :immediately
end
end
#----- Replace----#
include_recipe node["kafka"]["start_coordination"]["recipe"]
#----- Replace----#
LOGIC INJECTION 17
VANILLA COMMUNITY COOKBOOK 2.0:
18. Chef
Patterns
at
Bloomberg
Scale
//
ruby_block 'pre-shim' do
# pre-restart no-op
notifies :restart, 'service[kafka] ', :delayed
end
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
end
LOGIC INJECTION 18
COOKBOOK COORDINATOR RECIPE:
19. Chef
Patterns
at
Bloomberg
Scale
//
ruby_block 'pre-shim' do
# pre-restart done here
notifies :restart, 'service[kafka] ', :delayed
end
service 'kafka' do
provider kafka_init_opts[:provider]
supports start: true, stop: true, restart: true, status: true
action kafka_service_actions
notifies :create, 'ruby_block[post-shim] ', :immediately
end
ruby_block 'post-shim' do
# clean-up done here
end
LOGIC INJECTION 19
WRAPPER COORDINATOR RECIPE:
20. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE ON DEMAND 20
• COMMON SERVICE WHICH CAN BE REQUESTED
• Copy log files from applications into a centralized location
• Single location for users to review logs and helps with security
• Service available on all the nodes
• Applications can request the service dynamically
21. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE ON DEMAND 21
• NODE ATTRIBUTE TO STORE SERVICE REQUESTS
default['bcpc']['hadoop']['copylog'] = {}
• DATA STRUCTURE TO MAKE SERVICE REQUESTS
{
'app_id' => { 'logfile' => "/path/file_name_of_log_file",
'docopy' => true (or false)
},...
}
22. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE ON DEMAND 22
• APPLICATION RECIPES MAKE SERVICE REQUESTS
#
# Updating node attributes to copy HBase master log file to HDFS
#
node.default['bcpc']['hadoop']['copylog']['hbase_master'] = {
'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log",
'docopy' => true
}
node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = {
'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out",
'docopy' => true
}
23. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE ON DEMAND 23
• RECIPE FOR THE COMMON SERVICE
node['bcpc']['hadoop']['copylog'].each do |id,f|
if f['docopy']
template "/etc/flume/conf/flume-#{id}.conf" do
source "flume_flume-conf.erb”
action :create ...
variables(:agent_name => "#{id}",
:log_location => "#{f['logfile']}" )
notifies :restart,"service[flume-agent-multi-#{id}]",:delayed
end
service "flume-agent-multi-#{id}" do
supports :status => true, :restart => true, :reload => false
service_name "flume-agent-multi"
action :start
start_command "service flume-agent-multi start #{id}"
restart_command "service flume-agent-multi restart #{id}"
status_command "service flume-agent-multi status #{id}"
end
24. Chef
Patterns
at
Bloomberg
Scale
//
PLUGGABLE ALERTS 24
• SINGLE SOURCE FOR MONITORED STATS
• Allows users to visualize stats across different parameters
• Didn’t want to duplicate the stats collection by alerting system
• Need to feed data to the alerting system to generate alerts
25. Chef
Patterns
at
Bloomberg
Scale
//
PLUGGABLE ALERTS 25
• ATTRIBUTE WHERE USERS CAN DEFINE ALERTS
default["bcpc"]["hadoop"]["graphite"]["queries"] = {
'hbase_master' => [
{ 'type' => "jmx",
'query' => "memory.NonHeapMemoryUsage_committed",
'key' => "hbasenonheapmem",
'trigger_val' => "max(61,0)",
'trigger_cond' => "=0",
'trigger_name' => "HBaseMasterAvailability",
'trigger_dep' => ["NameNodeAvailability"],
'trigger_desc' => "HBase master seems to be down",
'severity' => 1
},{
'type' => "jmx",
'query' => "memory.HeapMemoryUsage_committed",
'key' => "hbaseheapmem",
...
},...], ’namenode' => [...] ...}
Query to pull stats
from data source
Define alert criteria
26. Chef
Patterns
at
Bloomberg
Scale
//
TEMPLATE PITFALLS 26
• LIBRARY FUNCTION CALLS IN WRAPPER COOKBOOKS
• Community cookbook provider accepts template as an attribute
• Template passed from wrapper makes a library function call
• Wrapper recipe includes the module of library function
27. Chef
Patterns
at
Bloomberg
Scale
//
TEMPLATE PITFALLS 27
...
Chef::Resource.send(:include, Bcpc::OSHelper)
...
cobbler_profile "bcpc_host" do
kickstart "cobbler.bcpc_ubuntu_host.preseed"
distro "ubuntu-12.04-mini-x86_64”
end
...
...
d-i passwd/user-password-crypted password
<%="#{get_config(@node, 'cobbler-root-password-salted')}"%>
d-i passwd/user-uid string
...
• WRAPPER RECIPE
• FUNCTION CALL IN TEMPLATE
29. Chef
Patterns
at
Bloomberg
Scale
//
DYNAMIC RESOURCES 29
• ANIT-PATTERN?
ruby_block "create namenode directories" do
block do
node[:bcpc][:storage][:mounts].each do |d|
dir = Chef::Resource::Directory.new("#{mount_root}/#{d}/dfs/nn",
run_context)
dir.owner "hdfs"
dir.group "hdfs"
dir.mode 0755
dir.recursive true
dir.run_action :create
exe = Chef::Resource::Execute.new("fixup nn owner", run_context)
exe.command "chown -Rf hdfs:hdfs #{mount_root}/#{d}/dfs"
exe.only_if {
Etc.getpwuid(File.stat("#{mount_root}/#{d}/dfs/").uid).name !=
"hdfs "
}
end
end
30. Chef
Patterns
at
Bloomberg
Scale
//
DYNAMIC RESOURCES 30
• SYSTEM CONFIGURATION
• Lengthy Configuration of a Storage Controller
• Setting Attributes at Converge Time
• Compile Time Actions?
• MUST WRAP IN RUBY_BLOCK’S
• Does not Update the Resource Collection
• Lazy’s everywhere:
• Guards: not_if{lazy{node[…]}.call.map{…}}
31. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE RESTART 31
• WE USE JMXTRANS TO MONITOR JMX STATS
• Service to be monitored varies with node
• There can be more than one service to be monitored
• Monitored service restart requires JMXtrans to be restarted**
32. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE RESTART 32
• DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES
"default_attributes" : {
"jmxtrans”:{
"servers”:[
{
"type": "datanode",
"service": "hadoop-hdfs-datanode",
"service_cmd":
"org.apache.hadoop.hdfs.server.datanode.DataNode"
}, {
"type": "hbase_rs",
"service": "hbase-regionserver",
"service_cmd":
“org.apache.hadoop.hbase.regionserver.HRegionServer"
}
]
} ...
Dependent Service Name
String to uniquely identify
the service process
33. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE RESTART 33
• JMXTRANS SERVICE RESTART LOGIC BUILT DYNAMICALLY
jmx_services = Array.new
jmx_srvc_cmds = Hash.new
node['jmxtrans']['servers'].each do |server|
jmx_services.push(server['service'])
jmx_srvc_cmds[server['service']] = server['service_cmd']
end
service "restart jmxtrans on dependent service" do
service_name "jmxtrans"
supports :restart => true, :status => true, :reload => true
action :restart
jmx_services.each do |jmx_dep_service|
subscribes :restart, "service[#{jmx_dep_service}]", :delayed
end
only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar”,
jmx_srvc_cmds)}
end
What if a
process is
re/started
externally?
Store the dependent service
name and process ids in
local variables
Subscribes from all
dependent services
34. Chef
Patterns
at
Bloomberg
Scale
//
SERVICE RESTART 34
def process_require_restart?(process_name, process_cmd, dep_cmds)
tgt_proces_pid = `pgrep -f #{process_cmd}`
...
tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}`
...
ret = false
restarted_processes = Array.new
dep_cmds.each do |dep_process, dep_cmd|
dep_pids = `pgrep -f #{dep_cmd}`
if dep_pids != ""
dep_pids_arr = dep_pids.split("n")
dep_pids_arr.each do |dep_pid|
dep_process_stime = `ps --no-header -o start_time #{dep_pid}`
if DateTime.parse(tgt_proces_stime) <
DateTime.parse(dep_process_stime)
restarted_processes.push(dep_process)
ret = true
end ...
Start time of the service process
Start time of all the service processes on
which it is dependent on
Compare the start time
36. Chef
Patterns
at
Bloomberg
Scale
//
ROLLING RESTART 36
• FLAGGING
• Negative Flagging – flag when a service is down
• Positive Flagging – flag when a service is reconfiguring
• Deadlock Avoidance
• CONTENTION
• Poll & Wait
• Fail the Run
• Simply Skip Service Restart and Go On
• Store the Need for Restart
• Breaks Assumptions of Procedural Chef Runs
37. Chef
Patterns
at
Bloomberg
Scale
//
ROLLING RESTART 37
HADOOP_SERVICE "ZOOKEEPER-SERVER" DO
DEPENDENCIES ["TEMPLATE[/ETC/ZOOKEEPER/CONF/ZOO.CFG]",
"TEMPLATE[/USR/LIB/ZOOKEEPER/BIN/ZKSERVER.SH]",
"TEMPLATE[/ETC/DEFAULT/ZOOKEEPER-SERVER]"]
PROCESS_IDENTIFIER "ORG.APACHE.ZOOKEEPER ... QUORUMPEERMAIN"
END
• SERVICE DEFINITION
38. Chef
Patterns
at
Bloomberg
Scale
//
ROLLING RESTART 38
• SYNCH STATE STORE
• Zookeeper
• SERVICE RESTART (KAFKA) VALIDATION CHECK
• Based on Jenkins pattern for wait_until_ready!
• Verifies that the service is up to an acceptable level
• Passes or stops the Chef run
• FUTURE DIRECTIONS
• Topology Aware Deployment
• Data Aware Deployment