SlideShare a Scribd company logo
1 of 25
Download to read offline
The Opscode Push Jobs Service
Mark Anderson
April 28, 2013
Push jobs
in a command line
• knife job start -quorum 90% 'chef-client' --search 'role:webapp'
• Finds all nodes with role webapp
• Submits a job with quorum of 90% to the pushy server.
• Checks quorum
• Starts job on available nodes
• Gathers success and failures
• And will do this for ten nodes...or a thousand
Push jobs
Why not use X?
• We wanted to build a tool that could be deeply integrated into chef.
• Integrated with authentication model
• Clients use their client key to authenticate to the server
• Users use their keys to send commands to the api
• Integrated with the authorization model
• Groups control access now
• Eventually there will be fine grained ACLs
• Integrated with search and other Chef features
• Scalability
Push jobs
Server
• Erlang service
• Extends the Chef REST API
• Job creation and tracking
• Push client configuration
• Controls the clients via ZeroMQ
• Heartbeating to track node availability
• Command execution
Push jobs
Client
• Simple ruby client
• Receives heartbeats from the server
• Sends back heartbeats to the server
• Executes commands
• Configuration requirements are minimal
• The client initiates all connections to the server
• Most configuration is via chef API call using the client key
• Opens ZeroMQ connections to server for all other communication
Push jobs
The lifecycle of a job
Server
Client
Job
Accepted
Send
Command
Clients
ACK
Wait for
Quorum
Start Exec
Clients
Exec
Collect
Results
Push jobs
Knife extension
• All control for pushy jobs is via extensions to the chef API
• Node status
• Job control
• start
• stop
• status
• Job listing
Pushy Demo
Push Jobs
Demo
Chef/
Pushy
Server
Chico
Harpo
Groucho
Gummo
Zeppo
The nitty gritty
Internals:
Client server interaction
• The client initiates all connections to the server
• The client authenticates to the server and receives
• A session key and TTL
• ZeroMQ connection information (ports, heartbeat rate, etc)
• Subscribes via ZeroMQ to server heartbeats (1 to many)
• Connects via ZeroMQ to the server (1-1)
• Sends heartbeats to the server as long as it receives server heartbeats
• Awaits commands from the server
Security
• Protocol security
• We leverage the existing API signing mechanism to exchange session keys
• All ZeroMQ messages are signed
• HMAC SHA256 signing protocol protects point to point messages
• RSA 2048/SHA1 protects broadcast messages (just like the chef API)
• Relies on the SSL chain of trust to the server.
Access control
• Access rights controlled by groups
• ‘push_job_writers’ group controls job creation and deletion
• ‘push_job_readers’ group controls read access to job status and results
• Whitelist for commands
• The client rejects commands that aren’t on the whitelist
• In the future we’d like to do finer grained access control
• Perhaps persistent job templates with their own access rights and commands
Implementation
Why erlang?
• Chef 11 work has been very successful
• Easier integration
• The process (think threads) model allows a great deal of parallelism
• Every node has an erlang process to track its state
• Every job has a process to track its state
Server process structure
Message
switch
Heartbeat
generator
REST API
Clients
Job
Monitor
Job
Monitor
Job
Monitor
Job
Monitor
Job
Monitor
Client
Monitor
Client
Monitor
Client
Monitor
Client
Monitor
Client
Monitor
Why ZeroMQ?
• Abstracts away much of the pain of socket libraries
• Reliable delivery
• Portable
• Broad language support
• Proven scalability
Some impedance mismatch
• ZeroMQ provides a lot of goodness for asynchronous execution
• That is really helpful in many languages
• Erlang doesn’t need that so much, and encourages finer grained tasks
• ZeroMQ hides a lot of interesting state
• It turns out we care about whether a node dies and comes back
• Need a better ZeroMQ/Erlang glue library
• ZeroMQ 3 offers some very interesting options for future work
Performance and scalability results
• We can run a job over 2000 nodes
• 15 sec heartbeats
• c1.medium
• Bottlenecks
• Heartbeats consume a lot of resources
• Everything goes through router process for zeromq messages
Moving towards a federated architecture
Chef API
SQL
Erchef
Solr SQL
Push Reporting
SQLSQL
Auth
Where are we now?
Availability: Limited release
• Time to get people using it
• Private Chef only for now
• Hosted Chef deferred
• Scalability: hosted has more than 2k nodes
• Security: ZeroMQ messages aren’t encrypted
• Open Source: Eventually
Future directions
• Scalability improvements
• Our biggest customers will want more
• Jobs as first class, persistent objects
• More control of execution of jobs,
• Inside a job: (push jobs makes an excellent DDOS tool)
• Between jobs: Chaining results between jobs
• Push jobs as a building block
• Conducting experiments using it for building block for continuous delivery
Questions?

More Related Content

What's hot

What's hot (20)

Introduction to Chef - Techsuperwomen Summit
Introduction to Chef - Techsuperwomen SummitIntroduction to Chef - Techsuperwomen Summit
Introduction to Chef - Techsuperwomen Summit
 
Introduction to Chef - April 22 2015
Introduction to Chef - April 22 2015Introduction to Chef - April 22 2015
Introduction to Chef - April 22 2015
 
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In CodeIntroduction to Chef: Automate Your Infrastructure by Modeling It In Code
Introduction to Chef: Automate Your Infrastructure by Modeling It In Code
 
Server Installation and Configuration with Chef
Server Installation and Configuration with ChefServer Installation and Configuration with Chef
Server Installation and Configuration with Chef
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
SaltConf14 - Saurabh Surana, HP Cloud - Automating operations and support wit...
 
Chef Workflow Demo
Chef Workflow DemoChef Workflow Demo
Chef Workflow Demo
 
Automating Infrastructure with Chef
Automating Infrastructure with ChefAutomating Infrastructure with Chef
Automating Infrastructure with Chef
 
Hosting Ruby Web Apps
Hosting Ruby Web AppsHosting Ruby Web Apps
Hosting Ruby Web Apps
 
Understand Chef
Understand ChefUnderstand Chef
Understand Chef
 
Migrating the Online’s console with Docker
Migrating the Online’s console with DockerMigrating the Online’s console with Docker
Migrating the Online’s console with Docker
 
Armada - the way to ship microservices
Armada - the way to ship microservicesArmada - the way to ship microservices
Armada - the way to ship microservices
 
Azure handsonlab
Azure handsonlabAzure handsonlab
Azure handsonlab
 
Version Controlling With Git
Version Controlling With GitVersion Controlling With Git
Version Controlling With Git
 
Chef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK BoxChef ignited a DevOps revolution – BK Box
Chef ignited a DevOps revolution – BK Box
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with Chef
 
Chef introduction
Chef introductionChef introduction
Chef introduction
 
Learning chef
Learning chefLearning chef
Learning chef
 

Similar to Push jobs: an orchestration building block for private Chef

Similar to Push jobs: an orchestration building block for private Chef (20)

Using MCollective with Chef - cfgmgmtcamp.eu 2014
Using MCollective with Chef - cfgmgmtcamp.eu 2014Using MCollective with Chef - cfgmgmtcamp.eu 2014
Using MCollective with Chef - cfgmgmtcamp.eu 2014
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 
2021.laravelconf.tw.slides1
2021.laravelconf.tw.slides12021.laravelconf.tw.slides1
2021.laravelconf.tw.slides1
 
How to Supercharge your PHP Web API
How to Supercharge your PHP Web APIHow to Supercharge your PHP Web API
How to Supercharge your PHP Web API
 
.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric.NET microservices with Azure Service Fabric
.NET microservices with Azure Service Fabric
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services Architecture
 
Mule Runtime: Performance Tuning
Mule Runtime: Performance Tuning Mule Runtime: Performance Tuning
Mule Runtime: Performance Tuning
 
Sensu Monitoring
Sensu MonitoringSensu Monitoring
Sensu Monitoring
 
HTTP - The Other Face Of Domino
HTTP - The Other Face Of DominoHTTP - The Other Face Of Domino
HTTP - The Other Face Of Domino
 
Serverless: The future of application delivery
Serverless: The future of application deliveryServerless: The future of application delivery
Serverless: The future of application delivery
 
Fixing Domino Server Sickness
Fixing Domino Server SicknessFixing Domino Server Sickness
Fixing Domino Server Sickness
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
 
Metrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability TeamMetrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability Team
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Erlang factory 2011 london
Erlang factory 2011 londonErlang factory 2011 london
Erlang factory 2011 london
 
Designing for Scale
Designing for ScaleDesigning for Scale
Designing for Scale
 
Chef Cookbook Workflow
Chef Cookbook WorkflowChef Cookbook Workflow
Chef Cookbook Workflow
 
Building a [micro]services platform on AWS
Building a [micro]services platform on AWSBuilding a [micro]services platform on AWS
Building a [micro]services platform on AWS
 
Nagios, Getting Started.
Nagios, Getting Started.Nagios, Getting Started.
Nagios, Getting Started.
 

More from Chef Software, Inc.

Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Software, Inc.
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Software, Inc.
 
Opscode Webinar: Automation for Education May 08-2013
Opscode Webinar: Automation for Education May 08-2013Opscode Webinar: Automation for Education May 08-2013
Opscode Webinar: Automation for Education May 08-2013
Chef Software, Inc.
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
Chef Software, Inc.
 

More from Chef Software, Inc. (20)

Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
Chef Fundamentals Training Series Module 6: Roles, Environments, Community Co...
 
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
 
Chef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation Setup
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of Chef
 
Opscode Webinar: Cooking with Chef on Microsoft Windows
Opscode Webinar: Cooking with Chef on Microsoft WindowsOpscode Webinar: Cooking with Chef on Microsoft Windows
Opscode Webinar: Cooking with Chef on Microsoft Windows
 
Opscode tech festa july 2013
Opscode tech festa   july 2013Opscode tech festa   july 2013
Opscode tech festa july 2013
 
Opscode Webinar: Automation for Education May 08-2013
Opscode Webinar: Automation for Education May 08-2013Opscode Webinar: Automation for Education May 08-2013
Opscode Webinar: Automation for Education May 08-2013
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
 
The Berkshelf Way
The Berkshelf WayThe Berkshelf Way
The Berkshelf Way
 
Using Kanban and Chef: A Case Study – Jeffrey Hulten
Using Kanban and Chef: A Case Study – Jeffrey HultenUsing Kanban and Chef: A Case Study – Jeffrey Hulten
Using Kanban and Chef: A Case Study – Jeffrey Hulten
 
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad HedlundSDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
SDN, Network Virtualization and the Software Defined Data Center – Brad Hedlund
 
ChefConf 2013 Keynote Session – Opscode – Adam Jacob
ChefConf 2013 Keynote Session – Opscode – Adam JacobChefConf 2013 Keynote Session – Opscode – Adam Jacob
ChefConf 2013 Keynote Session – Opscode – Adam Jacob
 
Using Chef and AppFirst to Automate Scale-out/Scale-down of Web Applications ...
Using Chef and AppFirst to Automate Scale-out/Scale-down of Web Applications ...Using Chef and AppFirst to Automate Scale-out/Scale-down of Web Applications ...
Using Chef and AppFirst to Automate Scale-out/Scale-down of Web Applications ...
 
The InstallShield of the 21st Century – Theo Schlossnagle
The InstallShield of the 21st Century – Theo SchlossnagleThe InstallShield of the 21st Century – Theo Schlossnagle
The InstallShield of the 21st Century – Theo Schlossnagle
 
The unintended benefits of Chef
The unintended benefits of ChefThe unintended benefits of Chef
The unintended benefits of Chef
 
Multi-provider Vagrant and Chef: AWS, VMware, and more
Multi-provider Vagrant and Chef: AWS, VMware, and moreMulti-provider Vagrant and Chef: AWS, VMware, and more
Multi-provider Vagrant and Chef: AWS, VMware, and more
 
Welcome to the IT Industrial Revolution! Are you ready?
Welcome to the IT Industrial Revolution! Are you ready?Welcome to the IT Industrial Revolution! Are you ready?
Welcome to the IT Industrial Revolution! Are you ready?
 
Who Says Elephants Can’t Cook? How IBM and Opscode are changing the role of c...
Who Says Elephants Can’t Cook? How IBM and Opscode are changing the role of c...Who Says Elephants Can’t Cook? How IBM and Opscode are changing the role of c...
Who Says Elephants Can’t Cook? How IBM and Opscode are changing the role of c...
 
Growing Pains with Chef – a Tale of DevOps in a Large Organization
Growing Pains with Chef – a Tale of DevOps in a Large OrganizationGrowing Pains with Chef – a Tale of DevOps in a Large Organization
Growing Pains with Chef – a Tale of DevOps in a Large Organization
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Push jobs: an orchestration building block for private Chef

  • 1.
  • 2. The Opscode Push Jobs Service Mark Anderson April 28, 2013
  • 3. Push jobs in a command line • knife job start -quorum 90% 'chef-client' --search 'role:webapp' • Finds all nodes with role webapp • Submits a job with quorum of 90% to the pushy server. • Checks quorum • Starts job on available nodes • Gathers success and failures • And will do this for ten nodes...or a thousand
  • 4. Push jobs Why not use X? • We wanted to build a tool that could be deeply integrated into chef. • Integrated with authentication model • Clients use their client key to authenticate to the server • Users use their keys to send commands to the api • Integrated with the authorization model • Groups control access now • Eventually there will be fine grained ACLs • Integrated with search and other Chef features • Scalability
  • 5. Push jobs Server • Erlang service • Extends the Chef REST API • Job creation and tracking • Push client configuration • Controls the clients via ZeroMQ • Heartbeating to track node availability • Command execution
  • 6. Push jobs Client • Simple ruby client • Receives heartbeats from the server • Sends back heartbeats to the server • Executes commands • Configuration requirements are minimal • The client initiates all connections to the server • Most configuration is via chef API call using the client key • Opens ZeroMQ connections to server for all other communication
  • 7. Push jobs The lifecycle of a job Server Client Job Accepted Send Command Clients ACK Wait for Quorum Start Exec Clients Exec Collect Results
  • 8. Push jobs Knife extension • All control for pushy jobs is via extensions to the chef API • Node status • Job control • start • stop • status • Job listing
  • 12. Internals: Client server interaction • The client initiates all connections to the server • The client authenticates to the server and receives • A session key and TTL • ZeroMQ connection information (ports, heartbeat rate, etc) • Subscribes via ZeroMQ to server heartbeats (1 to many) • Connects via ZeroMQ to the server (1-1) • Sends heartbeats to the server as long as it receives server heartbeats • Awaits commands from the server
  • 13. Security • Protocol security • We leverage the existing API signing mechanism to exchange session keys • All ZeroMQ messages are signed • HMAC SHA256 signing protocol protects point to point messages • RSA 2048/SHA1 protects broadcast messages (just like the chef API) • Relies on the SSL chain of trust to the server.
  • 14. Access control • Access rights controlled by groups • ‘push_job_writers’ group controls job creation and deletion • ‘push_job_readers’ group controls read access to job status and results • Whitelist for commands • The client rejects commands that aren’t on the whitelist • In the future we’d like to do finer grained access control • Perhaps persistent job templates with their own access rights and commands
  • 16. Why erlang? • Chef 11 work has been very successful • Easier integration • The process (think threads) model allows a great deal of parallelism • Every node has an erlang process to track its state • Every job has a process to track its state
  • 17. Server process structure Message switch Heartbeat generator REST API Clients Job Monitor Job Monitor Job Monitor Job Monitor Job Monitor Client Monitor Client Monitor Client Monitor Client Monitor Client Monitor
  • 18. Why ZeroMQ? • Abstracts away much of the pain of socket libraries • Reliable delivery • Portable • Broad language support • Proven scalability
  • 19. Some impedance mismatch • ZeroMQ provides a lot of goodness for asynchronous execution • That is really helpful in many languages • Erlang doesn’t need that so much, and encourages finer grained tasks • ZeroMQ hides a lot of interesting state • It turns out we care about whether a node dies and comes back • Need a better ZeroMQ/Erlang glue library • ZeroMQ 3 offers some very interesting options for future work
  • 20. Performance and scalability results • We can run a job over 2000 nodes • 15 sec heartbeats • c1.medium • Bottlenecks • Heartbeats consume a lot of resources • Everything goes through router process for zeromq messages
  • 21. Moving towards a federated architecture Chef API SQL Erchef Solr SQL Push Reporting SQLSQL Auth
  • 22. Where are we now?
  • 23. Availability: Limited release • Time to get people using it • Private Chef only for now • Hosted Chef deferred • Scalability: hosted has more than 2k nodes • Security: ZeroMQ messages aren’t encrypted • Open Source: Eventually
  • 24. Future directions • Scalability improvements • Our biggest customers will want more • Jobs as first class, persistent objects • More control of execution of jobs, • Inside a job: (push jobs makes an excellent DDOS tool) • Between jobs: Chaining results between jobs • Push jobs as a building block • Conducting experiments using it for building block for continuous delivery