SlideShare ist ein Scribd-Unternehmen logo
1 von 29
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Cloudbreak
on Kubernetes
Richard Doktorics
Krisztian Horvath
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Who we are?
 Krisztian Horvath
– Staff Engineer at Hortonworks
– Works on Cloudbreak from the beginning
– @keyki
 Richard Doktorics
– Senior Software Engineer
– Works on Cloudbreak from the beginning
– @doktoric
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Cloudbreak
 Kubernetes
 Helm
 Cloudbreak Rolling Update
 Log collection
 Monitoring & Alerting
 Questions
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure
 Simplified Cluster Provisioning
 Automated Cluster Scaling
– AMS (Ambari Metrics System)
– Prometheus based metrics
 Highly Extensible
– Recipes for scripting extensions that run before/after cluster provisioning
– Custom cloud images
 Multiple platforms are supported
– AWS
– GCP
– Azure
– OpenStack
– BYOS (Bring Your Own Stack)
What is Cloudbreak?
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Cloudbreak Deployer (CBD)
– Written in Go and Bash (go-basher)
– Compiled into single binary
 Micro-service architecture
– Each service runs in a Docker
container
– Each container is replaceable
with custom ones
– Services are handled with
docker-compose
Single node deployment
IMAGE NAMES
traefik:v1.3.8-alpine cbreak_traefik_1
hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1
postgres:9.6.1-alpine cbreak_commondb_1
hortonworks/cloudbreak-uaa cbreak_identity_1
hortonworks/hdc-auth:2.1.0 cbreak_sultans_1
hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1
hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1
gliderlabs/consul-server:0.5 cbreak_consul_1
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Run Cloudbreak in HA (High Available) mode
– Ability to recover flows in case of node failure
– Avoid master-slave design / leader election problems
 Scale Cloudbreak as we desire
– Distribute each cluster related flow
– Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows)
– Flow cancellation must be handled
 Scale the Web UI
– Had to introduce a Redis cluster for the session store
 Scale every other service as well
 Find a tool that makes it easy to deploy these services to multiple nodes
 Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere
Our goal was to..
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Kubernetes is an open-source platform designed to automate deploying, scaling and
operating application containers
 Deploy your applications quickly and predictably
 Scale your applications on the fly
 Roll out new features seamlessly
 Limit hardware usage to required resources only
 Portable: public, private, hybrid, multi-cloud
 Extensible: modular, pluggable, hookable, composable
 Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling
What is Kubernetes?
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Not because it’s fancy..
 Evaluated Kubernetes, Swarm, Mesos, Rancher
 Open source / Active community with hands-on experience
 Many cloud providers already supports it
 Lots of tooling behind it / API / CLI / Helm / Ansible / Salt
 Integration with most of the cloud providers
– Provision Load Balancer (GCP, AWS, Azure)
– Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account)
– Dynamic volume provisioning / Persistent disk (EBS, Azure Blob)
Why Kubernetes?
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Kubernetes on Azure
 az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2
--service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 ACS (Azure Container Service)
– Can run Kubernetes, Swarm, DC/OS
 AKS (Managed Kubernetes)
– No master VMs (at least on your side)
– Multiple agent pools with different VM types
– Scale the agent pools independently
– Automatic upgrades
 ACI (Azure Container Instances)
– No VMs to provision
– “Endless” resource pool
– Pay by seconds
– Can act “as a node” in the Kubernetes cluster
ACS / AKS / ACI
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 Pod
– Group of one or more containers with shared storage/network
– Always co-located and co-scheduled and run in a shared context
 Deployment
– Provides declarative updates for Pods
 StatefulSet
– Manages the deployment and scaling of a set of Pods
and provides guarantees about the ordering
and uniqueness of these Pods
– Has a persistent identifier that it maintains across
any rescheduling
 Service
– Abstraction which defines a logical set of Pods and a policy by which to access them
 Declared in yml files
Kubernetes resources
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service (cloudbreak.default.svc.cluster.local)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image: hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Helm
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 No real competitor
 Helps you manage Kubernetes applications
 Officially approved by community
 Official Charts
 Rolling upgrade
 Helm is the client, Tiller is the server
 Tiller is a Kubernetes pod
Why Helm?
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Running Helm on Kubernetes
 Helm package ~= Chart
– Define
– Install
– Upgrade
 Chart
– values.yml: stores variables for the template files templates directory
– Chart.yml: describes the chart, as in it’s name, description and version
– kubernetes templates.yml: Go template support
 Separated Charts for every component
– Cloudbreak
– Monitoring
– Analytics
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment and Service example
Deployment Service Deployment template Helm Service template Helm
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: cloudbreak
spec:
replicas: 5
selector:
matchLabels:
app: cloudbreak
template:
metadata:
labels:
app: cloudbreak
spec:
containers:
- name: cloudbreak
image:
hortonworks/cloudbreak:2.1.0
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: {{ .Release.Name }}-cloudbreak
spec:
replicas: {{ .Values.replicas }}
selector:
matchLabels:
app: cloudbreak
release: {{ .Release.Name }}
template:
metadata:
labels:
app: cloudbreak
release: {{ .Release.Name }}
spec:
containers:
- name: cloudbreak
image: {{ .Values.cbImage }}
ports:
- containerPort: 8080
name: http-port
- containerPort: 20105
name: jmx-port
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-cloudbreak
annotations:
prometheus.io/scrape: true
prometheus.io/path: "/”
prometheus.io/port: 20105
spec:
selector:
app: cloudbreak
release: {{ .Release.Name }}
ports:
- name: http
protocol: TCP
port: 8080
- name: jmx
protocol: TCP
port: 20105
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Rolling Update
 The goal is to have zero downtime update
 Ability to roll back in case something goes wrong
 Rolling Update strategy with Readiness Probe
 Canary releasing
 Prepare for running 2 versions of the application at the same time
Strategy Readiness Probe
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
readinessProbe:
httpGet:
path: /cb/info
port: 8080
initialDelaySeconds: 90
failureThreshold: 5
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
 Run a new version of the application along with the stable one and route
some of the users to this version
 Run your tests against the new version and once you are happy with the results shut
down the old version
 Maintain backward compatibility or you’ll break the update
 Hard to change the database
schema
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Canary releasing
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging and Monitoring
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Logging
 Logspout
– Collecting the logs from Docker socket
 Logstash
– Redirecting logs to file outputs
 Azure File Share
– Storing the Log files in Samba share
 LogSearch
– Owned by Hortonworks
– Using Solr under the hood
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitoring
 Prometheus
– Java metrics (Custom metrics)
– Provider per cluster
– REST status codes
– Response times
– Active flows per node
– Go metrics
– Consul metrics
– Linux/ Host metrics
– NodeJS metrics
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Alerting
ALERT successful_stack_creation_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0
ANNOTATIONS {
status="INFO”,
description="A new stack has been created on AWS.”
}
ALERT stack_creation_failed_aws
IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0
ANNOTATIONS {
status="WARN”,
description="Failed to create a stack on AWS.”
}
ALERT node_down
IF up{job='node_exporter'} == 0
FOR 5m
ANNOTATIONS {
status="ERROR”,
description = "Node {{ $labels.instance }} is down for more than 15 minutes”,
}
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank you!
Instagram (@hortonworks.hungary)

Weitere ähnliche Inhalte

Was ist angesagt?

Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
Animesh Singh
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
DataWorks Summit
 

Was ist angesagt? (20)

Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Introduction of OpenStack cascading solution
Introduction of OpenStack cascading solutionIntroduction of OpenStack cascading solution
Introduction of OpenStack cascading solution
 
Docker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryDocker OpenStack Cloud Foundry
Docker OpenStack Cloud Foundry
 
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source TriumvirateCloud foundry Docker Openstack - Leading Open Source Triumvirate
Cloud foundry Docker Openstack - Leading Open Source Triumvirate
 
Mesosphere quick overview
Mesosphere quick overviewMesosphere quick overview
Mesosphere quick overview
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
 
Openshift Container Platform on Azure
Openshift Container Platform on AzureOpenshift Container Platform on Azure
Openshift Container Platform on Azure
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMware
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Kubernetes for the PHP developer
Kubernetes for the PHP developerKubernetes for the PHP developer
Kubernetes for the PHP developer
 
Episode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at ScaleEpisode 2: Deploying Kubernetes at Scale
Episode 2: Deploying Kubernetes at Scale
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)
 
Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016Cloud Foundry Technical Overview at IBM Interconnect 2016
Cloud Foundry Technical Overview at IBM Interconnect 2016
 
Kubernetes day 2 Operations
Kubernetes day 2 OperationsKubernetes day 2 Operations
Kubernetes day 2 Operations
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevTriple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
 

Ähnlich wie Running Cloudbreak on Kubernetes

Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 
Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
Sébastien Le Gall
 

Ähnlich wie Running Cloudbreak on Kubernetes (20)

Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
DevOps in Age of Kubernetes
DevOps in Age of KubernetesDevOps in Age of Kubernetes
DevOps in Age of Kubernetes
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Beyond static configuration
Beyond static configurationBeyond static configuration
Beyond static configuration
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
 
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
 
Kubernetes @ meetic
Kubernetes @ meeticKubernetes @ meetic
Kubernetes @ meetic
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022
 
Easy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on AzureEasy Docker Deployments with Mesosphere DCOS on Azure
Easy Docker Deployments with Mesosphere DCOS on Azure
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Containers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red HatContainers Anywhere with OpenShift by Red Hat
Containers Anywhere with OpenShift by Red Hat
 
DevOps and BigData Analytics
DevOps and BigData Analytics DevOps and BigData Analytics
DevOps and BigData Analytics
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Running Cloudbreak on Kubernetes

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Cloudbreak on Kubernetes Richard Doktorics Krisztian Horvath
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Who we are?  Krisztian Horvath – Staff Engineer at Hortonworks – Works on Cloudbreak from the beginning – @keyki  Richard Doktorics – Senior Software Engineer – Works on Cloudbreak from the beginning – @doktoric
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda  Cloudbreak  Kubernetes  Helm  Cloudbreak Rolling Update  Log collection  Monitoring & Alerting  Questions
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Cloudbreak is a tool for provisioning Hadoop clusters on cloud infrastructure  Simplified Cluster Provisioning  Automated Cluster Scaling – AMS (Ambari Metrics System) – Prometheus based metrics  Highly Extensible – Recipes for scripting extensions that run before/after cluster provisioning – Custom cloud images  Multiple platforms are supported – AWS – GCP – Azure – OpenStack – BYOS (Bring Your Own Stack) What is Cloudbreak?
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Cloudbreak Deployer (CBD) – Written in Go and Bash (go-basher) – Compiled into single binary  Micro-service architecture – Each service runs in a Docker container – Each container is replaceable with custom ones – Services are handled with docker-compose Single node deployment IMAGE NAMES traefik:v1.3.8-alpine cbreak_traefik_1 hortonworks/cloudbreak:2.1.0 cbreak_cloudbreak_1 postgres:9.6.1-alpine cbreak_commondb_1 hortonworks/cloudbreak-uaa cbreak_identity_1 hortonworks/hdc-auth:2.1.0 cbreak_sultans_1 hortonworks/cloudbreak-autoscale:2.1.0 cbreak_periscope_1 hortonworks/hdc-web:2.1.0 cbreak_uluwatu_1 gliderlabs/consul-server:0.5 cbreak_consul_1
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Run Cloudbreak in HA (High Available) mode – Ability to recover flows in case of node failure – Avoid master-slave design / leader election problems  Scale Cloudbreak as we desire – Distribute each cluster related flow – Cannot run 2 flows for the same cluster at the same time (e.g: 2 upscale flows) – Flow cancellation must be handled  Scale the Web UI – Had to introduce a Redis cluster for the session store  Scale every other service as well  Find a tool that makes it easy to deploy these services to multiple nodes  Cloudbreak as a Service that is accessible by everyone and can start clusters anywhere Our goal was to..
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Kubernetes
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Kubernetes is an open-source platform designed to automate deploying, scaling and operating application containers  Deploy your applications quickly and predictably  Scale your applications on the fly  Roll out new features seamlessly  Limit hardware usage to required resources only  Portable: public, private, hybrid, multi-cloud  Extensible: modular, pluggable, hookable, composable  Self-healing: auto-placement, auto-restart, auto-replication, auto-scaling What is Kubernetes?
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Not because it’s fancy..  Evaluated Kubernetes, Swarm, Mesos, Rancher  Open source / Active community with hands-on experience  Many cloud providers already supports it  Lots of tooling behind it / API / CLI / Helm / Ansible / Salt  Integration with most of the cloud providers – Provision Load Balancer (GCP, AWS, Azure) – Use object stores to share data (Ceph, S3, GCP bucket, Azure Storage Account) – Dynamic volume provisioning / Persistent disk (EBS, Azure Blob) Why Kubernetes?
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Kubernetes on Azure  az aks create --resource-group k8srg --name k8s --agent-count 5 --agent-osdisk-size 100 --agent-vm-size Standard_D12_v2 --service-principal sp --client-secret cs --dns-name-prefix k8s --location westus --ssh-key-value ~/.ssh/id_rsa.pub
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  ACS (Azure Container Service) – Can run Kubernetes, Swarm, DC/OS  AKS (Managed Kubernetes) – No master VMs (at least on your side) – Multiple agent pools with different VM types – Scale the agent pools independently – Automatic upgrades  ACI (Azure Container Instances) – No VMs to provision – “Endless” resource pool – Pay by seconds – Can act “as a node” in the Kubernetes cluster ACS / AKS / ACI
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  Pod – Group of one or more containers with shared storage/network – Always co-located and co-scheduled and run in a shared context  Deployment – Provides declarative updates for Pods  StatefulSet – Manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods – Has a persistent identifier that it maintains across any rescheduling  Service – Abstraction which defines a logical set of Pods and a policy by which to access them  Declared in yml files Kubernetes resources
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service (cloudbreak.default.svc.cluster.local) apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Helm
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  No real competitor  Helps you manage Kubernetes applications  Officially approved by community  Official Charts  Rolling upgrade  Helm is the client, Tiller is the server  Tiller is a Kubernetes pod Why Helm?
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Running Helm on Kubernetes  Helm package ~= Chart – Define – Install – Upgrade  Chart – values.yml: stores variables for the template files templates directory – Chart.yml: describes the chart, as in it’s name, description and version – kubernetes templates.yml: Go template support  Separated Charts for every component – Cloudbreak – Monitoring – Analytics
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment and Service example Deployment Service Deployment template Helm Service template Helm apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cloudbreak spec: replicas: 5 selector: matchLabels: app: cloudbreak template: metadata: labels: app: cloudbreak spec: containers: - name: cloudbreak image: hortonworks/cloudbreak:2.1.0 ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105 apiVersion: extensions/v1beta1 kind: Deployment metadata: name: {{ .Release.Name }}-cloudbreak spec: replicas: {{ .Values.replicas }} selector: matchLabels: app: cloudbreak release: {{ .Release.Name }} template: metadata: labels: app: cloudbreak release: {{ .Release.Name }} spec: containers: - name: cloudbreak image: {{ .Values.cbImage }} ports: - containerPort: 8080 name: http-port - containerPort: 20105 name: jmx-port apiVersion: v1 kind: Service metadata: name: {{ .Release.Name }}-cloudbreak annotations: prometheus.io/scrape: true prometheus.io/path: "/” prometheus.io/port: 20105 spec: selector: app: cloudbreak release: {{ .Release.Name }} ports: - name: http protocol: TCP port: 8080 - name: jmx protocol: TCP port: 20105
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Rolling Update
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Rolling Update  The goal is to have zero downtime update  Ability to roll back in case something goes wrong  Rolling Update strategy with Readiness Probe  Canary releasing  Prepare for running 2 versions of the application at the same time Strategy Readiness Probe strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 readinessProbe: httpGet: path: /cb/info port: 8080 initialDelaySeconds: 90 failureThreshold: 5
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Canary releasing  Run a new version of the application along with the stable one and route some of the users to this version  Run your tests against the new version and once you are happy with the results shut down the old version  Maintain backward compatibility or you’ll break the update  Hard to change the database schema
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Canary releasing
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Logging and Monitoring
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Logging  Logspout – Collecting the logs from Docker socket  Logstash – Redirecting logs to file outputs  Azure File Share – Storing the Log files in Samba share  LogSearch – Owned by Hortonworks – Using Solr under the hood
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monitoring  Prometheus – Java metrics (Custom metrics) – Provider per cluster – REST status codes – Response times – Active flows per node – Go metrics – Consul metrics – Linux/ Host metrics – NodeJS metrics
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Alerting ALERT successful_stack_creation_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.successful.aws"}[5m])) > 0 ANNOTATIONS { status="INFO”, description="A new stack has been created on AWS.” } ALERT stack_creation_failed_aws IF sum(changes(org_springframework_metrics_cloudbreak_value{value=~"stack.creation.failed.aws"}[5m])) > 0 ANNOTATIONS { status="WARN”, description="Failed to create a stack on AWS.” } ALERT node_down IF up{job='node_exporter'} == 0 FOR 5m ANNOTATIONS { status="ERROR”, description = "Node {{ $labels.instance }} is down for more than 15 minutes”, }
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Questions?
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank you! Instagram (@hortonworks.hungary)