SlideShare a Scribd company logo
1 of 37
Download to read offline
Monitoring Kubernetes Across Data
Center and Cloud
Specifically Tectonic and Google Container Engine using Datadog
Presenters:
Ilan Rabinovitch, Director of Technical Community, Datadog
Aleks Saul, Customer-Facing Engineer, CoreOS
Aparna Sinha, Senior Product Manager, Google
Google Cloud Platform
Kubernetes at a glance
Open source production-grade container scheduling and management
● Top 0.01% of all GitHub projects: 950+ contributors & 35,000+ commits
Run Anywhere: multi-cloud, on-prem, bare-metal, OpenStack etc
Broad industry adoption
Commercial Enterprise Support
Kubernetes at a glance
Google Cloud Platform
Kubernetes provides container-centric infrastructure
Once specific containers are no longer bound to specific machines/VMs,
host-centric infrastructure no longer works
• Scheduling: Decide where my containers should run
• Lifecycle and health: Keep my containers running despite failures
• Scaling: Make sets of containers bigger or smaller
• Naming and discovery: Find where my containers are now
• Load balancing: Distribute traffic across a set of containers
• Storage volumes: Provide data to containers
• Logging and monitoring: Track what’s happening with my containers
• Debugging and introspection: Enter or attach to containers
• Identity and authorization: Control who can do things to my containers
Google Cloud Platform
Kubernetes offers choice and flexibility for Hybrid Cloud
Setting up and managing a cluster
• Choose a cloud: GCP, AWS, Azure, Rackspace, on-premises, ...
• Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ...
• Provision machines: create VMs, install Docker, ...
• Configure networking: IP ranges for Pods, Services, SDN, firewalls, ...
• Start cluster services: DNS, logging, monitoring, …
• Start and configure Kubernetes
• Manage nodes: kernel upgrades, OS updates, hardware failures, …
GKE is Google hosted and managed Kubernetes
• Directly uses upstream open source
• Rolls out within 3-5 business days of the latest open source release
• Alpha features also now available through ‘alpha clusters’
Google Cloud Platform
Google Container Engine (GKE)
“It delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency”
~ Philips (Hue Lights)
“Made our engineers more productive and helped us do more work with less staff”
~ CCP Games (EVE Online)
Google Cloud Platform
How Monitoring Works in Google Container Engine
Master
Storage BackendHeapster
Kubelet
cAdvisor
Node
Kubelet
cAdvisor
Node
Google Cloud Platform
Google Container Engine Monitoring Server
Metrics used for self repair, and exposed to end users via Stackdriver
Primary job is to ensure that each Kubernetes master is available
● Implements the repair logic for when a cluster is non-responsive
● Automatically resizes master machines as the number of nodes grows
Also collects metrics for each cluster
● Number of resources (nodes, pods, services, namespaces, etc)
● CPU usage, limit, utilization ratio; Memory usage and limit; Page faults;
Disk usage and limit; Uptime
● Uses number of nodes for report billing status
Google Cloud Platform
Pluggable interface for cloud monitoring
Run Influx and Grafana in the cluster
● alternative to Google Cloud Monitoring
Plug in your own!
● e.g., Prometheus, Datadog etc.
Kube State metrics: (node status, node capacity, replica state, etc)
Prometheus
Google Cloud Platform
Kube State Metrics
● Generates metrics about the state of
Kubernetes logical objects
(node status, node capacity, replica state, etc)
● Deployed alongside your other
applications as a kubernetes service.
● Exposes metrics via HTTP API or
Prometheus format
Google Cloud Platform
We focus on delivering the capabilities required by enterprise organizations
to run and manage kubernetes at scale...
● Cluster installers (for AWS and bare metal, to start).
● Management software to upgrade, backup, rollback, scale up and down the cluster.
● Console UI that surfaces management functionality, cluster information, and compute
usage to the user and includes add on services (Quay, identity and authentication).
Extending Kubernetes for the Enterprise
Google Cloud Platform
Tectonic Extends
Upstream Kubernetes
● Container orchestration
● Horizontal scale
● High availability
● Service discovery & load balancer
● Installer
● Management console
● Painless updates
● Cluster scaling
● Disaster recovery
● Alerts and logging
● Security (integrated)
● Container registry (Quay)
● Integration across environments
Extending Kubernetes for the Enterprise
Security Mgmt
Kubernetes
CoreOS Linux
Cloud Integration
Container Registry
Storage & Compute
apps/container/microservices
Google Cloud Platform
Tectonic
Kubernetes Security
● Clair: container vulnerability
scanning
● KMS integration
● LDAP integration
● RBAC integration
Extending Kubernetes for the Enterprise
Mgmt
Kubernetes
CoreOS Linux
Cloud Integration
Container Registry
Storage & Compute
apps/container/microservices
Security
•SaaS based infrastructure and application monitoring
•Focus on modern environments
•Cloud, Containers, Microservices
•Dynamic configuration models
•Processing nearly a trillion data points per day
•Intelligent Alerting and Insightful Dashboards
•Anomaly and Outlier Detection
Datadog Overview
Collecting data is cheap;
not having it when you
need it can be expensive
Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches,
Queues and more...
Monitor Everything
Datadog
● Deployed as a DaemonSet. One
instance per node.
● Collects metrics and events from:
○ container engine (eg Docker)
○ Kubernetes Heapster
○ kube-state-metrics
○ Deployed Applications
○ Google Monitoring APIs
● Exposes statsd end point for custom
metrics.
● Metrics are automatically tagged by
PODs, Labels, etc
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
How much we measure?
1 instance
• 10 metrics from cloud providers
1 operating system (e.g., Linux)
• 100 metrics
50~ metrics per application
Operational Complexity
100
instances
500
containers
Operational Complexity: Scale
160
metrics per host
800
metrics per host
Assuming 5 containers per host
Operational Complexity: Scale
100
instances
80,000
metrics
Assuming 5 containers per host
How much we measure?
1 instance
• 10 metrics from cloud providers
1 operating system (e.g., Linux)
• 100 metrics
50~ metrics per application
N containers
• 150*N metrics
Metrics Overload!
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
Source: Datadog
Operational Complexity Increases with..
• Number of things to measure
• Velocity of change
Monitoring Questions
• Where is a given container running?
• What is the overall capacity of my cluster?
• What port(s) are my applications running on?
• What’s the total throughput of my application?
• What’s its response time per tag? (app, version, data
center)
• What’s the distribution of 5xx error per
container? What about by data center?
Host Centric
Service Centric
Query Based Monitoring
“What’s the average throughput of application:nginx per
version ?”
“Alert me when one of my pod from replication controller:foo is
not behaving like the others?”
“Show me rate of HTTP 500 responses from nginx”
“… grouped by data center … running my app version 2….”
Service Discovery
Docker API Kubernetes
Monitoring Agent
Container
A O A O
Containers List &
Metadata
Additional Metadata
(Tags, etc)
Config Backends
Integration Configurations
Host Level
Metrics
Q&A
You can also follow us on Twitter:
@datadoghq
@googlecloud
@tectonicstack

More Related Content

What's hot

What's hot (20)

Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
 
Virtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan RadhakrishnanVirtualization at Gilt - Rangarajan Radhakrishnan
Virtualization at Gilt - Rangarajan Radhakrishnan
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Serverless Swift for Mobile Developers
Serverless Swift for Mobile DevelopersServerless Swift for Mobile Developers
Serverless Swift for Mobile Developers
 
Provisioning Datadog with Terraform
Provisioning Datadog with TerraformProvisioning Datadog with Terraform
Provisioning Datadog with Terraform
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 

Viewers also liked

WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
Brian Grant
 

Viewers also liked (20)

Troubleshooting Kubernetes
Troubleshooting KubernetesTroubleshooting Kubernetes
Troubleshooting Kubernetes
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
 
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
CloudCamp Chicago lightning talk "Connecting Vehicles on Google Cloud Platfor...
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
Monitorama - Please, no more Minutes, Milliseconds, Monoliths or Monitoring T...
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Managing your SaltStack Minions with Foreman
Managing your SaltStack Minions with ForemanManaging your SaltStack Minions with Foreman
Managing your SaltStack Minions with Foreman
 
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
Edge 2016 Session 1886  Building your own docker container cloud on ibm power...Edge 2016 Session 1886  Building your own docker container cloud on ibm power...
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
 
Data Logging and Telemetry
Data Logging and TelemetryData Logging and Telemetry
Data Logging and Telemetry
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Intel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWSIntel SoC as a Platform to Connect Sensor Data to AWS
Intel SoC as a Platform to Connect Sensor Data to AWS
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
WSO2Con US 2015 Kubernetes: a platform for automating deployment, scaling, an...
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 

Similar to Monitoring kubernetes across data center and cloud

Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-final
Deepak Mane
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
Stfalcon Meetups
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 

Similar to Monitoring kubernetes across data center and cloud (20)

Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-final
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
 
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
TDC2017 | São Paulo - Trilha Cloud Computing How we figured out we had a SRE ...
 
Mete Atamel
Mete AtamelMete Atamel
Mete Atamel
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
OpenStack Havana Release
OpenStack Havana ReleaseOpenStack Havana Release
OpenStack Havana Release
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Kubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestrationKubernetes – An open platform for container orchestration
Kubernetes – An open platform for container orchestration
 
Episode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-ServiceEpisode 1: Building Kubernetes-as-a-Service
Episode 1: Building Kubernetes-as-a-Service
 
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKSMigrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
Migrating from Self-Managed Kubernetes on EC2 to a GitOps Enabled EKS
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
 
Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh Monitoring on Kubernetes using Prometheus - Chandresh
Monitoring on Kubernetes using Prometheus - Chandresh
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service Accelerate Application Innovation Journey with Azure Kubernetes Service
Accelerate Application Innovation Journey with Azure Kubernetes Service
 

More from Datadog

Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
Datadog
 

More from Datadog (20)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slides
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob Terhaar
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex Lesser
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom Santero
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Monitoring kubernetes across data center and cloud

  • 1. Monitoring Kubernetes Across Data Center and Cloud Specifically Tectonic and Google Container Engine using Datadog Presenters: Ilan Rabinovitch, Director of Technical Community, Datadog Aleks Saul, Customer-Facing Engineer, CoreOS Aparna Sinha, Senior Product Manager, Google
  • 2. Google Cloud Platform Kubernetes at a glance Open source production-grade container scheduling and management ● Top 0.01% of all GitHub projects: 950+ contributors & 35,000+ commits Run Anywhere: multi-cloud, on-prem, bare-metal, OpenStack etc Broad industry adoption Commercial Enterprise Support Kubernetes at a glance
  • 3. Google Cloud Platform Kubernetes provides container-centric infrastructure Once specific containers are no longer bound to specific machines/VMs, host-centric infrastructure no longer works • Scheduling: Decide where my containers should run • Lifecycle and health: Keep my containers running despite failures • Scaling: Make sets of containers bigger or smaller • Naming and discovery: Find where my containers are now • Load balancing: Distribute traffic across a set of containers • Storage volumes: Provide data to containers • Logging and monitoring: Track what’s happening with my containers • Debugging and introspection: Enter or attach to containers • Identity and authorization: Control who can do things to my containers
  • 4. Google Cloud Platform Kubernetes offers choice and flexibility for Hybrid Cloud Setting up and managing a cluster • Choose a cloud: GCP, AWS, Azure, Rackspace, on-premises, ... • Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ... • Provision machines: create VMs, install Docker, ... • Configure networking: IP ranges for Pods, Services, SDN, firewalls, ... • Start cluster services: DNS, logging, monitoring, … • Start and configure Kubernetes • Manage nodes: kernel upgrades, OS updates, hardware failures, … GKE is Google hosted and managed Kubernetes • Directly uses upstream open source • Rolls out within 3-5 business days of the latest open source release • Alpha features also now available through ‘alpha clusters’
  • 5. Google Cloud Platform Google Container Engine (GKE) “It delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency” ~ Philips (Hue Lights) “Made our engineers more productive and helped us do more work with less staff” ~ CCP Games (EVE Online)
  • 6. Google Cloud Platform How Monitoring Works in Google Container Engine Master Storage BackendHeapster Kubelet cAdvisor Node Kubelet cAdvisor Node
  • 7. Google Cloud Platform Google Container Engine Monitoring Server Metrics used for self repair, and exposed to end users via Stackdriver Primary job is to ensure that each Kubernetes master is available ● Implements the repair logic for when a cluster is non-responsive ● Automatically resizes master machines as the number of nodes grows Also collects metrics for each cluster ● Number of resources (nodes, pods, services, namespaces, etc) ● CPU usage, limit, utilization ratio; Memory usage and limit; Page faults; Disk usage and limit; Uptime ● Uses number of nodes for report billing status
  • 8. Google Cloud Platform Pluggable interface for cloud monitoring Run Influx and Grafana in the cluster ● alternative to Google Cloud Monitoring Plug in your own! ● e.g., Prometheus, Datadog etc. Kube State metrics: (node status, node capacity, replica state, etc) Prometheus
  • 9. Google Cloud Platform Kube State Metrics ● Generates metrics about the state of Kubernetes logical objects (node status, node capacity, replica state, etc) ● Deployed alongside your other applications as a kubernetes service. ● Exposes metrics via HTTP API or Prometheus format
  • 10. Google Cloud Platform We focus on delivering the capabilities required by enterprise organizations to run and manage kubernetes at scale... ● Cluster installers (for AWS and bare metal, to start). ● Management software to upgrade, backup, rollback, scale up and down the cluster. ● Console UI that surfaces management functionality, cluster information, and compute usage to the user and includes add on services (Quay, identity and authentication). Extending Kubernetes for the Enterprise
  • 11. Google Cloud Platform Tectonic Extends Upstream Kubernetes ● Container orchestration ● Horizontal scale ● High availability ● Service discovery & load balancer ● Installer ● Management console ● Painless updates ● Cluster scaling ● Disaster recovery ● Alerts and logging ● Security (integrated) ● Container registry (Quay) ● Integration across environments Extending Kubernetes for the Enterprise Security Mgmt Kubernetes CoreOS Linux Cloud Integration Container Registry Storage & Compute apps/container/microservices
  • 12. Google Cloud Platform Tectonic Kubernetes Security ● Clair: container vulnerability scanning ● KMS integration ● LDAP integration ● RBAC integration Extending Kubernetes for the Enterprise Mgmt Kubernetes CoreOS Linux Cloud Integration Container Registry Storage & Compute apps/container/microservices Security
  • 13. •SaaS based infrastructure and application monitoring •Focus on modern environments •Cloud, Containers, Microservices •Dynamic configuration models •Processing nearly a trillion data points per day •Intelligent Alerting and Insightful Dashboards •Anomaly and Outlier Detection Datadog Overview
  • 14. Collecting data is cheap; not having it when you need it can be expensive
  • 15. Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  • 16. Datadog ● Deployed as a DaemonSet. One instance per node. ● Collects metrics and events from: ○ container engine (eg Docker) ○ Kubernetes Heapster ○ kube-state-metrics ○ Deployed Applications ○ Google Monitoring APIs ● Exposes statsd end point for custom metrics. ● Metrics are automatically tagged by PODs, Labels, etc
  • 17.
  • 18. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 19. How much we measure? 1 instance • 10 metrics from cloud providers 1 operating system (e.g., Linux) • 100 metrics 50~ metrics per application
  • 20.
  • 22. Operational Complexity: Scale 160 metrics per host 800 metrics per host Assuming 5 containers per host
  • 24.
  • 25. How much we measure? 1 instance • 10 metrics from cloud providers 1 operating system (e.g., Linux) • 100 metrics 50~ metrics per application N containers • 150*N metrics Metrics Overload!
  • 26. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 28. Operational Complexity Increases with.. • Number of things to measure • Velocity of change
  • 29. Monitoring Questions • Where is a given container running? • What is the overall capacity of my cluster? • What port(s) are my applications running on? • What’s the total throughput of my application? • What’s its response time per tag? (app, version, data center) • What’s the distribution of 5xx error per container? What about by data center?
  • 32.
  • 33.
  • 34. Query Based Monitoring “What’s the average throughput of application:nginx per version ?” “Alert me when one of my pod from replication controller:foo is not behaving like the others?” “Show me rate of HTTP 500 responses from nginx” “… grouped by data center … running my app version 2….”
  • 35. Service Discovery Docker API Kubernetes Monitoring Agent Container A O A O Containers List & Metadata Additional Metadata (Tags, etc) Config Backends Integration Configurations Host Level Metrics
  • 36.
  • 37. Q&A You can also follow us on Twitter: @datadoghq @googlecloud @tectonicstack