While developers see and realize the benefits of Kubernetes, how it improves efficiencies, saves time, and enables focus on the unique business requirements of each project; InfoSec, infrastructure, and software operations teams still face challenges when managing a new set of tools and technologies, and integrating them into an existing enterprise infrastructure.
These meetup slides go over what’s needed for a general architecture of a centralized Kubernetes operations layer based on open source components such as Prometheus, Grafana, ELK Stack, Keycloak, etc., and how to set up reliable clusters and multi-master configuration without a load balancer. It also outlines how these components should be combined into an operations-friendly enterprise Kubernetes management platform with centralized monitoring and log collection, identity and access management, backup and disaster recovery, and infrastructure management capabilities. This presentation will show real-world open source projects use cases to implement an ops-friendly environment.
Check out this and more webinars in our BrightTalk channel: https://goo.gl/QPE5rZ
2. Introductions
Oleg Chunikhin
CTO, Kublr
• Nearly 20 years in the field of software
architecture and development.
• Joined Kublr as the CTO in 2016.
• Kublr is an enterprise Kubernetes management and
operations platform that helps accelerate Kubernetes
adoption and containerized applications management for
enterprises.
3. History
• Custom software development company
• Dozens of projects per year
• Varying target environments: clouds, on-prem, hybrid
• Unified application delivery and ops platform wanted:
monitoring, logs, security, multiple env, ...
4. Docker and Kubernetes to the Rescue
• Docker is great, but local
• Kubernetes is great... when it is up and running
• Who sets up and operates K8S clusters?
• Who takes care of operational aspects at scale?
• How do you provide governance and ensure
compliance?
6. Kubernetes Management Platform Wanted
• Portability – clouds, on-prem, air-gapped, different OS’
• Centralized multi-cluster operations saves resources – many
environments (dev, prod, QA, ...), teams, applications
• Self-service and governance for Kubernetes operations
• Reliability – cluster self-healing, self-reliance
• Limited management profile – cloud and K8S API
• Architecture – flexible, open, pluggable, compatible
• Sturdy – secure, scalable, modular, HA, DR etc.
7. Central Control Plane: Operations
K8S Clusters
Cloud(s)
Data
center
API UI
Log collection
Operations
Monitoring
Authn and authz, SSO, federation
Audit Image Repo
Infrastructure management
Backup & DR
Dev
K8S API
Cloud API
Prod
PoC
Dev
10. Cluster: Portability
• (Almost) everything runs in containers
• Simple (single-binary) management agent
• Minimal store requirements
• Shared, eventually consistent
• Secure: RW files for masters, RO for nodes
• Thus the store can be anything:
S3, SA, NFS, rsynced dir, provided files, ...
• Minimal infra automation requirements
• Configure and run configuration agent
• Enable access to the store
• Can be AWS CF, Azure ARM, BOSH,
Ansible, ...
• Load balancer is not required for multi-master;
each agent can independently fail over to a healthy
master
Infrastructure
Automation
MASTER
KUBLR
overlay network, discovery,
connectivity
K8s Master Components:
etcd, scheduler, API, controller
Docker
KUBELET KUBLRKUBELET
NODE
Docker
overlay network, discovery,
connectivity
Infrastructure and
Application containers
Orchestration
Store Secrets
discovery
11. Cluster: Reliability
• Rely on underlying platform as much as
possible
• ASG on AWS
• IAM on AWS for store access
• SA on Azure, S3 on AWS
• ARM on Azure, CF on AWS
• Minimal infrastructure SLA
tolerate temporary failures
• Multi-muster API failover on nodes
• Resource management, memory requests
and limits for OS and k8s components
Infrastructure
Automation
MASTER
KUBLR
overlay network, discovery,
connectivity
K8s Master Components:
etcd, scheduler, API, controller
Docker
KUBELET KUBLRKUBELET
NODE
Docker
overlay network, discovery,
connectivity
Infrastructure and
Application containers
Orchestration
Store
12. Central Control Plane: Logs and Metrics
K8S Clusters
Cloud(s)
Data
center
API UI Operations
Authn and authz, SSO, federation
Image Repo
Infrastructure management
Backup & DR
Dev
K8S API
Cloud API
Prod
PoC
Dev
Log collection Monitoring
Audit
13. Centralized Monitoring and Log Collection.
Why Bother?
• Prometheus and ELK are heavy and not easy to operate;
need attention and at least 4-8 Gb RAM... each, per cluster
• Cloud/SaaS monitoring is not always permitted or available
• Existing monitoring is often not container-aware
• No aggregated view and analysis
• No alerting governance
14. K8S Monitoring with Prometheus
• Discover nodes, services, pods
via K8S API
• Query metrics from discovered
endpoints
• Endpoint are accessed directly
via internal cluster addresses
Kubernetes Cluster
Prometheus
Nodes
K8S API
Grafana
Pods
Discovery
Srv
Metrics
15. Centralized Monitoring
Cluster registry
PROMETHEUSGrafana
K8S Proxy API
nodes, pods,
service endpoints
Ship externally
Ship externally
Prometheus
config
Prometheus
data
Configurator
Control plane
KUBERNETES CLUSTER
Prometheus
(collector)
Prometheus
(collector)
16. Centralized Monitoring: Considerations
• Prometheus resource usage tuning
• Long-term storage (m3)
• Configuration file growth with many clusters
• Metrics labeling
• Additional load on API server
Where the project comes from
Company overview
Kubernetes as a solution – standardized delivery platform
Kubernetes is great for managing containers, but who manages Kubernetes?
How to streamline monitoring and collection of logs with multiple Kubernetes clusters?
Requirements
Portability – support for cloud environments, on prem deployment, and isolated deployments
Multi-cluster operations support
Centralized log collection and monitoring
Reliability – self healing, modularity, cluster self-reliance
Limited connectivity profile – do not require many ports
Architecture – flexible, open, pluggable
Security
The control plane is only critically involved in the cluster when the cluster is created
The control plane uses cloud specific infrastructure management automation frameworks – CF, ARM, BOSH, VMware, etc.
After the cluster infrastructure is created and configured, the cluster does not need the control plane
Self-coordination via the orchestration store
Orchestration store and underlying platform are the only coordination devices the cluster needs to operate and recover failures from
Masters and nodes are configured for the orchestration store access
Master(s) will try to get secrets and discovery information from the store;if not available – will generate and publish a new set
With multiple masters – the latest published package wins
Nodes will take the latest published data and use it.
Prometheus
Prometheus
Control plane keeps track of managed clusters
Configurator reconfigures Prometheus when cluster list changes
Prometheus configuration is in K8S config maps