In this InfluxDays NYC 2019 talk by Gunnar Aasen (Manager of Partner Engineering at InfluxData), you will get an overview of the AWS Container Monitoring Stack as well as how you can use InfluxDB on AWS for container monitoring. This session will include a demo of the solution.
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen
1. Gunnar Aasen / Partner Engineering
Container Monitoring
Best Practices
Using AWS and InfluxData
2. Agenda
• What is container monitoring
• Options for running containers on AWS
• Best practices for container monitoring
• Run TICK on AWS container services
• Demo
• Questions
3. Partner Engineering
Manager,
InfluxData
InfluxDB expert
Based in San Francisco, Gunnar is a former InfluxData
support engineer. He has intimate knowledge of InfluxDB
and the rest of the TICK stack. As a partner engineer, he’s
focused on integrating InfluxDB into the larger open source
and cloud ecosystems to help InfluxData’s partners and
customers succeed.
9. AWS ECS/Fargate
• Elastic Container Service (ECS)
– Docker-based container deployment
– Essentially AWS’ version of Kubernetes
• Terminology a bit different: Tasks vs services
– Exposes the EC2 hosts used underneath
– Can use Docker compose
• Fargate
– The same as ECS, with no EC2 instances exposed
– Pay only for container CPU/memory used
10. AWS EKS
• EKS is AWS’ managed Kubernetes offering
– Equivalent to Google’s GKS
• Uses EC2 instances underneath
– These are exposed to the user
• AWS manages the Kubernetes API
• Some integration with IAM and load balancers
12. Options for deploying TICK on AWS
• CloudFormation module for EC2
• Link: https://github.com/influxdata/amazon-cloud-formation-influxdb-enterprise
• ECS/Fargate via Docker Compose
• Link: https://github.com/influxdata/sandbox
• EKS
– Via Helm (On the AWS Marketplace)
• Link: https://aws.amazon.com/marketplace/pp/B07KGM885K
– Via InfluxDB operator
• Link: https://docs.influxdata.com/platform/integrations/kubernetes/
13. Kubernetes resources
• Summary
– Link: https://docs.influxdata.com/platform/integrations/kubernetes/
• kube-influxdb project
– Enable monitoring of Kubernetes with TICK easy on different platforms
• Link: https://github.com/influxdata/kube-influxdb
– Similar to kube-prometheus
– Includes common container and Kubernetes inputs to enable
– Includes graphs and dashboards for those metrics
– Will include alerts as well
15. What’s different
• Proliferation of containers
– Running in AWS…
• Enables microservices
– Increases the amount of inter-container (inter-process) communication
• Minimal environments
– Lack of familiar debugging tools and techniques
16. Observability is the new paradigm
• A holistic understanding of reality in a system
– Monitoring
• Current state of the system
– Logging
• Actions taken by services in the system
– Tracing
• Interactions between different services
– Graphs/alerting
• Translating machine information into human information
17. Levels of container monitoring
• Host/node level monitoring
– EC2 node failures
• Container monitoring
– Lack of resources
• Application monitoring
– Service does not respond
• Cluster monitoring
– Is Kubernetes overextended?
18. Telegraf in Kubernetes
• Three options
– DaemonSet: monitoring per node (one telegraf per EC2)
• Collect host/node metrics
– Deployment: single service for a cluster (Prometheus scraping)
• Collect application and cluster metrics
– SideCar: tight coupling with the application
• Collect container metrics
• DaemonSet or SideCar? Start with DaemonSet
• Understand the metrics you’re generating before deploying
19. Telegraf input plugins for instrumenting nodes
• cpu: standard CPU metrics
• system: general stats on system load
• processes: uptime, and number of users logged in
• procstat: fine grained process stats like RSS memory
• diskio: metrics about disk traffic and timing
• Disk: metrics about disk usage.
• Mem: system memory metrics.
• netstat: network related metrics
• http_response: setup local ping
• filestat: Files to gather stats about (meta node only)
20. Telegraf input plugins for instrumenting containers
• logs: requires syslog
• swap: system swap metrics.
• internal: Telegraf related stats
• docker: if deployed in containers
• kubernetes: kubelet stats like per-node pod metrics
• kube_inventory: Kubernetes state metrics
• prometheus: Prometheus-style /metrics endpoints
• syslog: structured logging
21. Monitoring recommendations
• Remember to set up black box testing
– Kubernetes may look fine internally but egress may be failing
– Always start here for alerting
• Node health is still important in Kubernetes
– OOM killer, no disk space are still problems
– Pay attention to local system disk space
• Believe your user’s reports
– Most small problems are never reported
– Microservices/container scheduling can create many small outages
22. System recommendations
• Decouple the monitoring system from the target infrastructure
– SaaS, VMs work well for decoupling
• Test the monitoring system
– All large environments should have staging metrics
• Monitoring should be deployed with your application
– Infrastructure as code like CloudFormation or Terraform templates
• Always consider how cascading failures will affect monitoring
– Monitoring systems tend to go down during other service issues
23. AWS recommendations
• Keeping an accessible record of Cloudwatch stats
– Keep in mind Cloudwatch API limits
• Always consider AWS limits ahead of time
– Available instance classes
– Hard to monitor without access to the AWS support API
• Kubernetes
– Stay up to date for the best experience
– Pay attention to IAM roles
– Use CloudFormation
24. Future Plans
• Next couple months
– Migrating to official Helm charts repo
• Deprecating TICK charts and kube-influxdb repos
• One well-known place for all charts
• This summer
– Operator extended for InfluxDB Enterprise
– Additional operator functionality for other TICK components
– Publish more tools for tracing