Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

OSMC 2019 | Monitoring Nomad with Prometheus and Icinga by Bram Vogelaar

67 Aufrufe

Veröffentlicht am

Things like Infrastructure as Code, Service Discovery and Config Management can and have helped us to quickly build and rebuild infrastructure but we haven’t nearly spend enough time to train our self to review, monitor and respond to outages. Does our platform degrade in a graceful way or what does a high cpu load really mean? What can we learn from level 1 outages to be able to run our platforms more reliably. We all love infrastructure as code, we automate everything ™. However making sure all of our infrastructure assets are monitored effectively can be slow and resource intensive multi stage process. During this talk we will investigate how we can setup and monitor a cloud native container platform that scales using hashicorp’s consul and nomad service discovery and container scheduling tools and Traefik a edge router. This talk will focus on making sure we can have alerts and metrics in this quickly changing infrastructure landscape. We’re going to show how to integrate icinga2 with consul and nomad. To finish off we´ll show how to visualize the prometheus data in a way that resembles netflix’s vizceral using freely available grafana dashboards and plugins.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

OSMC 2019 | Monitoring Nomad with Prometheus and Icinga by Bram Vogelaar

  1. 1. Monitoring Nomad with Prometheus and Icinga
  2. 2. Containers – the “friends don’t let friends use kubernetes” edition
  3. 3. ~$ whoami ●I used to be a Molecular Biologist, ●Then became a Dev, ●Now an Ops. ●Open Source Consultant @inuits.eu
  4. 4. Today’s Schedule
  5. 5. Traefik
  6. 6. ●Open Source Edge router ●Static and dynamic routes ●Loads of native integrations* – K8s, rancher, etcd, consul ●Let’s Encrypt integration Traefik
  7. 7. [entryPoints.http] address = ":80" [entryPoints.http.redirect] entryPoint = "https" [entryPoints.https] address = ":443" ingress
  8. 8. [acme] entryPoint = "https" onHostRule = true storage = "/etc/traefik.d/acme.json" [acme.httpChallenge] entryPoint = "http" [entryPoints.https.tls] ssl ingress *
  9. 9. [consulCatalog] domain = "attachmentgenie.com" endpoint = "127.0.0.1:8500" exposedByDefault = false dynamic ingress
  10. 10. [api] dashboard = true [metrics.prometheus] [ping] Good neighbor
  11. 11. Nice Dashboards #4475
  12. 12. Consul
  13. 13. ●Open Source Service Discovery Tool – dig @127.0.0.1 -p 8600 puppetmaster.service.consul ANY ●Build-in KV store ●Service Mesh tool Consul
  14. 14. ::consul::service { 'traefik-ui': port => 8080, } ::consul::check { 'traefik_http': interval => '60s', http => 'http://localhost:8080', service_id => 'traefik-ui', } Consul Services
  15. 15. Consul~Icinga Exit Codes ::consul::check { 'traefik-status': interval => '10s', script => '/usr/lib64/nagios/plugins/traefik-status', service_id => 'traefik-ui', }
  16. 16. telemetry { prometheus_retention_time = "30s", disable_hostname = true } Consul metrics for all
  17. 17. Nice Dashboards #2351
  18. 18. Nomad
  19. 19. Nomad ●Open Source tool to do dynamic workload scheduling ●Batch, containerized, and non-containerized applications. ●Has native Consul and Vault integration
  20. 20. "telemetry": { "collection_interval": "1s", "disable_hostname": true, "prometheus_metrics": true, "publish_allocation_metrics": true, "publish_node_metrics": true } Nomad metrics for all
  21. 21. Nice Dashboards #6281 #6278
  22. 22. job "blog" { datacenters = ["prod"] type = "service" group "hugo" { task "nginx" { driver = "docker" config { image = "private.dkr.ecr.us-east- 1.amazonaws.com/blog:1.3" Lets deploy our blog
  23. 23. service { name = "blog" tags = [ "traefik.enable=true", ] port = "http" check { type = "tcp" interval = "10s" timeout = "2s" } } Service Definition
  24. 24. Over-engineered Personal website
  25. 25. Consul UI
  26. 26. Nomad UI
  27. 27. Nomad UI
  28. 28. Static vs Dynamic (Problems)
  29. 29. use SensioLabsConsulServiceFactory; $sf = new ServiceFactory( array('base_uri' => $this->getSetting('consul_url')) ); $agent = $sf->get('catalog'); return json_decode($agent->nodes()->getBody()); Query Consul
  30. 30. Adding a sync source
  31. 31. Adding a sync rule
  32. 32. Add sync properties
  33. 33. Plumbing it together
  34. 34. Icinga Business Processes ::consul::watch { 'detect_backend_changes': type => 'service', handler => '/usr/bin/update_bp.sh', service => 'nomad-client', require => File['/usr/bin/update_bp.sh'], }
  35. 35. 11/13/2019 Bram Vogelaar bram@inuits.eu @attachmentgenie

×