Weitere ähnliche Inhalte Ähnlich wie Monitoring infrastructure with prometheus (20) Kürzlich hochgeladen (20) Monitoring infrastructure with prometheus2. Currently…
➔ 600+ servers
➔ External load balancers see avg 5k+ requests per second
➔ Internal Amplification of 8x to 12x
➔ Self managed deployments:
◆ ElasticSearch (Dynamic Scaling)
◆ PostgresQL
◆ Cassandra
◆ Kafka
◆ Redis
◆ RabbitMQ
◆ And more…
➔ Uptime of 99.95
➔ Ability to handle AZ failures
9. ➔ GCE SD configurations to populate hosts
➔ Instance metadata to cluster nodes.
Data source: Prometheus (cont… )
➔ Multiple Instances with different retention
➔ Separate dedicated instances for APM, Node Metrics, ICMP, Kubernetes
➔ Grafana connects to all of these
10. ➔ GCE SD configurations to populate hosts
metrics source: GCE with exporters
scrape_configs:
- job_name: node
scrape_interval: 15s
scrape_timeout: 15s
gce_sd_configs:
- project: <project_name>
zone: <zone_name>
port: 9100
filter: "(name ne .*stage.*)(name ne .*test.*)"
relabel_configs:
- source_labels: [__meta_gce_instance_name]
target_label: host
- source_labels: [__meta_gce_zone]
separator: '/'
regex: '(.*)/(.*)'
replacement: '${2}'
target_label: zone
- source_labels: [__meta_gce_metadata_cluster, __meta_gce_metadata_cluster_name]
separator: ';'
regex: '(.*);(.*)'
replacement: '${1}${2}'
target_label: cluster
- source_labels: [__meta_gce_metadata_env]
target_label: env
- source_labels: [__meta_gce_metadata_component]
target_label: component
Target Labels
11. ➔ K8S SD configurations to populate pods
metrics source: K8S API
- job_name: 'pod-metrics'
scrape_interval: 15s
scrape_timeout: 5s
kubernetes_sd_configs:
- api_server: '<api-path>'
role: node
basic_auth:
username: <user>
password: <access_token>
tls_config:
insecure_skip_verify: true
- api_server: '<api-path>'
role: pod
basic_auth:
username: <user>
password: <access_token>
tls_config:
insecure_skip_verify: true
scheme: http
relabel_configs:
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: host
- source_labels: [__address__]
action: keep
- source_labels: [__address__]
action: replace
target_label: address
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: kubernetes_container
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: cluster
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: 9284
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node_name
Target Labels
12. ➔ Hystrix is great for real time monitoring.
➔ Helps in quickly identifying failures.
➔ We capture hystrix data to prometheus.
➔ Help in debugging/retrospectives
Metrics source: Hystrix
app statsd-exporter prometheus
turbine
13. ➔ Custom metrics with Hystrix
Metrics source: Hystrix
scrape_configs:
- job_name: 'hystrix-stats'
scrape_interval: '15s'
file_sd_configs:
- files:
- 'hystrix-stats*.yml'
refresh_interval: 5m
metric_relabel_configs:
- source_labels: [__name__]
regex: '^(.*)_hystrix.*'
target_label: hcluster
replacement: '${1}'
- source_labels: [__name__]
regex: '^.*_(hystrix.*)'
target_label: __name__
replacement: '${1}'
- source_labels: [__name__]
regex: '^hystrix_([^_]+)_.*'
target_label: command
replacement: '${1}'
- source_labels: [__name__]
regex: '^hystrix_[^_]+_(.*)'
target_label: __name__
replacement: 'hystrix_${1}'
14. ➔ Official client https://github.com/prometheus/client_golang
➔ Support 4 metric types (counter, gauge, histogram, summary)
➔ Built-in support Common gRPC metrics
➔ Exposed in http://ip:port/metrics
Metrics source: Custom