SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Monitoring Infrastructure with
Prometheus
@
- Mohd Sahnawaz, Sr. SRE
Singapore Kubernetes User Group, 4th July 2018
Currently…
➔ 600+ servers
➔ External load balancers see avg 5k+ requests per second
➔ Internal Amplification of 8x to 12x
➔ Self managed deployments:
◆ ElasticSearch (Dynamic Scaling)
◆ PostgresQL
◆ Cassandra
◆ Kafka
◆ Redis
◆ RabbitMQ
◆ And more…
➔ Uptime of 99.95
➔ Ability to handle AZ failures
Architecture
monitoring
Dashboard
Grafana
Data Store
Prometheus
Metrics Source
● Exporter
○ Node
○ Postgres
○ JVM
○ ElasticSearch
○ HAProxy
○ Hystrix
○ StatsD
○ …
● Write your own
Dashboard : Grafana
Dashboard : Grafana (cont…)
K8S Deployment view →
Hystrix Metric view →
Data source: Prometheus
➔ GCE SD configurations to populate hosts
➔ Instance metadata to cluster nodes.
Data source: Prometheus (cont… )
➔ Multiple Instances with different retention
➔ Separate dedicated instances for APM, Node Metrics, ICMP, Kubernetes
➔ Grafana connects to all of these
➔ GCE SD configurations to populate hosts
metrics source: GCE with exporters
scrape_configs:
- job_name: node
scrape_interval: 15s
scrape_timeout: 15s
gce_sd_configs:
- project: <project_name>
zone: <zone_name>
port: 9100
filter: "(name ne .*stage.*)(name ne .*test.*)"
relabel_configs:
- source_labels: [__meta_gce_instance_name]
target_label: host
- source_labels: [__meta_gce_zone]
separator: '/'
regex: '(.*)/(.*)'
replacement: '${2}'
target_label: zone
- source_labels: [__meta_gce_metadata_cluster, __meta_gce_metadata_cluster_name]
separator: ';'
regex: '(.*);(.*)'
replacement: '${1}${2}'
target_label: cluster
- source_labels: [__meta_gce_metadata_env]
target_label: env
- source_labels: [__meta_gce_metadata_component]
target_label: component
Target Labels
➔ K8S SD configurations to populate pods
metrics source: K8S API
- job_name: 'pod-metrics'
scrape_interval: 15s
scrape_timeout: 5s
kubernetes_sd_configs:
- api_server: '<api-path>'
role: node
basic_auth:
username: <user>
password: <access_token>
tls_config:
insecure_skip_verify: true
- api_server: '<api-path>'
role: pod
basic_auth:
username: <user>
password: <access_token>
tls_config:
insecure_skip_verify: true
scheme: http
relabel_configs:
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: host
- source_labels: [__address__]
action: keep
- source_labels: [__address__]
action: replace
target_label: address
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: kubernetes_container
- source_labels: [__meta_kubernetes_pod_container_name]
action: replace
target_label: cluster
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: 9284
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node_name
Target Labels
➔ Hystrix is great for real time monitoring.
➔ Helps in quickly identifying failures.
➔ We capture hystrix data to prometheus.
➔ Help in debugging/retrospectives
Metrics source: Hystrix
app statsd-exporter prometheus
turbine
➔ Custom metrics with Hystrix
Metrics source: Hystrix
scrape_configs:
- job_name: 'hystrix-stats'
scrape_interval: '15s'
file_sd_configs:
- files:
- 'hystrix-stats*.yml'
refresh_interval: 5m
metric_relabel_configs:
- source_labels: [__name__]
regex: '^(.*)_hystrix.*'
target_label: hcluster
replacement: '${1}'
- source_labels: [__name__]
regex: '^.*_(hystrix.*)'
target_label: __name__
replacement: '${1}'
- source_labels: [__name__]
regex: '^hystrix_([^_]+)_.*'
target_label: command
replacement: '${1}'
- source_labels: [__name__]
regex: '^hystrix_[^_]+_(.*)'
target_label: __name__
replacement: 'hystrix_${1}'
➔ Official client https://github.com/prometheus/client_golang
➔ Support 4 metric types (counter, gauge, histogram, summary)
➔ Built-in support Common gRPC metrics
➔ Exposed in http://ip:port/metrics
Metrics source: Custom
Alerting : Alertmanager
Prometheus alertmanager
slack
victorops
Alerting process →
Alert rule →
Thank you!
Q & A
We’re hiring, visit careers.carousell.com

Weitere ähnliche Inhalte

Was ist angesagt?

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Was ist angesagt? (20)

Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법
 
Understanding Android Security
Understanding Android SecurityUnderstanding Android Security
Understanding Android Security
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Spring Security 5
Spring Security 5Spring Security 5
Spring Security 5
 
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Performance Analysis: The USE Method
Performance Analysis: The USE MethodPerformance Analysis: The USE Method
Performance Analysis: The USE Method
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Spring data jpa
Spring data jpaSpring data jpa
Spring data jpa
 
서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Intermediate Cypher.pdf
Intermediate Cypher.pdfIntermediate Cypher.pdf
Intermediate Cypher.pdf
 
Introduction to Modern Identity with Auth0's Developer
 Introduction to Modern Identity with Auth0's Developer Introduction to Modern Identity with Auth0's Developer
Introduction to Modern Identity with Auth0's Developer
 
FIDO기반 생체인식 인증기술_SK플래닛@tech세미나판교
FIDO기반 생체인식 인증기술_SK플래닛@tech세미나판교FIDO기반 생체인식 인증기술_SK플래닛@tech세미나판교
FIDO기반 생체인식 인증기술_SK플래닛@tech세미나판교
 
Socket.IO
Socket.IOSocket.IO
Socket.IO
 
Neo4j session
Neo4j sessionNeo4j session
Neo4j session
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 

Ähnlich wie Monitoring infrastructure with prometheus

Facebook的缓存系统
Facebook的缓存系统Facebook的缓存系统
Facebook的缓存系统
yiditushe
 
4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook
guoqing75
 
Using Apache as an Application Server
Using Apache as an Application ServerUsing Apache as an Application Server
Using Apache as an Application Server
Phil Windley
 

Ähnlich wie Monitoring infrastructure with prometheus (20)

UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Facebook的缓存系统
Facebook的缓存系统Facebook的缓存系统
Facebook的缓存系统
 
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
DevOpsDaysRiga 2018: Andrew Martin - Continuous Kubernetes Security
 
4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook4069180 Caching Performance Lessons From Facebook
4069180 Caching Performance Lessons From Facebook
 
Deploying windows containers with kubernetes
Deploying windows containers with kubernetesDeploying windows containers with kubernetes
Deploying windows containers with kubernetes
 
Solving anything in VCL
Solving anything in VCLSolving anything in VCL
Solving anything in VCL
 
KubeCon EU 2016: Templatized Application Configuration on OpenShift and Kuber...
KubeCon EU 2016: Templatized Application Configuration on OpenShift and Kuber...KubeCon EU 2016: Templatized Application Configuration on OpenShift and Kuber...
KubeCon EU 2016: Templatized Application Configuration on OpenShift and Kuber...
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
High Availability Content Caching with NGINX
High Availability Content Caching with NGINXHigh Availability Content Caching with NGINX
High Availability Content Caching with NGINX
 
Extending kubernetes
Extending kubernetesExtending kubernetes
Extending kubernetes
 
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep diveKubeCon Prometheus Salon -- Kubernetes metrics deep dive
KubeCon Prometheus Salon -- Kubernetes metrics deep dive
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
 
OSMC 2011 | Case Study - Icinga at Hyves.nl by Jeffrey Lensen
OSMC 2011 | Case Study - Icinga at Hyves.nl by Jeffrey LensenOSMC 2011 | Case Study - Icinga at Hyves.nl by Jeffrey Lensen
OSMC 2011 | Case Study - Icinga at Hyves.nl by Jeffrey Lensen
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
 
Spring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloudSpring Cloud: API gateway upgrade & configuration in the cloud
Spring Cloud: API gateway upgrade & configuration in the cloud
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An intro
 
Using Apache as an Application Server
Using Apache as an Application ServerUsing Apache as an Application Server
Using Apache as an Application Server
 

Kürzlich hochgeladen

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Kürzlich hochgeladen (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 

Monitoring infrastructure with prometheus

  • 1. Monitoring Infrastructure with Prometheus @ - Mohd Sahnawaz, Sr. SRE Singapore Kubernetes User Group, 4th July 2018
  • 2. Currently… ➔ 600+ servers ➔ External load balancers see avg 5k+ requests per second ➔ Internal Amplification of 8x to 12x ➔ Self managed deployments: ◆ ElasticSearch (Dynamic Scaling) ◆ PostgresQL ◆ Cassandra ◆ Kafka ◆ Redis ◆ RabbitMQ ◆ And more… ➔ Uptime of 99.95 ➔ Ability to handle AZ failures
  • 4.
  • 5. monitoring Dashboard Grafana Data Store Prometheus Metrics Source ● Exporter ○ Node ○ Postgres ○ JVM ○ ElasticSearch ○ HAProxy ○ Hystrix ○ StatsD ○ … ● Write your own
  • 7. Dashboard : Grafana (cont…) K8S Deployment view → Hystrix Metric view →
  • 9. ➔ GCE SD configurations to populate hosts ➔ Instance metadata to cluster nodes. Data source: Prometheus (cont… ) ➔ Multiple Instances with different retention ➔ Separate dedicated instances for APM, Node Metrics, ICMP, Kubernetes ➔ Grafana connects to all of these
  • 10. ➔ GCE SD configurations to populate hosts metrics source: GCE with exporters scrape_configs: - job_name: node scrape_interval: 15s scrape_timeout: 15s gce_sd_configs: - project: <project_name> zone: <zone_name> port: 9100 filter: "(name ne .*stage.*)(name ne .*test.*)" relabel_configs: - source_labels: [__meta_gce_instance_name] target_label: host - source_labels: [__meta_gce_zone] separator: '/' regex: '(.*)/(.*)' replacement: '${2}' target_label: zone - source_labels: [__meta_gce_metadata_cluster, __meta_gce_metadata_cluster_name] separator: ';' regex: '(.*);(.*)' replacement: '${1}${2}' target_label: cluster - source_labels: [__meta_gce_metadata_env] target_label: env - source_labels: [__meta_gce_metadata_component] target_label: component Target Labels
  • 11. ➔ K8S SD configurations to populate pods metrics source: K8S API - job_name: 'pod-metrics' scrape_interval: 15s scrape_timeout: 5s kubernetes_sd_configs: - api_server: '<api-path>' role: node basic_auth: username: <user> password: <access_token> tls_config: insecure_skip_verify: true - api_server: '<api-path>' role: pod basic_auth: username: <user> password: <access_token> tls_config: insecure_skip_verify: true scheme: http relabel_configs: - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: host - source_labels: [__address__] action: keep - source_labels: [__address__] action: replace target_label: address - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_container_name] action: replace target_label: kubernetes_container - source_labels: [__meta_kubernetes_pod_container_name] action: replace target_label: cluster - source_labels: [__meta_kubernetes_pod_container_port_number] regex: 9284 action: keep - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: kubernetes_node_name Target Labels
  • 12. ➔ Hystrix is great for real time monitoring. ➔ Helps in quickly identifying failures. ➔ We capture hystrix data to prometheus. ➔ Help in debugging/retrospectives Metrics source: Hystrix app statsd-exporter prometheus turbine
  • 13. ➔ Custom metrics with Hystrix Metrics source: Hystrix scrape_configs: - job_name: 'hystrix-stats' scrape_interval: '15s' file_sd_configs: - files: - 'hystrix-stats*.yml' refresh_interval: 5m metric_relabel_configs: - source_labels: [__name__] regex: '^(.*)_hystrix.*' target_label: hcluster replacement: '${1}' - source_labels: [__name__] regex: '^.*_(hystrix.*)' target_label: __name__ replacement: '${1}' - source_labels: [__name__] regex: '^hystrix_([^_]+)_.*' target_label: command replacement: '${1}' - source_labels: [__name__] regex: '^hystrix_[^_]+_(.*)' target_label: __name__ replacement: 'hystrix_${1}'
  • 14. ➔ Official client https://github.com/prometheus/client_golang ➔ Support 4 metric types (counter, gauge, histogram, summary) ➔ Built-in support Common gRPC metrics ➔ Exposed in http://ip:port/metrics Metrics source: Custom
  • 15. Alerting : Alertmanager Prometheus alertmanager slack victorops Alerting process → Alert rule →
  • 16. Thank you! Q & A We’re hiring, visit careers.carousell.com