SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Metrics & more
how to monitor big data systems @scale!
About me

Stefan Thies

@seti321


!
DevOps Evangelist @sematext!
Why monitoring is important
•  Tuning !
•  Detecting Bugs!
•  Stability!
•  Benchmarks!
•  Capacity planning!
Monitoring tools
must endure the
load
Would you start building own scales, 
when you would operate a real zoo?

- What’s your mechanical engineering expertise?
- How long does it take to get tools and raw material? 
- Who feeds the animals while being in the workshop?
- When do we need it and could it be ‚in time‘?
Let’s take
something from
the shelf and
build a custom
interface
‚load balancers‘!‚Custom Interface‘!
What
happens
@scale?
•  Many VM’s & Apps - each one generates ~ 5-130
metrics in short intervals!
•  Aggregation, Compromises on resolutions etc.!
•  Transactions - each creates N log entries !
•  limit recording, time based indices + aliases!
•  High throughput - high rate of logs & metrics!
•  build a monitoring infrastructure (remember this)!
!
METRIC SOURCE!
NUMBER OF METRICS TO
COLLECT!
OS (CPU. Mem, Disk)
 21
Hadoop
 133
Hbase 
 68
Elasticsearch
 62
Apache Storm
 25
Total
 309
~ 3,1 Mio. data points per week x N machines !
!
Example - No. of metrics per application!
25 Metric
Categories !
Metrics – Apache Kafka!
•  Find out and define metrics to collect !
•  Install, configure collectd, statsd, graphite, …!
•  Build, install / configure available agents!
•  Define reports or arrange all collected metrics to
dashboards e.g. grafana, …!
•  This are the basics!!
•  automate deployment for agents!
#monitoringsucks
#monitoringlove
•  Integrate with the organization !
•  alerting workflows + multi-user + security!
•  Scale out: !
•  Distributed event processing (e.g. Kafka)!
•  Scalable data stores (e.g. Elasticsearch, HBase)!
•  Add intelligence: !
•  Machine learning for metrics & events!
•  Alerting & Reporting based on it!
Monitoring Architecture
Receiver! Aggregator!
Scalable!
Storage!
Reporting!
Machine
Learning!
Alerting!
Forwarding!
User
Management!
Agents for all monitored applications!
Visualisation! Admin!
What can we find

in the wild?
Network Level
•  Packets: loss, size, counts!
•  Latency, jitters, delays!
•  Bandwidth – total, per link, per service, !
•  Firewalls / security breaches!
•  IDS, IPS – yet another malware detected !
•  On physical, transport, application layer, ...!
Server Level
•  Disk I/O!
•  CPU load!
•  Disk Space !
•  Memory!
•  Logs / security / events / syslog!
Standard Applications
•  Webservers, Databases, Search Engines, MQ‘s!
•  Request rates, disk space, partitions, locks, connections,
queue sizes, cache sizes!
•  Logfiles!
Hadoop,
Elasticsearch,
Cassandra,
Kafka, Storm
Spark, ...!
Example: Elasticsearch
Link: Top Metrics !
Own Application 

Custom Metrics & Logs
•  Logs & API for measurement!
•  Time measurements, KPI‘s, Usage tracking, Object
counters, Click Streams!
Application Traces
•  Post mortem analysis!
process.on (‚exit‘, heapdumpAndDie)
•  Dtrace !
•  Call Traces, Error stacks!
•  Heapdumps & Flamegraphs!
Log files as source of metrics
•  Simplest: log rate of an application!
•  Generate Count for operations!
•  Apply search and count related events!
•  E.g. count slow operations!
•  Extract values from logs !
•  Apply regex or field search to extract numbers !
Logs2Metrics 
Logs! Index!
Scheduled
Queries!
aggregate all messages
matching e.g.
„session opened“
every Minute e.g. on
auth.log
Custom !
Metric!
Monitoring !
System!
A Checklist for the introduction
of monitoring solutions
Define your criterias
•  Coverage of monitors/agents!
•  Quality of agents & setup!
•  Multi-User Support!
•  Reporting Capability & Secure Sharing!
•  Alerting capabilities!
•  Integrations / Notifications / API‘s!
•  Estimate required resources !
Map your landscape
•  Quantity of servers & applications to monitor!
•  What are the components of your App-Stack?!
•  Linux on AWS, NGINX, Node.js, REDIS, Elasticsearch!
•  Which programming languages are used?!
•  Can you find agents/monitors for all your ‚Apps‘?!
•  List missing parts -> find other or build a monitor!
Customizing – custom
metrics/plugins
•  What metrics are relevant for each ‚App‘?!
•  What is covered by existing agents?!
•  How to aggregate each of this metrics? !
•  min, max, sum, avg!
•  Pre-Aggregation vs. Query Time Aggregation!
Dashboards
•  Graphs!
•  Which metrics belong together?!
•  Display options ….!
•  Query language !
•  Dashboards!
•  What combination of graphs provides best insight?!
•  Can you share and re-use arranged dashboards for similar setups or situations? !
•  Or do you need to configure it again for other servers?!
•  Is sharing secured? Or just a link to your UI?!
Alerts
•  Threshold based alerts!
•  Status changes !
•  Heartbeat alerts!
•  Anomaly detection!
•  Challenges: Number of alert rules and queries !
& tuning ‘noise level’!
Alert notifications
anomaly
detection
and
alerting!
•  Metrics show „something happens“!
•  Logs provide evidence „what happened“!
•  Faster insights by reporting them together!
•  Correlate logs and metrics!
•  Metrics could be created from logs!
Integrate metrics & logs
Correlate Logs & Metrics
A brief overview of 

Centralizing Logs

raw logs! parser!
Log
shipper! storage! Visualization!
Kibana!Elasticsearch!Logstash!
Where is the work?!
Centralizing Logs with ELK !
files,
syslog!
Format adaption,!
& transport!
Tuning !
Maintenance!
Queries!
Security !
•  Input: Unstructured log lines!
•  Filter & Parser: Grok / RegEx!
•  Output: Structured JSON!
•  Forwarder: !
•  Elasticsearch, …!
•  Schema: Define the right Mapping 
•  Insert rate:!
•  Use bulk indexing!
•  Increase refresh time for higher insert rate!
•  Volume: !
•  Aliases and time based indices!
•  Memory usage: configure caching limits!
Setup Elasticsearch
•  How to secure it? !
•  Proxies, Security plugins, Hosted Solutions!
•  Queries and dashboard creation!
•  generators/templates for specific setups!
•  Learn Lucene query language!
Kibana
Thank you for !
your attention!
http://blog.sematext.com!

Weitere ähnliche Inhalte

Was ist angesagt?

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 

Was ist angesagt? (20)

Server Log Files & Technical SEO Audits: What You Need to Know
Server Log Files & Technical SEO Audits: What You Need to KnowServer Log Files & Technical SEO Audits: What You Need to Know
Server Log Files & Technical SEO Audits: What You Need to Know
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
 
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014Big data at AWS Chicago User Group - 2014
Big data at AWS Chicago User Group - 2014
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
Logging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaLogging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations Trifecta
 
Redash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data Lakes
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
 
Sarine's Big Data Journey by Rostislav Aaronov
Sarine's Big Data Journey by Rostislav AaronovSarine's Big Data Journey by Rostislav Aaronov
Sarine's Big Data Journey by Rostislav Aaronov
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Time Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETTTime Series Anomaly Detection with Azure and .NETT
Time Series Anomaly Detection with Azure and .NETT
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data Science
 
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ NetflixCloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
 

Andere mochten auch

Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
Philipp Fehre
 
Metrics Driven Design by Joshua Porter
Metrics Driven Design by Joshua PorterMetrics Driven Design by Joshua Porter
Metrics Driven Design by Joshua Porter
Andrew Chen
 

Andere mochten auch (10)

Metrics Monitoring Is So Critical - What's Your Best Approach?
Metrics Monitoring Is So Critical - What's Your Best Approach? Metrics Monitoring Is So Critical - What's Your Best Approach?
Metrics Monitoring Is So Critical - What's Your Best Approach?
 
What Every Organization Should Log And Monitor
What Every Organization Should Log And MonitorWhat Every Organization Should Log And Monitor
What Every Organization Should Log And Monitor
 
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
 
Quality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business ProcessesQuality Assurance and Testing of Automated Business Processes
Quality Assurance and Testing of Automated Business Processes
 
Logging & Metrics
Logging & Metrics  Logging & Metrics
Logging & Metrics
 
Logging & Metrics with Docker
Logging & Metrics with DockerLogging & Metrics with Docker
Logging & Metrics with Docker
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 
Metrics Driven Design by Joshua Porter
Metrics Driven Design by Joshua PorterMetrics Driven Design by Joshua Porter
Metrics Driven Design by Joshua Porter
 
Pengembangan Aplikasi Cloud Computing Menggunakan Node.js
Pengembangan Aplikasi Cloud Computing Menggunakan Node.jsPengembangan Aplikasi Cloud Computing Menggunakan Node.js
Pengembangan Aplikasi Cloud Computing Menggunakan Node.js
 

Ähnlich wie Metrics & more

Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-Applications
Satya Sanjibani Routray
 
Vest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuvenVest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuven
Marc Hullegie
 

Ähnlich wie Metrics & more (20)

Managing Remote Operation Teams
Managing Remote Operation TeamsManaging Remote Operation Teams
Managing Remote Operation Teams
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
 
Redundant devops
Redundant devopsRedundant devops
Redundant devops
 
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
Monitoring at Facebook - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2015
 
Monitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applicationsMonitoring docker containers and dockerized applications
Monitoring docker containers and dockerized applications
 
Tech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application StackTech for the Non Technical - Anatomy of an Application Stack
Tech for the Non Technical - Anatomy of an Application Stack
 
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux FestBuilding an Open Source AppSec Pipeline - 2015 Texas Linux Fest
Building an Open Source AppSec Pipeline - 2015 Texas Linux Fest
 
Monitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-ApplicationsMonitoring-Docker-Container-and-Dockerized-Applications
Monitoring-Docker-Container-and-Dockerized-Applications
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
 
Monitoring docker-container-and-dockerized-applications
Monitoring docker-container-and-dockerized-applicationsMonitoring docker-container-and-dockerized-applications
Monitoring docker-container-and-dockerized-applications
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
 
Vest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuvenVest Forensics presentation owasp benelux days 2012 leuven
Vest Forensics presentation owasp benelux days 2012 leuven
 
Workshop 04 android-development
Workshop 04 android-developmentWorkshop 04 android-development
Workshop 04 android-development
 
DCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to MicroservicesDCEU 18: From Monolith to Microservices
DCEU 18: From Monolith to Microservices
 
OpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOpsOpenFest 2014 Aggressive DevOps
OpenFest 2014 Aggressive DevOps
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Monitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In AzureMonitoring Containerized Micro-Services In Azure
Monitoring Containerized Micro-Services In Azure
 
Monitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized ApplicationMonitoring Docker Containers and Dockererized Application
Monitoring Docker Containers and Dockererized Application
 

Kürzlich hochgeladen

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 

Kürzlich hochgeladen (20)

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 

Metrics & more

  • 1. Metrics & more how to monitor big data systems @scale!
  • 3. Why monitoring is important •  Tuning ! •  Detecting Bugs! •  Stability! •  Benchmarks! •  Capacity planning!
  • 5. Would you start building own scales, when you would operate a real zoo? - What’s your mechanical engineering expertise? - How long does it take to get tools and raw material? - Who feeds the animals while being in the workshop? - When do we need it and could it be ‚in time‘?
  • 6. Let’s take something from the shelf and build a custom interface ‚load balancers‘!‚Custom Interface‘!
  • 8. •  Many VM’s & Apps - each one generates ~ 5-130 metrics in short intervals! •  Aggregation, Compromises on resolutions etc.! •  Transactions - each creates N log entries ! •  limit recording, time based indices + aliases! •  High throughput - high rate of logs & metrics! •  build a monitoring infrastructure (remember this)! !
  • 9. METRIC SOURCE! NUMBER OF METRICS TO COLLECT! OS (CPU. Mem, Disk) 21 Hadoop 133 Hbase 68 Elasticsearch 62 Apache Storm 25 Total 309 ~ 3,1 Mio. data points per week x N machines ! ! Example - No. of metrics per application!
  • 10. 25 Metric Categories ! Metrics – Apache Kafka!
  • 11. •  Find out and define metrics to collect ! •  Install, configure collectd, statsd, graphite, …! •  Build, install / configure available agents! •  Define reports or arrange all collected metrics to dashboards e.g. grafana, …! •  This are the basics!! •  automate deployment for agents! #monitoringsucks
  • 12. #monitoringlove •  Integrate with the organization ! •  alerting workflows + multi-user + security! •  Scale out: ! •  Distributed event processing (e.g. Kafka)! •  Scalable data stores (e.g. Elasticsearch, HBase)! •  Add intelligence: ! •  Machine learning for metrics & events! •  Alerting & Reporting based on it!
  • 14.
  • 15. What can we find
 in the wild?
  • 16. Network Level •  Packets: loss, size, counts! •  Latency, jitters, delays! •  Bandwidth – total, per link, per service, ! •  Firewalls / security breaches! •  IDS, IPS – yet another malware detected ! •  On physical, transport, application layer, ...!
  • 17. Server Level •  Disk I/O! •  CPU load! •  Disk Space ! •  Memory! •  Logs / security / events / syslog!
  • 18. Standard Applications •  Webservers, Databases, Search Engines, MQ‘s! •  Request rates, disk space, partitions, locks, connections, queue sizes, cache sizes! •  Logfiles!
  • 21. Own Application 
 Custom Metrics & Logs •  Logs & API for measurement! •  Time measurements, KPI‘s, Usage tracking, Object counters, Click Streams!
  • 22. Application Traces •  Post mortem analysis! process.on (‚exit‘, heapdumpAndDie) •  Dtrace ! •  Call Traces, Error stacks! •  Heapdumps & Flamegraphs!
  • 23. Log files as source of metrics •  Simplest: log rate of an application! •  Generate Count for operations! •  Apply search and count related events! •  E.g. count slow operations! •  Extract values from logs ! •  Apply regex or field search to extract numbers !
  • 24. Logs2Metrics Logs! Index! Scheduled Queries! aggregate all messages matching e.g. „session opened“ every Minute e.g. on auth.log Custom ! Metric! Monitoring ! System!
  • 25. A Checklist for the introduction of monitoring solutions
  • 26. Define your criterias •  Coverage of monitors/agents! •  Quality of agents & setup! •  Multi-User Support! •  Reporting Capability & Secure Sharing! •  Alerting capabilities! •  Integrations / Notifications / API‘s! •  Estimate required resources !
  • 27. Map your landscape •  Quantity of servers & applications to monitor! •  What are the components of your App-Stack?! •  Linux on AWS, NGINX, Node.js, REDIS, Elasticsearch! •  Which programming languages are used?! •  Can you find agents/monitors for all your ‚Apps‘?! •  List missing parts -> find other or build a monitor!
  • 28. Customizing – custom metrics/plugins •  What metrics are relevant for each ‚App‘?! •  What is covered by existing agents?! •  How to aggregate each of this metrics? ! •  min, max, sum, avg! •  Pre-Aggregation vs. Query Time Aggregation!
  • 29. Dashboards •  Graphs! •  Which metrics belong together?! •  Display options ….! •  Query language ! •  Dashboards! •  What combination of graphs provides best insight?! •  Can you share and re-use arranged dashboards for similar setups or situations? ! •  Or do you need to configure it again for other servers?! •  Is sharing secured? Or just a link to your UI?!
  • 30. Alerts •  Threshold based alerts! •  Status changes ! •  Heartbeat alerts! •  Anomaly detection! •  Challenges: Number of alert rules and queries ! & tuning ‘noise level’!
  • 32. •  Metrics show „something happens“! •  Logs provide evidence „what happened“! •  Faster insights by reporting them together! •  Correlate logs and metrics! •  Metrics could be created from logs! Integrate metrics & logs
  • 33. Correlate Logs & Metrics
  • 34. A brief overview of 
 Centralizing Logs

  • 35.
  • 36. raw logs! parser! Log shipper! storage! Visualization! Kibana!Elasticsearch!Logstash! Where is the work?! Centralizing Logs with ELK ! files, syslog! Format adaption,! & transport! Tuning ! Maintenance! Queries! Security !
  • 37. •  Input: Unstructured log lines! •  Filter & Parser: Grok / RegEx! •  Output: Structured JSON! •  Forwarder: ! •  Elasticsearch, …!
  • 38. •  Schema: Define the right Mapping •  Insert rate:! •  Use bulk indexing! •  Increase refresh time for higher insert rate! •  Volume: ! •  Aliases and time based indices! •  Memory usage: configure caching limits! Setup Elasticsearch
  • 39. •  How to secure it? ! •  Proxies, Security plugins, Hosted Solutions! •  Queries and dashboard creation! •  generators/templates for specific setups! •  Learn Lucene query language!
  • 41. Thank you for ! your attention! http://blog.sematext.com!