SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Simple Practices in
Performance Monitoring and Evaluation
Schubert Zhang
2016.3.24
SLA
Service Level Agreements
https://en.wikipedia.org/wiki/Service-level_agreement
SLAs commonly include segments to address: 

a definition of services, performance measurement, problem management, customer duties,
warranties, disaster recovery, termination of agreement.
•
•
• API
IM SLA
•
• Performance
• Performance
performance oriented SLA
Metrics
SLA Performance SLA
Performance Metrics
e.g.1: API
•
• (99%)
•
e.g.2: Call Center
• Abandonment Rate: Percentage of calls abandoned while waiting to be answered.
• ASA (Average Speed to Answer): Average time it takes for a call to be answered
by the service desk.
• TSF (Time Service Factor): Percentage of calls answered within a definite
timeframe, e.g., 80% in 20 seconds.
• FCR (First-Call Resolution): Percentage of incoming calls that can be resolved
without the use of a callback or without having the caller call back the helpdesk to
finish resolving the case.
• TAT (Turn-Around Time): Time taken to complete a certain task.
Metrics
Performance Metrics
Benchmarking
the quality of a service must be measured, evaluated,
… benchmarked.
and we must have a set of approaches for benchmarking.
Metrics to be monitored
Throughput
QPS TPS CPS
in seconds, in minutes, in hours …
Concurrency
Latency
Response Time Round-Trip Time(RTT) …
Average Median Min. Max. Percentile …
Quantile / Percentile
refers to Google Sawzall Paper
A Summary of these Concepts
Client-1
Client-2
Client-3
Client-N
Work Thread
Work Thread
Work Thread
Work Thread
Work Thread
ThroughputLatency Concurrency
Clients Server
A Life-World Example
Example-1
Paper Amazon Dynamo
Average
99.9%, quantile
Example-2
Evaluation Report to a NoSQL DB
Cassandra
Benchmark for Write API
Benchmark for Writes Cluster overview
Throughput
Latency
• Each	node	runs	6	clients	(threads),	totally	54	clients.
• Each	client	generates	random	CDRs	for	50	million	users/phone-numbers,	
and	puts	them	into	DaStor	one	by	one.
– Key	Space:	50	million
– Size	of	a	CDR: Thrift-compacted	encoding,	~200	bytes
ü Throughput:	 average	~80K	ops/s;	per-node:	average	~9K	ops/s
ü Latency:	average	~0.5ms
p Bottleneck:	network		(and	memory)
Benchmark for Read API
• Each	node	runs	8	clients	(threads)	,	totally	72	clients.
• Each	client	randomly	uses	a	user-id/phone-number	out	of	the	50-million	
space,	to	get	it’s	recent	20	CDRs	(one	page)	from	DaStor.
• All	clients	read	CDRs	of	a	same	day/bucket.
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
100ms
percentage	of	read	ops
ü Throughput:	 average	~140	ops/s;		per-node:	average	~16	ops/s
ü Latency:	average	~500ms,	97%	<	2s	(SLA)
p Bottleneck:	disk	IO	(random	seek)	(CPU	load	is	very	low)
average
97%
quantile
Total & Delta
Total:
Delta:
Generate the metrics and
monitor them
• In server side
• Add a operation-count and the time-
cost for every client call
• For every monitor interval, pull and
push the current Throughput and
Latency the monitor-tool(ganglia/
zabbix) or console.
• Throughput = sum of count / time interval
• Latency = average(sum of latency / sum of count),
max, min, quantile …
Code in Gitlab and Gerrit
Code for Spring Project
• Java
• JMX (Java Management Extensions, a simple example at https://github.com/schubertzhang/jsketch)
• javaagent (java -javaagent:jar path [= premain ] )
• jmxetric (use JMX and javaagent to display metrics to Ganglia, https://github.com/schubertzhang/jmxetric)
•
• Ganglia
• Zabbix
• …
Ganglia Zabbix etc.
Performance Benchmark
Programing
Demo
Test and Evaluation the Throughput and Latency of http://www.fangdd.com
Demo Time …
demo screenshots
demo screenshots
Average 95%
The long tail …
Statistical Monitoring for Outlier
usually for trouble-shooting
Captured from UTStarcom mSwitch R5 system, Guangxi Site, 2004.
The magic matrix:
•
• Redis Memcache
• Just add at a point, very low-cost
•
• Very
• Logs ELK
Heavy Logs & ELK
It’s another topic!
Thank You!

Weitere ähnliche Inhalte

Ähnlich wie Simple practices in performance monitoring and evaluation

IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & Accounting
IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & AccountingIBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & Accounting
IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & AccountingPaul Dennis
 
X-Ray distributed tracing proof-of-concept
X-Ray distributed tracing proof-of-conceptX-Ray distributed tracing proof-of-concept
X-Ray distributed tracing proof-of-conceptAram Alipoor
 
Latency SLOs Done Right
Latency SLOs Done RightLatency SLOs Done Right
Latency SLOs Done RightFred Moyer
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...Angad Singh
 
High throughput data streaming in Azure
High throughput data streaming in AzureHigh throughput data streaming in Azure
High throughput data streaming in AzureAlexander Laysha
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemReza Rahimi
 
39245203 intro-es-iv
39245203 intro-es-iv39245203 intro-es-iv
39245203 intro-es-ivEmbeddedbvp
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityRicardo Jimenez-Peris
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...HostedbyConfluent
 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.jhugg
 
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...Transcat
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...InfluxData
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
 
How to scale recommendation system with HBase
How to scale recommendation system with HBaseHow to scale recommendation system with HBase
How to scale recommendation system with HBaseRafael Arana
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Redis Labs
 
Business in a Flash: How to increase performance and lower costs in the data...
Business in a Flash:  How to increase performance and lower costs in the data...Business in a Flash:  How to increase performance and lower costs in the data...
Business in a Flash: How to increase performance and lower costs in the data...Violin Memory
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 

Ähnlich wie Simple practices in performance monitoring and evaluation (20)

IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & Accounting
IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & AccountingIBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & Accounting
IBM Impact 2014 AMC-1877: IBM WebSphere MQ for z/OS: Performance & Accounting
 
X-Ray distributed tracing proof-of-concept
X-Ray distributed tracing proof-of-conceptX-Ray distributed tracing proof-of-concept
X-Ray distributed tracing proof-of-concept
 
Latency SLOs Done Right
Latency SLOs Done RightLatency SLOs Done Right
Latency SLOs Done Right
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...An adaptive and eventually self healing framework for geo-distributed real-ti...
An adaptive and eventually self healing framework for geo-distributed real-ti...
 
Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
High throughput data streaming in Azure
High throughput data streaming in AzureHigh throughput data streaming in Azure
High throughput data streaming in Azure
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
39245203 intro-es-iv
39245203 intro-es-iv39245203 intro-es-iv
39245203 intro-es-iv
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.
 
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...
A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
How to scale recommendation system with HBase
How to scale recommendation system with HBaseHow to scale recommendation system with HBase
How to scale recommendation system with HBase
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
Business in a Flash: How to increase performance and lower costs in the data...
Business in a Flash:  How to increase performance and lower costs in the data...Business in a Flash:  How to increase performance and lower costs in the data...
Business in a Flash: How to increase performance and lower costs in the data...
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 

Mehr von Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

Mehr von Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Simple practices in performance monitoring and evaluation

  • 1. Simple Practices in Performance Monitoring and Evaluation Schubert Zhang 2016.3.24
  • 2. SLA Service Level Agreements https://en.wikipedia.org/wiki/Service-level_agreement SLAs commonly include segments to address: a definition of services, performance measurement, problem management, customer duties, warranties, disaster recovery, termination of agreement.
  • 3. • • • API IM SLA • • Performance • Performance performance oriented SLA
  • 4. Metrics SLA Performance SLA Performance Metrics e.g.1: API • • (99%) • e.g.2: Call Center • Abandonment Rate: Percentage of calls abandoned while waiting to be answered. • ASA (Average Speed to Answer): Average time it takes for a call to be answered by the service desk. • TSF (Time Service Factor): Percentage of calls answered within a definite timeframe, e.g., 80% in 20 seconds. • FCR (First-Call Resolution): Percentage of incoming calls that can be resolved without the use of a callback or without having the caller call back the helpdesk to finish resolving the case. • TAT (Turn-Around Time): Time taken to complete a certain task. Metrics Performance Metrics
  • 5. Benchmarking the quality of a service must be measured, evaluated, … benchmarked. and we must have a set of approaches for benchmarking.
  • 6. Metrics to be monitored
  • 7. Throughput QPS TPS CPS in seconds, in minutes, in hours …
  • 9. Latency Response Time Round-Trip Time(RTT) … Average Median Min. Max. Percentile …
  • 10. Quantile / Percentile refers to Google Sawzall Paper
  • 11. A Summary of these Concepts Client-1 Client-2 Client-3 Client-N Work Thread Work Thread Work Thread Work Thread Work Thread ThroughputLatency Concurrency Clients Server
  • 14.
  • 15.
  • 17. Example-2 Evaluation Report to a NoSQL DB Cassandra
  • 18. Benchmark for Write API Benchmark for Writes Cluster overview Throughput Latency • Each node runs 6 clients (threads), totally 54 clients. • Each client generates random CDRs for 50 million users/phone-numbers, and puts them into DaStor one by one. – Key Space: 50 million – Size of a CDR: Thrift-compacted encoding, ~200 bytes ü Throughput: average ~80K ops/s; per-node: average ~9K ops/s ü Latency: average ~0.5ms p Bottleneck: network (and memory)
  • 19. Benchmark for Read API • Each node runs 8 clients (threads) , totally 72 clients. • Each client randomly uses a user-id/phone-number out of the 50-million space, to get it’s recent 20 CDRs (one page) from DaStor. • All clients read CDRs of a same day/bucket. 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 100ms percentage of read ops ü Throughput: average ~140 ops/s; per-node: average ~16 ops/s ü Latency: average ~500ms, 97% < 2s (SLA) p Bottleneck: disk IO (random seek) (CPU load is very low) average 97% quantile
  • 21. Generate the metrics and monitor them
  • 22. • In server side • Add a operation-count and the time- cost for every client call • For every monitor interval, pull and push the current Throughput and Latency the monitor-tool(ganglia/ zabbix) or console. • Throughput = sum of count / time interval • Latency = average(sum of latency / sum of count), max, min, quantile … Code in Gitlab and Gerrit
  • 23. Code for Spring Project
  • 24. • Java • JMX (Java Management Extensions, a simple example at https://github.com/schubertzhang/jsketch) • javaagent (java -javaagent:jar path [= premain ] ) • jmxetric (use JMX and javaagent to display metrics to Ganglia, https://github.com/schubertzhang/jmxetric) • • Ganglia • Zabbix • …
  • 26. Performance Benchmark Programing Demo Test and Evaluation the Throughput and Latency of http://www.fangdd.com
  • 30. Statistical Monitoring for Outlier usually for trouble-shooting
  • 31. Captured from UTStarcom mSwitch R5 system, Guangxi Site, 2004. The magic matrix:
  • 32. • • Redis Memcache • Just add at a point, very low-cost • • Very • Logs ELK
  • 33. Heavy Logs & ELK It’s another topic!