SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Performance In Geode:
How Fast Is It, How Is It Measured, and How Can It Be Improved?
Helena Bales, Senior Software Engineer at Pivotal
What Is The Performance Of Geode?
2
Performance of Geode 1.9.0
3
203,855
244,463
181,655
207,697
What do those number mean?
4
● 200,000 operations per second means nothing to a person.
○ Is that good?
○ Is the performance consistent and accurate?
○ Has it improved or regressed since the last version?
○ Can it be better?
What do those number mean?
5
● 200,000 operations per second means nothing to a person.
○ Is that good? Pretty good, yes.
○ Is the performance consistent and accurate? Not yet.
○ Has it improved since the last version? Yes, slightly.
○ Can it be better? YES.
What do those number mean?
6
● 200,000 operations per second means nothing to a person.
○ Is that good? Pretty good, yes.
○ Is the performance consistent and accurate? Not yet.
○ Has it improved since the last version? Yes, slightly.
○ Can it be better? YES.
How do you know???
How Is Performance Measured?
7
Creating the Geode Benchmark - Features
8
● On demand
● Against any revision of Geode
● On AWS cluster deployment of Geode
● On any dev machine in the office
● From Concourse CI pipeline
● With a profiler attached
● Compare two runs of benchmarks for performance changes
Creating the Geode Benchmark - Goals
9
● Run by anyone interested in Geode
● Have others create benchmarks
● Visualize benchmark results over time
● Increase benchmark coverage of Geode
Tests Currently in the Benchmarks
10
○ ReplicatedGetBenchmark
○ ReplicatedGetLongBenchmark
○ ReplicatedPutBenchmark
○ ReplicatedPutLongBenchmark
○ ReplicatedPutAllBenchmark
○ ReplicatedPutAllLongBenchmark
○ ReplicatedFunctionExecutionBenchmark
○ ReplicatedFunctionExecutionWithArgum
entsBenchmark
○ ReplicatedFunctionExecutionWithFilters
Benchmark
○ PartitionedGetBenchmark
○ PartitionedGetLongBenchmark
○ PartitionedPutBenchmark
○ PartitionedPutLongBenchmark
○ PartitionedPutAllBenchmark
○ PartitionedPutAllLongBenchmark
○ PartitionedIndexedQueryBenchmark
○ PartitionedFunctionExecutionWithArgum
entsBenchmark
○ PartitionedFunctionExecutionWithFilters
Benchmark
Other Tested Configurations
11
● With SSL
● With JDKs: 8, 11, 12, 13
● With Security Manager
● With Garbage Collectors:
○ CMS
○ G1
○ Z
○ Shenandoah
● Adjustable max heap size
How Can Performance Be Improved?
12
Finding Performance Bottlenecks
13
● Monitor locks
● Thread Park/Unpark Reentrant Locks
● Allocations/GC
● Overuse of synchronization
● Getting a system property in a hot path
● Lazy initialization of objects in a hot path
● Synchronization on a container (ex. hash map)
Case Study – The Connection Pool
14
● Why were we even looking for anything?
○ Couldn’t saturate network, CPU, memory; no matter the available
resources
○ Profiler gave us no suspect hot spots
● How did we find the issue?
○ Found the secret profiler option to measure zero-time reentrant locks
○ Thread.park() became a hot spot, with reentrant lock and connection
pool as callers
○ The connection pool was holding a reentrant lock in a hot path
while using a deque.
Case Study – Finding the Problem
15
16
Case Study – Finding the Problem
Case Study - Finding the Problem
17
Case Study- Finding the Problem
18
19
Case Study – Solving the Problem
20
no lock!
Case Study – Solving the Problem
21
lock free structure
Case Study – Solving the Problem
22
no locks!
Case Study – Solving the Problem
23
Case Study - Profiling
24
Case Study – Testing
25
● Unit testing
● Integration Testing
● Distributed Testing
● Concurrency Testing
● Performance Testing
Case Study - Performance Testing
26
197,686
before
659,980
after
Case Study - Performance Testing
27
Other Bottlenecks – Over Eager Allocations
28
2 potentially
unused objects
per call –
new HashSet() =>
1 HashSet
& 1 HashMap
Other Bottlenecks – Over Eager Allocations (fixed)
● Do not allocate eagerly
● Allocate near first use
● Allocate after early returns that don’t use the allocated object
29
Other Bottlenecks – Know Your Structures
30
Methods called for every
operation and results in
1 add and 1 remove per op
Other Bottlenecks – Know Your Structures (fixed)
31
Methods still called for every
operation but does not allocate/gc
How much has performance improved?
32
Comparing Performance of 1.9.0 & 1.10.0
33
203,855
1.9.0
244,463
1.9.0 181,655
1.9.0
207,697
1.9.0
692,725
1.10.0
736,022
1.10.0
357,507
1.10.0
372,430
1.10.0
Comparing Performance of 1.9.0 & 1.10.0
34
1,764,765
1.9.0
518,534
1.10.0
488,051
1.10.0
1,005,730
1.10.0
965,404
1.10.0
1,980,391
1.9.0
1,471,434
1.9.0
1,731,946
1.9.0
Why Upgrade to Geode 1.10.0?
35
Comparing Performance of 1.9.0 & 1.10.0
36
v. 1.10.0
v. 1.9.0
PartitionedGetBenchmark
Relevant Links
37
● Geode repo: https://github.com/apache/geode
● Benchmark repo: https://github.com/apache/geode-benchmarks
● JIRA query for Performance Issues:
https://issues.apache.org/jira/browse/GEODE-
7134?jql=project%20%3D%20GEODE%20AND%20labels%20%3D
%20performance
Thank You
38

Weitere ähnliche Inhalte

Was ist angesagt?

Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
Multithreading and Actors
Multithreading and ActorsMultithreading and Actors
Multithreading and ActorsDiego Pacheco
 
Sap erp sp ehp基本 システム更新への基礎知識
Sap erp sp ehp基本 システム更新への基礎知識Sap erp sp ehp基本 システム更新への基礎知識
Sap erp sp ehp基本 システム更新への基礎知識Shiroh Kinoshita
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
Aplicación práctica de FIWARE al Internet de las Cosas
Aplicación práctica de FIWARE al Internet de las CosasAplicación práctica de FIWARE al Internet de las Cosas
Aplicación práctica de FIWARE al Internet de las CosasJavier García Puga
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunk
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafkaconfluent
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker PatternVikash Kodati
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Making Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMMaking Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMRafael Winterhalter
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cacheSebastian Andrasoni
 
MariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database ProxyMariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database ProxyMarkus Mäkelä
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
SOLID Principles and The Clean Architecture
SOLID Principles and The Clean ArchitectureSOLID Principles and The Clean Architecture
SOLID Principles and The Clean ArchitectureMohamed Galal
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaSpark Summit
 

Was ist angesagt? (20)

Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Multithreading and Actors
Multithreading and ActorsMultithreading and Actors
Multithreading and Actors
 
Sap erp sp ehp基本 システム更新への基礎知識
Sap erp sp ehp基本 システム更新への基礎知識Sap erp sp ehp基本 システム更新への基礎知識
Sap erp sp ehp基本 システム更新への基礎知識
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Aplicación práctica de FIWARE al Internet de las Cosas
Aplicación práctica de FIWARE al Internet de las CosasAplicación práctica de FIWARE al Internet de las Cosas
Aplicación práctica de FIWARE al Internet de las Cosas
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
RabbitMQ & Kafka
RabbitMQ & KafkaRabbitMQ & Kafka
RabbitMQ & Kafka
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search Optimization
 
Monitoring Apache Kafka
Monitoring Apache KafkaMonitoring Apache Kafka
Monitoring Apache Kafka
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Circuit Breaker Pattern
Circuit Breaker PatternCircuit Breaker Pattern
Circuit Breaker Pattern
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Making Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVMMaking Java more dynamic: runtime code generation for the JVM
Making Java more dynamic: runtime code generation for the JVM
 
Infinispan, a distributed in-memory key/value data grid and cache
 Infinispan, a distributed in-memory key/value data grid and cache Infinispan, a distributed in-memory key/value data grid and cache
Infinispan, a distributed in-memory key/value data grid and cache
 
MariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database ProxyMariaDB MaxScale: an Intelligent Database Proxy
MariaDB MaxScale: an Intelligent Database Proxy
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
SOLID Principles and The Clean Architecture
SOLID Principles and The Clean ArchitectureSOLID Principles and The Clean Architecture
SOLID Principles and The Clean Architecture
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 

Ähnlich wie Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved?

Continuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeContinuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeMatt Graham
 
Making Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsMaking Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsGail Murphy
 
Dataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGDataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGgree_tech
 
2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in CheckTargetX
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...Atlassian
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Lucidworks
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoEqunix Business Solutions
 
Db migrations equal pain
Db migrations equal painDb migrations equal pain
Db migrations equal painEugen Oskin
 
Keeping code clean
Keeping code cleanKeeping code clean
Keeping code cleanBrett Child
 
CD in Machine Learning Systems
CD in Machine Learning SystemsCD in Machine Learning Systems
CD in Machine Learning SystemsThoughtworks
 
Where refactoring meets big $$$
Where refactoring meets big $$$Where refactoring meets big $$$
Where refactoring meets big $$$Michał Gruca
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlcAlexey Tokar
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarGlobalLogic Ukraine
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project PresentationCan Köklü
 
Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeKnoldus Inc.
 
Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Aleksandr Tarasov
 

Ähnlich wie Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved? (20)

Continuous Deployment of Architectural Change
Continuous Deployment of Architectural ChangeContinuous Deployment of Architectural Change
Continuous Deployment of Architectural Change
 
Making Effective, Useful Software Development Tools
Making Effective, Useful Software Development ToolsMaking Effective, Useful Software Development Tools
Making Effective, Useful Software Development Tools
 
Dataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAGDataflow Visualization using ASCII DAG
Dataflow Visualization using ASCII DAG
 
2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check2.1 Automation Nation: Keeping your Process Builders in Check
2.1 Automation Nation: Keeping your Process Builders in Check
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
From Grassroots to Enterprise-wide: 10 Tips for Growing JIRA from 5 Users to ...
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores FinnotoPGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
PGConf.ASIA 2019 Bali - Patroni on GitLab.com - Jose Cores Finnoto
 
Db migrations equal pain
Db migrations equal painDb migrations equal pain
Db migrations equal pain
 
Keeping code clean
Keeping code cleanKeeping code clean
Keeping code clean
 
CD in Machine Learning Systems
CD in Machine Learning SystemsCD in Machine Learning Systems
CD in Machine Learning Systems
 
Where refactoring meets big $$$
Where refactoring meets big $$$Where refactoring meets big $$$
Where refactoring meets big $$$
 
Bots on guard of sdlc
Bots on guard of sdlcBots on guard of sdlc
Bots on guard of sdlc
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Slices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr BodnarSlices Of Performance in Java - Oleksandr Bodnar
Slices Of Performance in Java - Oleksandr Bodnar
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
 
Refactoring: Improving the design of existing code
Refactoring: Improving the design of existing codeRefactoring: Improving the design of existing code
Refactoring: Improving the design of existing code
 
Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
 

Mehr von VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

Mehr von VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Kürzlich hochgeladen

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Kürzlich hochgeladen (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Performance in Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved?

  • 1. Performance In Geode: How Fast Is It, How Is It Measured, and How Can It Be Improved? Helena Bales, Senior Software Engineer at Pivotal
  • 2. What Is The Performance Of Geode? 2
  • 3. Performance of Geode 1.9.0 3 203,855 244,463 181,655 207,697
  • 4. What do those number mean? 4 ● 200,000 operations per second means nothing to a person. ○ Is that good? ○ Is the performance consistent and accurate? ○ Has it improved or regressed since the last version? ○ Can it be better?
  • 5. What do those number mean? 5 ● 200,000 operations per second means nothing to a person. ○ Is that good? Pretty good, yes. ○ Is the performance consistent and accurate? Not yet. ○ Has it improved since the last version? Yes, slightly. ○ Can it be better? YES.
  • 6. What do those number mean? 6 ● 200,000 operations per second means nothing to a person. ○ Is that good? Pretty good, yes. ○ Is the performance consistent and accurate? Not yet. ○ Has it improved since the last version? Yes, slightly. ○ Can it be better? YES. How do you know???
  • 7. How Is Performance Measured? 7
  • 8. Creating the Geode Benchmark - Features 8 ● On demand ● Against any revision of Geode ● On AWS cluster deployment of Geode ● On any dev machine in the office ● From Concourse CI pipeline ● With a profiler attached ● Compare two runs of benchmarks for performance changes
  • 9. Creating the Geode Benchmark - Goals 9 ● Run by anyone interested in Geode ● Have others create benchmarks ● Visualize benchmark results over time ● Increase benchmark coverage of Geode
  • 10. Tests Currently in the Benchmarks 10 ○ ReplicatedGetBenchmark ○ ReplicatedGetLongBenchmark ○ ReplicatedPutBenchmark ○ ReplicatedPutLongBenchmark ○ ReplicatedPutAllBenchmark ○ ReplicatedPutAllLongBenchmark ○ ReplicatedFunctionExecutionBenchmark ○ ReplicatedFunctionExecutionWithArgum entsBenchmark ○ ReplicatedFunctionExecutionWithFilters Benchmark ○ PartitionedGetBenchmark ○ PartitionedGetLongBenchmark ○ PartitionedPutBenchmark ○ PartitionedPutLongBenchmark ○ PartitionedPutAllBenchmark ○ PartitionedPutAllLongBenchmark ○ PartitionedIndexedQueryBenchmark ○ PartitionedFunctionExecutionWithArgum entsBenchmark ○ PartitionedFunctionExecutionWithFilters Benchmark
  • 11. Other Tested Configurations 11 ● With SSL ● With JDKs: 8, 11, 12, 13 ● With Security Manager ● With Garbage Collectors: ○ CMS ○ G1 ○ Z ○ Shenandoah ● Adjustable max heap size
  • 12. How Can Performance Be Improved? 12
  • 13. Finding Performance Bottlenecks 13 ● Monitor locks ● Thread Park/Unpark Reentrant Locks ● Allocations/GC ● Overuse of synchronization ● Getting a system property in a hot path ● Lazy initialization of objects in a hot path ● Synchronization on a container (ex. hash map)
  • 14. Case Study – The Connection Pool 14 ● Why were we even looking for anything? ○ Couldn’t saturate network, CPU, memory; no matter the available resources ○ Profiler gave us no suspect hot spots ● How did we find the issue? ○ Found the secret profiler option to measure zero-time reentrant locks ○ Thread.park() became a hot spot, with reentrant lock and connection pool as callers ○ The connection pool was holding a reentrant lock in a hot path while using a deque.
  • 15. Case Study – Finding the Problem 15
  • 16. 16 Case Study – Finding the Problem
  • 17. Case Study - Finding the Problem 17
  • 18. Case Study- Finding the Problem 18
  • 19. 19
  • 20. Case Study – Solving the Problem 20 no lock!
  • 21. Case Study – Solving the Problem 21 lock free structure
  • 22. Case Study – Solving the Problem 22 no locks!
  • 23. Case Study – Solving the Problem 23
  • 24. Case Study - Profiling 24
  • 25. Case Study – Testing 25 ● Unit testing ● Integration Testing ● Distributed Testing ● Concurrency Testing ● Performance Testing
  • 26. Case Study - Performance Testing 26 197,686 before 659,980 after
  • 27. Case Study - Performance Testing 27
  • 28. Other Bottlenecks – Over Eager Allocations 28 2 potentially unused objects per call – new HashSet() => 1 HashSet & 1 HashMap
  • 29. Other Bottlenecks – Over Eager Allocations (fixed) ● Do not allocate eagerly ● Allocate near first use ● Allocate after early returns that don’t use the allocated object 29
  • 30. Other Bottlenecks – Know Your Structures 30 Methods called for every operation and results in 1 add and 1 remove per op
  • 31. Other Bottlenecks – Know Your Structures (fixed) 31 Methods still called for every operation but does not allocate/gc
  • 32. How much has performance improved? 32
  • 33. Comparing Performance of 1.9.0 & 1.10.0 33 203,855 1.9.0 244,463 1.9.0 181,655 1.9.0 207,697 1.9.0 692,725 1.10.0 736,022 1.10.0 357,507 1.10.0 372,430 1.10.0
  • 34. Comparing Performance of 1.9.0 & 1.10.0 34 1,764,765 1.9.0 518,534 1.10.0 488,051 1.10.0 1,005,730 1.10.0 965,404 1.10.0 1,980,391 1.9.0 1,471,434 1.9.0 1,731,946 1.9.0
  • 35. Why Upgrade to Geode 1.10.0? 35
  • 36. Comparing Performance of 1.9.0 & 1.10.0 36 v. 1.10.0 v. 1.9.0 PartitionedGetBenchmark
  • 37. Relevant Links 37 ● Geode repo: https://github.com/apache/geode ● Benchmark repo: https://github.com/apache/geode-benchmarks ● JIRA query for Performance Issues: https://issues.apache.org/jira/browse/GEODE- 7134?jql=project%20%3D%20GEODE%20AND%20labels%20%3D %20performance

Hinweis der Redaktion

  1. Hi, my name is Helena Bales, and my pronouns are they/them. I am a Senior Software Engineer at Pivotal, working on GemFire, and have been a Geode committer for about a year and a half. Today I want to talk to you about Geode’s performance. Specifically, what is the performance, how it is measured, and how it can be improved.
  2. So let’s start with the most basic of those three questions: what is the performance of geode?
  3. So this is the performance of Geode. On the vertical axis is the throughput in operations per second, and the horizontal axis has four different benchmark tests. So we can see that PartitionedGetBenchmark had an average throughput of 200,000 operations per second. But what does that mean? AWS Machine info: type - c5.9xlarge; vCPU - 36; Memory - 72GiB; Network – 10 Gbps; EBS bandwidth – 7,000 Mbps
  4. To describe performance, just a number doesn’t tell much about the performance of geode, and just raises more questions. Like is 200,000 good? Is the measurement consistent and accurate? And has it changed since the last version? Perhaps most importantly, can it be improved?
  5. Well here’s my answers to those questions from when we started these new benchmarks. We had pretty good performance, but we were seeing some variance between runs, and some issues with stop-the-world garbage collections. We also saw some improvements from the previous version, but also, a lot of room for improvement.
  6. But that brings up one more question. How do we know any of this?
  7. To answer that, lets start by talking about what the benchmarks test now.
  8. When the Performance team started replacing the previous bare metal performance testing of Geode, we had several goals for the project. These are the ones that we have completed so far. The benchmark can be run on demand against again revision of Geode (released or in development), on an AWS cluster or on any dev machine. They can also be run from Concourse CI pipelines. We also enabled running with a profiler attached for use in debugging performance bottlenecks. And finally, we can compare any two runs of benchmarks for changes in performance.
  9. And these are the goals that we are still working on. We want benchmarks to be run by members of the Geode community against their changes to Geode, or against their deployments. So far we have not received feedback that anyone outside of our office has used this project. We also would like for members of the community to create their own benchmarks to add to the existing list. The visualization of data is also something that is in progress, as that requires many iterations to get right. And finally, we would like to increase the test coverage that Benchmarks provide over Geode.
  10. This is our current lists of tests. We’re only going to focus on the highlighted four today, but you can see that we do have some good coverage over operations so far. The four that we are going to focus on are ReplicatedGetBenchmark, PartitionedGetBenchmark, ReplicatedPutBenchmark, and PartitionedPutBenchmark. That’s because gets and puts on replicated and partitioned regions are some of the most commonly used and basic operations that Geode supports.
  11. With those tests, we also provide many different configuration options for running the benchmarks. We support running the cluster with and without SSL enabled, with JDKs 8 through 13, with or without security manager enabled, with a variety of garbage collectors, and with adjustable max heap size. These options all change the geode cluster, so it has increased our coverage to run with them.
  12. So now that we all know a bit about the goals and features of this new benchmarking framework, lets pivot and talk about how we can improve performance.
  13. And the first step to fixing performance issues is finding them. Here are some of the things to look for using the profiler, including both monitor and reentrant locks, extra allocations and garbage collections. Other things to look for are overuse of synchronization, getting a system property in a hot path, lazy initialization of objects in a hot path, and synchronization on a container such as a hash map. And I’ll go over some examples of these in a bit.
  14. So now lets focus on a specific example of a performance refactor. Starting with the reason that we thought that anything was wrong in the first place. When the benchmark was run, none of the resources were saturated, and we couldn’t figure out where the bottleneck was since the profiler gave us no hot spots. Eventually we found the secret profiler option that shows the zero-time reentrant locks and found that Threak.park became a hotspot, and its callers were reentrant lock and the connection pool. So eventually we found that the connection pool was holding a reentrant lock in a hot path while using a deque.
  15. Highlight 36 vcpu aws instance This graph shows the average performance of the get operation with different numbers of threads on the client, using version 1.9.0 of Geode. As you can see, the performance stops scaling pretty quickly after 32 threads. This ended up being due to the connection pool. It did not support enough concurrent operations for more than 32 threads, causing decreasing performance.
  16. And the profiler shows where the issue is occurring in the code. Every operation that is executed on the server results in one call to borrowConnection and one call to returnConnection. Both of those methods get a reentrant lock. This lock is responsible for almost half the time spent in these two methods. This is the cause of that taper in performance as the thread count increases, and contention for the lock increases as operations both borrow and return connections concurrently.
  17. Here is the issue in code. This is a parred down version of the ConnectionManagerImpl which implements the ConnectionManager. With the first arrow here I have highlighted that the available connections are being stored in a deque. Because a deque is not a thread-safe structure, the second arrow highlights the reentrant lock that was appearing in the profiler.
  18. I’m going to focus on the borrow operation when talking about this issue, but it is also an issue in the returnConnection method. There are also two signatures of borrowConnection, one of which looks for a connection to a specific server. The other just takes a timeout and gets a connection to any available server. This is the one that I’ll be focusing on from here on.
  19. So this is the borrowConntection method. The red arrows on the left highlights that the lock is held for a significant portion of the method. And note that this is a collapsed view of the method to fit on one page. Holding the lock for this long makes it difficult for multiple threads to use the connection pool at the same time. Another issue with this code is that there is an await in here. The await causes the thread to be paused until the condition has been met and a signal is received. During this time, the lock is returned. This means that it must be reacquired before the thread can continue. This further delays the return of a connection to the caller by, in the worst case, the duration of the timeout plus the time it takes to reacquire the locks in with contention.
  20. Let’s move on and discuss the solution to this issue. The first part is to replace the deque in the connection manager with something else. In order to introduce some modularity of this code, all of the behavior related to the available connections will be moved into another class, called the AvailableConnectionManager. This allows us to get rid of the lock in the connection manager. This is due to the implementation of AvailableConnectionManager.
  21. This is the signature for the AvailableConnectionManager. As you can see, the deque has been replaced with a concurrent linked deque. The linked nature of the deque does provide some performance hits due to the need to allocate and garbage collect the nodes. This structure relies on Compare And Swap for a lock-free implementation, making the ConcurrentLinkedDeque the ideal choice for this implementation.
  22. With that change in mind, this is what borrowConnection() in the ConnectionManager looks like now. There are no locks in this method. Instead, we call useFirst on the available connection manager, with a predicate to get a connection to the server that we want.
  23. And this is the implementation of useFirst. There is still no locking in this method, and removeFirstOccurence is thread-safe, meaning that with a sufficiently large pool of connections, scaling should continue well past 32 threads on the client.
  24. To test if this new solution has other hot spots, we can use a profiler. And this time, you can see that the operation still results in an execution on the server, which calls both borrow and return connection, but both of those calls take 0-1% of the time spent in those methods. This provides good confidence that this implementation does not have a performance bottleneck.
  25. The new implementation of ConnectionManagerImpl and AvailableConnectionManager have been thoroughly tested at every level. I’m sure most of you are familiar with the concepts of unit and integration testing. But this has also been tested in three other ways. Distributed tests test how the connection manager behaves in a real Geode cluster. A cluster is spun up in several VMs and operations are run, causing connections to be created and destroyed, borrowed and returned. The next type of test is the Concurrency Test. For concurrency testing, an executor is given multiple threads to be run in parallel, applying pressure to the connection manager to test that certain timings do not result in concurrency issues. And finally, the testing that we’ve been talking about this whole time, performance testing.
  26. These are the results of the performance test, comparing the commit before the refactor with the refactor code. As you can see, this one commit results in a 239% increase in PartitionedGetBenchmark. And with this run of the tests the CPU of the client was saturated.
  27. Here is how the performance scales with the number of threads on the client in version 1.10.0, which includes the connection pool refactor as well as several other smaller refactors. As you can see, scaling continues significantly beyond 32 threads.
  28. So now let’s quickly look at a couple of other performance bottlenecks, starting with over eager allocations. What I mean by that is allocation objects long before they are used, resulting in excess garbage production. So once again, ignore most of the code here, and focus on the highlighted areas. Note that the declaration of the attemptedServers object (the first highlighted aread) occurs well before the first use of that object, the second yellow highlighted area. And since there is an early return, highlighted in green, between the declaration and the first usage, there is a chance that the object could be allocated and garbage collected without ever having been used. And in this case, a HashMap is being allocated, which results in one HashSet and one HashMap, creating a significant amount of garbage.
  29. The best way to avoid this issue is to allocate close to the first use of that object. And make sure that you’re allocating before early returns that would allow you to avoid allocating the object in the first place.
  30. Another common performance bottleneck is caused by choosing the wrong structure for the implementation. In this case, a linked list is used. This code is a hot path, and the borrowConnection and returnConnection are each called once per operation. This means that each operation results in one allocation of a node and one dereference of a node.
  31. In this case, a deque is a better choice, since the connection pool is of relatively constant size, the deque will not need to be resized very often. This shows the importance for performance of understanding your data structures.
  32. So to wrap things up, let’s talk about how much performance has improved.
  33. This is a graph of the throughputs of our four tests in version 1.10.0 compared to version 1.9.0. Each of these tests saw a significant improvement in performance due to the connection pool and other refactors.
  34. This is a similar graph to the previous slide but for latency. It shows that latency was also reduced by a significant amount between versions 1.9.0 and 1.10.0.
  35. So why should you upgrade to Geode 1.10.0?
  36. Well I think we’d all rather see a vsd output like the red line instead of the blue.
  37. Finally, I’d like to point you to some useful resources. We’d love to have more people use these benchmarks. There are instructions for running, and adding new benchmarks, in the benchmarks repository. We also have a great list of performance bottlenecks that we’ve found in our investigations but have not been able to prioritize. If you’re interested in working on performance issues, check out this JIRA search.