SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla
Performance Toolbox
ScyllaDB
Avi Kivity
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Understanding environment
and application impact
on performance
CTO, ScyllaDB
Avi Kivity
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Avi Kivity
3
KVM hypervisor author and ex-maintainer
ScyllaDB co-founder and CTO
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Agenda
4
▪ Environment
▪ Tracing
▪ Metrics
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Environment
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Environment
▪ Networking
▪ Disk interrupts
▪ Disk write cache
▪ Virtualization and containers
6
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Networking model (multiqueue)
7
NIC
OS/HW
Core Core Core Core Core Core
Rx Queue
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Networking model (singlequeue)
8
NIC
OS/HW
Core Core Core Core Core Core
Rx Queue
S/W Rx Queue
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Networking model (hybrid)
▪ Each core group is assigned a single hardware queue
▪ One core in core group handles networking
▪ Useful when too few hardware queues
▪ Too difficult to draw
9
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
How is the networking model configured?
▪ Determined by scylla_setup based on the hardware
▪ Stored in /etc/scylla.d/perftune.yaml
10
$ cat /etc/scylla.d/perftune.yaml
cpu_mask: '0x000000ff'
mode: mq
nic: eth0
tune:
- net
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Unbalanced networking
top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16
Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie
%Cpu0 : 34.3 us, 17.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 6.1 hi, 42.6 si, 0.0 st
%Cpu1 : 33.0 us, 5.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.6 hi, 2.3 si, 0.0 st
%Cpu2 : 40.3 us, 4.3 sy, 0.0 ni, 52.2 id, 0.0 wa, 0.1 hi, 3.1 si, 0.0 st
%Cpu3 : 37.3 us, 5.7 sy, 0.0 ni, 54.7 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st
%Cpu4 : 31.0 us, 4.3 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.2 hi, 2.7 si, 0.0 st
%Cpu5 : 41.3 us, 5.3 sy, 0.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 3.5 si, 0.0 st
%Cpu6 : 31.0 us, 4.3 sy, 0.0 ni, 62.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu7 : 34.0 us, 2.3 sy, 0.0 ni, 59.4 id, 0.0 wa, 0.2 hi, 4.1 si, 0.0 st
KiB Mem : 62882836 total, 61356464 free, 1129072 used, 397300 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 61124456 avail Mem
11
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Disk write cache - write back cache
Write-back cache
▪ Scylla writes to disk
▪ Disk places data in DRAM cache, and acknowledges
▪ Disk initiates data write to actual SSD in background
▪ Scylla asks disk to verify that the data made it to non-volatile
storage
▪ Disk waits until background write completes
o Potential stall
12
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
STALL
Disk write cache - write back
13
Scylla
Disk controller
Media
Write
Media
access
FlushACK
Media
access
complete
ACK
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Disk write cache - write back cache
Write-back cache
▪ Scylla writes to disk
▪ Disk places data in DRAM cache, and acknowledges
▪ Disk initiates data write to actual SSD in background
▪ Scylla asks disk to verify that the data made it to non-volatile
storage
▪ Disk does not wait until background write completes
o No stall
14
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Disk write cache - write back
15
Scylla
Disk controller
Media
Write
Media
access
Flush
ACK
Media
access
complete
ACK
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Beware of iowait
▪ iowait caused by pushing XFS out of its comfort zone
16
top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16
Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie
%Cpu0 : 34.1 us, 10.2 sy, 0.0 ni, 0.0 id, 47.0 wa, 6.1 hi, 2.6 si, 0.0 st
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Tracing
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Types of tracing
▪ Single-shot
▪ Probabilistic
▪ Slow query
18
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Single-shot tracing
▪ Useful for gaining an understanding of a query during
development
▪ Issue from cqlsh
19
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Probabilistic tracing
▪ Useful to gain an insight about what the application is doing
▪ Controlled by nodetool
▪ Start with very low probability to avoid disturbing the workload
20
$ nodetool settraceprobability 0.000001
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Slow-query logging
▪ Catch that long (and slow) tail
▪ Caution: a slow query can interfere with fast queries
21
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Metrics
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Metrics overview
▪ Aggregated vs. Shard metrics
▪ CPU metrics
▪ I/O metrics
▪ Coordinator-side metrics
▪ Replica-side metrics
23
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Zooming into aggregated metrics
▪ Start with cluster-level view
▪ Look at individual nodes
o Cluster runs at speed of slowest node
▪ Look at individual shards
o Node runs at speed of slowest shard
24
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
CPU metrics
▪ Utilization / load
o For throughput load, should achieve 100%
o If not
• Does one shard reach 100% and the others don’t?
– Hot partition
– Check networking environment
• Sufficient client concurrency?
25
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
I/O Queue metrics
I/O by type of operation: query, compaction, commitlog
▪ Bandwidth, IOPS (and average size)
▪ Delay
▪ Correlates with iostat command output
26
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Coordinator-side metrics
▪ CQL requests per second
▪ CQL connections and their distribution
o High connection open rate?
o Sufficient connections per shard?
o Bad connection distribution?
▪ Statements prepared
o Is the client using prepared statements correctly?
▪ Foreground reads and writes
▪ Background reads and writes
▪ Reconciliation
27
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Replica-side metrics
▪ Reads and writes - hot shard, hot node
▪ Cache hits/misses - compare with expectations
▪ Cache total memory - watch for sudden drops
▪ Active SSTable reads - high value indicates weak I/O
▪ Queued SSTable reads - high value indicates weak I/O
▪ Current compactions
28
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Summary
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Summary
▪ Many moving parts
▪ Despite automation, things can go wrong
▪ Application may get things wrong
▪ Need combination of methodical approach and intuition
▪ Engage the developers so we can improve things
30
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
avi@scylladb.com
@AviKivity
Please stay in touch
Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
 
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
 
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQL
 
Scylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking ahead
 
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDsScylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
 
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
 
Scylla Summit 2017: Distributed Materialized Views
Scylla Summit 2017: Distributed Materialized ViewsScylla Summit 2017: Distributed Materialized Views
Scylla Summit 2017: Distributed Materialized Views
 
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
 
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data PlatformScylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
Scylla Summit 2017: How Baidu Runs Scylla on a Petabyte-Level Big Data Platform
 
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPS
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPSScylla Summit 2017: Running a Soft Real-time Service at One Million QPS
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPS
 
If You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesIf You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined Types
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum Performance
 
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load BalancingScylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
Scylla Summit 2017: A Deep Dive on Heat Weighted Load Balancing
 
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
Scylla Summit 2017: Stretching Scylla Silly: The Datastore of a Graph Databas...
 
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
 
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
 
Scylla Summit 2017: SMF: The Fastest RPC in the West
Scylla Summit 2017: SMF: The Fastest RPC in the WestScylla Summit 2017: SMF: The Fastest RPC in the West
Scylla Summit 2017: SMF: The Fastest RPC in the West
 
Scylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC EvolutionScylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: The Upcoming HPC Evolution
 

Andere mochten auch

Andere mochten auch (9)

Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
 
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of ViewScylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View
 
Scylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes NativeScylla Summit 2016: Keynote - Big Data Goes Native
Scylla Summit 2016: Keynote - Big Data Goes Native
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 

Ähnlich wie Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
PostgreSQL Experts, Inc.
 

Ähnlich wie Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field (20)

Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...
Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...
Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
 
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
 
A Three-Tier Load Testing Program Saved Our Bacon
A Three-Tier Load Testing Program Saved Our BaconA Three-Tier Load Testing Program Saved Our Bacon
A Three-Tier Load Testing Program Saved Our Bacon
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
 
Database story by DevOps
Database story by DevOpsDatabase story by DevOps
Database story by DevOps
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
 
SharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonoughSharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonough
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware AccelerationHTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
HTAP By Accident: Getting More From PostgreSQL Using Hardware Acceleration
 
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
 
Hotsos 2012
Hotsos 2012Hotsos 2012
Hotsos 2012
 

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

  • 1. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Scylla Performance Toolbox ScyllaDB Avi Kivity
  • 2. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Understanding environment and application impact on performance CTO, ScyllaDB Avi Kivity
  • 3. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Avi Kivity 3 KVM hypervisor author and ex-maintainer ScyllaDB co-founder and CTO
  • 4. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Agenda 4 ▪ Environment ▪ Tracing ▪ Metrics
  • 5. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Environment
  • 6. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Environment ▪ Networking ▪ Disk interrupts ▪ Disk write cache ▪ Virtualization and containers 6
  • 7. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Networking model (multiqueue) 7 NIC OS/HW Core Core Core Core Core Core Rx Queue
  • 8. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Networking model (singlequeue) 8 NIC OS/HW Core Core Core Core Core Core Rx Queue S/W Rx Queue
  • 9. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Networking model (hybrid) ▪ Each core group is assigned a single hardware queue ▪ One core in core group handles networking ▪ Useful when too few hardware queues ▪ Too difficult to draw 9
  • 10. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company How is the networking model configured? ▪ Determined by scylla_setup based on the hardware ▪ Stored in /etc/scylla.d/perftune.yaml 10 $ cat /etc/scylla.d/perftune.yaml cpu_mask: '0x000000ff' mode: mq nic: eth0 tune: - net
  • 11. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Unbalanced networking top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16 Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie %Cpu0 : 34.3 us, 17.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 6.1 hi, 42.6 si, 0.0 st %Cpu1 : 33.0 us, 5.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.6 hi, 2.3 si, 0.0 st %Cpu2 : 40.3 us, 4.3 sy, 0.0 ni, 52.2 id, 0.0 wa, 0.1 hi, 3.1 si, 0.0 st %Cpu3 : 37.3 us, 5.7 sy, 0.0 ni, 54.7 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st %Cpu4 : 31.0 us, 4.3 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.2 hi, 2.7 si, 0.0 st %Cpu5 : 41.3 us, 5.3 sy, 0.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 3.5 si, 0.0 st %Cpu6 : 31.0 us, 4.3 sy, 0.0 ni, 62.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st %Cpu7 : 34.0 us, 2.3 sy, 0.0 ni, 59.4 id, 0.0 wa, 0.2 hi, 4.1 si, 0.0 st KiB Mem : 62882836 total, 61356464 free, 1129072 used, 397300 buff/cache KiB Swap: 0 total, 0 free, 0 used. 61124456 avail Mem 11
  • 12. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Disk write cache - write back cache Write-back cache ▪ Scylla writes to disk ▪ Disk places data in DRAM cache, and acknowledges ▪ Disk initiates data write to actual SSD in background ▪ Scylla asks disk to verify that the data made it to non-volatile storage ▪ Disk waits until background write completes o Potential stall 12
  • 13. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company STALL Disk write cache - write back 13 Scylla Disk controller Media Write Media access FlushACK Media access complete ACK
  • 14. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Disk write cache - write back cache Write-back cache ▪ Scylla writes to disk ▪ Disk places data in DRAM cache, and acknowledges ▪ Disk initiates data write to actual SSD in background ▪ Scylla asks disk to verify that the data made it to non-volatile storage ▪ Disk does not wait until background write completes o No stall 14
  • 15. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Disk write cache - write back 15 Scylla Disk controller Media Write Media access Flush ACK Media access complete ACK
  • 16. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Beware of iowait ▪ iowait caused by pushing XFS out of its comfort zone 16 top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16 Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie %Cpu0 : 34.1 us, 10.2 sy, 0.0 ni, 0.0 id, 47.0 wa, 6.1 hi, 2.6 si, 0.0 st
  • 17. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Tracing
  • 18. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Types of tracing ▪ Single-shot ▪ Probabilistic ▪ Slow query 18
  • 19. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Single-shot tracing ▪ Useful for gaining an understanding of a query during development ▪ Issue from cqlsh 19
  • 20. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Probabilistic tracing ▪ Useful to gain an insight about what the application is doing ▪ Controlled by nodetool ▪ Start with very low probability to avoid disturbing the workload 20 $ nodetool settraceprobability 0.000001
  • 21. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Slow-query logging ▪ Catch that long (and slow) tail ▪ Caution: a slow query can interfere with fast queries 21
  • 22. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Metrics
  • 23. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Metrics overview ▪ Aggregated vs. Shard metrics ▪ CPU metrics ▪ I/O metrics ▪ Coordinator-side metrics ▪ Replica-side metrics 23
  • 24. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Zooming into aggregated metrics ▪ Start with cluster-level view ▪ Look at individual nodes o Cluster runs at speed of slowest node ▪ Look at individual shards o Node runs at speed of slowest shard 24
  • 25. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company CPU metrics ▪ Utilization / load o For throughput load, should achieve 100% o If not • Does one shard reach 100% and the others don’t? – Hot partition – Check networking environment • Sufficient client concurrency? 25
  • 26. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company I/O Queue metrics I/O by type of operation: query, compaction, commitlog ▪ Bandwidth, IOPS (and average size) ▪ Delay ▪ Correlates with iostat command output 26
  • 27. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Coordinator-side metrics ▪ CQL requests per second ▪ CQL connections and their distribution o High connection open rate? o Sufficient connections per shard? o Bad connection distribution? ▪ Statements prepared o Is the client using prepared statements correctly? ▪ Foreground reads and writes ▪ Background reads and writes ▪ Reconciliation 27
  • 28. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Replica-side metrics ▪ Reads and writes - hot shard, hot node ▪ Cache hits/misses - compare with expectations ▪ Cache total memory - watch for sudden drops ▪ Active SSTable reads - high value indicates weak I/O ▪ Queued SSTable reads - high value indicates weak I/O ▪ Current compactions 28
  • 29. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Summary
  • 30. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Summary ▪ Many moving parts ▪ Despite automation, things can go wrong ▪ Application may get things wrong ▪ Need combination of methodical approach and intuition ▪ Engage the developers so we can improve things 30
  • 31. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company THANK YOU avi@scylladb.com @AviKivity Please stay in touch Any questions?