SlideShare ist ein Scribd-Unternehmen logo
1 von 36
High Performance NoSQL Masterclass
Scaling for Performance
Felipe Cardeneti Mendes
High Performance NoSQL Masterclass
Felipe Cardeneti Mendes
● Solution Architect at ScyllaDB
● Published Author
● Linux and Open Source enthusiast
High Performance NoSQL Masterclass
Agenda
● About ScyllaDB
● Getting started with NoSQL databases
● Deployment and Production Readiness
● Observability Tips and Tricks
High Performance NoSQL Masterclass
About ScyllaDB
High Performance NoSQL Masterclass
ScyllaDB Database Architecture
Horizontal & Vertical Scaling
Built in C++
(no Java overhead)
System and Data
Center Aware
Sharding Per Core Shard-Aware Drivers
Auto-Performance
Tuning
Network
Processor NUMA
Storage
Unique Close-to-Metal Architecture
High Performance NoSQL Masterclass
Shard per Core
Threads Shards
High Performance NoSQL Masterclass
Asynchronous Architecture
Request Answer
Request Answer
Waiting for response
Time Savings
Synchronous
architecture
Asynchronous
architecture
High Performance NoSQL Masterclass
Specialized Cache
Cassandra ScyllaDB
Key
cache
Row
cache
Linux page cache
SSTables
Unified cache
SSTables
Complex Tuning
On-heap /
Off-heap
High Performance NoSQL Masterclass
Ecosystem Compatibility
+ CQL native protocol
+ JMX management protocol
+ Management command line
/REST
+ SSTable file format
+ Configuration file format
+ CQL language
High Performance NoSQL Masterclass
Getting Started with NoSQL
Databases
High Performance NoSQL Masterclass
Modern Business Challenges
Keep CapEx
& OpEx in check
Reduce complexity
Scale as the
data grows
Queries in milliseconds
Leverage massive
amounts of data
Predictable, consistent
performance
High Performance NoSQL Masterclass
Oh… The CAP Theorem
High Performance NoSQL Masterclass
Workload Types
High Performance NoSQL Masterclass
Workload Types
Decision Support
+ More complex queries - large amounts
+ Latency important but not critical
+ Seconds to hours
Fundamental business tasks
+ Simple queries
+ Latency critical
+ Milliseconds per transaction
OLAP
Time
Complexity
of
Query
Time
Complexity
of
Query
OLTP
High Performance NoSQL Masterclass
OLAP Characteristics
Main Characteristics
+ Bound Concurrency
+ Scans and aggregations
+ Rely on MapReduce paradigms
OLAP
Time
Complexity
of
Query
Examples
+ How many users are from the US?
+ How many Twitter posts happened in 2022?
+ Which devices haven’t communicated back
within the past 1 hour?
High Performance NoSQL Masterclass
OLTP Characteristics
Main Characteristics
+ Unbound Concurrency
+ Designed for speed and simplicity
+ Often user facing APIs
Examples
+ What’s the last time an user logged in?
+ What have been the last 10 temperature
measurements for a given device?
+ How many likes a given posting has?
Time
Complexity
of
Query
OLTP
High Performance NoSQL Masterclass
Why not both? Meet Workload Prioritization
100 shares
Ratio = 100:100 (1:1) means equal shares of
processing/resources to complete tasks
Ratio = 100:50 (2:1) means 2X as many shares of processing/resources
for Transactions to complete tasks compared to Analytics
100 shares
100 shares
50 shares
OLTP
OLAP
Which Task to Run
Wide Column Databases Write Path
LSM storage engine’s write path:
18
Writes
commit log
Wide Column Databases Write Path
LSM storage engine’s write path:
19
Writes
commit log
Wide Column Databases Write Path
LSM storage engine’s write path:
20
Writes
commit log
Wide Column Databases Write Path
LSM storage engine’s write path:
21
Writes
commit log
compaction
Wide Column Databases Write Path
LSM storage engine’s write path:
22
Writes
commit log
compaction
Wide Column Databases Write Path
LSM storage engine’s write path:
23
Writes
commit log
What is compaction?
LSM storage engine’s write path:
24
Hidden Gems
+ This technique of keeping sorted files and merging them is well-known and
often called Log-Structured Merge (LSM) Tree
+ Published in 1996, earliest popular application known is the Lucene search
engine, 1999
Characteristics
+ High performance write.
+ Immediately readable.
+ Reasonable performance for read.
What is a compaction strategy?
LSM storage engine’s write path:
25
▪ Which files to compact, and when?
▪ This is called the compaction strategy
▪ The goal of the strategy is low amplification:
○ Avoid read requests needing many sstables.
• read amplification
○ Avoid overwritten/deleted/expired data staying on disk.
○ Avoid excessive temporary disk space needs
• space amplification
○ Avoid compacting the same data again and again.
• write amplification
Which one to choose?
LSM storage engine’s write path:
26
Know your workload
High Performance NoSQL Masterclass
Deployment and Production
Readiness
It all starts with Data Modeling
LSM storage engine’s write path:
28
Do’s
+ Denormalize
+ Query oriented approach
+ High data distribution / cardinality
Dont’s
+ Create hotspots
+ Large partitions/rows/cells/collections/etc
+ Low cardinality tables/views/indexes
Test, test, test…!
LSM storage engine’s write path:
29
Unit Testing
+ Test your workload and access patterns in a Docker container
+ Use specialized stress tools to simulate workload
○ cassandra-stress
○ nosqlbench
○ YCSB
+ OBSERVE the results (more on that later)
Application Development
LSM storage engine’s write path:
30
Functional Testing
+ Use Prepared Statements
+ Configure your routing and load balancing policy correctly
○ DCAware and TokenAware policies
○ ShardAware Drivers if using ScyllaDB!
+ Make use of Asynchronous APIs
+ Ensure your client is NOT a bottleneck
+ Paging is important: Ensure you adjust it right
Test, test, test…!
LSM storage engine’s write path:
31
Readiness Testing
+ What’s the unreplicated data set size?
+ How many operations per second do I need to achieve?
○ Out of these, what are the reads vs writes distribution?
○ What’s the average payload size?
○ What are my latency requirements?
+ How many regions should it replicate to?
+ Do I need indexes or views to satisfy my queries?
+ What is/are the target deployment location(s)?
+ Is the use case growth predictable or unpredictable?
+ What are the data retention requirements?
+ Is the use case storage or CPU bound?
Oh mighty sizing…!
LSM storage engine’s write path:
32
A Sizing Exercise
+ 500k ops/sec with 1KB rows
+ 5TB data set size
+ P99 reads and writes < 10ms
+ Target deployment region: AWS
Simple Math
+ RF=3 / 6 nodes * 5TB = 2.5TB per node
+ 500k ops/sec / 12,5K ops/core = 40 physical cores
+ Result: 6 nodes of i4i.8xlarge
High Performance NoSQL Masterclass
Observability Tips and Tricks
High Performance NoSQL Masterclass
ScyllaDB Monitoring Architecture
High Performance NoSQL Masterclass
ScyllaDB Monitoring Architecture
High Performance NoSQL Masterclass
Keep in touch!
Felipe Cardeneti Mendes
Solutions Architect
ScyllaDB
felipemendes@scylladb.co
m
Find me on LinkedIn

Weitere ähnliche Inhalte

Was ist angesagt?

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Was ist angesagt? (20)

RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Using Redis at Facebook
Using Redis at FacebookUsing Redis at Facebook
Using Redis at Facebook
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low LatencyAggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Learning postgresql
Learning postgresqlLearning postgresql
Learning postgresql
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 

Ähnlich wie Scaling for Performance

Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Ähnlich wie Scaling for Performance (20)

Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
How SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the GameHow SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the Game
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
The roadmap for sql server 2019
The roadmap for sql server 2019The roadmap for sql server 2019
The roadmap for sql server 2019
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
In-memory ColumnStore Index
In-memory ColumnStore IndexIn-memory ColumnStore Index
In-memory ColumnStore Index
 
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Tips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloudTips to drive maria db cluster performance for nextcloud
Tips to drive maria db cluster performance for nextcloud
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Brk3288 sql server v.next with support on linux, windows and containers was...
Brk3288 sql server v.next with support on linux, windows and containers   was...Brk3288 sql server v.next with support on linux, windows and containers   was...
Brk3288 sql server v.next with support on linux, windows and containers was...
 
AWS glue technical enablement training
AWS glue technical enablement trainingAWS glue technical enablement training
AWS glue technical enablement training
 
Using Redgate, AKS and Azure to bring DevOps to your database
Using Redgate, AKS and Azure to bring DevOps to your databaseUsing Redgate, AKS and Azure to bring DevOps to your database
Using Redgate, AKS and Azure to bring DevOps to your database
 

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Scaling for Performance

  • 1. High Performance NoSQL Masterclass Scaling for Performance Felipe Cardeneti Mendes
  • 2. High Performance NoSQL Masterclass Felipe Cardeneti Mendes ● Solution Architect at ScyllaDB ● Published Author ● Linux and Open Source enthusiast
  • 3. High Performance NoSQL Masterclass Agenda ● About ScyllaDB ● Getting started with NoSQL databases ● Deployment and Production Readiness ● Observability Tips and Tricks
  • 4. High Performance NoSQL Masterclass About ScyllaDB
  • 5. High Performance NoSQL Masterclass ScyllaDB Database Architecture Horizontal & Vertical Scaling Built in C++ (no Java overhead) System and Data Center Aware Sharding Per Core Shard-Aware Drivers Auto-Performance Tuning Network Processor NUMA Storage Unique Close-to-Metal Architecture
  • 6. High Performance NoSQL Masterclass Shard per Core Threads Shards
  • 7. High Performance NoSQL Masterclass Asynchronous Architecture Request Answer Request Answer Waiting for response Time Savings Synchronous architecture Asynchronous architecture
  • 8. High Performance NoSQL Masterclass Specialized Cache Cassandra ScyllaDB Key cache Row cache Linux page cache SSTables Unified cache SSTables Complex Tuning On-heap / Off-heap
  • 9. High Performance NoSQL Masterclass Ecosystem Compatibility + CQL native protocol + JMX management protocol + Management command line /REST + SSTable file format + Configuration file format + CQL language
  • 10. High Performance NoSQL Masterclass Getting Started with NoSQL Databases
  • 11. High Performance NoSQL Masterclass Modern Business Challenges Keep CapEx & OpEx in check Reduce complexity Scale as the data grows Queries in milliseconds Leverage massive amounts of data Predictable, consistent performance
  • 12. High Performance NoSQL Masterclass Oh… The CAP Theorem
  • 13. High Performance NoSQL Masterclass Workload Types
  • 14. High Performance NoSQL Masterclass Workload Types Decision Support + More complex queries - large amounts + Latency important but not critical + Seconds to hours Fundamental business tasks + Simple queries + Latency critical + Milliseconds per transaction OLAP Time Complexity of Query Time Complexity of Query OLTP
  • 15. High Performance NoSQL Masterclass OLAP Characteristics Main Characteristics + Bound Concurrency + Scans and aggregations + Rely on MapReduce paradigms OLAP Time Complexity of Query Examples + How many users are from the US? + How many Twitter posts happened in 2022? + Which devices haven’t communicated back within the past 1 hour?
  • 16. High Performance NoSQL Masterclass OLTP Characteristics Main Characteristics + Unbound Concurrency + Designed for speed and simplicity + Often user facing APIs Examples + What’s the last time an user logged in? + What have been the last 10 temperature measurements for a given device? + How many likes a given posting has? Time Complexity of Query OLTP
  • 17. High Performance NoSQL Masterclass Why not both? Meet Workload Prioritization 100 shares Ratio = 100:100 (1:1) means equal shares of processing/resources to complete tasks Ratio = 100:50 (2:1) means 2X as many shares of processing/resources for Transactions to complete tasks compared to Analytics 100 shares 100 shares 50 shares OLTP OLAP Which Task to Run
  • 18. Wide Column Databases Write Path LSM storage engine’s write path: 18 Writes commit log
  • 19. Wide Column Databases Write Path LSM storage engine’s write path: 19 Writes commit log
  • 20. Wide Column Databases Write Path LSM storage engine’s write path: 20 Writes commit log
  • 21. Wide Column Databases Write Path LSM storage engine’s write path: 21 Writes commit log compaction
  • 22. Wide Column Databases Write Path LSM storage engine’s write path: 22 Writes commit log compaction
  • 23. Wide Column Databases Write Path LSM storage engine’s write path: 23 Writes commit log
  • 24. What is compaction? LSM storage engine’s write path: 24 Hidden Gems + This technique of keeping sorted files and merging them is well-known and often called Log-Structured Merge (LSM) Tree + Published in 1996, earliest popular application known is the Lucene search engine, 1999 Characteristics + High performance write. + Immediately readable. + Reasonable performance for read.
  • 25. What is a compaction strategy? LSM storage engine’s write path: 25 ▪ Which files to compact, and when? ▪ This is called the compaction strategy ▪ The goal of the strategy is low amplification: ○ Avoid read requests needing many sstables. • read amplification ○ Avoid overwritten/deleted/expired data staying on disk. ○ Avoid excessive temporary disk space needs • space amplification ○ Avoid compacting the same data again and again. • write amplification
  • 26. Which one to choose? LSM storage engine’s write path: 26 Know your workload
  • 27. High Performance NoSQL Masterclass Deployment and Production Readiness
  • 28. It all starts with Data Modeling LSM storage engine’s write path: 28 Do’s + Denormalize + Query oriented approach + High data distribution / cardinality Dont’s + Create hotspots + Large partitions/rows/cells/collections/etc + Low cardinality tables/views/indexes
  • 29. Test, test, test…! LSM storage engine’s write path: 29 Unit Testing + Test your workload and access patterns in a Docker container + Use specialized stress tools to simulate workload ○ cassandra-stress ○ nosqlbench ○ YCSB + OBSERVE the results (more on that later)
  • 30. Application Development LSM storage engine’s write path: 30 Functional Testing + Use Prepared Statements + Configure your routing and load balancing policy correctly ○ DCAware and TokenAware policies ○ ShardAware Drivers if using ScyllaDB! + Make use of Asynchronous APIs + Ensure your client is NOT a bottleneck + Paging is important: Ensure you adjust it right
  • 31. Test, test, test…! LSM storage engine’s write path: 31 Readiness Testing + What’s the unreplicated data set size? + How many operations per second do I need to achieve? ○ Out of these, what are the reads vs writes distribution? ○ What’s the average payload size? ○ What are my latency requirements? + How many regions should it replicate to? + Do I need indexes or views to satisfy my queries? + What is/are the target deployment location(s)? + Is the use case growth predictable or unpredictable? + What are the data retention requirements? + Is the use case storage or CPU bound?
  • 32. Oh mighty sizing…! LSM storage engine’s write path: 32 A Sizing Exercise + 500k ops/sec with 1KB rows + 5TB data set size + P99 reads and writes < 10ms + Target deployment region: AWS Simple Math + RF=3 / 6 nodes * 5TB = 2.5TB per node + 500k ops/sec / 12,5K ops/core = 40 physical cores + Result: 6 nodes of i4i.8xlarge
  • 33. High Performance NoSQL Masterclass Observability Tips and Tricks
  • 34. High Performance NoSQL Masterclass ScyllaDB Monitoring Architecture
  • 35. High Performance NoSQL Masterclass ScyllaDB Monitoring Architecture
  • 36. High Performance NoSQL Masterclass Keep in touch! Felipe Cardeneti Mendes Solutions Architect ScyllaDB felipemendes@scylladb.co m Find me on LinkedIn