There are hundreds of possible databases you can choose from today. Yet if you draw up a short list of critical criteria related to performance and scalability for your use case, the field of choices narrows and your evaluation decision becomes much easier.
In this session, we’ll explore 5 essential factors to consider when selecting a high performance low latency database, including options, opportunities, and tradeoffs related to software architecture, hardware utilization, interoperability, RASP, and Deployment.
5 Factors When Selecting a High Performance, Low Latency Database
1. 5 Factors When Selecting a
High Performance, Low
Latency Database
Peter Corless — Director of Technical Advocacy, ScyllaDB
Arthur Pesa — Solutions Architect, ScyllaDB
2. Brought to you by
VIRTUAL EVENT | OCTOBER 19 + 20
All Things Performance
The event for developers who care about P99
percentiles and high-performance, low-latency
applications.
Register at p99conf.io
4. 5 Factors When Selecting a
High Performance, Low
Latency Database
Peter Corless — Director of Technical Advocacy, ScyllaDB
Arthur Pesa — Solutions Architect, ScyllaDB
5. Introductions
Peter Corless, Director of Technical Advocacy, ScyllaDB
+ Editor of and frequent contributor to the ScyllaDB blog
+ Program chair for ScyllaDB Summit and P99 CONF
+ Host of ScyllaDB Masterclass series
+ @PeterCorless on Twitter
Arthur Pesa, Solutions Architect, ScyllaDB
+ Helps customers successfully implement databases
+ Formerly at Nike, DataStax, Columbia Sportswear
6. + Five Factors — What’s most important for making a database decision for your
organization?
+ ScyllaDB — How our big, fast NoSQL database holds up against these
considerations
What We’ll Talk About
7. + “SQL vs. NoSQL” — If you need a table JOIN, you need a JOIN; if you need a
wide column, you need a wide column
+ 394 other database systems — Feel free to use these criteria compare to other
databases listed on DB-engines.com. Your Mileage May Vary (YMMV)
What We Won’t Talk About
9. + ScyllaDB is the database for data-intensive apps that require high performance and low
latency
+ ScyllaDB is a wide-column NoSQL database compatible with Apache Cassandra CQL &
Amazon DynamoDB APIs — only much faster
+ ScyllaDB, the company, started in 2016
+ ScyllaDB, the database, is available as Open Source, Enterprise and Cloud
ScyllaDB Intro
10. + Infoworld 2020 Technology of the Year!
+ Founded by designers of KVM Hypervisor
The Database Built for Gamechangers
10
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
+ Resolves challenges of legacy NoSQL databases
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ DBaaS/Cloud, Enterprise and Open Source solutions
+ Proven globally at scale
11. 11
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Fast computation of flight
pricing
Corporate fleet
management
Real-time analytics
2,000,000 SKU -commerce
management
Real-time location tracking
for friends/family
Video recommendation
management
IoT for industrial
machines
Synchronize browser
properties for millions
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Distributed storage for
distributed ledger tech
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
13. 1. Software Architecture — Does the database use the most efficient data structures, flexible
data models, and rich query languages to support your workloads and query patterns?
2. Hardware Utilization — Can it take full advantage of modern hardware platforms? Or will
you be leaving a significant amount of CPU cycles underutilized?
3. Interoperability — How easy is it to integrate into your development environment? Does it
support your programming languages, frameworks and projects? Was it designed to
integrate into your microservices and event streaming architecture?
4. RASP — Does it have the necessary qualities of Reliability, Availability, Scalability,
Serviceability and, of course, Performance?
5. Deployment — Does this database only work in a limited environment, such as only
on-premises, or only in a single datacenter or a single cloud vendor? Or does it lend itself to
being deployed exactly where and how you want around the globe?
5 Factors When Selecting a High
Performance, Low Latency Database
14. Does the database use the most efficient data structures, flexible data models, and
rich query languages to support your workloads and query patterns?
+ Workload — Transactional or Analytical? Hybrid?
+ Data Model — Key-Value, Wide Column, Column Store, Document, Graph, RDBMS, or other?
+ Query Language — SQL, SQL-like (CQL), JSON, or other?
+ Transactions/Operations/CAP — Which is more important, Consistency or Availability?
+ Data Distribution — Multi-datacenter or local clustering? Cross-cluster updates?
Software Architecture
15. Can it take full advantage of modern hardware platforms? Or will you be leaving a
significant amount of CPU cycles underutilized?
+ CPU utilization / efficiency — Process distribution; single- or multi-threading
+ RAM utilization / efficiency — read path and write path; caching; [JVM, heap tuning, etc.]
+ Storage utilization / efficiency — storage format, mutability, concurrency, tiering
+ Network utilization / efficiency — client/server vs. intra-cluster communications
Hardware Utilization
16. How easy is it to integrate into your development environment? Does it support your
programming languages, frameworks and projects? Was it designed to integrate into
your microservices and event streaming architecture?
+ Programming Languages/Frameworks — Clients, Libraries, SDKs, ORMs, Packages
+ Event Streaming/Message Queuing — Sink and/or Source, Kafka, Pulsar, RabbitMQ
+ APIs — RESTful, GraphQL, microservices
+ Other — e.g., Pluggable storage layer [ex: JanusGraph]
Interoperability
17. Does it have the necessary qualities of Reliability, Availability, Scalability, Serviceability
and, of course, Performance?
+ Reliability — Durability, Survivability, Guardrails
+ Availability — “Five Nines”
+ Scalability — Capacity, Elasticity
+ Serviceability — Manageability, Observability, Usability
+ Performance — Throughput, latency
RASP
18. Does this database only work in a limited environment, such as only on-premises, or
only in a single datacenter or a single cloud vendor? Or does it lend itself to being
deployed exactly where and how you want around the globe?
+ Cloud Vendor Lock-in?
+ On-Prem Deployable?
+ Kubernetes (k8s)
+ Multi-Cloud
Deployment
20. + Architected from the ground up based on Seastar
+ Seastar is an advanced, open-source C++ framework for high-performance server
applications on modern hardware.
+ Seastar uses a shared-nothing model that shards all requests onto individual cores.
+ Seastar is designed for sharing information between CPU cores without time-consuming
locking.
+ Seastar is the differentiator that allows ScyllaDB to run on hardware and not inside the
JVM
1. ScyllaDB Architecture
21. + ScyllaDB supports the Apache Cassandra CQL query language
+ If you're a Cassandra user today you will have the same experience when using CQL
in both CQLsh and your API’s
+ ScyllaDB also supports a DynamoDB-compatible API, called “Alternator”
+ Also supports DynamoDB Streams (“Alternator Streams”)
Cassandra CQL & DynamoDB Queries
22. + Wide Column NoSQL
+ “Key-Key-Value” row store (Partition Key, Clustering Key)
+ Highly optimized for OLTP workloads.
+ Do not be confused with “columnar stores” like Clickhouse, Druid or Pinot (OLAP-oriented)
+ Designed for extremely fast data access
+ Data is ordered in each table based on Clustering Key(s)
+ Data retrieval speeds measured in single digit ms
+ Use case based Data Modeling - single table per query
+ ScyllaDB employs Indexing, Secondary Indexing and Materialized Views that are far
superior in performance over Cassandra
Data Model
24. + Shard-per-core — each vCPU assigned its own data partitions
+ NUMA-aware — each vCPU also assigned its own RAM
+ Single-threaded per vCPU
+ Custom CPU and IO schedulers
Shard-per-Core Software Architecture
25. + Linear scalability for the latest cloud computing hardware
+ I4i.metal: 128 vCPUs, 1 TB RAM, 30 TB NVMe SSD per node
+ I3en.metal: up to 60 TB NVMe SSD per node
+ iotune and Diskplorer
+ Optimizing NVMe SSD
+ CPU + IO Schedulers
+ Best utilization of HW
2. Maximize Hardware Utilization
I3en I4i
27. CQL
+ ScyllaDB is a Shard per Core Architecture and has its own Shard Aware Drivers
+ Better utilizes ScyllaDB built-in efficiencies
+ Shard Aware drivers are available in Rust, Python, Go, and C++
+ ScyllaDB supports drivers that utilize standard Apache Cassandra Native Transport
+ Drivers exist for most every programming language in use today.
DynamoDB API
+ ScyllaDB has its own DynamoDB API called Alternator that allows you to plug your
current DynamoDB based API directly into ScyllaDB Alternator
+ ScyllaDB can use any of the AWS SDKs for DynamoDB without modification
Programming Languages / Drivers
28. + Kafka Sink Connector — Shard-Aware, optimized for ScyllaDB
+ Kafka Source Connector — based on Debezium
Event Streaming
29. 4. RASP
+ Reliability
+ Partition Tolerant, You can lose a node and still handle traffic.
+ “I just want the thing to run without any babysitting at all.”
+ Availability
+ Always on architecture, tunable consistency
+ Scalability
+ When needed you can add more nodes
+ Vertical as well as horizontal scalability — any number of vCPUs, and amount of TBs of SSD
+ Serviceability
+ ScyllaDB Monitoring Stack — real time observability makes identifying problems simple
+ ScyllaDB Manager — for backups and repairs
+ Performance
+ Millions of ops per second at single-digit ms P99 latencies
+ Allows full usage of available resources, CPU, Memory and Storage
30. ScyllaDB Open Source ScyllaDB Enterprise
ScyllaDB Operator for k8s
ScyllaDB Cloud
5. Deployment
On Premises
or
Any Cloud
31. Poll
How much data do you under management of your
transactional database?
32. Q&A
WANT TO KEEP LEARNING?
Join ScyllaDB University for Free:
university.scylladb.com
SCYLLADB VIRTUAL WORKSHOP
Getting Started with ScyllaDB
29 September, 2022, 12PM GMT | 8 AM ET | 5:30 PM IST
33. Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/