The process of optimizing shard-aware drivers for ScyllaDB has involved several initiatives, often necessitating a complete rewrite from the ground up. Discover the efforts put into enhancing the performance of ScyllaDB drivers with a focus on Rust, and how its code base will serve as a foundation for drivers using other language bindings in the future. This session emphasizes the performance gains achieved by harnessing the power of the asynchronous Tokio framework as the backbone of a new, high-performance driver while thoughtfully architecting and optimizing various components of the driver.
3. Presenter
Piotr Grabowski, Software Team Leader, ScyllaDB
+ Software Team Leader at ScyllaDB
+ Responsible for all ScyllaDB drivers, ScyllaDB Kafka
Connectors (ScyllaDB Sink Connector and ScyllaDB CDC
Source Connector)
+ Joined ScyllaDB 2.5 years ago
4. + For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
4
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
5. + ScyllaDB runs only on Linux
+ We take advantage of many Linux-only APIs:
+ io_uring
+ (previously) epoll/aio
+ Avi Kivity, CTO and cofounder of ScyllaDB, began
the development of KVM in Linux kernel
+ Great performance and low latencies are our
focus, frequently looking into how ScyllaDB can
work more efficiently with Linux kernel
The Linux-native Database
5
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
6. 6
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M
transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance
for on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
13. Drivers 101
+ Drivers (in this presentation) - libraries that allow sending queries to
ScyllaDB
+ Primary protocol: CQL (Cassandra Query Language) protocol
+ TCP
+ ScyllaDB supports CQL v4
+ Frame-based protocol, supporting multiple streams
+ Supports LZ4 and Snappy compression
+ ScyllaDB drivers support shard awareness:
+ Driver can connect to a specific shard of ScyllaDB
14. Drivers 101 - Role of Drivers
+ The role of drivers:
+ Serialization/deserialization of CQL frames
+ Serialization/deserialization of ScyllaDB types
+ Querying and maintaining metadata about tables/nodes
+ Routing requests to correct nodes (and shards)
+ Sending request across network
+ Conveniently constructing and executing queries in your language of choice:
+ gocqlx
+ Java Driver’s Mapper interface
15. Drivers 101 - Performance
+ How can the driver improve performance?
+ Shard awareness: sending the query to a correct shard
+ Partitioners: ScyllaDB’s CDC (Change Data Capture) implements a custom
partitioner which determines a node to send the query to
+ LWT Optimization: consistently prefer a single replica when executing a
LWT query to avoid Paxos conflicts
+ Optimizing hot paths in the driver:
+ Serialization/deserialization
+ Routing code
+ Avoiding copies, allocations and locks
17. ScyllaDB Rust Driver
+ The idea was born during a hackathon in 2020
+ Over the last 3 years we continued the development
18.
19. ScyllaDB Rust Driver
+ The idea was born during a hackathon in 2020
+ Over the last 3 years we continued the development
+ Uses Tokio framework
+ The driver is now feature complete, supporting many advanced features:
+ Shard awareness
+ Asynchronous interface with support for large concurrency
+ Compression
+ All CQL types
+ Speculative execution
+ TLS support
20.
21.
22.
23.
24. ScyllaDB Rust Driver - Runtime
+ Async Rust is based on a quite unique future/promise model:
+ Running a function which returns a future does not automatically spawn an
asynchronous task, as in many other languages
+ Instead, async functions need a runtime to execute them
+ Which runtime to choose?
+ Tokio (http://tokio.rs) is a de-facto standard runtime for async Rust
projects.
We chose it due to its rich set of APIs, popularity and very active
community of developers and contributors.
25. ScyllaDB Rust Driver - API Design
+ A central component of our driver is a session, established once and then
used to communicate with Scylla. It has many customizable parameters,
but most of them have sensible defaults.
let uri = "127.0.0.1:9042";
let session: Session = SessionBuilder::new().known_node(uri).build().await?;
if let Some(rows) = session.query("SELECT a, b, c FROM ks.t", &[]).await?.rows {
for row in rows.into_typed::<(i32, i32, String)>() {
let (a, b, c) = row?;
println!("a, b, c: {}, {}, {}", a, b, c);
}
}
26. ScyllaDB Rust Driver - O(N²) in Tokio?
+ Issue raised by the author of latte - a benchmark tool for ScyllaDB and Cassandra
+ The driver had problems scaling with high concurrency of requests
+ We managed to identify a root cause in the implementation of FuturesUnordered, a
utility to gather many futures and wait for them
+ Due to cooperative scheduling in Tokio, it was possible for
FuturesUnordered to iterate over all futures each time
it was polled
+ A fix was merged to Tokio to limit the number of
Futures iterated over in each poll
27. ScyllaDB Rust Driver - Connection Management
Ability to customize the number of connections is critical for performance. Our driver
uses a default of 1 connection per shard, but can be customized to instead establish a
fixed number of connections, be it per node or per shard.
28. ScyllaDB Rust Driver - Shard Awareness
Scylla takes it even further - drivers can try to connect directly to a core which
owns a particular partition, which implies better latency. Shard awareness is built in
into Scylla Rust Driver from the start.
30. ScyllaDB Rust Driver - Load Balancing
SELECT * FROM table
WHERE partition_key = “R1250GS”
hash(“R1250GS”) = replica nodes
31. + Main goal: reduce number of allocations and atomic operations while
building the query plan, especially on the happy path:
+ Plan function was split to pick() and fallback() methods. This allowed to
better optimize the most common case, where only one node from the load
balancing plan is needed
+ Precomputation of replica sets:
+ A struct introduced that precomputes replica lists of a given strategies, and
provides O(1) access to desired replica slices
ScyllaDB Rust Driver - Load Balancing
Refactor
37. ScyllaDB Rust Driver - Other Efforts
+ Rack-aware load balancing
+ Reduce the cost of querying ScyllaDB nodes in other racks (corresponding
for example to AWS Availability Zones)
+ Reduce the latency by querying the nearest rack
+ Iterator-based deserialization
+ The current implementation deserializes row data into equivalent of
Vec<Vec<Option<CqlValue>>
+ Skip materializing all rows into vector, deserialize on-the-fly
+ Make great use of Rust lifetimes to guarantee memory safety
38. ScyllaDB Rust Driver - Iterator-based
Deserialization
+ Reworked Deserialization API
+ Solves performance issues and improves type safety
+ Old API marked as "Legacy" for backward compatibility
+ Problems with Current API
+ Inefficient representation with rows and vecs
+ Incomplete information for FromCqlVal and FromRow
+ New API with DeserializeCql and DeserializeRow
+ Allows on-demand deserialization, reducing allocations
+ More comprehensive type checking and improved deserialization
+ Migration from Legacy API
+ Mechanical changes for most use cases
+ Legacy and new API can be used simultaneously
39. ScyllaDB Rust Driver - Removing All
Allocations?
+ A community-started project, led by Joseph Perez (@wyfo) written from
scratch to have zero-copy deserialization, zero (or one) allocations per
request
+ Core ideas:
+ Query plan caching
+ Zero/one allocation per request
+ We are looking into incorporating the ideas shown in this project into
ScyllaDB Rust Driver
40. ScyllaDB Rust Driver - Profiling tools
Rust ecosystem makes it easy to look for performance issues in your
project. One of such tools is cargo flamegraph, a utility for creating
flamegraphs, which can be examined to see if any function calls take up too
much CPU time.
42. ScyllaDB Rust Driver - Profiling tools
For projects based on Tokio, tokio-console can be used to inspect
running asynchronous tasks in real time, browse the used resources, and so
on.
Ref: https://tokio.rs/blog/2021-12-announcing-tokio-console
44. Bindings to ScyllaDB Rust Driver
+ When benchmarking ScyllaDB Rust Driver against other drivers, we
measured it was the most performant driver, beating the C++ driver
+ Why not develop a way to use ScyllaDB Rust Driver from C++ code?
+ Benefits of a unified core:
+ Higher performance
+ Easier maintenance
+ Fewer bugs
45. Bindings to ScyllaDB Rust Driver - C/C++
+ We started development for the C/C++ language
+ C++ bindings to the Rust driver; the same API as the original
C++ driver
+ Drop-in replacement (just replacing .so file)
+ The resulting project has an order-of-magnitude fewer LoC
+ Better stability, fewer problems compared to the original C++
driver
47. Q&A
ScyllaDB University
Free online learning
scylladb.com/university
scylladb.com/events
Build Low-Latency
Rust Applications
on ScyllaDB
June 21 2023
October 18 + 19, 2023
p99conf.io
48. Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/
Hinweis der Redaktion
Before we begin we are pushing a quick poll question.
Where are you in your NoSQL adoption?
I currently use ScyllaDB
I currently use another NoSQL database
I am currently evaluating NoSQL
I am interested in learning more about ScyllaDB
None of the above
Ok, thanks for those responses. Let’s get started.
For those of you who are not familiar with ScyllaDB yet, it is the database behind gamechangers - organizations whose success depends upon delivering engaging experiences with impressive speed. Discord, Disney+ Hotstar, Palo Alto Networks, and ShareChat are some examples of the most extreme scale – and ScyllaDB is used by many smaller rapidly-growing organizations as well.
Created by the founders of the KVM hypervisor, ScyllaDB was built with a close-to-the-metal design that squeezes every possible ounce of performance out of modern infrastructure.
This translates to predictable low latency even at high throughputs. And ScyllaDB is scalable to terabytes or petabytes of storage, and capable of millions of IOPS at single-digit millisecond latencie on surprisingly small and efficient clusters.
With such consistent innovation the adoption of our database technology has grown to over 400 key players worldwide…
“Many of you will recognize some of the companies among the selection pictured here, such as Starbucks who leverage ScyllaDB for inventory management, Zillow for real-time property listing and updates, and Comcast Xfinity who power all DVR scheduling with ScyllaDB.”
As it can be seen, ScyllaDB is used across many different industries and for entirely different types of use cases. Chat applications, IOT, social networking, e-commerce, fraud detection, security are some of the examples pictured in this slide. More than often, your company probably has a use case that is a perfect fit for ScyllaDB and it may be that you don’t know it yet!
If you are interested in knowing how we can help you more, feel free to engage with us!
To summarize, if you care about having low latencies while having high throughput for your application, we are certain that ScyllaDB is a good fit for you.