Discord initially used MongoDB and Cassandra to store messages but faced scalability and performance issues. They developed a Rust data service library and "Superdisk" storage topology using local NVMe SSDs and persistent disks to provide high performance storage for ScyllaDB. Discord then migrated trillions of messages from Cassandra to ScyllaDB in just 9 days, resulting in lower disk usage, more consistent latencies, and a quieter system. The combination of the Rust library, Superdisk, and ScyllaDB allows Discord to reliably store trillions of messages at high scale.
2. Progress Over Perfection
Day 1: MongoDB/TokuMX (a temporary choice)
We wanted a database ultimately that was:
● Scalable
● Fault Tolerant
● Low Maintenance
● Not a blob store (serialization is expensive!)
2
5. What Even Is A Message?
CREATE TABLE messages (
channel_id bigint,
bucket int,
message_id bigint,
author_id bigint,
content text,
PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
5
6. Cassandra Problems
Size of dataset and access patterns cause trouble
● Hot partitions tank latency
● Compactions run behind
● 2 expensive 2 repair
● Garbage collection tax
If messages are slow, Discord is slow!
6
7. 2020 Architecture
● Python API monolith
● Talked directly to databases
● No more Cassandra… except for messages
● Figuring out ScyllaDB - how can we get the most performance?
7
9. Built in Rust!
● Experience
● It’s fast!
● It’s fun!
● “Fearless concurrency”
● Strong Cassandra/ScyllaDB support
● Leverage Tokio ecosystem for
async I/O
● Lets us say we “rewrote it in Rust”
9
13. Better, but it’s still Cassandra
● It helps!
● Still having issues (just not quite as frequently)
● Concurrently, we’re ironing out ScyllaDB performance
We want the best ScyllaDB we can have so that Discord thrives
13
15. ScyllaDB Disk Options in GCP
Local NVMe SSDs
● ScyllaDB’s preference
● Fast
● Low reliability = increased
risk of quorum loss and
downtime
● Potential data loss on
underlying host error
Persistent Disks
● Discord’s preference
● Point-in-time snapshots
● Slow
● Cascading latency if used
in production
15
22. We turned it on…
● Reads still occurring at the speed of PD writes
● ScyllaDB uses one I/O bandwidth channel by default - shared by reads and
writes
● Worked with ScyllaDB to get duplex I/O - separate channels for reads and
writes
● Write intent bitmap - speeds up recovery, but was causing latency issues so
we turned that off
22
23. Reducing Complexity
● Superdisk makes for a system that’s fast, scalable, durable… and complex
● Automation + logical simplicity - can we reduce the number of states the
system can be in?
● Bias towards removing/replacing nodes in an unknown scenario
● Write bitmap removal - having it gives faster resyncs, but if your errors are
either miniscule resyncs or a full rebuild than what advantage does it have?
23
24. Preparing the RAID
● Preflight script runs before
ScyllaDB startup - in charge of
giving the green light
● Brand new node - provision RAID,
join the cluster
● Data on PD but not NVMe -
unclean shutdown, wait for the
RAID to recover
● Data on both PD and NVMe -
health check the RAID
24
25. Click here to read more
How Discord Supercharges
Network Disks for Extreme Low
Latency
26. The Groundwork
● Data service library - optimized database accesses
● Superdisk storage topology - fast + reliable
Meanwhile, Cassandra is a high-toil system…
26
28. Plan v1
Goal: Get value quickly because Cassandra is giving us literal headaches
● Try to use ScyllaDB for recent data and migrate historical data
behind it
● Set up ScyllaDB’s Spark migrator, requires a lot of tuning
ETA: 3+ months
28
30. Plan v2
Goal: Get value even faster than Plan v1
● Rewrite it in Rust!
● Reimplement the data migrator in an afternoon via our data service
code
30
31. Plan v2
Goal: Get value even faster than Plan v1
● Rewrite it in Rust!
● Reimplement the data migrator in an afternoon via our data service
code
ETA: 9 days
up to 3.2 million messages / second!
31
33. It’s Great!
● Quiet (knock on wood)!
● Better storage density - less than half the nodes of Cassandra
● 53% reduction in disk utilization
● Much more consistent latencies
33