ScyllaDB co-founder and CTO Avi Kivity will cover 2022 accomplishments and deliveries. Avi will focus on ScyllaDB core database in this talk. This year we enabled full repair-based-node-operations, brought distributed aggregation into Scylla as default, focused on goodput, at the face of overload and many other changes.
2. Agenda
â Increasing Streaming Robustness
â Autoparallel Queries
â WebAssembly User DeïŹned Functions
â Per-partition Throttling
â Alternator Updates
â Consistent Schema and Topology
â New SSD Disk Modeling
â Taming Corner Cases
â Whatâs Cooking Now
3. Repair-Based Node Operations
â Resumable bootstrap/decommission
â Stream from primary replica
â Or a quorum if primary is missing
â Increases resilience and improves correctness
4. Autoparallel Queries
â Aggregations previously done via Spark or custom code
â Instead, recognize certain CQL patterns
â Dispatch to all nodes, all vcpus within nodes
Node 5
Node 1
Node 2
Node 4 Node 3
SELECT COUNT(*)
FROM t
5. WebAssembly UDF/UDA
â Push compute into database
â Use any language*
â Computations run in a WASM sandbox
â Use case: analytics
*as long as itâs Rust
6. Per Partition Rate Limit
â New CQL table attribute to limit access rate to partition
â Works for reads and writes
â Prevent bot accounts from spamming database
â âHot partitionâ
11. Reverse Queries
â 4.5 and older slow for large partitions
â 4.6 fast, but skipped cache
â 5.0+ fast, supports cache
â Works well with paging SELECT *
FROM tab
WHERE âŠ
ORDER BY clustering_key DESC
12. â Queries that encounter large consecutive tombstone runs are now well
supported
â Partitions with many range tombstones work well
Better Handling of Tombstones
13. â Escalating countermeasures as memory usage increases
â Prevent new queries from starting
â Allow only one query to make progress
â Kill all but one query
Improved Out-of-Memory Handling
14. Repair-Based Tombstone Garbage Collection
â Eliminate gc_grace_seconds
â Tie tombstone garbage collection to last repair
â Improves performance for clusters that have frequent repair
â Improves correctness for clusters that missed repair
16. Nudging the CQL Grammar Towards SQL
â Relaxing constraints
â Reconciling semantic oddities
â Increasing the scope of autoparallel queries
17. â A spectrum of cost/performance tradeoffs
â RAM: Extremely fast (100ns), very expensive
â NVMe: Very fast, (100”s), expensive
â HDD: Slow (10ms), cheap
â Cloud Object storage (S3 and similar)
â Slow (40ms), cheap
â InïŹnitely expandable
â Easy to manipulate
â Shared access
Object Storage
18. â Very dense databases
â Where latency is not critical
â Tiered storage
â Mix service levels and cost
â Optimize both cost and latency
Use Cases for Object Storage
19. Thank You
Stay in Touch
Avi Kivity
avi@scylladb.com
@AviKivity
@avikivity