Scylla began with a Cassandra compatibility story, implementing Cassandra’s query language (CQL) and replicating its user-visible architecture. Recently we introduced “Alternator” - an experimental feature adding compatibility with a second NoSQL database: Amazon’s DynamoDB. In this talk we look at why DynamoDB’s API was chosen as a good target for our API extension, how DynamoDB is similar to Scylla - and how it differs, and how we can implement DynamoDB’s API in Scylla. We will describe our progress so far in making Alternator compatible with DynamoDB - and what still remains to be done so that any DynamoDB application can run unmodified on Scylla.
2. Presenter
Nadav Har’El, distinguished engineer
Nadav Har’El has had a diverse 25-year career in computer
programming and computer science. In the past he worked on
scientific computing, networking software, information retrieval,
virtualization and operating systems. Today he contributes to,
and is a maintainer of, the OSv kernel, Seastar and ScyllaDB.
3. ■ Scylla is compatible with Cassandra® and its APIs (CQL, Thrift).
■ The Alternator project:
Adding a DynamoDBTM-compatible API to Scylla.
■ Available in open-source Scylla since September.
■ Preview release - with some limitations. GA expected soon.
■ beta release on Scylla Cloud (ScyllaDB’s SaaS).
Project Alternator
4. ■ Why Alternator?
■ How to run Alternator, and its state today
■ What’s still missing and planned
■ A bit about how Alternator works
Agenda
6. Scylla design principles
■ Efficient implementation for modern hardware
● Significantly higher throughput than Cassandra
● Linear scalability to many-core machines
● Focused on modern fast SSDs
■ Low tail latency
■ Reliability
■ Observability
■ Autonomous database (minimal configuration)
We can apply these advantages to more than just Cassandra
compatibility!
7. Why DynamoDB API?
DynamoDB is similar in design and data model to Scylla
More details on the similarities, and differences, later.
Amazon Dynamo
(2007 paper)
Google Bigtable
(2006 paper)
8. Why DynamoDB API?
DynamoDB is SaaS
SaaS is easy to get started
with; A trend in industry
DynamoDB popularity
Growing vs. Cassandra:
Cassandra
DynamoDB
9. Why DynamoDB API?
Better price/performance
In the past we compared the total
cost of running a workload on
DynamoDB and on Scylla.
Managed Scylla (“Scylla Cloud”)
5 times cheaper than DynamoDB’s
cheapest option (yearly reservation) .
DynamoDBScylla Cloud
10. Why DynamoDB API?
Vendor lock-in
Users want to move their DynamoDB application to
■ a different cloud provider,
■ a private datacenter,
■ or a hybrid of multiple clouds or datacenters.
Scylla can be run on any cloud or datacenter.
12. Getting Alternator running in 5 minutes
■ Running Alternator is simply running Scylla
● with the parameter “alternator-port” set.
● other options (HTTPS, authorization) - see
docs/alternator/alternator.md.
■ You can get it running on your local machine in 5 minutes using
docker:
docker run --name scylla -d -p 8000:8000 scylladb/scylla-
nightly:latest --alternator-port=8000
13. Test it works
■ Run unmodified Amazon DynamoDB client libraries or CLI tools
against Alternator:
● aws --endpoint-url http://172.17.0.1:8000
dynamodb create-table --table-name mytab --attribute-
definitions AttributeName=key,AttributeType=S --key-schema
AttributeName=key,KeyType=HASH
--billing-mode PAY_PER_REQUEST
■ Let’s try a test case which uses many more DynamoDB features:
14. DynamoDB's Tic-Tac-Toe demo
■ An open-source Python application using DynamoDB
■ Written by Amazon, to demonstrate many DynamoDB features
● Various keys, attributes, conditional updates and secondary indexes, ...
■ Written in Python, using the Amazon’s AWS client library (boto3)
● python application.py --mode local --port 8000
A multiplayer Tic-Tac-Toe game server. Many users can connect, invite
each other to games, play against each other, and keep score.
15. A much more intensive test
■ Cluster of three 30-core nodes in AWS, each in separate AZ.
■ 1.1 TB data - 1 billion items, 1.1KB each.
■ YCSB workload, 50% read 50% write, Zipfian distribution.
17. A much more intensive test (cont.)
120 Kops/sec is pretty intensive - in
DynamoDB provisioned pricing, it
would cost $85 per hour.
VMs (EC2, on-demand pricing) for
running Scylla - $7.5 per hour.
Year reservation cheaper for both.
3 copies of data
18. A much more intensive test (cont.)
Statistics per node Statistics per shard (CPU)
19. A much more intensive test (cont.)
Many more Scylla statistics
Alternator-specific statistics
20. Now also DynamoDB API compatible
Alternator on Scylla Cloud
Scylla Cloud
Fully Managed Database as a Service
● Industry’s fastest and most affordable NoSQL DBaaS
● Low and predictable latencies to support real-time
applications
● Fully CQL compatible
28. Alternator implementation
A short survey of:
■ How Alternator works
■ Where it differs from DynamoDB
■ What still needs to be done?
A much more detailed survey can be found in this document.
29. The DynamoDB API
DynamoDB API is: JSON requests and responses over HTTP/HTTPS.
Request Response
POST / HTTP/1.1
...
X-Amz-Target: DynamoDB_20120810.CreateTable
{ "TableName": "mytab",
"KeySchema": [{"AttributeName": "key",
"KeyType": "HASH"}],
"AttributeDefinitions":
[{"AttributeName": "key",
"AttributeType": "S"}],
"BillingMode": "PAY_PER_REQUEST"
}
{ "TableDescription":
{ "AttributeDefinitions": [{
"AttributeName":"key",
"AttributeType":"S"}],
"TableName": "mytab",
"KeySchema": [{"AttributeName":"key",
"KeyType":"HASH"}],
"TableStatus": "ACTIVE",
"CreationDateTime": 1569242964,
"TableId": "91347050-de00-11e9-
a100-000000000000"
}}
30. Alternator structure
Alternator is part of Scylla, not a proxy layer.
Each Scylla node also answers DynamoDB API requests.
■ No need for separate sizing for an API-translation cluster.
■ Same nodes can do both CQL and DynamoDB API.
DynamoDB API implemented with internal function calls and RPC
■ No inefficient translation to CQL.
Client needs to send requests to the different Scylla nodes.
■ Can be done via separate HTTP load balancer or DNS.
31. Data model
■ Same as Scylla’s tables, partitions, rows.
■ Item attributes are schema-less (and nested, as in JSON).
● Emulated as a single Scylla column -
a map, allows concurrent update to different top-level attributes).
DynamoDB Table
Hash Key Sort Key, attributes Sort Key, attributes
...
...
Partition
Items
32. Read-Modify-Write
■ DynamoDB natively supports Read-Modify-Write (RMW) updates:
● set a = 2 if a == 1 - Conditional updates
● set a = a + 1 - Counters
● set a = b - Attribute copy
● Easy, since all writes do a read anyway (leader model & Btree)
■ Scylla natively supports independent writes to different columns:
● Efficient updates to different columns - without requiring a read.
● Uses CRDT - Conflict-free Replicated Data Type
Temporarily, our implementation does separate read and write -
unsafe for concurrent operations.
We are adding support for Read-modify-write operations - LWT.
33. Alternator’s compatibility and limitations
■ See detailed current status in alternator.md, and issues in bug
tracker.
■ Several DynamoDB applications already work unmodified.
■ Some of the issues we plan to address for the GA:
● A few operations and parameters not yet supported.
● Safe concurrent read-modify-write operations.
● On demand backups.
● DynamoDB streams (CDC).
34. Migrating from DynamoDB to Alternator
■ Install Scylla and load balancer, or use Scylla Cloud.
■ Tell your application, written to use DynamoDB, Scylla’s endpoint
address.
■ This is a preview release. Watch out for unsupported features and
unsafe concurrent RMW operations.
■ Migrate existing data from DynamoDB to Scylla using DynamoDB
API
● E.g. Spark migrator: https://www.scylladb.com/2019/09/12/migrating-from-
dynamodb-to-scylla
35. Summary
■ Scylla is a very efficient, reliable, low latency NoSQL data store, that
began with Cassandra compatibility.
■ The Alternator Project adds to Scylla DynamoDB API compatibility.
● Can run existing applications designed for DynamoDB,
● On any cloud or data center, not just on AWS.
● Open source.
● Also available as managed service (DBaaS) on Scylla Cloud.
■ Currently a preview release, with some limitations, but GA expected
soon.