Opera chose Scylla over Cassandra to sync the data of millions of browsers to a back-end data repository. The results of the migration and further optimizations they made in their stack helped Opera to gain better latency/throughput and lower resources usage beyond their expectations.
Attend this session to learn how to
Migrate your data in a sane way, without any downtime
Connect a Python+Django web app to Scylla, how to use intranode sharding to improve your application
WordPress Websites for Engineers: Elevate Your Brand
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
1. How to Sync
Tens of Millions of Browsers
and Sleep Well at Night
Rafał Furmański & Piotr Olchawa
2. Presenters
Rafał Furmański, Engineering Manager
Project Manager, Software Engineer, Big Data enthusiast and certified Cassandra developer.
Rafał has 10+ years of experience in programming.
After work: addicted volleyball player.
Piotr Olchawa, Software Engineer
Piotr is a Software Engineer working at Opera between backend and SysOps.
He has over 4 years of experience in programming. He is a big fan of everything that’s extreme:
rock climbing, hackathons, public speaking.
3. Outline
■ About Opera and Sync
■ Problems with Cassandra and first encounter with Scylla
■ Migration process and results
■ Automated repairs with scylla-cli
■ Scylla proxy & shard awareness
6. About Opera
■ Founded in 1995 in Norway
■ HQ in Oslo
■ Branches in Poland, Sweden and China
■ Listed on NASDAQ
■ We make browsers & apps
● Desktop:
■ Opera
■ Opera GX
● Mobile:
■ Opera Mini
■ Opera for Android
■ Opera Touch
■ Opera News
7. About Opera
■ Opera has pioneered many concepts
found in the major browsers today
■ We continue to introduce unique features
in our products
8. Opera syncs
■ Favorite sites on the Speed Dial
■ Bookmarks
■ Open tabs from all devices
■ Browsing history
■ Passwords
■ Boowser preferences
About Opera Sync
10. Opera Sync - infrastructure/software
■ Deployed on bare metal boxes in 2 datacenters:
● Backend - 2x10
● Database - 2x13
■ On each backend host:
● Debian Stretch
● Docker containers:
■ uWSGI (Python/Django App)
■ Nginx
■ Celery workers
■ RabbitMQ
■ statsd
■ Configuration/Deployment: Ansible & Docker Swarm
■ Monitoring: Graphite/Grafana + Nagios + PagerDuty
11. Opera Sync - example model and queries
class Bookmark(Model):
user_id = columns.Text(partition_key=True)
version = columns.BigInt(primary_key=True, clustering_order='ASC')
id = columns.Text(primary_key=True)
parent_id = columns.Text()
position = columns.Bytes()
name = columns.Text()
ctime = columns.DateTime()
mtime = columns.DateTime()
deleted = columns.Boolean(default=False)
folder = columns.Boolean(default=False)
specifics = columns.Bytes()
12. Opera Sync - example model and queries
class Bookmark(Model):
user_id = columns.Text(partition_key=True)
version = columns.BigInt(primary_key=True, clustering_order='ASC')
id = columns.Text(primary_key=True)
parent_id = columns.Text()
position = columns.Bytes()
name = columns.Text()
ctime = columns.DateTime()
mtime = columns.DateTime()
deleted = columns.Boolean(default=False)
folder = columns.Boolean(default=False)
specifics = columns.Bytes()
Query 1: Get all bookmarks of user ‘Adam’ from version=5 # version == precise timestamp
Query 2: Change/remove bookmark of user ‘Adam’ with version=5 and id=’6’
14. Problems with Cassandra
■ We started with Cassandra=2.1 and immediately got hit by:
● [CASSANDRA-9935] Repair fails with RuntimeException
● [CASSANDRA-10689] java.lang.OutOfMemoryError: Direct buffer memory
● [CASSANDRA-10697] Leak detected while running offline scrub
● [CASSANDRA-8558] Deleted row still can be selected out
● [CASSANDRA-8446] Lost writes when using lightweight transactions
● [CASSANDRA-8280] Crash on inserting data over 64K into indexed strings
● [CASSANDRA-8067] NullPointerException in KeyCacheSerializer
● [CASSANDRA-9681] Memtable heap size grows and GC pauses are triggered
15. Problems with Cassandra
■ Bugs, bugs, bugs…
■ Very high p95/p99 read/write latencies
■ Long GC pauses(!!!)
■ Insane CPU usage
■ Failing Gossip/Binary protocols
■ Restarts without specific reason
■ Problems with bootstrapping new nodes
■ Neverending repairs
16. Our “solutions”
■ Add more and more C* nodes!
■ Tune every piece of C*/Java config
■ Seek help from C* gurus
■ [SYNC-1146] Cron job to restart C* periodically (sic!)
17. Our journey with Scylla
First encounter:
Cassandra Summit
First Scylla Cluster
&
Benchmarks
September 2015
July 2018
Decision to migrate
August 2018
Decommissioning of
last Cassandra Node
13 May 2019
18. Initial benchmarks
■ setup: 3 bare metal nodes in the cluster
■ tool: cassandra-stress
■ keyspace: sync, table: bookmark, time: 10 minutes
■ mixed workload: 50% GetUpdates / 50% Commit
20. Migration process
1. Make django-cassandra-engine connect to more than 1 database
2. Prepare 2x3 node Scylla Cluster (with monitoring)
3. Update backend to be connection-aware
Bookmark.objects.using(connection='scylla').filter(...)
1. Move a few test users to Scylla (me and coworkers)
2. Make all new users use Scylla
3. Slowly migrate all existing users from Cassandra to Scylla
a. decommission nodes from Cassandra cluster
b. add decommissioned nodes to Scylla cluster
4. Disconnect Cassandra and make Scylla the default database engine
5. Cleanup
22. Determining user’s connection
def get_user_store(user_id):
connection = UserStore.maybe_get_user_connection(user_id) # from cache
if connection is not None: # We know exactly which connection to use
with ContextQuery(UserStore, connection=connection) as US:
return US.objects.get(user_id=user_id)
else: # We have no clue which connection is correct for this user
try:
with ContextQuery(UserStore, connection='cassandra') as US:
user_store = US.objects.get(user_id=user_id)
except UserStore.DoesNotExist:
with ContextQuery(UserStore, connection='scylla') as US:
user_store = US.objects.get(user_id=user_id)
user_store.cache_user_connection()
return user_store
23. Migration script
Requirements:
■ Ability to move user data from Cassandra to Scylla (and back)
■ Consistency check after migrating
■ Concurrent execution is a must
■ Measure everything:
● Number of migrated users
● Migration time (with distribution)
● Errors with reasons
● Failed migrations
24. Migration script
Algorithm:
1. Pick free user from Cassandra DB (check if not already being migrated) and
mark as picked for migration
2. Set user_store.migration_pending = True (with TTL!)
3. Copy all the data to Scylla DB
4. Perform consistency check
5. Remove leftovers from Cassandra (and clear the connection cache)
6. Set user_store.migration_pending = False
25. Challenges during migration
■ Timeouts and Unavailables in Cassandra
■ Migrating huge accounts takes some time
■ User is cut off from Sync during the migration period
■ Synchronization of concurrent processes
26. Migration results
■ Reduced number of nodes: from 32 (a year ago) to 26 (now) to 8 (next)
■ Faster node bootstrap time (days vs hours)
■ Huge drops in latency
■ No more sleepless nights!
28. scylla-cli overview
■ Console script for
● Checking status of the cluster
● Performing range repairs
■ Connects to Scylla API on each
host via SSH tunnel (or direct)
■ Written in Python
■ Available on PyPi:
$ pip install scylla-cli
29. Why repair with scylla-cli?
■ It works with Scylla Open Source
■ Performs repairs only on the primary range of a Scylla node (in discrete
steps, node by node)
■ Performs advanced repair techniques (subrange repair)
■ Scheduled repairs - what and when (specific node, table)
■ Built-in retry mechanism
■ Real time repair progress and ETA
■ Works better on a busy cluster than regular nodetool repair
Example repair usage:
$ scli repair sync session --dc=Amsterdam
31. Scylla per-node CPU shard awareness
■ “Gains by Using Scylla-Specific Drivers” - over 2x latency decrease:
(Scylla Summit 2018 - Piotr Jastrzębski)
■ Cassandra native protocol extension
■ Achieved by per-node CPU connections
32. Shard-awareness for Sync
Rationale
■ 48 shards per Scylla server - potential performance improvement
Obstacles
■ No Python driver support
■ 300 uwsgi + celery workers per host
● 13 DBs * 48 shards * 300 workers = ~187000 connections
● Port range up to 65535
The solution
■ Proxy Scylla client/server (gocqlproxy)
1 connection / worker, and 1 connection / host-shard
(~300 workers + ~600 shards = ~900 connections)
■ Simplified protocol with just one message type
34. gocqlproxy implementation - driver and proxy
cassandra/proxy_session.py:
class ProxyConnection(DefaultConnection):
# (...)
def send_msg(self, msg, *args, **kwargs):
# (...)
proxied_msg = ProxiedMessage(msg, routing_key)
return super().send_msg(proxied_msg, ...)
cassandra/protocol.py:
class ProxiedMessage(_MessageType):
opcode = 0xF0
# (...)
def send_body(self, f, protocol_version):
message_bytes = encode_message(self.message)
write_longstring(f, message_bytes)
write_longstring(f, self.routing_key)
proxy.go:
frameWriter := &writeProxiedFrame{
head: nestedHead,
frameData: nestedFramer.rbuf,
}
// find the appropriate host/shard to forward the frame to:
partitionKey := query.clientFramer.readBytes()
serverConn, err := query.session.pickHost(partitionKey)
// send the frame to the chosen server/shard:
serverConn.exec(context.TODO(), frameWriter, nil)
// return the response to client (use client’s stream id)
clientFramer.writeHeader(response, outerHeader.stream)
clientFramer.wbuf = append(clientFramer.wbuf, frameData...)
clientFramer.finishWrite()
35. Shard-aware Sync and gocqlproxy – results
■ Production:
● We’ve enabled a working prototype of gocqlproxy, running stable, for a few days
● We can use shard-awareness with 900 connections instead of 180000
■ Local synthetic benchmarks - measured latency decreases:
cluster-wide approximated latencies
averaged over 75-second test runs
‘non-shard-aware→shard-aware’
(100% * (before-after) / before)
read [μs] write [μs]
avg 580→480 (~17%) 570→470 (~18%)
p95 1000→980 (~2%) 1000→980 (~2%)
p99 2000→1000 (~50%) 1900→1000 (~47%)
37. Take away
■ Download Opera Browser
■ Django-cassandra-engine
■ Scylla-cli
■ Scylla-proxy:
● gocqlproxy
● Python driver
38. Thank you Stay in touch
Any questions?
Rafał Furmański
rfurmanski@opera.com
r4fek
Piotr Olchawa
polchawa@opera.com
BugsKillPeople
Hinweis der Redaktion
350M -> monthly active users
shard-awareness for python-driver?
(+) no proxy required
(-) non-obvious logic to implement (shard num calculation)
(-) likely too many connections anyway
gocqlproxy
(+) simple - most responsibilities in the proxy - already implemented in gocql
(+) connection numbers, similarly to twemproxy
(-) non-standard protocol changes, both in the driver and in the proxy
Without proxy:
every worker has one connection per shard
(about 300 workers x 600 shards = almost 200k)
With proxy:
every worker has just one connection to the single-process proxy (about 300 connections)
the proxy has one connection per shard (about 600 connections)
300 + 600 = 900
M workers, N shards -> M+N instead of M*N
Messages wrapped as (message, routing_key)
Idea #1: routing_key extracted in the proxy “transparently”
parsing logic to implement
Idea #2: routing_key duplicated by the driver
proxy can use it directly to find the right shard
simple to implement: just (1) add message type, (2) pass the message through to the server in the proxy
remember that stream id needs to come from the “outer”/”wrapping” message
Promising results in multiple-container, single-machine clusters
Up to 50% latency decreases (p99), when alternating between shard-aware/non-aware tests
Little performance improvement in production (except decreased connection counts)
Perhaps IPC overhead is negligible in Sync for some reason?