PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Filipchik, Sony) | C* Summit 2016

PlayStation
and Searchable Cassandra:
How we built user specific search using C*
without Solr

Who are we?
Alexander Filipchik (PSN: LaserToy)
Principal Software Engineer at Sony Interactive Entertainment
Dustin Pham (PSN: quibfan)
Principal Software Engineer at Sony Interactive Entertainment

The Rise of PlayStation4
PlayStation Network is big and growing.
– Over 65 million monthly active users.
– Hundreds of millions of users.
– A Lot of Services.
– More than 40M devices

PlayStation 4 growth
• Pre warm – November 2013, couple
thousands PS4s for Taco Bell.
• Launch Day – 1,000,000 PS4s several days
later.
• Adding 1.3 Millions devices a month.

2009 MySql
Year Unicorn’s Tech Our Tech
2011 MongoDB/MySql
2012 Redis/MySql PS3: MySQL + Memcached, Solr
2013 Redis/Postgres MySQL + Memcached/Cassandra, Solr
2014 Redis/Shards For Postgres + MySql MySQL + Memcached/Cassandra, Solr
2015 Riak/Shards For Postgres + MySql MySQL + Memcached/Cassandra + Redis,
Solr
2016 Who knows what/Cassandra MySQL + Memcached/Cassandra + Redis,
Solr

What is it?
• It is an online Games store for PlayStation
• To give you an idea:
– Revenue went from 800M per year 4 years ago to
almost 5B last year
– It is making more than all of Nintendo
• And it is not just eCommerce, it is a whole set of
services – Video Streaming, Game Streaming,
Social, etc

Some Challenges
• We are not Amazon, so content should be delivered
right away
• What you bought is not just a transactional record
that user checks once in a while. Multiple services
need access to this information in real time
• Which means it should be
– highly available
– fast
– and easy to scale

The Problem
• So, legacy System uses well known Relational DB
to handle our transactions.
• It is state of the art software that doesn’t scale
well in our circumstances.
• We wanted to allow client to run any queries
without consulting with hundreds of DBAs first.
• Sharding sounds like a pain.
• Multiple regions should be easy.

But
Axiom
It is Not Easy to Replace Relational Database
with Cassandra for user facing traffic.

Simple Digital Store Model
Anotherhundredtables

CQL Going to Save Us!!!
• No Joins.
• No Transactions.
• No search.
• Just weird.

Some observations
• For us most load comes from user-centric
activities
• So, we mostly query within a user’s dataset
• Which means we don’t need to join across
users often

What if we denormalize?
Purchased

So, we came up with Schema
Account1 Json 1 Json 2 …. Json n
Now it horizontally scalable
We have in row transactions
Read is very fast – no joins
Now we need to propagate user purchases
from Relational DB to C*
And figure out how to support queries

Solving the Puzzle
• There are number of ways we can use to
notify C* about account level changes in the
source of truth - let’s not talk about it for now.
• Let’s talk about queries.

Going deeper
• What client wants:
– Search, sort, filter
• What can we do:
– Use secondary Index
– Fetch everything in memory and process it
– How about…

Solr?
• Can we use it to support our flexible user level
query requirement?
• Not really:
– Data has high cardinality properties
– And it will not be very fast because Solr is optimized
for a different use case
– It will be another set of system to support and scale

What can We Do?
• We can index, and writing indexer sounds like
a lot of fun
• Wait, someone already had the fun and made:

Account1 Json 1 Json 2 …. Json n
Schema v2
Account1 Json 1 Json n Version
Now We can Search on anything inside the row that represents the user
Index is small and it is fast to pull it from C*
But we still pulling all this bytes all he time
And what if 2 servers write to the same row?

Distributed Cache?
• It is nice to keep things as close to our MicroService as
possible
• In something that can do fast reads
• And we have a lot of RAM these days
• So we can have a beefy Memcached/Redis/Aerospike
deployment
• And Still pay Network penalty and think about scaling them
• What if

Soft State Pattern
• Cache lives inside the MicroService, so no network penalty
• Requests for the same user are processed on the same
instance, so we can save network roundtrip and also have
some optimizations done (sequencing)
• Changes to State also are replicated to the storage (C*) and
are identified with some version number
• If instance goes down, user session will be moved to
another alive instance automatically
• It is much easier to scale up Microservices than C*

Or in Other Words
Account 1
Version
Account 2
Version
Account 3
Version
Account 4
Version
Account 5
Version
Account 6
Version
Account1 jsons Version
…. … … …
Account n jsons Version
Instance 1
Instance 2
Instance 3
Cassandra

But what if cross user data changes?
• Product was renamed
• Game image just got updated
• And so on…

Cross User Data sync
• A process that can detect a change in the data
and notify all the affected users
• Simple solution: a reverse lockup table from data
to users
• And you can optimize it
• Users don’t have to see updates in the same time
• Updates account’s version, so lazy reindexing can
be done

High level
…. … … …
Account n jsons Version
Accounts Cassandra
Account 1 Version
…. …
Account n Version
MetaData Versions
Account 1
Version
Account 6
Version
Product 1 Account 1 … Account 2312
…. … … …
Product n Account 26 … Account 123
MetaData Cassandra
Data-sync microservice
Is meta UpdatedAccount update
MetaData Updates

Was Dustin Wrong?
• Tens of billions of documents
• Average API latency is below 10ms
• Actual search latency is in microseconds
• Hundreds thousands of documents are indexed per second
• Another system which is based o the same idea indexes
million of documents per second on 18 servers
• And most importantly:
– No major incidents in production.

PlayStation is hiring:
hackitects.com

PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Filipchik, Sony) | C* Summit 2016

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Filipchik, Sony) | C* Summit 2016

Ähnlich wie PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Filipchik, Sony) | C* Summit 2016 (20)

Mehr von DataStax

Mehr von DataStax (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Filipchik, Sony) | C* Summit 2016