1,000,000 daily users and no cache (Splash 2011)

Who is that guy?

Jesper Richter-‐Reichhelm
Twi1er: @jrirei

Head of Engineering
wooga
Berlin, Germany

wooga is #3 game developer on Facebook

Wooga has dedicated game teams

Cooming
soon

Flash client sends state changes to backend

Flash client Ruby backend

Social games need to scale quite a bit

400 million PIs / month


14 billion requests / month



100,000 DB operaKons / second



50,000 DB updates / second



50,000 DB updates / second

no cache

A journey to 1,000,000 daily users

Start of the journey
6 weeks of pain
Paradise
Conclusion

October 2009: wooga’s ﬁrst simulaKon game

Instead of PHP we used Ruby

Our database was MySQL

even user ids odd user ids

And we went into the cloud

Master-‐slave replicaKon for DBs worked ﬁne

lb

app app app

db db

We added a few applicaKon servers over Kme

lb

app app app app app app app app app

db db

250K daily users and no problems
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"
Life was good

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

Life was well and I went on a nice vacaKon
TO DO

<picture: Jesper in clot
canyon>

Our bane: MySQL hiccups

(!!"#

'!"#

&!"#

%!"#

$!"#

!"#
!# )# (!# ()# $!# $)# *!# *)# %!#

SQL queries generated by Rubyamf gem

AMF responses to Flash client


Wrong conﬁg...
... so associated data was included, too


Wrong conﬁg...
... so associated data was included, too

=> Easy to ﬁx

More traﬃc using the same cluster

lb


db db

Config tweaks brought us to 300K DAU
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"
Config fixes

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

AcKveRecord’s checks caused 20% extra DB

Checking connecKon state
MySQL process list full of ‘status’ calls

AcKveRecord’s checks caused 20% extra DB

Checking connecKon state
MySQL process list full of ‘status’ calls

=> Fixed by 1 line of code

I/O on MySQL masters sKll was the bo^leneck

New Relic: 60% of all UPDATEs on ‘Kles’ table

Tiles are part of the core game loop

Core game loop
1) plant
2) wait
3) harvest

We started to shard on model, too

Adding new shards

old old
master slave


Adding new shards
1) Setup new masters as slaves of old ones

old old new
master slave master


Adding new shards
1) Setup new masters

old old new new
master slave master slave


Adding new shards
2) Start using new masters

old old new new


Adding new shards
3) Cut replica<on

old old new new


Adding new shards
3) Cut replica<on
4) Truncate

old old new new

4 DB masters and a few more servers

lb

app app app app app app app app

app app app app app app app app

<les <les
db db
db db

Sharding by model brought us to 400K DAU
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"
Shard by model

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

We improved our MySQL setup

RAID-‐0 of EBS volumes



Using XtraDB



Using XtraDB

Tweaking my.cnf

Sharding gem circumvented AR’s internal cache

AcKveRecord caches SQL queries...



... only in our development environment!



... only in our development environment!

=> Fixed by 2 lines of code

I/O sKll was not fast enough

If 2 + 2 is not enough, ...

I/O sKll was not fast enough

If 2 + 2 is not enough, ...

… perhaps 4 + 4 masters will do?

It’s no fun to handle 8+8 MySQL DBs

lb



<les <les
db db
db db

It’s no fun to handle 8+8 MySQL DBs

lb



<les <les <les <les
db db db db
db db db db

At 500K DAU we were at a dead end
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

I/O remained bo^leneck for MySQL UPDATEs

Each DB master could do
about 1000 DB write/s.

I/O remained bo^leneck for MySQL UPDATEs

Each DB master could do
about 1000 DB write/s.

That’s not enough!

Pick the right tool for the job!

Redis is fast but goes beyond simple key/value

Redis is a key-‐value store
Hashes, Sets, Sorted Sets, Lists
Atomic opera<ons like set, get, increment

Redis is fast but goes beyond simple key/value

Redis is a key-‐value store
Hashes, Sets, Sorted Sets, Lists
Atomic opera<ons like set, get, increment

50,000 transacKons/s on EC2
Writes are as fast as reads

Wooga has dedicated game teams

Shelf Kles : An ideal candidate for using

Shelf Kles:
{ plant1 => 184,
plant2 => 141,
plant3 => 130,
plant4 => 112,
… }

Shelf Kles : An ideal candidate for using Redis

Redis Hash
HGETALL
HGETHSET
HINCRBY
…

Migrate on the ﬂy when accessing new model

Migrate on the ﬂy -‐ but only once

true if id could be added
else false

Typical migraKon throughput over 3 days

Migrate on the ﬂy -‐ and clean up later

1. Let migraKon run unKl everything cools down


2. Migrate the rest manually


3. Remove migraKon code


4. Wait unKl no fallback necessary


4. Wait unKl no fallback necessary
5. Remove SQL table

A journey to 1,000,000 daily users

Start of the journey
6 weeks of pain
Paredise (or not?)
Conclusion

Again: Tiles are part of the core game loop

Core game loop
1) plant
2) wait
3) harvest

Size ma^ers for migraKons

MigraKon check overload
Migra<on only on startup

Size ma^ers for migraKons

MigraKon check overload
Migra<on only on startup

Overlooked an edge case
Only migrate 1% of users
Con<nue if everything is ok

In-‐memory DBs don’t like to dump to disk

Dumping to disk
SAVE is blocking
BGSAVE needs free RAM


Dumping to disk
SAVE is blocking

Latency increase by 100%


Dumping to disk
SAVE is blocking

Latency increase by 100%

=> BGSAVE on slaves every 15 minutes

Redis replicaKon starts with a BGSAVE

BGSAVE on master
Slave imports dumped ﬁle

Redis replicaKon starts with a BGSAVE

BGSAVE on master
Slave imports dumped ﬁle

=> No RAM means no new slaves

Redis had a memory fragmenKon problem

44 GB

in 8 days

24 GB

Redis had a memory fragmenKon problem

38 GB
in 3 days
24 GB

If MySQL is a truck
Fast enough
Disk based
Robust

Fast enough disk based robust

If MySQL is a truck, Redis is a race car
Super fast
RAM based
Fragile

Super fast RAM based fragile

Big and staKc data in MySQL, rest goes to Redis

256 GB data 60 GB data
10% writes 50% writes
hSp://www.ﬂickr.com/photos/erix/245657047/

Lots of boxes, but automaKon helps a lot!

lb lb

app app app app app app app app app app app app app



db db db db db redis redis redis redis redis

We reached 1 million daily users!
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"
1,000,000 -‐ Big party!

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

We started archiving inacKve users
&$!!!$!!!"

%$#!!$!!!"

50% DB
%$!!!$!!!"
reducKon

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

We even survived a complete data center loss
&$!!!$!!!"

EBS no
%$#!!$!!!"
more!

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

We improved our MySQL schema on-‐the-‐ﬂy
&$!!!$!!!"

30% DB
%$#!!$!!!"
reducKon

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

Will we reach 2 million daily users?
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

You do not know the future

Plan ahead


Plan ahead

Learn


Plan ahead

Learn

Adapt

EvoluKon every week

EVOLUTION
of sonware
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

EvoluKon every week

EVOLUTION
of sonware

EvoluKon every week, RevoluKon if necessary

REVOLUTION
of sonware

EvoluKon every week, RevoluKon if necessary

REVOLUTION
of sonware
&$!!!$!!!"

%$#!!$!!!"

%$!!!$!!!"

#!!$!!!"

!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"

Each new game is a revoluKon

Each new game is a revoluKon

Cooming
soon

Works for teams and for companies

!""#$%&"'()"*+,

Thank you!

Jesper Richter-‐Reichhelm
@jrirei

slideshare.net/wooga
wooga.com/jobs

1,000,000 daily users and no cache (Splash 2011)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (15)

Mehr von Wooga

Mehr von Wooga (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

1,000,000 daily users and no cache (Splash 2011)