Spil Storage Platform (Erlang) @ EUG-NL

SSP : Spil Storage Platform
Thijs Terlouw – Senior Backend Engineer

12th July 2012

Schedule

1. Background
• Problems
• Wish list
2. Solution
3. Challenges
4. Performance
5. Lessons learned

2

Background

Mission Spil Games: “ unite the world in play “

• localized social-gaming platforms
• focus on : teens, girls and family
• many portals:
• girlsgogames.com
• agame.com

3

Background

• Over 200 countries, 15+ different languages

• On average 85 minutes per month per user

• Over 4000 online games

• 200 million unique users per month

5

Background

• Traditional LAMP stack
• Tweaked over time to keep up with growth
• Reaching limits of current system
• One of largest problems is the database

6

Problems: the database

• Not all developers are DB experts
• security
• performance
• caching
• Changing requirements
• Difficult to shard the databases

7

Wish list
1. Transparent scalability
• Sharding data
• Scalable applications on top of sharded data
2. Multi-database transactions
• atomic operations across machines
3. Fast enough (low-ish latency, high throughput)
4. Highly available (central system)
5. Can handle large dataset
6. Offer flexibility (trade consistency for speed for instance)
7. Use MySQL (experience in-house DB-team)
8. Don’t expose SQL to devs, offer business-specific model
• Storage specific security measures (character escaping)
9. Allow changes to storage layer without affecting business (versioning)
10. Centralize ownership of caching

8

Schedule

1. Background
2. Solution
3. Challenges
4. Performance
5. Lessons learned

9

Solution

• No matching Open Source projects
• So we want a massively scalable, soft real-time,
highly available system
• Implement it ourselves: Erlang obvious candidate
Not the first to think of this:
• Amazon SimpleDB
• Riak
• Use Open Source where possible

10

Solution : mindset

1. Our system should be always on
2. No global locks
3. Inconsistencies are the norm
• Hardware breaks down (power failures etc)
• Version mismatches (upgrading system non atomic)
• State mismatches (adding new machine)

11

SSP: Spil Storage Platform

Bucket Buckets:
Erlang

12

SSP : Overview

• Bucket is a list of records of a specific type.
Structured data! A bucket can map to one or several
MySQL database tables and offers a CRUD-like
interface (with filters)
• All data is identified by a unique GID (64 bit integer)
• All requests for a particular GID are handled by one
Pipeline process (sequentially)

13

SSP: Pipeline

• Why do we need Pipelines?
• Sequential = bottleneck !?!
• Don’t you guys know Erlang is
about PARALLELIZING work?

15

SSP: Pipeline

• Drawbacks:
• For hotspots (game with a gazillion ) sequential (read)
access is bad indeed
• Optimization: allow dirty read (try local cache first , outside
pipeline), other solutions possible.

• Advantages:
• Facilitates scalability (no global locks, but per bucket/GID sync)
• Pipelines make multi-database consistency easier

Requests to most GIDs (users) are evenly distributed

16

SSP: Finding the Pipeline

{bucket, phash2(Gid, Ringsize)}

17

SSP: Bucket

• Each bucket is an OTP application
• Buckets are largely generated
• XML -> SQL + PIQI -> Erlang
– Using XSLT
– Piqic

19

Piqi?

• PIQI is
• data definition language
• cross-language data serialization system
compatible with Protocol Buffers
• Piqi-RPC — an RPC-over-HTTP system for Erlang
• Would be better if transport was pluggable
• http://piqi.org/

20

SSP: Example Bucket XML definition

21

gidlog.piqi
Mostly templated via xslt

22

gidlog_accessors.hrl

Parse piqi
generated hrl:
epp:parse_file/3

mostly template

added as dep

23

SSP: bucket implementation

• bucketX.erl
– include_lib(“…/bucketX_accessors.hrl”)
– verify_record(R)
– start/0 and start_link/0
– init/1
– get_fun(Version), del_fun(V), insert_fun(V),…

• bucketX_v1.erl
– del, insert, … (Gid, Shard, Filters)
– get mysql pool
– build some SQL
– emysql:execute(Poolname, Sql)

24

SSP: Versions

1. A bucket is versioned. The interface of a bucket is
stable, but implementation can vary
2. We can go up or down a version, migration is automatic
• Mirror-mode is introduced so we can write to multiple
versions (but read from only one version)

25

SSP: Shards (storage level)

1. GIDs (eg users) are sharded automatically.
• Each version might have multiple shards
2. Redundancy (of data) is handled by MySQL

{bucket, GID} -> {Version, Shard} mapping
• Version default: config
• Shard default: default rule GID % shards
• Actual version/shard per GID stored in DB (cached)

26

SSP: Cache

• Each node has a private Memcached instance
• We store all data for a GID/bucket in this cache
• Filters applied after retrieving data from cache
• Don´ change data in storage outside of the SSP!
t

27

Schedule

1. Background
2. Solution
3. Challenges
4. Performance
5. Lessons learned

28

Challenge: controlled shutdown node

29

Challenge: controlled shutdown node
How do we shutdown a node without losing jobs?
• Shutdown bucketX application on a node
• stop pipeline factories on this node (for bucketX)
• hand over work to other PF (on other nodes)
– couple of mnesia ring reads
– move ETS table contents to new PF
– remember which PF took over (so we can forward)
• If we go to another node, clone Pipeline (gen2 pri)
• remove this node from the lookup ring
• all PFs fix their hash range based on ring
• Because there is a race condition handing over many
to one (non-continuous blocks) PF
• Sleep a while  (actually wait for pipeline handovers)

30

Note: shutdown application

• if you terminate an application, all processes that
were started (even if not linked) are terminated!
• bit hidden in documentation of application:start/2
and stop/1
• so we need to explicitly set the group_leader to
something that never shuts down:

init(#state{} = S ) ->
group_leader(whereis(init), self()),
{ok, S}.
31

Challenge: shutdown pipeline

• The Pipeline process that we spawn per Gid needs
to shutdown when done (less memory)
• When is it actually done?
• Work might be assigned to the Pipeline just when
the Pipeline decides it is done: race conditions!

32

Challenge: shutdown pipeline (2)

• All requests for a GID are handled by a single
Pipeline Factory
• The pipeline will issue a ‘work done’ command to
the PF with a ‘CommandCounter’
• PF maintains an ETS table
• Lookup if the registered CommandCounter for
that GID is the same as the reported number
• If so: tell the Pipeline to die

33

Challenge: high uptime

• We want continuous usage of SSP
– Even while upgrading bucket versions
– So there can be multiple versions running
simultaneously
• Take care of creating closures
• Atomic behavior per GID

34

Challenge: quite complex system

35

Schedule

1. Background
2. Solution
3. Challenges
4. Performance
5. Lessons learned

36

Performance

• Currently we run SSP in ´ shadow´mode, so no real
data yet. Making realistic benchmarks is quite a lot
of work.

• Latency (local machine):
– 6-26ms to do a GET request on a primary key (cache miss)
– 0.6ms with a cache hit
– Cache stores Erlang terms currently (term_to_binary)
• Always read from cache
– Does not detect changes in storage done outside SSP

37

Performance

• Requests (local):
– Getting from cache at about 13.5K req/sec
• elibs_benchmark:test_fun(gidlog_get, fun() ->
gidlog:get(123456) end, 10, 10000).
– Getting from mysql about 615 req/sec incl cache miss
• elibs_benchmark:test_fun(gidlog_get, fun() -> {_,_,C} =
os:timestamp(), gidlog:get(C) end, 10, 100).
– ~2 SSP machines can saturate a MySQL machine
– 8K writes/sec for 2 MySQL + 4 SSP machines (old
hardware)

38

Schedule

1. Background
2. Solution
3. Challenges
4. Performance
5. Lessons learned

39

Lessons learned (1)

• There are many good Open Source libraries
• Emysql : we have added transaction support
• Eep0018 : fast json encoder/decoder (yajl c++)
• Estatsd : graphite-capable monitoring
• Poolboy : Erlang worker pool factory (for
memcached)
• Twig/Lager : logging (syslog)

40

Lessons learned (2)

• Mnesia is great to replicate state across machines
• Faster local lookups
• Less error prone
• Encapsulate all Mnesia usage in a module
• Adding nodes to Mnesia
• Use ram_copies
• Transactions are great
• We deploy an Erlang cluster (with Mnesia
replication) only inside a single DataCenter
• Not across unreliable connections!
41

Lessons learned (3)

• XML + XSD + XSLT are great to define API
• They might have a bad name, but work great
• Can transform in any other format
• Used to generate documentation

Todo:
• generate more code (Buckets)
• write gen_bucket behaviour
• don´ start with generating code
t

42

Lessons learned (4)

• Rebar is great
• Compilation is pretty convenient, but the best part
are the “dependencies”
• Also the worst part 
• We have proposed two improvements:
• Allow different projects to share dependencies
(major speedup for compiling)
• Smarter version conflict resolution (semantic
versioning: [ “>= 1.3.1”, “< 2.0.0” ] )

43

Lessons learned (5)

• We use #records{} for all APIs
– Piqi input/output
– Stable and well-defined
– Will move to ProtocolBuffers
• Use OTP applications everywhere
– Start/stop stuff
– See started apps: application:which_applications()
• Terminate on fatal errors
– Memcached down : terminate all buckets, don´t
try to recover (prevent overload DB)
44

Lessons learned (6)

• You need to add admin/monitoring interface

45

Open Source

We will not open-source SSP, but we do actively
contribute to libraries used in SSP (so far Emysql,
Rebar, Piqi)

46

THANKS!
Questions?
Thijs.Terlouw@spilgames.com

47

Spil Storage Platform (Erlang) @ EUG-NL

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Spil Storage Platform (Erlang) @ EUG-NL

Ähnlich wie Spil Storage Platform (Erlang) @ EUG-NL (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Spil Storage Platform (Erlang) @ EUG-NL