SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
SSP : Spil Storage Platform
Thijs Terlouw – Senior Backend Engineer

12th July 2012
Schedule

1. Background
   • Problems
   • Wish list
2. Solution
3. Challenges
4. Performance
5. Lessons learned




                     2
Background

Mission Spil Games: “ unite the world in play “

•   localized social-gaming platforms
•   focus on : teens, girls and family
•   many portals:
    • girlsgogames.com
    • agame.com




                            3
4
Background


•   Over 200 countries, 15+ different languages

•   On average 85 minutes per month per user

•   Over 4000 online games

•   200 million unique users per month



                          5
Background

•   Traditional LAMP stack
•   Tweaked over time to keep up with growth
•   Reaching limits of current system
•   One of largest problems is the database




                         6
Problems: the database

• Not all developers are DB experts
  • security
  • performance
  • caching
• Changing requirements
• Difficult to shard the databases




                         7
Wish list
1.     Transparent scalability
     •     Sharding data
     •     Scalable applications on top of sharded data
2.     Multi-database transactions
     •    atomic operations across machines
3.   Fast enough (low-ish latency, high throughput)
4.   Highly available (central system)
5.   Can handle large dataset
6.   Offer flexibility (trade consistency for speed for instance)
7.   Use MySQL (experience in-house DB-team)
8.   Don’t expose SQL to devs, offer business-specific model
   •     Storage specific security measures (character escaping)
9.   Allow changes to storage layer without affecting business (versioning)
10. Centralize ownership of caching



                                       8
Schedule

1.   Background
2.   Solution
3.   Challenges
4.   Performance
5.   Lessons learned




                       9
Solution

• No matching Open Source projects
• So we want a massively scalable, soft real-time,
  highly available system
• Implement it ourselves: Erlang obvious candidate
   Not the first to think of this:
  • Amazon SimpleDB
  • Riak
• Use Open Source where possible



                         10
Solution : mindset

1. Our system should be always on
2. No global locks
3. Inconsistencies are the norm
  • Hardware breaks down (power failures etc)
  • Version mismatches (upgrading system non atomic)
  • State mismatches (adding new machine)




                           11
SSP: Spil Storage Platform




         Bucket   Buckets:
Erlang




                             12
SSP : Overview

•   Bucket is a list of records of a specific type.
    Structured data! A bucket can map to one or several
    MySQL database tables and offers a CRUD-like
    interface (with filters)
•   All data is identified by a unique GID (64 bit integer)
•   All requests for a particular GID are handled by one
    Pipeline process (sequentially)




                            13
SSP   Overview




      14
SSP: Pipeline

•   Why do we need Pipelines?
    • Sequential = bottleneck !?!
    • Don’t you guys know Erlang is
      about PARALLELIZING work?




                          15
SSP: Pipeline

•   Drawbacks:
    • For hotspots (game with a gazillion       ) sequential (read)
      access is bad indeed
    • Optimization: allow dirty read (try local cache first , outside
      pipeline), other solutions possible.

•   Advantages:
    • Facilitates scalability (no global locks, but per bucket/GID sync)
    • Pipelines make multi-database consistency easier

Requests to most GIDs (users) are evenly distributed


                                   16
SSP: Finding the Pipeline




      {bucket, phash2(Gid, Ringsize)}




                               17
SSP: Bucket

• Each bucket is an OTP application
• Buckets are largely generated
• XML -> SQL + PIQI -> Erlang
   – Using XSLT
   – Piqic




                        19
Piqi?

•   PIQI is
    • data definition language
    • cross-language data serialization system
      compatible with Protocol Buffers
    • Piqi-RPC — an RPC-over-HTTP system for Erlang
       • Would be better if transport was pluggable
    • http://piqi.org/




                              20
SSP: Example Bucket XML definition




                  21
gidlog.piqi
                   Mostly templated via xslt




              22
gidlog_accessors.hrl


Parse piqi
generated hrl:
epp:parse_file/3

mostly template

added as dep


                     23
SSP: bucket implementation

•   bucketX.erl
    –   include_lib(“…/bucketX_accessors.hrl”)
    –   verify_record(R)
    –   start/0 and start_link/0
    –   init/1
    –   get_fun(Version), del_fun(V), insert_fun(V),…

•   bucketX_v1.erl
    –   del, insert, … (Gid, Shard, Filters)
    –   get mysql pool
    –   build some SQL
    –   emysql:execute(Poolname, Sql)


                                 24
SSP: Versions

1. A bucket is versioned. The interface of a bucket is
    stable, but implementation can vary
2. We can go up or down a version, migration is automatic
   • Mirror-mode is introduced so we can write to multiple
      versions (but read from only one version)




                           25
SSP: Shards (storage level)

1. GIDs (eg users) are sharded automatically.
   • Each version might have multiple shards
2. Redundancy (of data) is handled by MySQL


{bucket, GID} -> {Version, Shard} mapping
   • Version default: config
   • Shard default: default rule GID % shards
   • Actual version/shard per GID stored in DB (cached)



                               26
SSP: Cache

•   Each node has a private Memcached instance
•   We store all data for a GID/bucket in this cache
    • Filters applied after retrieving data from cache
•   Don´ change data in storage outside of the SSP!
        t




                                  27
Schedule

1.   Background
2.   Solution
3.   Challenges
4.   Performance
5.   Lessons learned




                       28
Challenge: controlled shutdown node




                  29
Challenge: controlled shutdown node
How do we shutdown a node without losing jobs?
• Shutdown bucketX application on a node
  • stop pipeline factories on this node (for bucketX)
      • hand over work to other PF (on other nodes)
          – couple of mnesia ring reads
          – move ETS table contents to new PF
          – remember which PF took over (so we can forward)
      • If we go to another node, clone Pipeline (gen2 pri)
   • remove this node from the lookup ring
   • all PFs fix their hash range based on ring
      • Because there is a race condition handing over many
         to one (non-continuous blocks) PF
   • Sleep a while  (actually wait for pipeline handovers)


                                30
Note: shutdown application

•   if you terminate an application, all processes that
    were started (even if not linked) are terminated!
•   bit hidden in documentation of application:start/2
    and stop/1
•   so we need to explicitly set the group_leader to
    something that never shuts down:

    init(#state{} = S ) ->
      group_leader(whereis(init), self()),
      {ok, S}.
                           31
Challenge: shutdown pipeline

•   The Pipeline process that we spawn per Gid needs
    to shutdown when done (less memory)
•   When is it actually done?
•   Work might be assigned to the Pipeline just when
    the Pipeline decides it is done: race conditions!




                          32
Challenge: shutdown pipeline (2)

• All requests for a GID are handled by a single
  Pipeline Factory
• The pipeline will issue a ‘work done’ command to
  the PF with a ‘CommandCounter’
• PF maintains an ETS table
  • Lookup if the registered CommandCounter for
    that GID is the same as the reported number
  • If so: tell the Pipeline to die



                        33
Challenge: high uptime

• We want continuous usage of SSP
  – Even while upgrading bucket versions
  – So there can be multiple versions running
    simultaneously
• Take care of creating closures
• Atomic behavior per GID




                         34
Challenge: quite complex system




                  35
Schedule

1.   Background
2.   Solution
3.   Challenges
4.   Performance
5.   Lessons learned




                       36
Performance

•   Currently we run SSP in ´ shadow´mode, so no real
    data yet. Making realistic benchmarks is quite a lot
    of work.

•   Latency (local machine):
    – 6-26ms to do a GET request on a primary key (cache miss)
    – 0.6ms with a cache hit
    – Cache stores Erlang terms currently (term_to_binary)
•   Always read from cache
    – Does not detect changes in storage done outside SSP


                              37
Performance

•   Requests (local):
    – Getting from cache at about 13.5K req/sec
       • elibs_benchmark:test_fun(gidlog_get, fun() ->
         gidlog:get(123456) end, 10, 10000).
    – Getting from mysql about 615 req/sec incl cache miss
       • elibs_benchmark:test_fun(gidlog_get, fun() -> {_,_,C} =
         os:timestamp(), gidlog:get(C) end, 10, 100).
    – ~2 SSP machines can saturate a MySQL machine
    – 8K writes/sec for 2 MySQL + 4 SSP machines (old
      hardware)



                               38
Schedule

1.   Background
2.   Solution
3.   Challenges
4.   Performance
5.   Lessons learned




                       39
Lessons learned (1)

•   There are many good Open Source libraries
    • Emysql : we have added transaction support
    • Eep0018 : fast json encoder/decoder (yajl c++)
    • Estatsd : graphite-capable monitoring
    • Poolboy : Erlang worker pool factory (for
      memcached)
    • Twig/Lager : logging (syslog)




                           40
Lessons learned (2)

• Mnesia is great to replicate state across machines
  • Faster local lookups
  • Less error prone
• Encapsulate all Mnesia usage in a module
  • Adding nodes to Mnesia
  • Use ram_copies
  • Transactions are great
• We deploy an Erlang cluster (with Mnesia
  replication) only inside a single DataCenter
  • Not across unreliable connections!
                         41
Lessons learned (3)

•   XML + XSD + XSLT are great to define API
    • They might have a bad name, but work great
    • Can transform in any other format
    • Used to generate documentation

Todo:
• generate more code (Buckets)
• write gen_bucket behaviour
• don´ start with generating code
       t


                         42
Lessons learned (4)

• Rebar is great
  • Compilation is pretty convenient, but the best part
    are the “dependencies”
  • Also the worst part 
• We have proposed two improvements:
  • Allow different projects to share dependencies
    (major speedup for compiling)
  • Smarter version conflict resolution (semantic
    versioning: [ “>= 1.3.1”, “< 2.0.0” ] )


                          43
Lessons learned (5)

• We use #records{} for all APIs
  – Piqi input/output
  – Stable and well-defined
  – Will move to ProtocolBuffers
• Use OTP applications everywhere
  – Start/stop stuff
  – See started apps: application:which_applications()
• Terminate on fatal errors
  – Memcached down : terminate all buckets, don´t
    try to recover (prevent overload DB)
                         44
Lessons learned (6)

• You need to add admin/monitoring interface




                      45
Open Source



  We will not open-source SSP, but we do actively
  contribute to libraries used in SSP (so far Emysql,
  Rebar, Piqi)




                          46
THANKS!
           Questions?
Thijs.Terlouw@spilgames.com




           47

Weitere ähnliche Inhalte

Was ist angesagt?

MySQL Replication Alternative: Pros and Cons
MySQL Replication Alternative: Pros and ConsMySQL Replication Alternative: Pros and Cons
MySQL Replication Alternative: Pros and ConsDarpan Dinker
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergendistributed matters
 
What's New In PostgreSQL 9.4
What's New In PostgreSQL 9.4What's New In PostgreSQL 9.4
What's New In PostgreSQL 9.4Pavan Deolasee
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres OpenPostgresOpen
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBDdawnlua
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 
OpenZFS data-driven performance
OpenZFS data-driven performanceOpenZFS data-driven performance
OpenZFS data-driven performanceahl0003
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapEDB
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-trainingconline training
 
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterHow to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterSveta Smirnova
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 PerformanceMYXPLAIN
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigSelena Deckelmann
 

Was ist angesagt? (20)

MySQL Replication Alternative: Pros and Cons
MySQL Replication Alternative: Pros and ConsMySQL Replication Alternative: Pros and Cons
MySQL Replication Alternative: Pros and Cons
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
 
What's New In PostgreSQL 9.4
What's New In PostgreSQL 9.4What's New In PostgreSQL 9.4
What's New In PostgreSQL 9.4
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
OpenZFS data-driven performance
OpenZFS data-driven performanceOpenZFS data-driven performance
OpenZFS data-driven performance
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
Postgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheapPostgres vision 2018: The Promise of zheap
Postgres vision 2018: The Promise of zheap
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Bn 1016 demo postgre sql-online-training
Bn 1016 demo  postgre sql-online-trainingBn 1016 demo  postgre sql-online-training
Bn 1016 demo postgre sql-online-training
 
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterHow to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 Performance
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Tuning Linux for MongoDB
Tuning Linux for MongoDBTuning Linux for MongoDB
Tuning Linux for MongoDB
 

Ähnlich wie Spil Storage Platform (Erlang) @ EUG-NL

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesNETWAYS
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology義洋 顏
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allenjaxconf
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
Austin Cassandra Users 6/19: Apache Cassandra at Vast
Austin Cassandra Users 6/19: Apache Cassandra at VastAustin Cassandra Users 6/19: Apache Cassandra at Vast
Austin Cassandra Users 6/19: Apache Cassandra at VastDataStax Academy
 
DConf2015 - Using D for Development of Large Scale Primary Storage
DConf2015 - Using D for Development  of Large Scale Primary StorageDConf2015 - Using D for Development  of Large Scale Primary Storage
DConf2015 - Using D for Development of Large Scale Primary StorageLiran Zvibel
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 

Ähnlich wie Spil Storage Platform (Erlang) @ EUG-NL (20)

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
CPU Caches - Jamie Allen
CPU Caches - Jamie AllenCPU Caches - Jamie Allen
CPU Caches - Jamie Allen
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Cassandra at Vast
Cassandra at VastCassandra at Vast
Cassandra at Vast
 
Austin Cassandra Users 6/19: Apache Cassandra at Vast
Austin Cassandra Users 6/19: Apache Cassandra at VastAustin Cassandra Users 6/19: Apache Cassandra at Vast
Austin Cassandra Users 6/19: Apache Cassandra at Vast
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
DConf2015 - Using D for Development of Large Scale Primary Storage
DConf2015 - Using D for Development  of Large Scale Primary StorageDConf2015 - Using D for Development  of Large Scale Primary Storage
DConf2015 - Using D for Development of Large Scale Primary Storage
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Spil Storage Platform (Erlang) @ EUG-NL

  • 1. SSP : Spil Storage Platform Thijs Terlouw – Senior Backend Engineer 12th July 2012
  • 2. Schedule 1. Background • Problems • Wish list 2. Solution 3. Challenges 4. Performance 5. Lessons learned 2
  • 3. Background Mission Spil Games: “ unite the world in play “ • localized social-gaming platforms • focus on : teens, girls and family • many portals: • girlsgogames.com • agame.com 3
  • 4. 4
  • 5. Background • Over 200 countries, 15+ different languages • On average 85 minutes per month per user • Over 4000 online games • 200 million unique users per month 5
  • 6. Background • Traditional LAMP stack • Tweaked over time to keep up with growth • Reaching limits of current system • One of largest problems is the database 6
  • 7. Problems: the database • Not all developers are DB experts • security • performance • caching • Changing requirements • Difficult to shard the databases 7
  • 8. Wish list 1. Transparent scalability • Sharding data • Scalable applications on top of sharded data 2. Multi-database transactions • atomic operations across machines 3. Fast enough (low-ish latency, high throughput) 4. Highly available (central system) 5. Can handle large dataset 6. Offer flexibility (trade consistency for speed for instance) 7. Use MySQL (experience in-house DB-team) 8. Don’t expose SQL to devs, offer business-specific model • Storage specific security measures (character escaping) 9. Allow changes to storage layer without affecting business (versioning) 10. Centralize ownership of caching 8
  • 9. Schedule 1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned 9
  • 10. Solution • No matching Open Source projects • So we want a massively scalable, soft real-time, highly available system • Implement it ourselves: Erlang obvious candidate Not the first to think of this: • Amazon SimpleDB • Riak • Use Open Source where possible 10
  • 11. Solution : mindset 1. Our system should be always on 2. No global locks 3. Inconsistencies are the norm • Hardware breaks down (power failures etc) • Version mismatches (upgrading system non atomic) • State mismatches (adding new machine) 11
  • 12. SSP: Spil Storage Platform Bucket Buckets: Erlang 12
  • 13. SSP : Overview • Bucket is a list of records of a specific type. Structured data! A bucket can map to one or several MySQL database tables and offers a CRUD-like interface (with filters) • All data is identified by a unique GID (64 bit integer) • All requests for a particular GID are handled by one Pipeline process (sequentially) 13
  • 14. SSP Overview 14
  • 15. SSP: Pipeline • Why do we need Pipelines? • Sequential = bottleneck !?! • Don’t you guys know Erlang is about PARALLELIZING work? 15
  • 16. SSP: Pipeline • Drawbacks: • For hotspots (game with a gazillion ) sequential (read) access is bad indeed • Optimization: allow dirty read (try local cache first , outside pipeline), other solutions possible. • Advantages: • Facilitates scalability (no global locks, but per bucket/GID sync) • Pipelines make multi-database consistency easier Requests to most GIDs (users) are evenly distributed 16
  • 17. SSP: Finding the Pipeline {bucket, phash2(Gid, Ringsize)} 17
  • 18. SSP: Bucket • Each bucket is an OTP application • Buckets are largely generated • XML -> SQL + PIQI -> Erlang – Using XSLT – Piqic 19
  • 19. Piqi? • PIQI is • data definition language • cross-language data serialization system compatible with Protocol Buffers • Piqi-RPC — an RPC-over-HTTP system for Erlang • Would be better if transport was pluggable • http://piqi.org/ 20
  • 20. SSP: Example Bucket XML definition 21
  • 21. gidlog.piqi Mostly templated via xslt 22
  • 23. SSP: bucket implementation • bucketX.erl – include_lib(“…/bucketX_accessors.hrl”) – verify_record(R) – start/0 and start_link/0 – init/1 – get_fun(Version), del_fun(V), insert_fun(V),… • bucketX_v1.erl – del, insert, … (Gid, Shard, Filters) – get mysql pool – build some SQL – emysql:execute(Poolname, Sql) 24
  • 24. SSP: Versions 1. A bucket is versioned. The interface of a bucket is stable, but implementation can vary 2. We can go up or down a version, migration is automatic • Mirror-mode is introduced so we can write to multiple versions (but read from only one version) 25
  • 25. SSP: Shards (storage level) 1. GIDs (eg users) are sharded automatically. • Each version might have multiple shards 2. Redundancy (of data) is handled by MySQL {bucket, GID} -> {Version, Shard} mapping • Version default: config • Shard default: default rule GID % shards • Actual version/shard per GID stored in DB (cached) 26
  • 26. SSP: Cache • Each node has a private Memcached instance • We store all data for a GID/bucket in this cache • Filters applied after retrieving data from cache • Don´ change data in storage outside of the SSP! t 27
  • 27. Schedule 1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned 28
  • 29. Challenge: controlled shutdown node How do we shutdown a node without losing jobs? • Shutdown bucketX application on a node • stop pipeline factories on this node (for bucketX) • hand over work to other PF (on other nodes) – couple of mnesia ring reads – move ETS table contents to new PF – remember which PF took over (so we can forward) • If we go to another node, clone Pipeline (gen2 pri) • remove this node from the lookup ring • all PFs fix their hash range based on ring • Because there is a race condition handing over many to one (non-continuous blocks) PF • Sleep a while  (actually wait for pipeline handovers) 30
  • 30. Note: shutdown application • if you terminate an application, all processes that were started (even if not linked) are terminated! • bit hidden in documentation of application:start/2 and stop/1 • so we need to explicitly set the group_leader to something that never shuts down: init(#state{} = S ) -> group_leader(whereis(init), self()), {ok, S}. 31
  • 31. Challenge: shutdown pipeline • The Pipeline process that we spawn per Gid needs to shutdown when done (less memory) • When is it actually done? • Work might be assigned to the Pipeline just when the Pipeline decides it is done: race conditions! 32
  • 32. Challenge: shutdown pipeline (2) • All requests for a GID are handled by a single Pipeline Factory • The pipeline will issue a ‘work done’ command to the PF with a ‘CommandCounter’ • PF maintains an ETS table • Lookup if the registered CommandCounter for that GID is the same as the reported number • If so: tell the Pipeline to die 33
  • 33. Challenge: high uptime • We want continuous usage of SSP – Even while upgrading bucket versions – So there can be multiple versions running simultaneously • Take care of creating closures • Atomic behavior per GID 34
  • 35. Schedule 1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned 36
  • 36. Performance • Currently we run SSP in ´ shadow´mode, so no real data yet. Making realistic benchmarks is quite a lot of work. • Latency (local machine): – 6-26ms to do a GET request on a primary key (cache miss) – 0.6ms with a cache hit – Cache stores Erlang terms currently (term_to_binary) • Always read from cache – Does not detect changes in storage done outside SSP 37
  • 37. Performance • Requests (local): – Getting from cache at about 13.5K req/sec • elibs_benchmark:test_fun(gidlog_get, fun() -> gidlog:get(123456) end, 10, 10000). – Getting from mysql about 615 req/sec incl cache miss • elibs_benchmark:test_fun(gidlog_get, fun() -> {_,_,C} = os:timestamp(), gidlog:get(C) end, 10, 100). – ~2 SSP machines can saturate a MySQL machine – 8K writes/sec for 2 MySQL + 4 SSP machines (old hardware) 38
  • 38. Schedule 1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned 39
  • 39. Lessons learned (1) • There are many good Open Source libraries • Emysql : we have added transaction support • Eep0018 : fast json encoder/decoder (yajl c++) • Estatsd : graphite-capable monitoring • Poolboy : Erlang worker pool factory (for memcached) • Twig/Lager : logging (syslog) 40
  • 40. Lessons learned (2) • Mnesia is great to replicate state across machines • Faster local lookups • Less error prone • Encapsulate all Mnesia usage in a module • Adding nodes to Mnesia • Use ram_copies • Transactions are great • We deploy an Erlang cluster (with Mnesia replication) only inside a single DataCenter • Not across unreliable connections! 41
  • 41. Lessons learned (3) • XML + XSD + XSLT are great to define API • They might have a bad name, but work great • Can transform in any other format • Used to generate documentation Todo: • generate more code (Buckets) • write gen_bucket behaviour • don´ start with generating code t 42
  • 42. Lessons learned (4) • Rebar is great • Compilation is pretty convenient, but the best part are the “dependencies” • Also the worst part  • We have proposed two improvements: • Allow different projects to share dependencies (major speedup for compiling) • Smarter version conflict resolution (semantic versioning: [ “>= 1.3.1”, “< 2.0.0” ] ) 43
  • 43. Lessons learned (5) • We use #records{} for all APIs – Piqi input/output – Stable and well-defined – Will move to ProtocolBuffers • Use OTP applications everywhere – Start/stop stuff – See started apps: application:which_applications() • Terminate on fatal errors – Memcached down : terminate all buckets, don´t try to recover (prevent overload DB) 44
  • 44. Lessons learned (6) • You need to add admin/monitoring interface 45
  • 45. Open Source We will not open-source SSP, but we do actively contribute to libraries used in SSP (so far Emysql, Rebar, Piqi) 46
  • 46. THANKS! Questions? Thijs.Terlouw@spilgames.com 47