SlideShare a Scribd company logo
1 of 30
Download to read offline
SCALING LoL CHAT 
TO 70 MILLION PLAYERS 
Michal Ptaszek, @michalptaszek 
Riot Games
WHAT’S PLANNED 
1 2 3 4 
GAME CHAT TECH LESSONS 
LEARNED 
5 
Q&A
WHAT IS LEAGUE OF LEGENDS? 
2009 
LAUNCH 
TEAM 
ORIENTED 
100+ 
CHAMPS 
MODERN 
FANTASY
MESSAGING SERVICE 
Private player chat and group chats. 
PRESENCE SERVICE 
Friend lists, availability and status. 
SOCIAL GRAPH SERVICE 
Internal service for store, match history, leagues. 
CHAT 
WHAT IS IT?
CHAT 
WHAT IS IT?
CHAT BY THE NUMBERS 
67 million 
monthly 
players 
27 million 
daily 
players 
7.5 million 
concurrent 
players 
1 billion 
events 
routed per 
server, per 
day
CHAT AT 10K FEET 
STABLE, SCALABLE CHAT SERVICE 
PROTOCOL DATA 
SERVER STORE
CHAT AT 10K FEET 
STABLE, SCALABLE CHAT SERVICE 
DATA 
PROTOCOL SERVER STORE
PROTOCOL: XMPP 
Decentralized 
Architecture 
Openness 
Extensibility 
Availability of 
Client 
Libraries 
Security Wide 
Adoption
CHAT AT 10K FEET 
STABLE, SCALABLE CHAT SERVICE 
DATA 
PROTOCOL SERVER STORE
SERVER: EJABBERD 
‣ Open source Jabber/XMPP server 
‣ Relatively nice scalability and performance with default configuration 
‣ Wide adoption and active, helpful community 
‣ Very good as a starting point for our own server solution 
▾ We were aware that one day we would need to start customizing it 
‣ Written in Erlang programming language
TECHNOLOGY: ERLANG/OTP 
Erlang is... 
Which gives us... 
A functional language 
Built with concurrency and 
distribution in mind 
Able to scale extremely well 
Capable of reloading code on the fly 
A declarative style of programming 
An easier way to build our 
distributed applications 
More time to focus on coding 
Less downtime
SERVER: EJABBERD - PHILOSOPHY 
Share nothing approach; enables massive, near linear 
horizontal scalability. ARCHITECTURE 
Implementation of self-healing properties, which bring the 
system to a well-known, stable state. 
FAULT 
TOLERANCE 
When something is massively broken - do not fix it! LET IT 
CRASH
SERVER: EJABBERD - ARCHITECTURE 
ETL Queries 
Secondary 
Riak Cluster 
External Traffic (5223) 
Internal Traffic 
Riak Riak 
Ejabberd 
Server 
Ejabberd LB 
Server
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 1 - MAKE IT WORK 
‣ Over time mostly rewritten 
‣ Removed unwanted and unneeded 
parts 
‣ Optimized certain flow paths 
‣ Make it compatible with industry 
standards 
‣ Wrote over 600 tests to cover it 
Invite 
Alice Bob 
Accept 
Alice Bob 
Invite 
Alice Bob 
Accept 
Alice Bob 
Alice Bob
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 1 - MAKE IT WORK 
‣ Over time mostly rewritten 
‣ Removed unwanted and unneeded 
parts 
‣ Optimized certain flow paths 
‣ Make it compatible with industry 
standards 
‣ Wrote over 600 tests to cover it 
Invite 
Alice Bob 
Accept 
Alice Bob 
Alice Bob
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 2: MAKE IT RIGHT 
‣ Removed clear bottlenecks 
‣ Avoid shared, mutable state 
‣ “Make it work, make it right, make it 
fast” 
MUC 
router 
user 
sesussioenr 
sesussioenr 
session 
MUC 
room 
user 
sesussioenr 
sesussioenr 
session 
user 
sesussioenr 
sesussioenr 
session 
MUC 
room 
MUC 
room
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 2: MAKE IT RIGHT 
‣ Removed clear bottlenecks 
‣ Avoid shared, mutable state 
‣ “Make it work, make it right, make it 
fast” 
user 
sesussioenr 
sesussioenr 
session 
MUC 
room 
user 
sesussioenr 
sesussioenr 
session 
user 
sesussioenr 
sesussioenr 
session 
MUC 
room 
MUC 
room
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 2: MAKE IT RIGHT 
‣ Removed clear bottlenecks 
‣ Avoid shared, mutable state 
‣ “Make it work, make it right, make it 
fast” 
Session Table: 
JID -> Session Handler 
session table 
Alice 
Bob Charlie
SERVER: EJABBERD - IMPLEMENTATION 
PHASE 3 - MAKE IT FAST 
‣ Patched VM and stdlibs 
‣ Sacrificing generic nature of 
Erlang/OTP framework in favor of 
better scalability and fault tolerance 
‣ Better traceability and profiling 
functions 
‣ More visibility into the system 
‣ Improved logging for code reloading 
and real time system upgrades
CHAT AT 10K FEET 
STABLE, SCALABLE CHAT SERVICE 
PROTOCOL SERVER DATA 
STORE
NOSQL 
DATA STORE: RIAK 
SCALE Linearly 
scalable 
No growth 
headaches 
FAULT 
Higher 
TOLERANCE No SPoF uptime 
SCHEMA-LESS 
Faster 
feature 
iterations 
More 
shipped 
features 
‣ Distributed, fault-tolerant, 
key-value store 
‣ Masterless, fully peer-to-peer 
architecture 
‣ AP in CAP theorem, with 
eventual consistency 
‣ Low, predictable latency 
‣ Extreme scalability 
‣ Multi data center 
replication
LESSONS LEARNED 
UNDERSTAND YOUR SYSTEM 
‣ Over 500 real-time 
counters, rates, histograms 
collected each minute 
‣ Make sure to know counter 
values for “correct” and 
“abnormal” conditions 
‣ Alerts and logs for long 
running operations 
‣ Integration with Graphite, 
Zabbix and Nagios
IMPLEMENT FEATURE TOGGLES 
LESSONS LEARNED 
‣ Safety valve for 
things that might 
cause problems 
‣ Partial deployments 
allowing features to 
be enabled only for 
certain groups of 
people 
Alice Bob Charlie 
group reordering 
feature 
whitelist: Bob 
Bob
SUPPORT CODE RELOADING 
‣ Patching bugs on the 
fly 
‣ Changing server 
configuration 
‣ Collecting data for 
future analysis 
‣ No downtime 
deploys 
LESSONS LEARNED 
buggy 
code 
fixed 
code 
server 
restart 
buggy 
code 
fixed 
code
GET YOUR LOGGING RIGHT 
LESSONS LEARNED 
‣ Proper logging and 
tracing facilities 
‣ Debug modes for 
selected users 
‣ Tools for analysis of 
the collected data 
Alice 
ejabberd.log slow_db.log 
trace_alice.log 
roster_audit.log muc_audit.log 
Honu
ALWAYS LOAD TEST YOUR CODE 
‣ Automatic verification 
of the latest builds 
‣ Collecting historical 
results for comparison 
‣ Measuring the impact 
of new features and 
changes to the code 
‣ Simulating various 
failures 
LESSONS LEARNED
THINGS WILL FAIL 
LESSONS LEARNED 
‣ Prepare for the worst 
‣ It’s just a matter of 
time for crash to 
happen 
‣ It’s not only our code 
that fails 
‣ Unlikely events 
happen every second 
under given scale
CHAT IS DOING GREAT! 
The quality uptime is over 99% each month, and is increasing, with hundreds 
of servers deployed all over the world. 
SCALE AND PERFORMANCE 
Each server offer reliable, low latency to the players, routing over 1B events 
a day with low resource utilization. 
CHAT IS EVOLVING 
Rolling out Riak worldwide, making LoL Chat available outside of the client, 
explore possibilities around using social graph data, and more... 
CURRENT 
SITUATION
THANK YOU! 
ANY QUESTIONS?

More Related Content

What's hot

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 

What's hot (20)

Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
검색엔진이 데이터를 다루는 법 김종민
검색엔진이 데이터를 다루는 법 김종민검색엔진이 데이터를 다루는 법 김종민
검색엔진이 데이터를 다루는 법 김종민
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링Apache kafka 관리와 모니터링
Apache kafka 관리와 모니터링
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 

Viewers also liked

Marketing Portfolio
Marketing PortfolioMarketing Portfolio
Marketing Portfolio
Gary Little
 
3r tema 1 com som . cos humà
3r tema 1 com som . cos humà3r tema 1 com som . cos humà
3r tema 1 com som . cos humà
nalsina
 

Viewers also liked (13)

Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
Big Data at Riot Games – Using Hadoop to Understand Player Experience - Stamp...
 
Let's Chat about Chat - RICON 2014
Let's Chat about Chat - RICON 2014 Let's Chat about Chat - RICON 2014
Let's Chat about Chat - RICON 2014
 
Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014
 
ECU Masterclass slides August 2014
ECU Masterclass slides August 2014ECU Masterclass slides August 2014
ECU Masterclass slides August 2014
 
ประกาศสอบ
ประกาศสอบประกาศสอบ
ประกาศสอบ
 
Marketing Portfolio
Marketing PortfolioMarketing Portfolio
Marketing Portfolio
 
Presentacion IniciativasEC3 Daniel Torres
Presentacion IniciativasEC3 Daniel TorresPresentacion IniciativasEC3 Daniel Torres
Presentacion IniciativasEC3 Daniel Torres
 
Oris Watches
Oris WatchesOris Watches
Oris Watches
 
3r tema 1 com som . cos humà
3r tema 1 com som . cos humà3r tema 1 com som . cos humà
3r tema 1 com som . cos humà
 
Presentación SocietalImpact Daniel Torres
Presentación SocietalImpact Daniel Torres Presentación SocietalImpact Daniel Torres
Presentación SocietalImpact Daniel Torres
 
Junior java standard edition developer
Junior java standard edition developerJunior java standard edition developer
Junior java standard edition developer
 
V miss u sweetheart!!
V miss u sweetheart!!V miss u sweetheart!!
V miss u sweetheart!!
 
Kruche presentation 2015
Kruche presentation 2015Kruche presentation 2015
Kruche presentation 2015
 

Similar to Scaling LoL Chat to 70M Players

Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
Morgan Tocker
 

Similar to Scaling LoL Chat to 70M Players (20)

Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe BarcelonaKubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes Failure Stories - KubeCon Europe Barcelona
 
Into The Box 2018 Ortus Keynote
Into The Box 2018 Ortus KeynoteInto The Box 2018 Ortus Keynote
Into The Box 2018 Ortus Keynote
 
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SFWebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
WebRTC Infrastructure scalability notes - Geek'n Kranky - June 2014 @ Google SF
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
Compliance Automation with InSpec - Chef NYC Meetup - April 2017
Compliance Automation with InSpec - Chef NYC Meetup - April 2017Compliance Automation with InSpec - Chef NYC Meetup - April 2017
Compliance Automation with InSpec - Chef NYC Meetup - April 2017
 
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
 
Increasing velocity via serless semantics
Increasing velocity via serless semanticsIncreasing velocity via serless semantics
Increasing velocity via serless semantics
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)
 
Accelerate Your OpenStack Deployment
Accelerate Your OpenStack Deployment Accelerate Your OpenStack Deployment
Accelerate Your OpenStack Deployment
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
M|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With AutomationM|18 How DBAs at TradingScreen Make Life Easier With Automation
M|18 How DBAs at TradingScreen Make Life Easier With Automation
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
 

Recently uploaded

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Recently uploaded (20)

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 

Scaling LoL Chat to 70M Players

  • 1. SCALING LoL CHAT TO 70 MILLION PLAYERS Michal Ptaszek, @michalptaszek Riot Games
  • 2. WHAT’S PLANNED 1 2 3 4 GAME CHAT TECH LESSONS LEARNED 5 Q&A
  • 3. WHAT IS LEAGUE OF LEGENDS? 2009 LAUNCH TEAM ORIENTED 100+ CHAMPS MODERN FANTASY
  • 4. MESSAGING SERVICE Private player chat and group chats. PRESENCE SERVICE Friend lists, availability and status. SOCIAL GRAPH SERVICE Internal service for store, match history, leagues. CHAT WHAT IS IT?
  • 6. CHAT BY THE NUMBERS 67 million monthly players 27 million daily players 7.5 million concurrent players 1 billion events routed per server, per day
  • 7. CHAT AT 10K FEET STABLE, SCALABLE CHAT SERVICE PROTOCOL DATA SERVER STORE
  • 8. CHAT AT 10K FEET STABLE, SCALABLE CHAT SERVICE DATA PROTOCOL SERVER STORE
  • 9. PROTOCOL: XMPP Decentralized Architecture Openness Extensibility Availability of Client Libraries Security Wide Adoption
  • 10. CHAT AT 10K FEET STABLE, SCALABLE CHAT SERVICE DATA PROTOCOL SERVER STORE
  • 11. SERVER: EJABBERD ‣ Open source Jabber/XMPP server ‣ Relatively nice scalability and performance with default configuration ‣ Wide adoption and active, helpful community ‣ Very good as a starting point for our own server solution ▾ We were aware that one day we would need to start customizing it ‣ Written in Erlang programming language
  • 12. TECHNOLOGY: ERLANG/OTP Erlang is... Which gives us... A functional language Built with concurrency and distribution in mind Able to scale extremely well Capable of reloading code on the fly A declarative style of programming An easier way to build our distributed applications More time to focus on coding Less downtime
  • 13. SERVER: EJABBERD - PHILOSOPHY Share nothing approach; enables massive, near linear horizontal scalability. ARCHITECTURE Implementation of self-healing properties, which bring the system to a well-known, stable state. FAULT TOLERANCE When something is massively broken - do not fix it! LET IT CRASH
  • 14. SERVER: EJABBERD - ARCHITECTURE ETL Queries Secondary Riak Cluster External Traffic (5223) Internal Traffic Riak Riak Ejabberd Server Ejabberd LB Server
  • 15. SERVER: EJABBERD - IMPLEMENTATION PHASE 1 - MAKE IT WORK ‣ Over time mostly rewritten ‣ Removed unwanted and unneeded parts ‣ Optimized certain flow paths ‣ Make it compatible with industry standards ‣ Wrote over 600 tests to cover it Invite Alice Bob Accept Alice Bob Invite Alice Bob Accept Alice Bob Alice Bob
  • 16. SERVER: EJABBERD - IMPLEMENTATION PHASE 1 - MAKE IT WORK ‣ Over time mostly rewritten ‣ Removed unwanted and unneeded parts ‣ Optimized certain flow paths ‣ Make it compatible with industry standards ‣ Wrote over 600 tests to cover it Invite Alice Bob Accept Alice Bob Alice Bob
  • 17. SERVER: EJABBERD - IMPLEMENTATION PHASE 2: MAKE IT RIGHT ‣ Removed clear bottlenecks ‣ Avoid shared, mutable state ‣ “Make it work, make it right, make it fast” MUC router user sesussioenr sesussioenr session MUC room user sesussioenr sesussioenr session user sesussioenr sesussioenr session MUC room MUC room
  • 18. SERVER: EJABBERD - IMPLEMENTATION PHASE 2: MAKE IT RIGHT ‣ Removed clear bottlenecks ‣ Avoid shared, mutable state ‣ “Make it work, make it right, make it fast” user sesussioenr sesussioenr session MUC room user sesussioenr sesussioenr session user sesussioenr sesussioenr session MUC room MUC room
  • 19. SERVER: EJABBERD - IMPLEMENTATION PHASE 2: MAKE IT RIGHT ‣ Removed clear bottlenecks ‣ Avoid shared, mutable state ‣ “Make it work, make it right, make it fast” Session Table: JID -> Session Handler session table Alice Bob Charlie
  • 20. SERVER: EJABBERD - IMPLEMENTATION PHASE 3 - MAKE IT FAST ‣ Patched VM and stdlibs ‣ Sacrificing generic nature of Erlang/OTP framework in favor of better scalability and fault tolerance ‣ Better traceability and profiling functions ‣ More visibility into the system ‣ Improved logging for code reloading and real time system upgrades
  • 21. CHAT AT 10K FEET STABLE, SCALABLE CHAT SERVICE PROTOCOL SERVER DATA STORE
  • 22. NOSQL DATA STORE: RIAK SCALE Linearly scalable No growth headaches FAULT Higher TOLERANCE No SPoF uptime SCHEMA-LESS Faster feature iterations More shipped features ‣ Distributed, fault-tolerant, key-value store ‣ Masterless, fully peer-to-peer architecture ‣ AP in CAP theorem, with eventual consistency ‣ Low, predictable latency ‣ Extreme scalability ‣ Multi data center replication
  • 23. LESSONS LEARNED UNDERSTAND YOUR SYSTEM ‣ Over 500 real-time counters, rates, histograms collected each minute ‣ Make sure to know counter values for “correct” and “abnormal” conditions ‣ Alerts and logs for long running operations ‣ Integration with Graphite, Zabbix and Nagios
  • 24. IMPLEMENT FEATURE TOGGLES LESSONS LEARNED ‣ Safety valve for things that might cause problems ‣ Partial deployments allowing features to be enabled only for certain groups of people Alice Bob Charlie group reordering feature whitelist: Bob Bob
  • 25. SUPPORT CODE RELOADING ‣ Patching bugs on the fly ‣ Changing server configuration ‣ Collecting data for future analysis ‣ No downtime deploys LESSONS LEARNED buggy code fixed code server restart buggy code fixed code
  • 26. GET YOUR LOGGING RIGHT LESSONS LEARNED ‣ Proper logging and tracing facilities ‣ Debug modes for selected users ‣ Tools for analysis of the collected data Alice ejabberd.log slow_db.log trace_alice.log roster_audit.log muc_audit.log Honu
  • 27. ALWAYS LOAD TEST YOUR CODE ‣ Automatic verification of the latest builds ‣ Collecting historical results for comparison ‣ Measuring the impact of new features and changes to the code ‣ Simulating various failures LESSONS LEARNED
  • 28. THINGS WILL FAIL LESSONS LEARNED ‣ Prepare for the worst ‣ It’s just a matter of time for crash to happen ‣ It’s not only our code that fails ‣ Unlikely events happen every second under given scale
  • 29. CHAT IS DOING GREAT! The quality uptime is over 99% each month, and is increasing, with hundreds of servers deployed all over the world. SCALE AND PERFORMANCE Each server offer reliable, low latency to the players, routing over 1B events a day with low resource utilization. CHAT IS EVOLVING Rolling out Riak worldwide, making LoL Chat available outside of the client, explore possibilities around using social graph data, and more... CURRENT SITUATION
  • 30. THANK YOU! ANY QUESTIONS?