SlideShare ist ein Scribd-Unternehmen logo
1 von 67
RIOT GAMES
TRACKING YORDLES THROUGH 630 MILLION
  MINUTES OF HARDCORE GAMING A DAY



        BARRY LIVINGSTON & DANI RAYAN
1


2


3


4
    INTRODUCTION
5


6


7
1
            ABOUT THE SPEAKERS
    INTRO




2


3


4


5


6


7
1
            THIS PRESENTATION IS ABOUT…
    INTRO




2
            •   The history of Riot’s data warehouse
3           •   Why we incorporated Hadoop
            •   Our high level architecture
4           •   Usecases Hadoop has enabled
            •   Lessons learned
5
            •   Where we’re headed
6


7
1
                         WHO?
    INTRO




2
      • Developer and publisher of League of Legends
3     • Founded 2006 by gamers for gamers
      • Player experience focused – requires data
4


5


6


7
1   INTRO




2
            4.2 MILLION     32.5 MILLION
                    DAILY   REGISTERED

3


4


5
            1.3 MILLION     11.5 MILLION
              CONCURRENT    MONTHLY
6


7
1


2


3


4
    HISTORY
5


6


7
1
              MEET ANDY HO
2   HISTORY

                       “With enough data,
                      even simple questions
3
                         become difficult
                           questions”
4


5


6


7
1                     SCRAPPY START-UP PHASE

2   HISTORY    START-UP



3
              • One initial beta environment for North America
              • Queries done directly off production MySQL slaves
4             • This is obviously not a good practice


5


6


7
1                  AROUND OUR INITIAL LAUNCH
                               INITIAL
2   HISTORY       START-UP
                              LAUNCH



3             •   Moved to a dedicated, single MySQL instance for the DW
              •   Data ETL’d from production slaves into this instance (by Andy)
4             •   Queries run in MySQL (by Andy)
              •   Reporting was done in Excel (by Andy)

5


6                            This worked great!

7
1                  THEN WE STARTED GROWING
                               INITIAL
2   HISTORY     START-UP
                              LAUNCH
                                             GROWTH



3             • Resources were focused elsewhere
                 – We had competition
                 – Focused on producing features and scaling our systems
4             • Opened EU environment June 2010
              • Needed something speedy – created parallel installation
                 – This was bad
5
                 – But we could still get the answers we wanted


6


7
1                       AND THEN – CRAZY GROWTH!
                                   INITIAL                CRAZY
2   HISTORY            START-UP
                                  LAUNCH
                                             GROWTH
                                                         GROWTH



3
     # unique logins




                                    TOTAL ACTIVE PLAYERS


4

                                                           4.2M
5                                                        NOV. 2011

                                             1.5MM
                                             JULY 2011

6


7
                                                                     time
1                          THE BREAKING POINT
                              INITIAL                     CRAZY        BREAKING
2   HISTORY     START-UP
                             LAUNCH
                                           GROWTH
                                                         GROWTH          POINT



3             • NA Data Warehouse reached a breaking point 9 months ago
                – 24 hours of data took 24.5 hours to ETL
              • We couldn’t handle…
4               – multiple environments in a vertical MySQL instance
                – a single environment in a vertical MySQL instance
5             • We needed to change!


6


7
1


2


3


4
    SOLUTION
5


6


7
1
                      WHY HADOOP?
2                           COST EFFECTIVE
                Expanding rapidly, so CAPEX was a concern

3   SOLUTION

                                SCALABLE
               Handles massive data sets and diverse data sets
4                   (both structured and unstructured)


                             OPEN SOURCE
5
                   Our engineers can dive into problems

6                       SPEED OF EXECUTION
                          We needed to move fast!
7
1
     HIGH LEVEL ARCHITECTURE – CURRENT
                                                                      Business
2                Audit    Plat
                                                                      Analyst



                 LoL
                                                                    Tableau


3   SOLUTION   NORTH AMERICA

                                  Pentaho
                 Audit    Plat       +
                                 Custom ETL   Hive Data   Pentaho   MySQL
4                                    +
                                              Warehouse
                 LoL
                                   Sqoop
               EUROPE


5
                 Audit    Plat



                  LoL

6                                              Analysts
               KOREA




7
1
                        WHAT MAKES UP OUR ETL
                                                                      Business
2                Audit    Plat
                                                                      Analyst



                 LoL
                                                                    Tableau


3   SOLUTION   NORTH AMERICA

                                  Pentaho
                 Audit    Plat       +
                                 Custom ETL   Hive Data   Pentaho   MySQL
4                                    +
                                              Warehouse
                 LoL
                                   Sqoop
               EUROPE


5
                 Audit    Plat



                  LoL

6                                              Analysts
               KOREA




7
1
                WHAT MAKES UP OUR ETL
2


3   SOLUTION

                Pentaho
                                          All of these orchestrated by Pentaho
                   +
               Custom ETL
4                  +
                 Sqoop
                            We use Sqoop for staging data only

5
                            Then dynamically partition data into Hive tables


6


7
1
                        WHAT MAKES UP OUR ETL
                                                                      Business
2                Audit    Plat
                                                                      Analyst



                 LoL
                                                                    Tableau


3   SOLUTION   NORTH AMERICA

                                  Pentaho
                 Audit    Plat       +
                                 Custom ETL   Hive Data   Pentaho   MySQL
4                                    +
                                              Warehouse
                 LoL
                                   Sqoop
               EUROPE


5
                 Audit    Plat



                  LoL

6                                              Analysts
               KOREA




7
1
                       WHAT MAKES UP OUR ETL
                                                          Hive Data Warehouse
2


3   SOLUTION
                   Data         Temp
                               Staging
4                               Area



5
               1
                   Data written into
                   temp staging area
6
                             Prevents analysts from running queries out of partially written tables

                             Helps us leverage Hive’s merging and compression settings
7
1
                         WHAT MAKES UP OUR ETL
                                                             Hive Data Warehouse
2
                                                                     Partition A

3   SOLUTION                                                         Partition B
                      Data         Temp
                                  Staging                            Partition C

4                                  Area
                                                                     Partition D


                                                                     Partition E
5
               2
                     Hive dynamically
                     inserts data into
6                  appropriate partitions

                                According to value generated for partition key in the target table

7                               Non-existent partitions will be created by Hive
1
                     WHAT MAKES UP OUR ETL
                                                          Hive Data Warehouse
2
                                                                                  Partition A1
                                                                  Partition A     Partition A2
                                                                                  Partition A3
                                                                                  Partition B1
3   SOLUTION                                                      Partition B     Partition B2
                   Data         Temp
                                                                                  Partition B3
                                                                                  Partition C1
                               Staging                           Partition C      Partition C2
                                                                                  Partition C3

4                               Area                                              Partition D1
                                                                 Partition D      Partition D2
                                                                                  Partition D3
                                                                                  Partition E1
                                                                  Partition E     Partition E2
                                                                                  Partition E3
5
               3
                 Layered partitioning
                   = very helpful for
6              region-based partitioning

                             Helps maintain one table definition across regions
7
1
                  WHAT MAKES UP OUR ETL
                                      Hive Data Warehouse
2


3   SOLUTION
                Data    Temp
                       Staging
4                       Area



5
               TO OPTIMIZE DISK IO FOR USER QUERIES,
6              WE ENABLED COMPRESSION


7
Hive Data Warehouse
1

                   Data    Temp
2                         Staging
                           Area




3   SOLUTION

               WHY COMPRESSION?
                    We have 24 cores and disk IO is always the bottleneck,
4                   so compression is essential

               WHY SNAPPY COMPRESSED
5              SEQUENCEFILE BLOCKS?
                    Lots of “why Snappy” discussion on the interwebs already

                    SequenceFile can be split by Hadoop and can run
6
                    multiple maps in parallel
                    Block compression yields better compression ratio while
                    keeping the file splittable; this block size is configurable
7
1
                  WHAT WE DO IN HIVE
2


3   SOLUTION




4
                              Hive Data
                              Warehouse

5

               We ETL data from OLTP MySQL slaves daily
6


7
1
                   WHAT WE DO IN HIVE
2                         Our analysts shoot
                            Hive queries
                              every day
3   SOLUTION




4
                              Hive Data
                              Warehouse

5

               Translating to 1000s of MR jobs daily
6


7
1
                      WHAT WE DO IN HIVE
2
                  We have some pretty large tables:

3   SOLUTION




4                                                     e.g., one with 50,795,997,734 rows
                                   Hive Data
                                   Warehouse

5

                   We use metrics derived from Hive queries to
6              improve our matchmaking system and player behavior


7
1
               WHAT DID WE LEARN FROM ETL?
2              • If you use custom ETL, keep an eye out for block distribution
               • DRY: Re-inventing the wheel is not a good idea
3   SOLUTION      – Invest time in researching proper tools that suit your needs
                  – Tons of options for ETL and workflow management
                  – Just because company X is using a particular ETL or workflow
4                   management tool, it may or may not work effectively for you


5


6


7
1
                                 WHY TABLEAU?
                                                                       Business
2                Audit    Plat
                                                                       Analyst



                 LoL
                                                                     Tableau


3   SOLUTION   NORTH AMERICA

                                   Pentaho
                 Audit    Plat        +
                                  Custom ETL   Hive Data   Pentaho   MySQL
4                                     +
                                               Warehouse
                 LoL
                                    Sqoop
               EUROPE


5
                 Audit    Plat



                  LoL

6                                               Analysts
               KOREA




7
1
                            WHY TABLEAU?
                 Business
2                Analyst


                              • We needed to democratize access for
               Tableau
                                non-technical folks
3   SOLUTION
                                  – Design
                                  – Execs
               MySQL              – Player Support
4
                              • Great visualization capability
                              • Easy to work with
5                             • Has a Hive connector*


6


7
1
     LEAGUE OF LEGENDS GAMEPLAY BASICS
2


3   SOLUTION




4


5


6


7
1


2


3   SOLUTION




4


5


6


7
1


2


3   SOLUTION




4


5


6


7
1


2


3        USECASE # 1
4   THE STORY OF SHEN
5


6


7
1


2


3   WAIT, SO WHAT’S A YORDLE?
    • Yordles = very cute race of champions in League of Legends
4
    • We track Yordles (and the rest of our champions) because game
      balance is exceptionally important
5


6


7
1
              DESIGN BALANCE IS IMPORTANT
2
              • Highly competitive game
              • Updated every 2-3 weeks
3
                 – New champions
                 – New items
4
    USECASE
       #1
              • Game is a living, breathing service that’s always in motion
              • Have to maintain a level playing field

5


6


7
1
                   QUICKLY REACTING TO CHANGES
2                                          = wins




3


    USECASE
4      #1




5


6
     total plays




7                                              time
1
              HOW DID WE CREATE THAT?
2


3


    USECASE
4      #1




5


6


7
                         *All logos are trademarks of respective owners
1
              WHY NOT JUST HIVE?
2


3


    USECASE
4      #1




5


6


7
                       *All logos are trademarks of respective owners
1
                  WHY NOT JUST HIVE?
2


3

              HIVE IS FOR
              MASSIVE JOBS
    USECASE
4      #1




5


6


7
1
          HIVE TO MYSQL TRANSFORMATION
2             • Many of our stakeholders use Tableau
              • Transformed required data into cubes for direct Tableau
                consumption using Pentaho
3             • Initially experimented with Hive-to-Tableau connector
                 – Had issues, e.g., triggering MR jobs for every change and non-
    USECASE
                   persistent Hive-Server
4      #1




5


6


7
1
    WE WANTED TO KNOW MORE ABOUT…
2

              Which champions and skins are popular across all regions?
3


    USECASE
4      #1
                   What are the win-rates of champions across all regions?

5

                Are better players choosing different champions?
6


7
1
     WE CREATED CUBES OF AGGREGATED DATA
2




                                               win rates
3


    USECASE
4      #1




5


6

                                   champions
7
1
     HOW WE DID IT: TRANSFORMATION++
2
              Massive tables
              reside in Hive
3
                  Hive                                     MySQL                         TABLEAU
                                  transformation                       transformed
                                      creates                         into cubes for
    USECASE
4      #1
                                  dimension tables                 Tableau consumption




5


6                              Some dimension tables
                                 moved to join with
                               other fact tables in Hive

7
1
              WHY DID WE GO THIS ROUTE?
2


3
              Not good for slowly changing       MySQL is not awesome for joining
              dimensions                         massive tables
    USECASE   • No automatic primary key
4      #1
                 generation
              • Can’t regenerate dimension
                 table quickly enough since it
                 requires a full-table scan
5


6                  • Decided to use best of both worlds
                   • Also leveraged map-side joins and distributed cache
7
1


2


3
         USECASE #2
4   MATCHMAKING AND
    REGIONAL METRICS
5


6


7
1
                     FIRST, SOME CONTEXT
2             • League of Legends is global in scale, with players
                 logging in from >145 countries in a typical day
3
              • No-fee play means very low barrier to play
              • Players often play on multiple environments regularly
                (e.g. EU players on NA environments and vice versa)
4             • Same features and mechanics deployed in all territories
              • It’s vitally important that we understand game
5   USECASE
       #2
                 performance metrics by geography and region


6


7
1
                           MATCHMAKING
2         • One of the most important features outside of gameplay
          • Like a dating service, the objective is to match people up;
3         • Number of different queues that players can line up in, depending
            on the type of match they’re looking for

4


    USECASE
5      #2




6               Critical that this system is balanced
                                             balanced
              and able to create good matches quickly
7
1
              MATCHMAKING – IS IT WORKING?
2             • Matchmaking algorithm based on modified Elo system
              • Inspecting the “curve” of these scores:
3                – Should show a similar distribution in all regions
                 – May show interesting trends, such as win/lose ratios
4


    USECASE
5      #2




6


7
1
              MATCHMAKING – IS IT WORKING?
2
          % players
                      ELO DISTRIBUTION GRAPH


3


4


    USECASE
5      #2




6


7
                                               ELO score
1
      WHAT WAS NEEDED TO GENERATE IT?
              1
2                 Had to join massive tables with session and player data


                      MASSIVE              MASSIVE               MASSIVE
3                      TABLE                TABLE                 TABLE
                       WITH                 WITH                  WITH
                     SESSION               PLAYER                GAME
4                      DATA                 DATA                  DATA

              2
    USECASE       Needed to lookup and range-query IP-addresses in same join
5      #2

                              Required for many region-based metrics

6


7
1
                  LIMITATIONS OF HIVE
2
                                 Hive

3


4             No good indexing           Not efficient for
              mechanism in our          lookup and range
                  version                    queries
    USECASE
5      #2




6    This made region-based queries computationally difficult

7
1
                                 SOLUTION
2
                                     Hive

3
              leveraged
              open-source
4             libraries online
                                  GeoIP UDFs

    USECASE                         UDFs = user-defined functions that one
5      #2
                                           can add to the Hive interpreter


6


7
1


2


3


4
    LESSONS
5


6


7
1


2


3


4


5


6   LESSONS




7
1


2


3


4


5


6   LESSONS




7
1


2


3


4


5


6   LESSONS




7
1


2


3


4


5


6   LESSONS




7
1


2


3


4


5


6   LESSONS




7
1


2


3


4
    THE FUTURE
5


6


7
1
                   OUR IMMEDIATE GOALS
2
             •   Shorten time to insight
             •   Increase depth of insight
3
             •   Enable data analysis for client-side features
             •   Log ingestion and analysis
4
             •   Flexible auditing framework
             •   International data infrastructure
5


6


      THE
7   FUTURE
1
             CHALLENGE: MAKE IT GLOBAL
2        • Data centers across the globe since latency has huge effect on
           gameplay  log data scattered around the world
3
         • Large presence in Asia -- some areas (e.g., PH) have bandwidth
           challenges or bandwidth is expensive

4


5


6


      THE
7   FUTURE
1
             CHALLENGE: WE HAVE BIG DATA
                   STRUCTURED DATA
2
                   500G DAILY
                   APPLICATION AND OPERATIONAL LOGS
3
                   4.5TB DAILY
4                  OFFICIAL LOL SITE TRAFFIC
                   6MM HITS DAILY
5                  RIOT YOUTUBE CHANNEL
                   1.7MM SUBSCRIBERS
                   270+MM VIEWS
6
                   + chat logs
                   + detailed gameplay event tracking
7     THE
    FUTURE
                   + so on….
1
               OUR AUDACIOUS GOALS
2
             Build a world-class data and analytics organization
             • Deeply understand players across the globe
             • Apply that understanding to improve games for players
3
             • Deeply understand our entire ecosystem, including social media


4            Have ability to identify, understand and react to
             meaningful trends in real time
5
             Have deep, real-time understanding of our systems
             from player experience and operational standpoints
6


      THE
7   FUTURE
1
                  SHAMELESS HIRING PLUG
2            • Like most everybody else at this conference… we’re
               hiring!
3
             • The Riot Manifesto

                        Player experience first
4                       Challenge convention

                        Focus on talent and team
5
                        Take play seriously

                        Stay hungry, stay humble
6


      THE
7   FUTURE
1
             SHAMELESS HIRING PLUG
2


3


4


5


6


      THE
                   And yes, you can play games at work.
7   FUTURE
                                        It’s encouraged!
THANK YOU!
QUESTIONS?
     BARRY LIVINGSTON        &        DANI RAYAN
 blivingston@riotgames.com       drayan@riotgames.com

Weitere ähnliche Inhalte

Andere mochten auch

Automated Testing: Saving time and money
Automated Testing: Saving time and moneyAutomated Testing: Saving time and money
Automated Testing: Saving time and moneyBlayn Parkinson
 
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming""Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"StacyAntonova
 
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)Ryan Kam
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Come rilasciare App di Qualità
Come rilasciare App di QualitàCome rilasciare App di Qualità
Come rilasciare App di QualitàLuca Manara
 
Introduzione al Semantic Web
Introduzione al Semantic WebIntroduzione al Semantic Web
Introduzione al Semantic WebGiacomo Fiumara
 

Andere mochten auch (13)

Har Ki Doon
Har Ki DoonHar Ki Doon
Har Ki Doon
 
Automated Testing: Saving time and money
Automated Testing: Saving time and moneyAutomated Testing: Saving time and money
Automated Testing: Saving time and money
 
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming""Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"
"Creativity in PR" by A. Green chapter 6 "Greenlight thinking. Brainstorming"
 
32.perindagkop
32.perindagkop32.perindagkop
32.perindagkop
 
22.Kecamatan sitellu talli urang jehe
22.Kecamatan sitellu talli urang jehe22.Kecamatan sitellu talli urang jehe
22.Kecamatan sitellu talli urang jehe
 
1. lampiran 1 - 2013
1. lampiran 1 - 20131. lampiran 1 - 2013
1. lampiran 1 - 2013
 
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)
Social Spark Winter Case Challenge - 1st Place (Rotman Commerce)
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
TCP/IP
TCP/IPTCP/IP
TCP/IP
 
Venture Capital for Italy
Venture Capital for ItalyVenture Capital for Italy
Venture Capital for Italy
 
Come rilasciare App di Qualità
Come rilasciare App di QualitàCome rilasciare App di Qualità
Come rilasciare App di Qualità
 
Introduzione al Semantic Web
Introduzione al Semantic WebIntroduzione al Semantic Web
Introduzione al Semantic Web
 

Ähnlich wie Big Data At Riot Games - Hadoop Summit'12

Honu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesHonu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesJerome Boulon
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian
 
Scale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the ProgrammerScale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the ProgrammerPaul Cleary
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 
Peer council 2013_presentation
Peer council 2013_presentationPeer council 2013_presentation
Peer council 2013_presentationWiLS
 
Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011IPv6no
 
AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例Yasuhiro Horiuchi
 
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning TalksKiller Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning TalksAtlassian
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureDmitry Buzdin
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot GamesMatt Goeke
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?DATAVERSITY
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...Databricks
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problemsChristo Kutrovsky
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededchiportal
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
Hippo get together presentation solr integration
Hippo get together presentation   solr integrationHippo get together presentation   solr integration
Hippo get together presentation solr integrationHippo
 

Ähnlich wie Big Data At Riot Games - Hadoop Summit'12 (20)

Honu/Big Data @ Riot Games
Honu/Big Data @ Riot GamesHonu/Big Data @ Riot Games
Honu/Big Data @ Riot Games
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Scale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the ProgrammerScale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the Programmer
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Peer council 2013_presentation
Peer council 2013_presentationPeer council 2013_presentation
Peer council 2013_presentation
 
Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011
 
AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例
 
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning TalksKiller Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problems
 
Implementing OLE at Penn
Implementing OLE at PennImplementing OLE at Penn
Implementing OLE at Penn
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Is Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic neededIs Advanced Verification for FPGA based Logic needed
Is Advanced Verification for FPGA based Logic needed
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Hippo get together presentation solr integration
Hippo get together presentation   solr integrationHippo get together presentation   solr integration
Hippo get together presentation solr integration
 

Big Data At Riot Games - Hadoop Summit'12

  • 1. RIOT GAMES TRACKING YORDLES THROUGH 630 MILLION MINUTES OF HARDCORE GAMING A DAY BARRY LIVINGSTON & DANI RAYAN
  • 2. 1 2 3 4 INTRODUCTION 5 6 7
  • 3. 1 ABOUT THE SPEAKERS INTRO 2 3 4 5 6 7
  • 4. 1 THIS PRESENTATION IS ABOUT… INTRO 2 • The history of Riot’s data warehouse 3 • Why we incorporated Hadoop • Our high level architecture 4 • Usecases Hadoop has enabled • Lessons learned 5 • Where we’re headed 6 7
  • 5. 1 WHO? INTRO 2 • Developer and publisher of League of Legends 3 • Founded 2006 by gamers for gamers • Player experience focused – requires data 4 5 6 7
  • 6. 1 INTRO 2 4.2 MILLION 32.5 MILLION DAILY REGISTERED 3 4 5 1.3 MILLION 11.5 MILLION CONCURRENT MONTHLY 6 7
  • 7. 1 2 3 4 HISTORY 5 6 7
  • 8. 1 MEET ANDY HO 2 HISTORY “With enough data, even simple questions 3 become difficult questions” 4 5 6 7
  • 9. 1 SCRAPPY START-UP PHASE 2 HISTORY START-UP 3 • One initial beta environment for North America • Queries done directly off production MySQL slaves 4 • This is obviously not a good practice 5 6 7
  • 10. 1 AROUND OUR INITIAL LAUNCH INITIAL 2 HISTORY START-UP LAUNCH 3 • Moved to a dedicated, single MySQL instance for the DW • Data ETL’d from production slaves into this instance (by Andy) 4 • Queries run in MySQL (by Andy) • Reporting was done in Excel (by Andy) 5 6 This worked great! 7
  • 11. 1 THEN WE STARTED GROWING INITIAL 2 HISTORY START-UP LAUNCH GROWTH 3 • Resources were focused elsewhere – We had competition – Focused on producing features and scaling our systems 4 • Opened EU environment June 2010 • Needed something speedy – created parallel installation – This was bad 5 – But we could still get the answers we wanted 6 7
  • 12. 1 AND THEN – CRAZY GROWTH! INITIAL CRAZY 2 HISTORY START-UP LAUNCH GROWTH GROWTH 3 # unique logins TOTAL ACTIVE PLAYERS 4 4.2M 5 NOV. 2011 1.5MM JULY 2011 6 7 time
  • 13. 1 THE BREAKING POINT INITIAL CRAZY BREAKING 2 HISTORY START-UP LAUNCH GROWTH GROWTH POINT 3 • NA Data Warehouse reached a breaking point 9 months ago – 24 hours of data took 24.5 hours to ETL • We couldn’t handle… 4 – multiple environments in a vertical MySQL instance – a single environment in a vertical MySQL instance 5 • We needed to change! 6 7
  • 14. 1 2 3 4 SOLUTION 5 6 7
  • 15. 1 WHY HADOOP? 2 COST EFFECTIVE Expanding rapidly, so CAPEX was a concern 3 SOLUTION SCALABLE Handles massive data sets and diverse data sets 4 (both structured and unstructured) OPEN SOURCE 5 Our engineers can dive into problems 6 SPEED OF EXECUTION We needed to move fast! 7
  • 16. 1 HIGH LEVEL ARCHITECTURE – CURRENT Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 + Warehouse LoL Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 17. 1 WHAT MAKES UP OUR ETL Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 + Warehouse LoL Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 18. 1 WHAT MAKES UP OUR ETL 2 3 SOLUTION Pentaho All of these orchestrated by Pentaho + Custom ETL 4 + Sqoop We use Sqoop for staging data only 5 Then dynamically partition data into Hive tables 6 7
  • 19. 1 WHAT MAKES UP OUR ETL Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 + Warehouse LoL Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 20. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 3 SOLUTION Data Temp Staging 4 Area 5 1 Data written into temp staging area 6 Prevents analysts from running queries out of partially written tables Helps us leverage Hive’s merging and compression settings 7
  • 21. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 Partition A 3 SOLUTION Partition B Data Temp Staging Partition C 4 Area Partition D Partition E 5 2 Hive dynamically inserts data into 6 appropriate partitions According to value generated for partition key in the target table 7 Non-existent partitions will be created by Hive
  • 22. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 Partition A1 Partition A Partition A2 Partition A3 Partition B1 3 SOLUTION Partition B Partition B2 Data Temp Partition B3 Partition C1 Staging Partition C Partition C2 Partition C3 4 Area Partition D1 Partition D Partition D2 Partition D3 Partition E1 Partition E Partition E2 Partition E3 5 3 Layered partitioning = very helpful for 6 region-based partitioning Helps maintain one table definition across regions 7
  • 23. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 3 SOLUTION Data Temp Staging 4 Area 5 TO OPTIMIZE DISK IO FOR USER QUERIES, 6 WE ENABLED COMPRESSION 7
  • 24. Hive Data Warehouse 1 Data Temp 2 Staging Area 3 SOLUTION WHY COMPRESSION? We have 24 cores and disk IO is always the bottleneck, 4 so compression is essential WHY SNAPPY COMPRESSED 5 SEQUENCEFILE BLOCKS? Lots of “why Snappy” discussion on the interwebs already SequenceFile can be split by Hadoop and can run 6 multiple maps in parallel Block compression yields better compression ratio while keeping the file splittable; this block size is configurable 7
  • 25. 1 WHAT WE DO IN HIVE 2 3 SOLUTION 4 Hive Data Warehouse 5 We ETL data from OLTP MySQL slaves daily 6 7
  • 26. 1 WHAT WE DO IN HIVE 2 Our analysts shoot Hive queries every day 3 SOLUTION 4 Hive Data Warehouse 5 Translating to 1000s of MR jobs daily 6 7
  • 27. 1 WHAT WE DO IN HIVE 2 We have some pretty large tables: 3 SOLUTION 4 e.g., one with 50,795,997,734 rows Hive Data Warehouse 5 We use metrics derived from Hive queries to 6 improve our matchmaking system and player behavior 7
  • 28. 1 WHAT DID WE LEARN FROM ETL? 2 • If you use custom ETL, keep an eye out for block distribution • DRY: Re-inventing the wheel is not a good idea 3 SOLUTION – Invest time in researching proper tools that suit your needs – Tons of options for ETL and workflow management – Just because company X is using a particular ETL or workflow 4 management tool, it may or may not work effectively for you 5 6 7
  • 29. 1 WHY TABLEAU? Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 + Warehouse LoL Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 30. 1 WHY TABLEAU? Business 2 Analyst • We needed to democratize access for Tableau non-technical folks 3 SOLUTION – Design – Execs MySQL – Player Support 4 • Great visualization capability • Easy to work with 5 • Has a Hive connector* 6 7
  • 31. 1 LEAGUE OF LEGENDS GAMEPLAY BASICS 2 3 SOLUTION 4 5 6 7
  • 32. 1 2 3 SOLUTION 4 5 6 7
  • 33. 1 2 3 SOLUTION 4 5 6 7
  • 34. 1 2 3 USECASE # 1 4 THE STORY OF SHEN 5 6 7
  • 35. 1 2 3 WAIT, SO WHAT’S A YORDLE? • Yordles = very cute race of champions in League of Legends 4 • We track Yordles (and the rest of our champions) because game balance is exceptionally important 5 6 7
  • 36. 1 DESIGN BALANCE IS IMPORTANT 2 • Highly competitive game • Updated every 2-3 weeks 3 – New champions – New items 4 USECASE #1 • Game is a living, breathing service that’s always in motion • Have to maintain a level playing field 5 6 7
  • 37. 1 QUICKLY REACTING TO CHANGES 2 = wins 3 USECASE 4 #1 5 6 total plays 7 time
  • 38. 1 HOW DID WE CREATE THAT? 2 3 USECASE 4 #1 5 6 7 *All logos are trademarks of respective owners
  • 39. 1 WHY NOT JUST HIVE? 2 3 USECASE 4 #1 5 6 7 *All logos are trademarks of respective owners
  • 40. 1 WHY NOT JUST HIVE? 2 3 HIVE IS FOR MASSIVE JOBS USECASE 4 #1 5 6 7
  • 41. 1 HIVE TO MYSQL TRANSFORMATION 2 • Many of our stakeholders use Tableau • Transformed required data into cubes for direct Tableau consumption using Pentaho 3 • Initially experimented with Hive-to-Tableau connector – Had issues, e.g., triggering MR jobs for every change and non- USECASE persistent Hive-Server 4 #1 5 6 7
  • 42. 1 WE WANTED TO KNOW MORE ABOUT… 2 Which champions and skins are popular across all regions? 3 USECASE 4 #1 What are the win-rates of champions across all regions? 5 Are better players choosing different champions? 6 7
  • 43. 1 WE CREATED CUBES OF AGGREGATED DATA 2 win rates 3 USECASE 4 #1 5 6 champions 7
  • 44. 1 HOW WE DID IT: TRANSFORMATION++ 2 Massive tables reside in Hive 3 Hive MySQL TABLEAU transformation transformed creates into cubes for USECASE 4 #1 dimension tables Tableau consumption 5 6 Some dimension tables moved to join with other fact tables in Hive 7
  • 45. 1 WHY DID WE GO THIS ROUTE? 2 3 Not good for slowly changing MySQL is not awesome for joining dimensions massive tables USECASE • No automatic primary key 4 #1 generation • Can’t regenerate dimension table quickly enough since it requires a full-table scan 5 6 • Decided to use best of both worlds • Also leveraged map-side joins and distributed cache 7
  • 46. 1 2 3 USECASE #2 4 MATCHMAKING AND REGIONAL METRICS 5 6 7
  • 47. 1 FIRST, SOME CONTEXT 2 • League of Legends is global in scale, with players logging in from >145 countries in a typical day 3 • No-fee play means very low barrier to play • Players often play on multiple environments regularly (e.g. EU players on NA environments and vice versa) 4 • Same features and mechanics deployed in all territories • It’s vitally important that we understand game 5 USECASE #2 performance metrics by geography and region 6 7
  • 48. 1 MATCHMAKING 2 • One of the most important features outside of gameplay • Like a dating service, the objective is to match people up; 3 • Number of different queues that players can line up in, depending on the type of match they’re looking for 4 USECASE 5 #2 6 Critical that this system is balanced balanced and able to create good matches quickly 7
  • 49. 1 MATCHMAKING – IS IT WORKING? 2 • Matchmaking algorithm based on modified Elo system • Inspecting the “curve” of these scores: 3 – Should show a similar distribution in all regions – May show interesting trends, such as win/lose ratios 4 USECASE 5 #2 6 7
  • 50. 1 MATCHMAKING – IS IT WORKING? 2 % players ELO DISTRIBUTION GRAPH 3 4 USECASE 5 #2 6 7 ELO score
  • 51. 1 WHAT WAS NEEDED TO GENERATE IT? 1 2 Had to join massive tables with session and player data MASSIVE MASSIVE MASSIVE 3 TABLE TABLE TABLE WITH WITH WITH SESSION PLAYER GAME 4 DATA DATA DATA 2 USECASE Needed to lookup and range-query IP-addresses in same join 5 #2 Required for many region-based metrics 6 7
  • 52. 1 LIMITATIONS OF HIVE 2 Hive 3 4 No good indexing Not efficient for mechanism in our lookup and range version queries USECASE 5 #2 6 This made region-based queries computationally difficult 7
  • 53. 1 SOLUTION 2 Hive 3 leveraged open-source 4 libraries online GeoIP UDFs USECASE UDFs = user-defined functions that one 5 #2 can add to the Hive interpreter 6 7
  • 54. 1 2 3 4 LESSONS 5 6 7
  • 55. 1 2 3 4 5 6 LESSONS 7
  • 56. 1 2 3 4 5 6 LESSONS 7
  • 57. 1 2 3 4 5 6 LESSONS 7
  • 58. 1 2 3 4 5 6 LESSONS 7
  • 59. 1 2 3 4 5 6 LESSONS 7
  • 60. 1 2 3 4 THE FUTURE 5 6 7
  • 61. 1 OUR IMMEDIATE GOALS 2 • Shorten time to insight • Increase depth of insight 3 • Enable data analysis for client-side features • Log ingestion and analysis 4 • Flexible auditing framework • International data infrastructure 5 6 THE 7 FUTURE
  • 62. 1 CHALLENGE: MAKE IT GLOBAL 2 • Data centers across the globe since latency has huge effect on gameplay  log data scattered around the world 3 • Large presence in Asia -- some areas (e.g., PH) have bandwidth challenges or bandwidth is expensive 4 5 6 THE 7 FUTURE
  • 63. 1 CHALLENGE: WE HAVE BIG DATA STRUCTURED DATA 2 500G DAILY APPLICATION AND OPERATIONAL LOGS 3 4.5TB DAILY 4 OFFICIAL LOL SITE TRAFFIC 6MM HITS DAILY 5 RIOT YOUTUBE CHANNEL 1.7MM SUBSCRIBERS 270+MM VIEWS 6 + chat logs + detailed gameplay event tracking 7 THE FUTURE + so on….
  • 64. 1 OUR AUDACIOUS GOALS 2 Build a world-class data and analytics organization • Deeply understand players across the globe • Apply that understanding to improve games for players 3 • Deeply understand our entire ecosystem, including social media 4 Have ability to identify, understand and react to meaningful trends in real time 5 Have deep, real-time understanding of our systems from player experience and operational standpoints 6 THE 7 FUTURE
  • 65. 1 SHAMELESS HIRING PLUG 2 • Like most everybody else at this conference… we’re hiring! 3 • The Riot Manifesto Player experience first 4 Challenge convention Focus on talent and team 5 Take play seriously Stay hungry, stay humble 6 THE 7 FUTURE
  • 66. 1 SHAMELESS HIRING PLUG 2 3 4 5 6 THE And yes, you can play games at work. 7 FUTURE It’s encouraged!
  • 67. THANK YOU! QUESTIONS? BARRY LIVINGSTON & DANI RAYAN blivingston@riotgames.com drayan@riotgames.com

Hinweis der Redaktion

  1. Andy was a designer and analyst that began our data warehouseHe was the only resource focused on building out the DW and our analytical capability for the first year of its existenceHe’s also a really nice guy!He made an excellent point, and one that I want to carry through this presentation.
  2. Times where there were 20% month over month growth in a single environment2 environments w/~200K CCU to 16 environments and 1.3million CCU in the space of 12 monthsResources were focused on getting our operational systems to scale along with demand
  3. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  4. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  5. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  6. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  7. Before we talk about our first usecase, we need to give you a little bit of context about the game and gameplay (super high level)Session Based Team play - basic idea is like “capture the flag” – MOBA!If you die, you re-spawn after a certain amount of time (that time grows as the game progresses)Lots of strategy to the game
  8. Each player “summons” a Champion that he playsEach champion has very different abilities
  9. All players begin at level 1 in a gameplay session and can progress to a maximum of level 18Gain abilitiesGain gold and use that gold to equip your player
  10. Shen is not a Yordle. Shen is a ninja
  11. Early this year, Shen was underpoweredWe decided to fix himHowever, we accidentally made him highly overpoweredWe recognized this fact quickly, and a fix was in place within 2 days
  12. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  13. Shen is not a Yordle. Shen is a ninja
  14. For international player populations on the North American environment
  15. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)
  16. One table has hand-entered values that lives only in MySQL.Hive cannot generate primary key our-of-the box, we need to associate fact with dimensions in further steps.For intance, we introduce new champions, skins, (mysql ) elo range expands, game types etc (in Hive)