SlideShare ist ein Scribd-Unternehmen logo
1 von 67
RIOT GAMES
TRACKING YORDLES THROUGH 630 MILLION
  MINUTES OF HARDCORE GAMING A DAY



        BARRY LIVINGSTON & DANI RAYAN
1


2


3


4
    INTRODUCTION
5


6


7
1   INTRO
            ABOUT THE SPEAKERS
2


3


4


5


6


7
1   INTRO
            THIS PRESENTATION IS ABOUT…
2
            •    The history of Riot’s data warehouse
3           •    Why we incorporated Hadoop
            •    Our high level architecture
4           •    Usecases Hadoop has enabled
            •    Lessons learned
5
            •    Where we’re headed
6


7
1   INTRO
                         WHO?
2
      •  Developer and publisher of League of Legends
3     •  Founded 2006 by gamers for gamers
      •  Player experience focused – requires data
4


5


6


7
1   INTRO




2
            4.2 MILLION     32.5 MILLION
                    DAILY   REGISTERED

3


4


5
            1.3 MILLION     11.5 MILLION
              CONCURRENT    MONTHLY
6


7
1


2


3


4
    HISTORY
5


6


7
1
              MEET ANDY HO
2   HISTORY

                       “With enough data,
                      even simple questions
3
                         become difficult
                           questions”
4


5


6


7
1                     SCRAPPY START-UP PHASE

2   HISTORY    START-UP



3
              •  One initial beta environment for North America
              •  Queries done directly off production MySQL slaves
4             •  This is obviously not a good practice


5


6


7
1                   AROUND OUR INITIAL LAUNCH
                                 INITIAL
2   HISTORY        START-UP
                                LAUNCH



3             •    Moved to a dedicated, single MySQL instance for the DW
              •    Data ETL’d from production slaves into this instance (by Andy)
4             •    Queries run in MySQL (by Andy)
              •    Reporting was done in Excel (by Andy)

5


6                             This worked great!

7
1                  THEN WE STARTED GROWING
                                INITIAL
2   HISTORY     START-UP
                               LAUNCH
                                              GROWTH



3             •  Resources were focused elsewhere
                 –  We had competition
                 –  Focused on producing features and scaling our systems
4             •  Opened EU environment June 2010
              •  Needed something speedy – created parallel installation
                 –  This was bad
5
                 –  But we could still get the answers we wanted


6


7
1                       AND THEN – CRAZY GROWTH!
                                   INITIAL                CRAZY
2   HISTORY            START-UP
                                  LAUNCH
                                             GROWTH
                                                         GROWTH



3
     # unique logins




                                    TOTAL ACTIVE PLAYERS


4

                                                           4.2M
5                                                        NOV. 2011

                                             1.5MM
                                             JULY 2011

6


7
                                                                     time
1                          THE BREAKING POINT
                              INITIAL                      CRAZY        BREAKING
2   HISTORY     START-UP
                             LAUNCH
                                           GROWTH
                                                          GROWTH          POINT



3             •  NA Data Warehouse reached a breaking point 9 months ago
                –  24 hours of data took 24.5 hours to ETL
              •  We couldn’t handle…
4               –  multiple environments in a vertical MySQL instance
                –  a single environment in a vertical MySQL instance
5             •  We needed to change!


6


7
1


2


3


4
    SOLUTION
5


6


7
1
                      WHY HADOOP?
2                           COST EFFECTIVE
                Expanding rapidly, so CAPEX was a concern

3   SOLUTION

                                SCALABLE
               Handles massive data sets and diverse data sets
4                   (both structured and unstructured)


                             OPEN SOURCE
5
                   Our engineers can dive into problems

6                       SPEED OF EXECUTION
                          We needed to move fast!
7
1
     HIGH LEVEL ARCHITECTURE – CURRENT
                                                                       Business
2                Audit    Plat
                                                                       Analyst



                 LoL
                                                                     Tableau


3   SOLUTION   NORTH AMERICA

                                   Pentaho
                 Audit    Plat        +
                                  Custom ETL   Hive Data   Pentaho   MySQL
4                LoL                  +
                                               Warehouse


                                    Sqoop
               EUROPE


5
                 Audit     Plat



                  LoL

6                                               Analysts
               KOREA




7
1
                        WHAT MAKES UP OUR ETL
                                                                       Business
2                Audit    Plat
                                                                       Analyst



                 LoL
                                                                     Tableau


3   SOLUTION   NORTH AMERICA

                                   Pentaho
                 Audit    Plat        +
                                  Custom ETL   Hive Data   Pentaho   MySQL
4                LoL                  +
                                               Warehouse


                                    Sqoop
               EUROPE


5
                 Audit     Plat



                  LoL

6                                               Analysts
               KOREA




7
1
                WHAT MAKES UP OUR ETL
2


3   SOLUTION

                Pentaho
                                          All of these orchestrated by Pentaho
                   +
               Custom ETL
4                  +
                 Sqoop
                            We use Sqoop for staging data only

5
                            Then dynamically partition data into Hive tables


6


7
1
                        WHAT MAKES UP OUR ETL
                                                                       Business
2                Audit    Plat
                                                                       Analyst



                 LoL
                                                                     Tableau


3   SOLUTION   NORTH AMERICA

                                   Pentaho
                 Audit    Plat        +
                                  Custom ETL   Hive Data   Pentaho   MySQL
4                LoL                  +
                                               Warehouse


                                    Sqoop
               EUROPE


5
                 Audit     Plat



                  LoL

6                                               Analysts
               KOREA




7
1
                      WHAT MAKES UP OUR ETL
                                                           Hive Data Warehouse
2


3   SOLUTION
                   Data         Temp
                               Staging
4                               Area



5
               1
                   Data written into
                   temp staging area
6
                             Prevents analysts from running queries out of partially written tables

                             Helps us leverage Hive’s merging and compression settings
7
1
                         WHAT MAKES UP OUR ETL
                                                             Hive Data Warehouse
2
                                                                     Partition A

3   SOLUTION                                                         Partition B
                      Data         Temp
                                  Staging                            Partition C

4                                  Area
                                                                     Partition D


                                                                     Partition E
5
               2
                     Hive dynamically
                     inserts data into
6                  appropriate partitions

                                According to value generated for partition key in the target table

7                               Non-existent partitions will be created by Hive
1
                     WHAT MAKES UP OUR ETL
                                                          Hive Data Warehouse
2
                                                                                  Partition A1
                                                                  Partition A     Partition A2
                                                                                  Partition A3
                                                                                  Partition B1
3   SOLUTION                                                      Partition B     Partition B2
                   Data         Temp
                                                                                  Partition B3
                                                                                  Partition C1
                               Staging                           Partition C      Partition C2
                                                                                  Partition C3

4                               Area                                              Partition D1
                                                                 Partition D      Partition D2
                                                                                  Partition D3
                                                                                  Partition E1
                                                                  Partition E     Partition E2
                                                                                  Partition E3
5
               3
                 Layered partitioning
                   = very helpful for
6              region-based partitioning

                             Helps maintain one table definition across regions
7
1
                  WHAT MAKES UP OUR ETL
                                      Hive Data Warehouse
2


3   SOLUTION
                Data    Temp
                       Staging
4                       Area



5
               TO OPTIMIZE DISK IO FOR USER QUERIES,
6
               WE ENABLED COMPRESSION


7
Hive Data Warehouse
1

                   Data    Temp
2                         Staging
                           Area




3   SOLUTION

               WHY COMPRESSION?
                    We have 24 cores and disk IO is always the bottleneck,
4                   so compression is essential

               WHY SNAPPY COMPRESSED
5              SEQUENCEFILE BLOCKS?
                    Lots of “why Snappy” discussion on the interwebs already

                    SequenceFile can be split by Hadoop and can run
6
                    multiple maps in parallel
                    Block compression yields better compression ratio while
                    keeping the file splittable; this block size is configurable
7
1
                  WHAT WE DO IN HIVE
2


3   SOLUTION




4
                              Hive Data
                              Warehouse

5

               We ETL data from OLTP MySQL slaves daily
6


7
1
                   WHAT WE DO IN HIVE
2                         Our analysts shoot
                            Hive queries
                              every day
3   SOLUTION




4
                              Hive Data
                              Warehouse

5

               Translating to 1000s of MR jobs daily
6


7
1
                      WHAT WE DO IN HIVE
2
                  We have some pretty large tables:

3   SOLUTION




4                                                     e.g., one with 50,795,997,734 rows
                                   Hive Data
                                   Warehouse

5

                   We use metrics derived from Hive queries to
6              improve our matchmaking system and player behavior


7
1
               WHAT DID WE LEARN FROM ETL?
2              •  If you use custom ETL, keep an eye out for block distribution
               •  DRY: Re-inventing the wheel is not a good idea
3   SOLUTION       –  Invest time in researching proper tools that suit your needs
                   –  Tons of options for ETL and workflow management
                   –  Just because company X is using a particular ETL or workflow
4                     management tool, it may or may not work effectively for you


5


6


7
1
                                  WHY TABLEAU?
                                                                        Business
2                Audit    Plat
                                                                        Analyst



                 LoL
                                                                      Tableau


3   SOLUTION   NORTH AMERICA

                                    Pentaho
                 Audit    Plat         +
                                   Custom ETL   Hive Data   Pentaho   MySQL
4                LoL                   +
                                                Warehouse


                                     Sqoop
               EUROPE


5
                 Audit     Plat



                  LoL

6                                                Analysts
               KOREA




7
1
                            WHY TABLEAU?
                 Business
2                Analyst


                              •  We needed to democratize access for
               Tableau
                                 non-technical folks
3   SOLUTION
                                  –  Design
                                  –  Execs
               MySQL              –  Player Support
4
                              •  Great visualization capability
                              •  Easy to work with
5                             •  Has a Hive connector*


6


7
1
     LEAGUE OF LEGENDS GAMEPLAY BASICS
2


3   SOLUTION




4


5


6


7
1


2


3   SOLUTION




4


5


6


7
1


2


3   SOLUTION




4


5


6


7
1


2


3        USECASE # 1
4   THE STORY OF SHEN
5


6


7
1


2


3   WAIT, SO WHAT’S A YORDLE?
    •  Yordles = very cute race of champions in League of Legends
4   •  We track Yordles (and the rest of our champions) because game
       balance is exceptionally important
5


6


7
1
              DESIGN BALANCE IS IMPORTANT
2
              •  Highly competitive game
              •  Updated every 2-3 weeks
3
                 –  New champions
                 –  New items
4
    USECASE
       #1
              •  Game is a living, breathing service that’s always in motion
              •  Have to maintain a level playing field

5


6


7
1
                   QUICKLY REACTING TO CHANGES
2                                          = wins




3


    USECASE
4      #1




5


6
     total plays




7                                              time
1
              HOW DID WE CREATE THAT?
2


3


    USECASE
4      #1




5


6


7
                         *All logos are trademarks of respective owners
1
              WHY NOT JUST HIVE?
2


3


    USECASE
4      #1




5


6


7
                       *All logos are trademarks of respective owners
1
                  WHY NOT JUST HIVE?
2


3

              HIVE IS FOR
              MASSIVE JOBS
    USECASE
4      #1




5


6


7
1
          HIVE TO MYSQL TRANSFORMATION
2             •  Many of our stakeholders use Tableau
              •  Transformed required data into cubes for direct Tableau
                 consumption using Pentaho
3             •  Initially experimented with Hive-to-Tableau connector
                –  Had issues, e.g., triggering MR jobs for every change and non-
    USECASE
                   persistent Hive-Server
4      #1




5


6


7
1
    WE WANTED TO KNOW MORE ABOUT…
2

              Which champions and skins are popular across all regions?
3


    USECASE
4      #1
                   What are the win-rates of champions across all regions?

5

                Are better players choosing different champions?
6


7
1
     WE CREATED CUBES OF AGGREGATED DATA
2




                                               win rates
3


    USECASE
4      #1




5


6

                                   champions
7
1
     HOW WE DID IT: TRANSFORMATION++
2
              Massive tables
              reside in Hive
3
                  Hive                                     MySQL                         TABLEAU
                                   transformation                      transformed
                                       creates                        into cubes for
    USECASE
4      #1
                                  dimension tables                 Tableau consumption




5


6                              Some dimension tables
                                 moved to join with
                               other fact tables in Hive

7
1
              WHY DID WE GO THIS ROUTE?
2


3
              Not good for slowly changing       MySQL is not awesome for joining
              dimensions                         massive tables
    USECASE   •  No automatic primary key
4      #1
                 generation
              •  Can’t regenerate dimension
                 table quickly enough since it
                 requires a full-table scan
5


6                  •  Decided to use best of both worlds
                   •  Also leveraged map-side joins and distributed cache
7
1


2


3
         USECASE #2
4   MATCHMAKING AND
    REGIONAL METRICS
5


6


7
1
                      FIRST, SOME CONTEXT
2             •  League of Legends is global in scale, with players
                  logging in from >145 countries in a typical day
3
              •  No-fee play means very low barrier to play
              •  Players often play on multiple environments regularly
                 (e.g. EU players on NA environments and vice versa)
4             •  Same features and mechanics deployed in all territories
              •  It’s vitally important that we understand game
5   USECASE
       #2
                  performance metrics by geography and region


6


7
1
                          MATCHMAKING
2        •  One of the most important features outside of gameplay
         •  Like a dating service, the objective is to match people up;
3        •  Number of different queues that players can line up in, depending
            on the type of match they’re looking for

4


    USECASE
5      #2




6               Critical that this system is balanced
                                             balanced
              and able to create good matches quickly
7
1
              MATCHMAKING – IS IT WORKING?
2             •  Matchmaking algorithm based on modified Elo system
              •  Inspecting the “curve” of these scores:
3                 –  Should show a similar distribution in all regions
                  –  May show interesting trends, such as win/lose ratios
4


    USECASE
5      #2




6


7
1
              MATCHMAKING – IS IT WORKING?
2
          % players
                      ELO DISTRIBUTION GRAPH


3


4


    USECASE
5      #2




6


7
                                               ELO score
1
      WHAT WAS NEEDED TO GENERATE IT?
              1
2                 Had to join massive tables with session and player data


                      MASSIVE              MASSIVE               MASSIVE
3                      TABLE                TABLE                 TABLE
                       WITH                 WITH                  WITH
                     SESSION               PLAYER                GAME
4                      DATA                 DATA                  DATA

              2
    USECASE       Needed to lookup and range-query IP-addresses in same join
5      #2

                              Required for many region-based metrics

6


7
1
                  LIMITATIONS OF HIVE
2
                                 Hive

3


4             No good indexing           Not efficient for
              mechanism in our          lookup and range
                  version                    queries
    USECASE
5      #2




6    This made region-based queries computationally difficult

7
1
                                 SOLUTION
2
                                     Hive

3
              leveraged
              open-source
4             libraries online
                                  GeoIP UDFs

    USECASE                         UDFs = user-defined functions that one
5      #2
                                        can add to the Hive interpreter


6


7
1


2


3


4
    LESSONS
5


6


7
1


2


3                  LESSON #1
                                     a
              Analysts are greedy in
4
                 mid-sized cluster
                                              ited
              Enable a scheduler with cap-lim
                      resources for users
5


6   LESSONS




7
1


2


3                          LESSON #2
                                        ase
              Configuration follows usec
                                         ld                 shou
              •     Hardware   profiles and file-structure
4
                    match the workload
                                                               with
              •     Enabling com   pression to trade disk IO
                    CPU helped us
5                                                     hines is
               •    Large r block size in beefy mac
                                                               MMV
                    performant. 25   6MB worked for us àY
                                                        directory
6              •     Levera ge all the spindles. Stripe
                                                arameters
    LESSONS
                     access for appropriate p

7
1


2


3                       LESSON #3
                                      nd
                Cover your downside (a
4
                     your backside)
                                              r and
              •  RA  ID Namenode, Jobtracke              ode)
                  Secondary N  amenode (Checkpoint N
                                       rough http
5             •  Backup namespace th
                                                 crucial –
               •  Backu ps and Trash configs are
                                                    u a warning
                  remem  ber “rmr” doesn’t give yo
6   LESSONS




7
1


2


3                        LESSON #4
              Plan capac ity for at least one
4
                       year ahead
                                                            ea
              •  Instead of usi   ng production cluster, hav
                                                erimenting new
                  dar k launch cluster for exp
5                 usecases
                                                     an other
               •  In-pla ce upgrades is trickier th           ibility
                  enterprise so    ftware since wire-compat
                                                 istros
6   LESSONS        is n ot available in current d



7
1


2


3                                  LESSON #5
                         Automate for reality! ed
                                              rc
                                            pen-sou
               We wrote chef recipes and o
4                                                   es/
                them at h ttps://github.com/RiotGam
                          cloudera-cookbook
5


6   LESSONS

                                                                                                                  may not
                                                                                         2.0 (the "License"); you
                                                                Apache License, Version       y of the License at htt
                                                                                                                       p://
                                      mes. Licensed under the                 y obtain a cop
              Copyright 2012 Riot Ga               with the License. You ma                                to in writing,
                                      compliance                                                   agreed
              use this file except in                                      by applicable law or
                                               SE-2.0 Unless required               BASIS, WITHOUT WARR
                                                                                                               ANTIES OR
              www.a  pache.org/licenses/LICEN                       d on an "AS IS"
                                       er the License is distribute                              for the specific language
7             software distributed und
              CONDITIONS OF AN
                                       Y KIND, either expres
                                          limitations under the
                                                                s or implied. See the Lic
                                                                 License.
                                                                                           ense

               governing permissions and
1


2


3


4
    THE FUTURE
5


6


7
1
                    OUR IMMEDIATE GOALS
2
             •    Shorten time to insight
             •    Increase depth of insight
3
             •    Enable data analysis for client-side features
             •    Log ingestion and analysis
4
             •    Flexible auditing framework
             •    International data infrastructure
5


6


      THE
7   FUTURE
1
             CHALLENGE: MAKE IT GLOBAL
2        •  Data centers across the globe since latency has huge effect on
            gameplay à log data scattered around the world
         •  Large presence in Asia -- some areas (e.g., PH) have bandwidth
3
            challenges or bandwidth is expensive

4


5


6


      THE
7   FUTURE
1
             CHALLENGE: WE HAVE BIG DATA
                   STRUCTURED DATA
2
                   500G DAILY
                   APPLICATION AND OPERATIONAL LOGS
3
                   4.5TB DAILY
4                  OFFICIAL LOL SITE TRAFFIC
                   6MM HITS DAILY
5                  RIOT YOUTUBE CHANNEL
                   1.7MM SUBSCRIBERS
                   270+MM VIEWS
6
                   +  chat logs
                   +  detailed gameplay event tracking
7     THE
    FUTURE
                   +  so on….
1
               OUR AUDACIOUS GOALS
2
             Build a world-class data and analytics organization
             •  Deeply understand players across the globe
             •  Apply that understanding to improve games for players
3
             •  Deeply understand our entire ecosystem, including social media


4            Have ability to identify, understand and react to
             meaningful trends in real time
5
             Have deep, real-time understanding of our systems
             from player experience and operational standpoints
6


      THE
7   FUTURE
1
                  SHAMELESS HIRING PLUG
2            •  Like most everybody else at this conference… we’re
                hiring!
             •  The Riot Manifesto
3
                        Player experience first
4                       Challenge convention

                        Focus on talent and team
5
                        Take play seriously

                        Stay hungry, stay humble
6


      THE
7   FUTURE
1
             SHAMELESS HIRING PLUG
2


3


4


5


6


      THE
                   And yes, you can play games at work.
7   FUTURE
                                        It’s encouraged!
THANK YOU!
QUESTIONS?
     BARRY LIVINGSTON        &        DANI RAYAN
 blivingston@riotgames.com       drayan@riotgames.com

Weitere ähnliche Inhalte

Was ist angesagt?

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
DataStax
 

Was ist angesagt? (20)

KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
New Features in OBIEE 12c
New Features in OBIEE 12c New Features in OBIEE 12c
New Features in OBIEE 12c
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
SQL Server Upgrade and Consolidation - Methodology and Approach
SQL Server Upgrade and Consolidation - Methodology and ApproachSQL Server Upgrade and Consolidation - Methodology and Approach
SQL Server Upgrade and Consolidation - Methodology and Approach
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
Graph db
Graph dbGraph db
Graph db
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 

Ähnlich wie Big Data at Riot Games

Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011
IPv6no
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
DATAVERSITY
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
Dmitry Buzdin
 

Ähnlich wie Big Data at Riot Games (20)

Big Data At Riot Games - Hadoop Summit'12
Big Data At Riot Games - Hadoop Summit'12Big Data At Riot Games - Hadoop Summit'12
Big Data At Riot Games - Hadoop Summit'12
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Scale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the ProgrammerScale by the Bay 2019 Reprogramming the Programmer
Scale by the Bay 2019 Reprogramming the Programmer
 
Scala 2.10.0 (english version)
Scala 2.10.0 (english version)Scala 2.10.0 (english version)
Scala 2.10.0 (english version)
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011Ron Broersma dren-stavanger-22 nov2011
Ron Broersma dren-stavanger-22 nov2011
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
Peer council 2013_presentation
Peer council 2013_presentationPeer council 2013_presentation
Peer council 2013_presentation
 
Oozie @ Riot Games
Oozie @ Riot GamesOozie @ Riot Games
Oozie @ Riot Games
 
AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例AWSによるソーシャルアプリ運用事例
AWSによるソーシャルアプリ運用事例
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
Christo kutrovsky oracle rac solving common scalability problems
Christo kutrovsky   oracle rac solving common scalability problemsChristo kutrovsky   oracle rac solving common scalability problems
Christo kutrovsky oracle rac solving common scalability problems
 
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning TalksKiller Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
Killer Reporting with JIRA Dashboards - Atlassian Summit 2010 - Lightning Talks
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
 
Bank Data Frank Peterson DB2 10-Early_Experiences_pdf
Bank Data   Frank Peterson DB2 10-Early_Experiences_pdfBank Data   Frank Peterson DB2 10-Early_Experiences_pdf
Bank Data Frank Peterson DB2 10-Early_Experiences_pdf
 
VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011VMWare Winnipeg Forum - 2011
VMWare Winnipeg Forum - 2011
 
Improving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and TestingImproving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and Testing
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Big Data at Riot Games

  • 1. RIOT GAMES TRACKING YORDLES THROUGH 630 MILLION MINUTES OF HARDCORE GAMING A DAY BARRY LIVINGSTON & DANI RAYAN
  • 2. 1 2 3 4 INTRODUCTION 5 6 7
  • 3. 1 INTRO ABOUT THE SPEAKERS 2 3 4 5 6 7
  • 4. 1 INTRO THIS PRESENTATION IS ABOUT… 2 •  The history of Riot’s data warehouse 3 •  Why we incorporated Hadoop •  Our high level architecture 4 •  Usecases Hadoop has enabled •  Lessons learned 5 •  Where we’re headed 6 7
  • 5. 1 INTRO WHO? 2 •  Developer and publisher of League of Legends 3 •  Founded 2006 by gamers for gamers •  Player experience focused – requires data 4 5 6 7
  • 6. 1 INTRO 2 4.2 MILLION 32.5 MILLION DAILY REGISTERED 3 4 5 1.3 MILLION 11.5 MILLION CONCURRENT MONTHLY 6 7
  • 7. 1 2 3 4 HISTORY 5 6 7
  • 8. 1 MEET ANDY HO 2 HISTORY “With enough data, even simple questions 3 become difficult questions” 4 5 6 7
  • 9. 1 SCRAPPY START-UP PHASE 2 HISTORY START-UP 3 •  One initial beta environment for North America •  Queries done directly off production MySQL slaves 4 •  This is obviously not a good practice 5 6 7
  • 10. 1 AROUND OUR INITIAL LAUNCH INITIAL 2 HISTORY START-UP LAUNCH 3 •  Moved to a dedicated, single MySQL instance for the DW •  Data ETL’d from production slaves into this instance (by Andy) 4 •  Queries run in MySQL (by Andy) •  Reporting was done in Excel (by Andy) 5 6 This worked great! 7
  • 11. 1 THEN WE STARTED GROWING INITIAL 2 HISTORY START-UP LAUNCH GROWTH 3 •  Resources were focused elsewhere –  We had competition –  Focused on producing features and scaling our systems 4 •  Opened EU environment June 2010 •  Needed something speedy – created parallel installation –  This was bad 5 –  But we could still get the answers we wanted 6 7
  • 12. 1 AND THEN – CRAZY GROWTH! INITIAL CRAZY 2 HISTORY START-UP LAUNCH GROWTH GROWTH 3 # unique logins TOTAL ACTIVE PLAYERS 4 4.2M 5 NOV. 2011 1.5MM JULY 2011 6 7 time
  • 13. 1 THE BREAKING POINT INITIAL CRAZY BREAKING 2 HISTORY START-UP LAUNCH GROWTH GROWTH POINT 3 •  NA Data Warehouse reached a breaking point 9 months ago –  24 hours of data took 24.5 hours to ETL •  We couldn’t handle… 4 –  multiple environments in a vertical MySQL instance –  a single environment in a vertical MySQL instance 5 •  We needed to change! 6 7
  • 14. 1 2 3 4 SOLUTION 5 6 7
  • 15. 1 WHY HADOOP? 2 COST EFFECTIVE Expanding rapidly, so CAPEX was a concern 3 SOLUTION SCALABLE Handles massive data sets and diverse data sets 4 (both structured and unstructured) OPEN SOURCE 5 Our engineers can dive into problems 6 SPEED OF EXECUTION We needed to move fast! 7
  • 16. 1 HIGH LEVEL ARCHITECTURE – CURRENT Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 LoL + Warehouse Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 17. 1 WHAT MAKES UP OUR ETL Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 LoL + Warehouse Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 18. 1 WHAT MAKES UP OUR ETL 2 3 SOLUTION Pentaho All of these orchestrated by Pentaho + Custom ETL 4 + Sqoop We use Sqoop for staging data only 5 Then dynamically partition data into Hive tables 6 7
  • 19. 1 WHAT MAKES UP OUR ETL Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 LoL + Warehouse Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 20. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 3 SOLUTION Data Temp Staging 4 Area 5 1 Data written into temp staging area 6 Prevents analysts from running queries out of partially written tables Helps us leverage Hive’s merging and compression settings 7
  • 21. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 Partition A 3 SOLUTION Partition B Data Temp Staging Partition C 4 Area Partition D Partition E 5 2 Hive dynamically inserts data into 6 appropriate partitions According to value generated for partition key in the target table 7 Non-existent partitions will be created by Hive
  • 22. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 Partition A1 Partition A Partition A2 Partition A3 Partition B1 3 SOLUTION Partition B Partition B2 Data Temp Partition B3 Partition C1 Staging Partition C Partition C2 Partition C3 4 Area Partition D1 Partition D Partition D2 Partition D3 Partition E1 Partition E Partition E2 Partition E3 5 3 Layered partitioning = very helpful for 6 region-based partitioning Helps maintain one table definition across regions 7
  • 23. 1 WHAT MAKES UP OUR ETL Hive Data Warehouse 2 3 SOLUTION Data Temp Staging 4 Area 5 TO OPTIMIZE DISK IO FOR USER QUERIES, 6 WE ENABLED COMPRESSION 7
  • 24. Hive Data Warehouse 1 Data Temp 2 Staging Area 3 SOLUTION WHY COMPRESSION? We have 24 cores and disk IO is always the bottleneck, 4 so compression is essential WHY SNAPPY COMPRESSED 5 SEQUENCEFILE BLOCKS? Lots of “why Snappy” discussion on the interwebs already SequenceFile can be split by Hadoop and can run 6 multiple maps in parallel Block compression yields better compression ratio while keeping the file splittable; this block size is configurable 7
  • 25. 1 WHAT WE DO IN HIVE 2 3 SOLUTION 4 Hive Data Warehouse 5 We ETL data from OLTP MySQL slaves daily 6 7
  • 26. 1 WHAT WE DO IN HIVE 2 Our analysts shoot Hive queries every day 3 SOLUTION 4 Hive Data Warehouse 5 Translating to 1000s of MR jobs daily 6 7
  • 27. 1 WHAT WE DO IN HIVE 2 We have some pretty large tables: 3 SOLUTION 4 e.g., one with 50,795,997,734 rows Hive Data Warehouse 5 We use metrics derived from Hive queries to 6 improve our matchmaking system and player behavior 7
  • 28. 1 WHAT DID WE LEARN FROM ETL? 2 •  If you use custom ETL, keep an eye out for block distribution •  DRY: Re-inventing the wheel is not a good idea 3 SOLUTION –  Invest time in researching proper tools that suit your needs –  Tons of options for ETL and workflow management –  Just because company X is using a particular ETL or workflow 4 management tool, it may or may not work effectively for you 5 6 7
  • 29. 1 WHY TABLEAU? Business 2 Audit Plat Analyst LoL Tableau 3 SOLUTION NORTH AMERICA Pentaho Audit Plat + Custom ETL Hive Data Pentaho MySQL 4 LoL + Warehouse Sqoop EUROPE 5 Audit Plat LoL 6 Analysts KOREA 7
  • 30. 1 WHY TABLEAU? Business 2 Analyst •  We needed to democratize access for Tableau non-technical folks 3 SOLUTION –  Design –  Execs MySQL –  Player Support 4 •  Great visualization capability •  Easy to work with 5 •  Has a Hive connector* 6 7
  • 31. 1 LEAGUE OF LEGENDS GAMEPLAY BASICS 2 3 SOLUTION 4 5 6 7
  • 32. 1 2 3 SOLUTION 4 5 6 7
  • 33. 1 2 3 SOLUTION 4 5 6 7
  • 34. 1 2 3 USECASE # 1 4 THE STORY OF SHEN 5 6 7
  • 35. 1 2 3 WAIT, SO WHAT’S A YORDLE? •  Yordles = very cute race of champions in League of Legends 4 •  We track Yordles (and the rest of our champions) because game balance is exceptionally important 5 6 7
  • 36. 1 DESIGN BALANCE IS IMPORTANT 2 •  Highly competitive game •  Updated every 2-3 weeks 3 –  New champions –  New items 4 USECASE #1 •  Game is a living, breathing service that’s always in motion •  Have to maintain a level playing field 5 6 7
  • 37. 1 QUICKLY REACTING TO CHANGES 2 = wins 3 USECASE 4 #1 5 6 total plays 7 time
  • 38. 1 HOW DID WE CREATE THAT? 2 3 USECASE 4 #1 5 6 7 *All logos are trademarks of respective owners
  • 39. 1 WHY NOT JUST HIVE? 2 3 USECASE 4 #1 5 6 7 *All logos are trademarks of respective owners
  • 40. 1 WHY NOT JUST HIVE? 2 3 HIVE IS FOR MASSIVE JOBS USECASE 4 #1 5 6 7
  • 41. 1 HIVE TO MYSQL TRANSFORMATION 2 •  Many of our stakeholders use Tableau •  Transformed required data into cubes for direct Tableau consumption using Pentaho 3 •  Initially experimented with Hive-to-Tableau connector –  Had issues, e.g., triggering MR jobs for every change and non- USECASE persistent Hive-Server 4 #1 5 6 7
  • 42. 1 WE WANTED TO KNOW MORE ABOUT… 2 Which champions and skins are popular across all regions? 3 USECASE 4 #1 What are the win-rates of champions across all regions? 5 Are better players choosing different champions? 6 7
  • 43. 1 WE CREATED CUBES OF AGGREGATED DATA 2 win rates 3 USECASE 4 #1 5 6 champions 7
  • 44. 1 HOW WE DID IT: TRANSFORMATION++ 2 Massive tables reside in Hive 3 Hive MySQL TABLEAU transformation transformed creates into cubes for USECASE 4 #1 dimension tables Tableau consumption 5 6 Some dimension tables moved to join with other fact tables in Hive 7
  • 45. 1 WHY DID WE GO THIS ROUTE? 2 3 Not good for slowly changing MySQL is not awesome for joining dimensions massive tables USECASE •  No automatic primary key 4 #1 generation •  Can’t regenerate dimension table quickly enough since it requires a full-table scan 5 6 •  Decided to use best of both worlds •  Also leveraged map-side joins and distributed cache 7
  • 46. 1 2 3 USECASE #2 4 MATCHMAKING AND REGIONAL METRICS 5 6 7
  • 47. 1 FIRST, SOME CONTEXT 2 •  League of Legends is global in scale, with players logging in from >145 countries in a typical day 3 •  No-fee play means very low barrier to play •  Players often play on multiple environments regularly (e.g. EU players on NA environments and vice versa) 4 •  Same features and mechanics deployed in all territories •  It’s vitally important that we understand game 5 USECASE #2 performance metrics by geography and region 6 7
  • 48. 1 MATCHMAKING 2 •  One of the most important features outside of gameplay •  Like a dating service, the objective is to match people up; 3 •  Number of different queues that players can line up in, depending on the type of match they’re looking for 4 USECASE 5 #2 6 Critical that this system is balanced balanced and able to create good matches quickly 7
  • 49. 1 MATCHMAKING – IS IT WORKING? 2 •  Matchmaking algorithm based on modified Elo system •  Inspecting the “curve” of these scores: 3 –  Should show a similar distribution in all regions –  May show interesting trends, such as win/lose ratios 4 USECASE 5 #2 6 7
  • 50. 1 MATCHMAKING – IS IT WORKING? 2 % players ELO DISTRIBUTION GRAPH 3 4 USECASE 5 #2 6 7 ELO score
  • 51. 1 WHAT WAS NEEDED TO GENERATE IT? 1 2 Had to join massive tables with session and player data MASSIVE MASSIVE MASSIVE 3 TABLE TABLE TABLE WITH WITH WITH SESSION PLAYER GAME 4 DATA DATA DATA 2 USECASE Needed to lookup and range-query IP-addresses in same join 5 #2 Required for many region-based metrics 6 7
  • 52. 1 LIMITATIONS OF HIVE 2 Hive 3 4 No good indexing Not efficient for mechanism in our lookup and range version queries USECASE 5 #2 6 This made region-based queries computationally difficult 7
  • 53. 1 SOLUTION 2 Hive 3 leveraged open-source 4 libraries online GeoIP UDFs USECASE UDFs = user-defined functions that one 5 #2 can add to the Hive interpreter 6 7
  • 54. 1 2 3 4 LESSONS 5 6 7
  • 55. 1 2 3 LESSON #1 a Analysts are greedy in 4 mid-sized cluster ited Enable a scheduler with cap-lim resources for users 5 6 LESSONS 7
  • 56. 1 2 3 LESSON #2 ase Configuration follows usec ld shou •  Hardware profiles and file-structure 4 match the workload with •  Enabling com pression to trade disk IO CPU helped us 5 hines is •  Large r block size in beefy mac MMV performant. 25 6MB worked for us àY directory 6 •  Levera ge all the spindles. Stripe arameters LESSONS access for appropriate p 7
  • 57. 1 2 3 LESSON #3 nd Cover your downside (a 4 your backside) r and •  RA ID Namenode, Jobtracke ode) Secondary N amenode (Checkpoint N rough http 5 •  Backup namespace th crucial – •  Backu ps and Trash configs are u a warning remem ber “rmr” doesn’t give yo 6 LESSONS 7
  • 58. 1 2 3 LESSON #4 Plan capac ity for at least one 4 year ahead ea •  Instead of usi ng production cluster, hav erimenting new dar k launch cluster for exp 5 usecases an other •  In-pla ce upgrades is trickier th ibility enterprise so ftware since wire-compat istros 6 LESSONS is n ot available in current d 7
  • 59. 1 2 3 LESSON #5 Automate for reality! ed rc pen-sou We wrote chef recipes and o 4 es/ them at h ttps://github.com/RiotGam cloudera-cookbook 5 6 LESSONS may not 2.0 (the "License"); you Apache License, Version y of the License at htt p:// mes. Licensed under the y obtain a cop Copyright 2012 Riot Ga with the License. You ma to in writing, compliance agreed use this file except in by applicable law or SE-2.0 Unless required BASIS, WITHOUT WARR ANTIES OR www.a pache.org/licenses/LICEN d on an "AS IS" er the License is distribute for the specific language 7 software distributed und CONDITIONS OF AN Y KIND, either expres limitations under the s or implied. See the Lic License. ense governing permissions and
  • 60. 1 2 3 4 THE FUTURE 5 6 7
  • 61. 1 OUR IMMEDIATE GOALS 2 •  Shorten time to insight •  Increase depth of insight 3 •  Enable data analysis for client-side features •  Log ingestion and analysis 4 •  Flexible auditing framework •  International data infrastructure 5 6 THE 7 FUTURE
  • 62. 1 CHALLENGE: MAKE IT GLOBAL 2 •  Data centers across the globe since latency has huge effect on gameplay à log data scattered around the world •  Large presence in Asia -- some areas (e.g., PH) have bandwidth 3 challenges or bandwidth is expensive 4 5 6 THE 7 FUTURE
  • 63. 1 CHALLENGE: WE HAVE BIG DATA STRUCTURED DATA 2 500G DAILY APPLICATION AND OPERATIONAL LOGS 3 4.5TB DAILY 4 OFFICIAL LOL SITE TRAFFIC 6MM HITS DAILY 5 RIOT YOUTUBE CHANNEL 1.7MM SUBSCRIBERS 270+MM VIEWS 6 +  chat logs +  detailed gameplay event tracking 7 THE FUTURE +  so on….
  • 64. 1 OUR AUDACIOUS GOALS 2 Build a world-class data and analytics organization •  Deeply understand players across the globe •  Apply that understanding to improve games for players 3 •  Deeply understand our entire ecosystem, including social media 4 Have ability to identify, understand and react to meaningful trends in real time 5 Have deep, real-time understanding of our systems from player experience and operational standpoints 6 THE 7 FUTURE
  • 65. 1 SHAMELESS HIRING PLUG 2 •  Like most everybody else at this conference… we’re hiring! •  The Riot Manifesto 3 Player experience first 4 Challenge convention Focus on talent and team 5 Take play seriously Stay hungry, stay humble 6 THE 7 FUTURE
  • 66. 1 SHAMELESS HIRING PLUG 2 3 4 5 6 THE And yes, you can play games at work. 7 FUTURE It’s encouraged!
  • 67. THANK YOU! QUESTIONS? BARRY LIVINGSTON & DANI RAYAN blivingston@riotgames.com drayan@riotgames.com