SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Rob Winters
Stefano Oldeman
• Introductions – Rob, Stefano, Spil
• Analytics at Spil, the Journey Questions that drove the evolution
• Architectural Overview Designing successful analytics architecture
• Analytical Case Studies How can you change the business?
▫ Information empowerment (Self-service BI)
▫ Data mining/Predictive Analytics
▫ Personalization
• Key Learnings What are mistakes for you to avoid?
Today’s Agenda
The value of an idea lies in the using of it.
-Thomas Edison
A person who is gifted sees the essential point and leaves the rest as surplus.
-Thomas Carlyle
If you torture the data long enough, it will confess.
-Ronald Coase
Rob Winters
• Current role: Head of Data
Technology, de Bijenkorf
• Formerly Director of
Analytics, Spil Games
• Eight years in analytics, four
in leadership roles
• Industries include telecom,
gaming, retail, and e-
commerce
Spil Games
• Web and mobile gaming
company based in
Hilversum
• >150 million monthly
unique visitors, >1 billion
gameplays monthly
• Activity measured in >150
countries across ~50 sites
Stefano Oldeman
• Current role: Big data
Developer, Shop2market
• Former Developer in Big
Data program, Spil Games
• 4 years in High available /
High performance
applications. 3 years
building BI solutions.
The Story of our Journey
What drove Spil to Analytics?
We were here
2014 version
Stagnating growth, needed to act differently Buzzword ideas without understanding
Zynga was growing by using data, Spil’s growth was slowing
“Data-driven” was a hot buzzword in 2007
2014’s version: “I need a Hadoop!”
• Nightly copies of production data into Postgres
• No integration model for data sources
• Event tracking = Google Analytics
• Reporting was “send out the numbers” rather
than “analyze data and answer questions”
Starting the Big Data program
Rationale:
• We needed higher connectibility, flexibility, and quicker insights than were
possible with existing solutions
• Wanted to “own” our own data versus having it on an external system
First steps:
• Answer “what do we want to track?”
• Deploy the fundamental components:
▫ Tracking Library
▫ Logging infrastructure
▫ MapReduce platform
▫ Scheduling
• Start basic event tracking
Systems and Architecture
Architectural Overview
Plus a scheduler!
Event tracking principles
• Think of it as Information architecture
• Each event should refer to an actual business-user interaction
• Use multiple events over time to tell what happened
• Never tell the system what did NOT happen
• Agree upfront on structures and definitions that explain your business
Challenges with data pipelines
Think in pipes
Things will fail somewhere
Be generic
Keep moving until the end
• Two-tier architecture: Hadoop/Disco for “big data” persistence and processing,
analytical database for data warehousing and analytics
• All data persisted in Hadoop
• Some data is made available in your DB
• Offload big data calculations to Hadoop
• When data is not complete or business logic changes: Replay data from your
Hadoop to your DB
The Data Lake
The right tool for the right job: ETL tools plus raw code
Load first, integrate later – ELT versus ETL
Everybody lies. Manage your own metadata and provide a feedback loop
Vertica: Our column store data warehouse
• The goal: offer users complete data in a high performance environment
• Reporting namespace for normalized tables
▫ Names are user friendly
▫ Optimized for drag and drop queries / reports
• Users escalate when they find incorrect data
▫ This feedback is then processed in the data pipeline
▫ Data is processed again to correct mistakes
Why not just Hadoop?
Source Merging, Ad hoc, Run-time query performance
Tools for the use case: visual analytics, standard dashboards, statistical
environment
Analysts need to use development best practices – version control, deployment
mechanisms, metadata-driven models
Everyone else needs something simple, intuitive, and FUN
Performance is critical. You have <5 seconds to load a report
Use Cases
Primary Objective:
An organization that can
Formulate
Ask
Explore
and Answer
questions using
data
Engaging the frontline is not the same as management support
Balancing operational needs versus management needs
Scale the BI team support at a better than linear rate
Roadshows, 1:1 sessions, and informal learn@lunches to discuss data questions
Centralize your systems, distribute your support via power users
Challenge all requests equally on a value basis; fit “tweaks” to dev windows
Avoid presumptions in keys; Avoid “interpretation”; keep the raw records!
How do we enforce consistency without limiting future flexibility?
Build to FAIL – jobs should be able to be run at any time, repeatedly, without requiring intervention
How can we be resilient, fail gracefully, and recover automatically?
It’s in the tooling – use systems that don’t require pre-aggregation or complex end-user querying
How can we allow deep exploration without compromising performance?
Is our technology ready to support personalization?
Can we use data to (semi) automatically improve our business?
There are known knowns – Donald Rumsfeld
API’s to integrate with product
Releases and Servers
Can we create additional user and business
value by delivering an individual
experience to (almost) everyone?
Loading data into production
Non happy flow..
Key Learnings
• Plan for 10x more than today, design for 100x more
• Build versus off-the-shelf: I’ve built an event tracker
• Testing on production: NOT OPTIONAL
• “Cheap” isn’t always cheap
▫ Expensive software which offsets hardware and (most importantly) people costs can often deliver much lower
TCO
• Hadoop is great to work on, there are so many tools.
▫ But you don’t want to worry about the infrastructure (outsourced)
• And yet, the developers and infra engineers have to work closely
▫ Users/Scripts should not store small files (who should/can influence)
• Simple is better than complex (Flume + Avro FF).
• Security..
▫ Problem 1: user X uploads a file, hive can’t read it
▫ Problem 2: user X creates a table with hive, can user Y can’t write to it..
• Conclusion:
▫ Clear agreements on who can do what.. And follow defined requirements set by the users
(TTL of files, FileFormats, when to upgrade).
What We Learned about Hadoop
• Reduce your analytical cycles!
▫ 15 minute query time = <20 iterations per day; 2 minute query = > 100
• Walk before you run
• Be wary of the HIPPO (Highest Paid Person in the Organization)
• Focus on developing internal bench strength and power users above the general
organization
• Old tricks are the best tricks (when done right)
What we learned about analytics
The greatest value of a picture is when it forces us to notice what we
never expected to see.
-John Tukey

Weitere ähnliche Inhalte

Was ist angesagt?

Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data AnalyticsRob Winters
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayDataStax
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data WarehouseEric Sun
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Data
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastYellowbrick Data
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing StrategiesKnoldus Inc.
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...DataStax
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 

Was ist angesagt? (20)

Getting Started with Big Data Analytics
Getting Started with Big Data AnalyticsGetting Started with Big Data Analytics
Getting Started with Big Data Analytics
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Data engineering
Data engineeringData engineering
Data engineering
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Big Data Testing Strategies
Big Data Testing StrategiesBig Data Testing Strategies
Big Data Testing Strategies
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data lake
Data lakeData lake
Data lake
 
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
Webinar | Real-time Analytics for Healthcare: How Amara Turned Big Data into ...
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 

Ähnlich wie Big Data at a Gaming Company: Spil Games

5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)Mark Barber
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterInside Analysis
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Building a Data Driven Company
Building a Data Driven CompanyBuilding a Data Driven Company
Building a Data Driven CompanyMaciej Mróz
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mhaAgileDenver
 

Ähnlich wie Big Data at a Gaming Company: Spil Games (20)

5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)You've Got No UI?! (Agile Data Teams)
You've Got No UI?! (Agile Data Teams)
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Building a Data Driven Company
Building a Data Driven CompanyBuilding a Data Driven Company
Building a Data Driven Company
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mha
 

Kürzlich hochgeladen

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 

Kürzlich hochgeladen (17)

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 

Big Data at a Gaming Company: Spil Games

  • 2. • Introductions – Rob, Stefano, Spil • Analytics at Spil, the Journey Questions that drove the evolution • Architectural Overview Designing successful analytics architecture • Analytical Case Studies How can you change the business? ▫ Information empowerment (Self-service BI) ▫ Data mining/Predictive Analytics ▫ Personalization • Key Learnings What are mistakes for you to avoid? Today’s Agenda
  • 3. The value of an idea lies in the using of it. -Thomas Edison A person who is gifted sees the essential point and leaves the rest as surplus. -Thomas Carlyle If you torture the data long enough, it will confess. -Ronald Coase
  • 4. Rob Winters • Current role: Head of Data Technology, de Bijenkorf • Formerly Director of Analytics, Spil Games • Eight years in analytics, four in leadership roles • Industries include telecom, gaming, retail, and e- commerce Spil Games • Web and mobile gaming company based in Hilversum • >150 million monthly unique visitors, >1 billion gameplays monthly • Activity measured in >150 countries across ~50 sites Stefano Oldeman • Current role: Big data Developer, Shop2market • Former Developer in Big Data program, Spil Games • 4 years in High available / High performance applications. 3 years building BI solutions.
  • 5. The Story of our Journey
  • 6. What drove Spil to Analytics? We were here 2014 version Stagnating growth, needed to act differently Buzzword ideas without understanding Zynga was growing by using data, Spil’s growth was slowing “Data-driven” was a hot buzzword in 2007 2014’s version: “I need a Hadoop!”
  • 7. • Nightly copies of production data into Postgres • No integration model for data sources • Event tracking = Google Analytics • Reporting was “send out the numbers” rather than “analyze data and answer questions”
  • 8. Starting the Big Data program Rationale: • We needed higher connectibility, flexibility, and quicker insights than were possible with existing solutions • Wanted to “own” our own data versus having it on an external system First steps: • Answer “what do we want to track?” • Deploy the fundamental components: ▫ Tracking Library ▫ Logging infrastructure ▫ MapReduce platform ▫ Scheduling • Start basic event tracking
  • 11. Event tracking principles • Think of it as Information architecture • Each event should refer to an actual business-user interaction • Use multiple events over time to tell what happened • Never tell the system what did NOT happen • Agree upfront on structures and definitions that explain your business
  • 12. Challenges with data pipelines Think in pipes Things will fail somewhere Be generic Keep moving until the end
  • 13. • Two-tier architecture: Hadoop/Disco for “big data” persistence and processing, analytical database for data warehousing and analytics • All data persisted in Hadoop • Some data is made available in your DB • Offload big data calculations to Hadoop • When data is not complete or business logic changes: Replay data from your Hadoop to your DB The Data Lake
  • 14. The right tool for the right job: ETL tools plus raw code Load first, integrate later – ELT versus ETL Everybody lies. Manage your own metadata and provide a feedback loop
  • 15. Vertica: Our column store data warehouse • The goal: offer users complete data in a high performance environment • Reporting namespace for normalized tables ▫ Names are user friendly ▫ Optimized for drag and drop queries / reports • Users escalate when they find incorrect data ▫ This feedback is then processed in the data pipeline ▫ Data is processed again to correct mistakes Why not just Hadoop? Source Merging, Ad hoc, Run-time query performance
  • 16. Tools for the use case: visual analytics, standard dashboards, statistical environment Analysts need to use development best practices – version control, deployment mechanisms, metadata-driven models Everyone else needs something simple, intuitive, and FUN Performance is critical. You have <5 seconds to load a report
  • 18. Primary Objective: An organization that can Formulate Ask Explore and Answer questions using data
  • 19. Engaging the frontline is not the same as management support Balancing operational needs versus management needs Scale the BI team support at a better than linear rate Roadshows, 1:1 sessions, and informal learn@lunches to discuss data questions Centralize your systems, distribute your support via power users Challenge all requests equally on a value basis; fit “tweaks” to dev windows
  • 20. Avoid presumptions in keys; Avoid “interpretation”; keep the raw records! How do we enforce consistency without limiting future flexibility? Build to FAIL – jobs should be able to be run at any time, repeatedly, without requiring intervention How can we be resilient, fail gracefully, and recover automatically? It’s in the tooling – use systems that don’t require pre-aggregation or complex end-user querying How can we allow deep exploration without compromising performance?
  • 21.
  • 22. Is our technology ready to support personalization? Can we use data to (semi) automatically improve our business?
  • 23.
  • 24. There are known knowns – Donald Rumsfeld API’s to integrate with product Releases and Servers
  • 25.
  • 26. Can we create additional user and business value by delivering an individual experience to (almost) everyone?
  • 27.
  • 28. Loading data into production Non happy flow..
  • 29.
  • 31. • Plan for 10x more than today, design for 100x more • Build versus off-the-shelf: I’ve built an event tracker • Testing on production: NOT OPTIONAL • “Cheap” isn’t always cheap ▫ Expensive software which offsets hardware and (most importantly) people costs can often deliver much lower TCO
  • 32. • Hadoop is great to work on, there are so many tools. ▫ But you don’t want to worry about the infrastructure (outsourced) • And yet, the developers and infra engineers have to work closely ▫ Users/Scripts should not store small files (who should/can influence) • Simple is better than complex (Flume + Avro FF). • Security.. ▫ Problem 1: user X uploads a file, hive can’t read it ▫ Problem 2: user X creates a table with hive, can user Y can’t write to it.. • Conclusion: ▫ Clear agreements on who can do what.. And follow defined requirements set by the users (TTL of files, FileFormats, when to upgrade). What We Learned about Hadoop
  • 33. • Reduce your analytical cycles! ▫ 15 minute query time = <20 iterations per day; 2 minute query = > 100 • Walk before you run • Be wary of the HIPPO (Highest Paid Person in the Organization) • Focus on developing internal bench strength and power users above the general organization • Old tricks are the best tricks (when done right) What we learned about analytics
  • 34. The greatest value of a picture is when it forces us to notice what we never expected to see. -John Tukey