SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Billions of Rows, Millions of Insights
Right Now
Developing a Landscape for Real Time Information
Who is Spil Games?
• 180 million monthly and 12 million daily players
• More than one billion gameplays monthly
• Active in every country of the world (even Vatican City!)
One of the largest casual gaming companies on the planet
Local EVERYWHERE
Titles
We are a platform first, but also a publisher and a developer
The paradigm is shifting
The Data Lake
• Highly consistent
• Highly connectable
• Inflexible
• Slow
• Flexible
• Fast
• Going to get wet
You always
need both
Traditionally, we define data
based on what we expect
With streaming data, we capture
first and define later
Defining BIG Data
The Four Vs
Velocity
Variety
Volume
Veracity
Small Data = BIG Data?
Real Time
ETL, Events, Excel
Drinking from the Firehose
Heuristics
VALUE: The Only V that Matters
Defining the VELOCITY and VARIETY
Traditional ETL“Real Time”
• Once a day
• Once a week
• Delayed
• Faster than human
perception
• <200 milliseconds
“In Time”
In Time: Information is available fast enough to influence decisions
•While in the shop/on the site (minutes)
•While the query runs (seconds)
•While the page loads (milliseconds)
The Velocity Continuum
Deriving the VALUE at Spil
Informing Decisions Making Decisions
• Day to day business reporting
• Analytical reporting for self-service
analysis
• Business analytics for advising decisions
• Descriptive models to explain our
business
• Customer Lifetime Value
• Marketing ROI
• Customer content recommendations
• Email campaign targeting
• Site learning and optimization
• System monitoring and alerting
Why Real Time Reporting Matters
Value of
Reporting
Real time reporting is a paradigm-shifting component
of our cloud based big data strategy!
I need to see everything happening RIGHT NOW
System Monitoring
Product Changes
In-Time Customer Support
Real Time Systems Requirements
Requirement Rationale Our Experience
Scalable with fast loads Must handle intraday
variable load
Load swings up to 300%
during the day
Fast join performance Synthesizing traditional ETL
data and real time events
on the fly
Denormalization is great
but volume expensive; 3NF
is BAD
Resilient Real time means as few
buffers as possible
Tableau extracts can slow
the process too much
Good query optimizer Minor inefficiencies
translate to expensive
performance hits
The best MySQL engine is
still too slow for BIG data
aggregation
Concurrent loading and
querying
No offline processing for
real time data
ETLs running at the same
time as queries up to 20%
of the time
Solution: C-Store Databases
C-Stores and Fast Dashboards
• C-Stores persist each column independently and
allow column compression
• Queries retrieve data only from needed columns
Example: 7 billion rows, 25 columns, 10 bytes/column =
1,6 TB table
Query: Select A, sum( D ) from table where C >= X;
Row Store: 1,6TB of Data
Column Store (30% compression): <195 GB data
The Result: Dashboards can run
direct on large tables
Dashboard on 7 billion row table with
two joins, <20 seconds to refresh
Our Technical Infrastructure
How much data do we handle?
Through Map/Reduce:
1.2 Billion Events/Day
(150 Million Rows/Day
into DWH)
Through ETL:
100-200 Million
Rows/Day into DWH
Map/Reduce: 20 Billion
Rows
Vertica:
45 Billion Rows
Long Term Storage:
All of 2013 Events
Predictive models: >500
million scores per day
ETLs to Production DBs:
>10 Models
Reporting: 150
Dashboards, 80 data
sources
Queries: >2000 per day
Ingestion Persistence Usage
Data Flow for Event Data
{ token:"BAEDIDtxmZoAWAEA",
sessionId:1358331540132,
visitorId:515876866411417,
pageInSession:3, environment:"stg",
eventList:[{ eventCategory: 'displayAds',
eventAction: 'fetch', eventLabel:
'Miniclip,leaderboard,160x60,SE,2.9',
eventValue: 1, //the depth in the daizy
chaining pageInSession: 2, timing:
1730 }] }:
JSON event data is generated by client
Visitor Session Page Timing Type Action Source Value
123 456 3 1730 DisplayAd Fetch Miniclip 2
Data is structured in Map/Reduce and put into flat files
Data is loaded into Vertica for Reporting + Analysis
Tableau queries directly from fact tables
Why we chose our tech
• Affordable
• Highly available and resilient
• Extremely fast development due to SQL
• Excellent query performance = lazy
optimization
• Right price
• Easy (and fun!) development
• Excellent library availability
• Industry standard for Map/Reduce
• Cheap storage of “data lake”
• Easy integration with existing tech
• Denormalize like crazy (or cheat with pre-join projections)
• Map/Reduce doesn’t like “real” time (try Storm)
• Network is the first limit you hit
• Let Tableau write the SQL, but optimize the projections
• Tableau’s caching is inflexible, scripting can solve (kind of)
What we’ve learned along the way
Q&A + Demo

Weitere ähnliche Inhalte

Was ist angesagt?

Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeSingleStore
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
MemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformSingleStore
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusDatabricks
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Data
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesSingleStore
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data IntegrationsPat Patterson
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018confluent
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastYellowbrick Data
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...HostedbyConfluent
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondSingleStore
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkDatabricks
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureSingleStore
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 

Was ist angesagt? (20)

Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
MemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics PlatformMemSQL - The Real-time Analytics Platform
MemSQL - The Real-time Analytics Platform
 
Building the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for FluviusBuilding the Next-gen Digital Meter Platform for Fluvius
Building the Next-gen Digital Meter Platform for Fluvius
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 
Building Custom Big Data Integrations
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
 
The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018The Future of ETL - Strata Data New York 2018
The Future of ETL - Strata Data New York 2018
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
Money Heist - A Stream Processing Original! | Meha Pandey and Shengze Yu, Net...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
The State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and BeyondThe State of the Data Warehouse in 2017 and Beyond
The State of the Data Warehouse in 2017 and Beyond
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
Internet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data InfrastructureInternet of Things and Multi-model Data Infrastructure
Internet of Things and Multi-model Data Infrastructure
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 

Andere mochten auch

Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!BigDataExpo
 
Top bi travelbird
Top bi travelbirdTop bi travelbird
Top bi travelbirdBigDataExpo
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningRob Winters
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil GamesRob Winters
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
ProductTank AMS - Building Products to Win - Edial Dekker
ProductTank AMS - Building Products to Win - Edial DekkerProductTank AMS - Building Products to Win - Edial Dekker
ProductTank AMS - Building Products to Win - Edial DekkerProductTank AMS
 

Andere mochten auch (7)

Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
 
Top bi travelbird
Top bi travelbirdTop bi travelbird
Top bi travelbird
 
Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine Learning
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
ProductTank AMS - Building Products to Win - Edial Dekker
ProductTank AMS - Building Products to Win - Edial DekkerProductTank AMS - Building Products to Win - Edial Dekker
ProductTank AMS - Building Products to Win - Edial Dekker
 

Ähnlich wie Billions of Rows, Millions of Insights, Right Now

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_DataNAVER D2
 
WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Caserta
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitRekha Joshi
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Jedha Bootcamp
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE John Furrier
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 

Ähnlich wie Billions of Rows, Millions of Insights, Right Now (20)

ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data[2C6]Everyplay_Big_Data
[2C6]Everyplay_Big_Data
 
WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015WebAction In-Memory Computing Summit 2015
WebAction In-Memory Computing Summit 2015
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage Les objets connectés : de nombreux cas d'usage
Les objets connectés : de nombreux cas d'usage
 
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE Wikibon #IoT #HyperConvergence Presentation via @theCUBE
Wikibon #IoT #HyperConvergence Presentation via @theCUBE
 
Hyper-Convergence CrowdChat
Hyper-Convergence CrowdChatHyper-Convergence CrowdChat
Hyper-Convergence CrowdChat
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Billions of Rows, Millions of Insights, Right Now

  • 1. Billions of Rows, Millions of Insights Right Now Developing a Landscape for Real Time Information
  • 2. Who is Spil Games? • 180 million monthly and 12 million daily players • More than one billion gameplays monthly • Active in every country of the world (even Vatican City!) One of the largest casual gaming companies on the planet Local EVERYWHERE Titles We are a platform first, but also a publisher and a developer
  • 3. The paradigm is shifting The Data Lake • Highly consistent • Highly connectable • Inflexible • Slow • Flexible • Fast • Going to get wet You always need both Traditionally, we define data based on what we expect With streaming data, we capture first and define later
  • 4. Defining BIG Data The Four Vs Velocity Variety Volume Veracity Small Data = BIG Data? Real Time ETL, Events, Excel Drinking from the Firehose Heuristics VALUE: The Only V that Matters
  • 5. Defining the VELOCITY and VARIETY Traditional ETL“Real Time” • Once a day • Once a week • Delayed • Faster than human perception • <200 milliseconds “In Time” In Time: Information is available fast enough to influence decisions •While in the shop/on the site (minutes) •While the query runs (seconds) •While the page loads (milliseconds) The Velocity Continuum
  • 6. Deriving the VALUE at Spil Informing Decisions Making Decisions • Day to day business reporting • Analytical reporting for self-service analysis • Business analytics for advising decisions • Descriptive models to explain our business • Customer Lifetime Value • Marketing ROI • Customer content recommendations • Email campaign targeting • Site learning and optimization • System monitoring and alerting
  • 7. Why Real Time Reporting Matters Value of Reporting Real time reporting is a paradigm-shifting component of our cloud based big data strategy! I need to see everything happening RIGHT NOW System Monitoring Product Changes In-Time Customer Support
  • 8. Real Time Systems Requirements Requirement Rationale Our Experience Scalable with fast loads Must handle intraday variable load Load swings up to 300% during the day Fast join performance Synthesizing traditional ETL data and real time events on the fly Denormalization is great but volume expensive; 3NF is BAD Resilient Real time means as few buffers as possible Tableau extracts can slow the process too much Good query optimizer Minor inefficiencies translate to expensive performance hits The best MySQL engine is still too slow for BIG data aggregation Concurrent loading and querying No offline processing for real time data ETLs running at the same time as queries up to 20% of the time Solution: C-Store Databases
  • 9. C-Stores and Fast Dashboards • C-Stores persist each column independently and allow column compression • Queries retrieve data only from needed columns Example: 7 billion rows, 25 columns, 10 bytes/column = 1,6 TB table Query: Select A, sum( D ) from table where C >= X; Row Store: 1,6TB of Data Column Store (30% compression): <195 GB data The Result: Dashboards can run direct on large tables Dashboard on 7 billion row table with two joins, <20 seconds to refresh
  • 11. How much data do we handle? Through Map/Reduce: 1.2 Billion Events/Day (150 Million Rows/Day into DWH) Through ETL: 100-200 Million Rows/Day into DWH Map/Reduce: 20 Billion Rows Vertica: 45 Billion Rows Long Term Storage: All of 2013 Events Predictive models: >500 million scores per day ETLs to Production DBs: >10 Models Reporting: 150 Dashboards, 80 data sources Queries: >2000 per day Ingestion Persistence Usage
  • 12. Data Flow for Event Data { token:"BAEDIDtxmZoAWAEA", sessionId:1358331540132, visitorId:515876866411417, pageInSession:3, environment:"stg", eventList:[{ eventCategory: 'displayAds', eventAction: 'fetch', eventLabel: 'Miniclip,leaderboard,160x60,SE,2.9', eventValue: 1, //the depth in the daizy chaining pageInSession: 2, timing: 1730 }] }: JSON event data is generated by client Visitor Session Page Timing Type Action Source Value 123 456 3 1730 DisplayAd Fetch Miniclip 2 Data is structured in Map/Reduce and put into flat files Data is loaded into Vertica for Reporting + Analysis Tableau queries directly from fact tables
  • 13. Why we chose our tech • Affordable • Highly available and resilient • Extremely fast development due to SQL • Excellent query performance = lazy optimization • Right price • Easy (and fun!) development • Excellent library availability • Industry standard for Map/Reduce • Cheap storage of “data lake” • Easy integration with existing tech
  • 14. • Denormalize like crazy (or cheat with pre-join projections) • Map/Reduce doesn’t like “real” time (try Storm) • Network is the first limit you hit • Let Tableau write the SQL, but optimize the projections • Tableau’s caching is inflexible, scripting can solve (kind of) What we’ve learned along the way