Game Analytics with AWS - GDC 2014

AWS Gaming Solutions | GDC 2014
Game Analytics with AWS
Or, How to learn what your players love so they will love your game
Nate Wiger @nateware | Principal Gaming Solutions Architect

Mobile Game Landscape
• Free To Play
• In-App Purchases
• Long-Tail
• Cross-Platform
• Go Global
• User Retention = Revenue

Projected Mobile App Revenue
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
2011 2012 2013 2014 2015 2016 2017
Ads
IAP
Paid
Source:
Gartner

Winning at Free to Play
• Phase 1: Collect Data
• Phase 2: Analyze
• Phase 3: Profit

Analyze What?
Emotions
• Enjoying game
• Engaged
• Like/dislike new content
• Stuck on a level
• Bored
• Abandonment
Behaviors
• Hours played day/week
• Number of sessions/day
• Level progression
• Friend invites/referrals
• Response to mobile push
• Money spent/week

Example: Level Progression (One Metric)
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
# of Tries

Example: Level Progression (Two Metrics)
0
10
20
30
40
50
60
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
% Highest Level # of Tries

Key Takeaways
• Multiple data sources
• Correlate variables
• Deltas vs absolutes
• Settle on terminology (game vs level)
• Time matters

Events & Metrics
• Event = Moment in Time
– Login/quit
– Game start/end
– Level up
– In-app purchase
• Metrics = What to Measure
– KISS
– Numbers
– Booleans
– Strings (Enums)
• Always Include (ALWAYS)
– User
– Action
– Session (context-dependent)
– Timestamp in ISO8601
2014-03-16T16:28:26

Off The Shelf Analytics
• Easy To Integrate
• Pre-Baked Reports
• Rate Limits
• Retention Windows
• Data Lock-In

Ok, A Real Business Plan
Ingest Store Process Analyze

Ok, A Real Business Plan
Ingest
• HTTP PUT
• Kafka
• Kinesis
• Scribe
Store
• S3
• DynamoDB
• HDFS
• Redshift
Process
• EMR (Hadoop)
• Spark
• Storm
Analyze
• Tableau
• Pentaho
• Jaspersoft

• Write Events File on Device
• Periodically Upload to S3
• Process into Redshift
• Point GUI Tool to Redshift
Start Simple
2014-01-24,nateware,e4df,login
2014-01-24,nateware,e4df,gamestart
2014-01-24,nateware,e4df,gameend
2014-01-25,nateware,a88c,login
2014-01-
25,nateware,a88c,friendlist
2014-01-25,nateware,a88c,gamestart
Profit!

Redshift at a Glance
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Columnar table storage
– Load, backup, restore via Amazon S3
– Parallel load from Amazon DynamoDB
• Single node version available

Tableau + Redshift

Plumbing
① Create S3 bucket ("mygame-analytics-events")
② Request a security token for your mobile app:
http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html
③ Upload data from your users' devices
④ Run a scheduled copy to Redshift
⑤ Setup Tableau to access Redshift
⑥ Go to the Beach

Loading Redshift from S3
copy events
from 's3://mygame-analytics-events'
credentials 'aws_access_key_id=<access-key-id>;
aws_secret_access_key=<secret-access-key>'
delimiter=',';
Scheduled Redshift Load using Data Pipeline:
http://aws.amazon.com/articles/1143507459230804

• Also Collect Server Logs
• Periodically Upload to S3
• Stuff into Redshift
• External Analytics Data Too
More Data Sources
EC2
External
Analytics

Logrotate to S3
/var/log/apache2/*.log {
sharedscripts
postrotate
sudo /usr/sbin/apache2ctl graceful
s3cmd sync /var/log/*.gz s3://mygame-logs/
endscript
}
Blog Entry on Log Rotation:
http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/
And/or, Use ELB Access Logs:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/acce
ss-log-collection.html

• Different File Formats
• Device vs Apache vs CDN
• Cleanup with EMR Job
• Output to Clean Bucket
• Load into Redshift
Dealing With Messy Data
EC2

Redshift vs Elastic MapReduce
Redshift
• Columnar DB
• Familiar SQL
• Structured Data
• Batch Load
• Faster to Query
• Long-term Storage
Elastic MapReduce
• Hadoop
• Hive/Pig are SQL-like
• Unstructured Data
• Streaming Loop
• Scales > PB's
• Transient

• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
Direct From DynamoDB
EC2

• Integrate Game DB
• Load Directly into Redshift
• Redshift does Intelligent Merge
• Tracks Hash Keys, Columns
• Or Stream into EMR
Direct From DynamoDB
EC2

Loading Redshift from DynamoDB
copy games
from 'dynamodb://games'
aws_secret_access_key=<secret-access-key>';
copy events
from 's3://mygame-analytics-events'
aws_secret_access_key=<secret-access-key>'
delimiter=',';

Funnel Cake

Back To Basics
2014-01-24,nateware,e4df,login
2014-01-24,nateware,e4df,gamestart
2014-01-24,nateware,e4df,gameend
2014-01-25,nateware,a88c,login
2014-01-
25,nateware,a88c,friendlist
2014-01-25,nateware,a88c,gamestart

Measure Retention: Repeated Plays
create view events_by_user_by_month as
select user_id,
date_trunc('month', event_date)
as month_active,
count(*) as total_events
from events
group by user_id, month_active;

First-Pass Retention – Too Noisy
0
5
10
15
20
25
30
35
40
# Play Sessions / Month
nateware
Lazyd0g
AK187
3strikes

Cohorts & Cambria
• Enables calculating relative metrics
• Group users by a common attribute
– Month game installed
– Demographics
• Run analysis by cohort
– Join with metrics
• Use Redshift as it's SQL
– Example of where SQL is a good fit

Creating Cohorts with Redshift
create view cohort_by_first_event_date as
select user_id,
date_trunc('month', min(event_date))
as first_month
from events
group by user_id;
http://snowplowanalytics.com/analytics/customer-
analytics/cohort-analysis.html

Retention by Cohort – Join Events with Cohort
0
5
10
15
20
25
Week 1 Week 2 Week 3 Week 5 Week 6 Week 7
# Sessions / Week
2013-11
2013-12
2014-01
2014-02
2014-03
2014-04

Moar Cohorts
• Define multiple cohorts
– By activity, time, demographics
– As many as you like
• Change cohort depending on analysis
• Join same metrics with different cohorts
– Retention by date
– Retention by demographic
– Retention by average plays/month quartile

Example Event Stream
2014-03-17T09:52:08-07:00,nateware,e4b5,login
2014-03-17T09:52:54-07:00,nateware,e4b5,gamestart
2014-03-17T09:53:15-07:00,nateware,e4b5,levelup
2014-03-17T09:54:06-07:00,nateware,e4b5,gameend
2014-03-17T09:54:23-07:00,nateware,30a4,gamestart
2014-03-17T09:55:14-07:00,nateware,30a4,gameend
2014-03-17T09:55:41-07:00,nateware,30a4,gamestart
2014-03-17T09:57:12-07:00,nateware,6ebd,levelup
2014-03-17T09:58:50-07:00,nateware,6ebd,levelup
2014-03-17T09:59:52-07:00,nateware,6ebd,gameend

Cohorts by Type of Activity
create view cohort_by_first_play_date as
select user_id,
date_trunc('month', min(event_date))
as first_month
from events
where action = 'gamestart'
group by user_id;

Post-Match Heatmaps

Real-Time Analytics
Batch
• What game modes do
people like best?
• How many people have
downloaded DLC pack 2?
• Where do most people
die on map 4?
• How many daily players
are there on average?
Real-Time
• What game modes are
people playing now?
• Are more or less people
downloading DLC today?
• Are people dying in the
same places? Different?
• How many people are
playing today? Variance?

Why Real-Time Analytics?
30x in 24 hours
What if you ran a promo?

Real-Time Tools
Spark
• High-Performance
Hadoop Alternative
• Berkeley.edu
• Compatible with HiveQL
• 100x faster than Hadoop
• Runs on EMR
Kinesis
• Amazon fully-managed
streaming data layer
• Similar to Kafka
• Streams contain Shards
• Each Shard ingests data
up to 1MB/sec, 1000 TPS
• Data stored for 24 hours

• Always Batch Due to S3
Back To Basics [Dubstep Remix]
EC2

• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
Need Data Faster!
EC2

• Stream Data With Kinesis
• Multiple Writers and Readers
• Still Output to Redshift
• Stream to Spark on EMR
• Storm via Kinesis Spout
• Custom EC2 Workers
Lots of Ins and Outs
EC2
EC2

Data
Sources
App.4
[Machine
Learning]
AWSEndpoint
App.1
[Aggregate &
De-Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
Extraction]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Introducing Amazon Kinesis
Service for Real-Time Big Data Ingestion

Putting Data into Kinesis
• Producers use PUT to send data to a Stream
• PutRecord {Data, PartitionKey, StreamName}
• Partition Key distributes PUTs across Shards
• Unique Sequence # returned on PUT call
• Documentation:
http://docs.aws.amazon.com/kinesis/latest/dev/
introduction.html

Writing to a Kinesis Stream
POST / HTTP/1.1
Host: kinesis.<region>.<domain>
x-amz-Date: <Date>
Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-
type;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid,
Signature=<Signature>
User-Agent: <UserAgentString>
Content-Type: application/x-amz-json-1.1
Content-Length: <PayloadSizeBytes>
Connection: Keep-Alive
X-Amz-Target: Kinesis_20131202.PutRecord
{
"StreamName": "exampleStreamName",
"Data": "XzxkYXRhPl8x",
"PartitionKey": "partitionKey"
}

Kinesis + Spark
http://aws.amazon.com/articles/4926593393724923

Death in Real-Time
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"}
PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"}
PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"}

Real-Time Heatmaps

But A Bow On It
• Collect data from the start
• Store it even if you can't process it (yet)
• Start simple – S3 + Redshift
• Add data sources – process with EMR
• Real-time – Kinesis + Spark
• Tons of untapped potential for gaming

Fallback Plan
Cheers – Nate Wiger @nateware

Game Analytics with AWS - GDC 2014

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Game Analytics with AWS - GDC 2014