Snowplow: open source game analytics powered by AWS

Snowplow: open source
game analytics powered
by AWS

Hello! We’re Alex and Yali.
We created Snowplow
• We cofounded Snowplow
• Open source event data pipeline built on
AWS tech
• Collect granular, rich, event-level data
across digital platforms
• Validate, enrich, model and deliver that
data to the places it can be analysed and
acted on

Wonder at what the data made possible
drove us to create Snowplow
• Digital event data is rich, behavioral information on how millions of people do
things (play, work, socialize, ﬂirt, unwind etc.) collected at scale
• Endless possibilities to ask and answer different questions, build intelligence
and act on that intelligence
• Packaged solutions do a poor job of enabling companies to realise all the
different possibilities presented by this data
• Lots of companies build their own event data pipelines to realise those
possibilities. If we can build a standard pipeline, companies can focus on
doing stuff with the data

A call to arms for games
analysts

Games companies are typically
very analytically sophisticated
• At a (often early) stage invest in event data
warehouse / data pipeline
• Analytics is often very speciﬁc to each game:
packaged solutions can only get you so far
• Data sophistication: competitive advantage
• Larger game studios typically have very large data
teams (engineering, science and analysis) and
signiﬁcant analytics infrastructure that they’ve built

But you don’t need to build your own
event data pipeline from scratch
• We have a tried and tested open-source stack, that you can
deploy directly to your own AWS account
• Built on top of AWS services incl. Kinesis, Lambda, Redshift,
Elasticsearch, S3, EMR
• Use your data engineers to build analyses speciﬁc to your game,
not to re-build the pipe!

Building high quality event data
pipelines is hard
Data quality Schema evolution
Enrichment Data modeling

Today Snowplow is
used by games
studios…
…And companies in
other sectors

Snowplow and our early
gaming inﬂuences

Early work with games studios heavily
inﬂuenced our thinking
Flexible data schema
that evolve!
Event grammar:
events vs entities
Evolving data models:
understanding
sequences of play

Game analytics encompasses a lot
• Product analytics: use data to improve the game
• Customer acquisition analytics: sustainably drive user growth
• Game health analytics: monitor the game
• Data-driven applications within the game e.g. player-matching
• Plenty more that is speciﬁc to your game

We distinguish between analytics on
read vs analytics on write
• Decide on how you want to process the
data at the point of query
• Prioritise having the flexibility to query the
data in a rich / varied way
• De-prioritise query latency
• Example: product analytics
Analytics on Read Analytics on Write
• Define in advance how the data will be
queried
• Prioritise low latency
• De-prioritise query flexibility
• Example: game health monitoring
Different architectures are appropriate for the above two cases

With Snowplow, we meet both
requirements via a Lambda Architecture
Analytics on write:
kinesis + AWS
Lambda / Spark
Streaming
Analytics on
read: Redshift /
Spark / Athena

Analytics on read example: A/B testing
to drive product development
• Limitless possibilities for experiments
• Wide set of metrics that you might be
looking to inﬂuence with each
experiment
• Tracking the experiments should be
easy
• All enabled by the ﬂexibility to
compute segments and metrics after
the fact (at query time)

Delivering the A/B testing framework
with Redshift and/or Spark on EMR
Process
• Product manager deﬁnes A/B test in
advance incl. KPI and success
threshold
• Rolling program of tests run each week
• Test history documented
Technology
• Event tracked to indicate that a user is
assigned to a speciﬁc group and a
particular experiment is run
• KPI can be measured after the fact

Analytics on read example 2:
level optimisation analytics

Delivering level analytics with Redshift
and/or Spark on EMR
Process
• Deﬁne key metrics to understand player
engagement with each level
• Build out data modeling process to compute
level aggregation on the underlying event
stream
• Extend over time: build out more
sophisticated metrics as understanding of
play evolves
Technology
• Attach level metadata to all events
• Aggregate event-stream in Redshift /
Spark
• Recompute over historical data as new
metrics are developed

AWS provides a rich and growing toolkit
for analytics on read
• EMR enabling Hadoop, Spark, Flink
• Athena
• Redshift
• Elasticsearch Service

Analytics on write example 1:
Surface aggregate play data in the game
• https://next.codecombat.com/play/dungeon

Delivering aggregate play data into the game with
Kinesis, Lambda and DynamoDB
Example: calculating # of users live on each level now
Elegantly handle computing complex metrics (count distincts) in real-time
{…},
{ event_name: e,
level_name: l
user_name: u,
timestamp: t },
{…}
Kinesis event stream AWS Lambda
Compute
player
state
Player state
table
Event stream
of updates to
player state
DynamoDB
+ stream
Compute
level state
AWS Lambda DynamoDB
Level state
table

Analytics on write example 2:
Tiered support based on player LTV
Triage user based on expected LTV
1. Standard user: minimise support cost
2. Silver user: personalised service
3. Platinum user: concierge service

Delivering tiered support using Kinesis, Lambda,
DynamoDB and API Gateway
Example: computing customer lifetime value and serving from customer API
{…},
{ event_name: e,
user_name: u,
transaction_value: v
timestamp: t },
{…}
Kinesis event stream AWS Lambda
Compute
Player
Lifetime
Value
Player
State
table
DynamoDB
+ stream
Serve
Player
State
API Gateway
Triage
player
support
tier

AWS provides a rich and growing toolkit
for analytics on write
• Spark Streaming on EMR
• Kinesis Client Library
Stream processing frameworks Serverless event processing
• AWS Lambda
• Kinesis Analytics

Design considerations for
game analytics

1. Keep your analytics stack
independent from your game’s stack
Evolve game and
analytics independently
Best of breed
components for
analytics and game
Handle order of
magnitude different
scale requirements
• Helpful for larger teams
• Reduce fragility
• Limited overlap
between best tools for
game engines and
best for event analytics
• Game event volumes
will dwarf active game
data

2. Develop your analytics on read ﬁrst,
then migrate them to on write
• Example: customer acquisition model: set bid prices for different
user cohorts
• Model developed, tested and trained on historical data in data
warehouse
• Model then put live on real-time data in-stream

3. Have a formal framework for
managing change
• Change is inevitable through the lifetime of the game:
• The game evolves
• Analysts and scientists ask new questions of the game
• The analytics team must agree a framework to handle:
• Updates to the in-game event and entity schemas (affects the
developers)
• Evolution of the event data modeling (affects the wider company)

Standardise on your event data pipeline
• Why re-invent the wheel?
• Deploy our tried and tested open-source stack, directly in your
AWS account
• Use your data engineers to build analyses speciﬁc to your game,
not to re-build the pipe!

Learn more
• http://snowplowanalytics.com
• https://github.com/snowplow/snowplow

Thank you for attending #AmazonDevDay, please take a moment to
complete our survey for a chance to win the grand prize.
bit.ly/DevDaySurvey
Q&A will be in a room on the third ﬂoor

Snowplow: open source game analytics powered by AWS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Snowplow: open source game analytics powered by AWS

Ähnlich wie Snowplow: open source game analytics powered by AWS (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Snowplow: open source game analytics powered by AWS