-
1.
DeNA West & BigQuery
Yoshiki Izawa
-
2.
Who is DeNA?
Japan
China
West
-
3.
Who is DeNA West?
• 1st Party: developed in house
• 2nd-3rd Party: Developed externally
with/without our help and published
by DeNA
• JP 1st Party: Import hits made by
DeNA in Japan
-
4.
What data do we care about?
Standard Game KPIs Marketing Data Custom Game Insight
-
5.
What’s our data like?
of raw logs per minute
~60MB
of raw logs per day
~50GB
-
6.
How were we dealing with it?
Log Table
Logs from over
100 titles and
multiple studios
42TB raw log
data from May
2011 to Jan 2015
Accessed
via HiveQL
-
7.
What were our data woes?
Data sources
ETL
Table Storage
Data access for ad
hoc analysis Visualization solutions
Visualization workflows
-
8.
Issue #1 - 3 Hour Data Delay
Production Collector (add rats, scrip)
Amazon S3 (for backup)
ETL (Validation, Normalization)
PS
Logs
GS
Logs
GC/PC
Logs
-
9.
Issue #2 - Too Many Cooks
Hadoop
Marketing
Finance
Game Analysts
User Retention
Clipart credit: iconka.com, webdesignhot.com
-
10.
Issue #3 - Slow Queries
-
11.
What was our solution?
Simplify common tech using AppEngine and BigQuery
-
12.
Solution #1 - BigQuery Instantly Ingestion
Google API
Ad Data
PS / PC
Logs
GC/GC
Logs
PF Data
-
13.
Solution #1 - BigQuery Instantly Ingestion (original)
Gateway module
PS / PC
Logs
GC/GC
Logs
Pull Task Queue
Game Project
Common
Project
App Engine Big Query
streaming insert
Game ProjectGame ProjectGame Project
-
14.
Solution #1 - BigQuery Instantly Ingestion (current)
Gateway
module
PS / PC
Logs
GC/GC
Logs
Cloud Storage
Game
Project
Common
Project
App Engine Big Query
Game ProjectGame ProjectGame Project
Log
Common
Project
TEMP
dataset
batch load
every 3min
create log timestamp
base files
by 3min cron job
bq query
every 3min
-
15.
Solution #2 - BigQuery Scaling
Scaling + support = happier analysts!
Wh ya gonna call?
-
16.
Solution #2 - BigQuery Permissions
-
17.
Solution #3 - BigQuery is fast
-
18.
Lessons Learned
• Quotas
• Different cost structure
• New(ish) Product - some features may not work as expected
-
19.
Is BigQuery for you?
• Cutting edge tech
• High control of your analysis
• Zero maintenance
-
20.
Let’s see some data!
-
21.
Let’s see some data!
Dropped entry price,
increased conversion Reduced overall revenue at
start of funnel but increase
overall
-
22.
Let’s see some data!
Ruby Medal Balance over Time
Tutorial Completion RateTCP Latency Distribution
Don’t plan on saying anything here. This is just the slide to bring up while you’re getting on stage. But you can certainly duplicate this if you want.
Hi everyone! I work at DeNA, Japanese company growing our western presence in gaming. Last year, we had just under 2 Billion dollars worth of virtual currency spent in Japan and around 270 million dollars across the rest of the world.
Internally like Blood Brothers 2, or with licensed IP like Transformers, Star Wars, and Marvel. And we’re bringing over Final Fantasy: Record Keeper, one of our biggest hits in Japan that’s making us over $10 million dollars a month there.
common logs across all our games to compare KPIs. We get marketing data from ad vendors on install sources and spend. We implement custom logs in each game for design tuning.
<have time> On busy days, we get around 60 megabytes of raw player logs per minute and 50 gigabites of raw logs per day. <struggled to process this volume>
Old solution: Hadoop cluster accessed primarily via Hive. All of our player logs since May 2011 from over 100 titles across multiple studios is stored in one big 42TB table
Old infrastructure <pause> 15 seconds is definitely not enough time to describe it all. We had a pretty complicated set up before, and we would run into various bottlenecks and failure points that I’ll dive into next.
Our first big issue was a hefty delay between when a player would trigger logs to when analysts could query the data. The log collection and ETL process took a while, and analysts would need to wait 3 or more hours for new data - not fun when your games run live events.
Next, as DeNA West grew our portfolio, we also grew our data users - our increasing team of analysts would clog our systems and we had issues controlling permissions, especially with external developers.
And - this one drove me crazy -Queries took so long to run you’d forget what you were looking for, and exploratory analysis was clunky - just not as fun and intuitive as it should be.
<have time> So how did we decide to solve these issues? We simplified our games’ common tech in the West by using Google AppEngine as a platform server and Google BigQuery for analysis.
This addressed a lot of the problems we used to deal with. For starters, we experimented with Google’s Streaming API, Cloud Logging, and other set ups to get our logs almost instantly after they’re sent - not 3 hours later.
Gateway module receives ~35,000 records/min
Pull Tasks queue handles 8K~9K/min including retries
Hit URLFetch Quota on GAE
3 queries per minute per game
x 7 game projects
x 1440 minutes per day
= 30,240 queries per day
Not to mention pager duty for our Hadoop cluster will soon be a thing of the past. Now if we have too many users clogging the system, it’s Google’s problem, not ours. BigQuery is built to scale, so if we launch 20 more titles, we don’t need to worry about stress on our cluster like we used to.
And we can much more flexibly control permissions - internally as well as externally. For example, we can easily share 3rd party game data with the partner developer, an issue we struggled with before.
<if time, our old solution for this was a nightmare>
And queries are SO much faster. In this video I’m getting tutorial completion rate by country, platform, and device. As the rules go, I only have 15 seconds to crunch over 150 gigs data - and it’s done! In Hadoop, the same query over the same volume took 2 minutes.
Not to say it’s been completely smooth sailing in our transition, mostly because we’ve had to look at things differently. We’ve learned to keep an eye on hitting quotas, as well as query and storage costs. And given it’s a newer product, we’ve had some problems where things don’t work as expected. It’s been crucial to be able to iterate quickly and work with Google when we’ve noticed issues.
<time, sum bq has helped us deal with some headaches> We’ve felt that it’s been a good solution given that we like doing our analysis in house, but don’t want to maintain a cluster ourselves.
Before I go, show you data with Tableau. In Blood Brothers 2 for example, we saw an issue where players lost easy missions early on, so we improved educating them and drilling into the data by day we could see our change helped.
dropped the entry price for a live event’s step-up gacha, which is one that has unlocking steps increasing in price. Though this meant lower revenue early in the funnel, overall revenue (the green line) was due to increased conversion to the end
We always strive to drive actions with data - prioritizing engineering effort by monitoring game performance, optimizing tutorial and purchase funnels, tuning our games, and iterating on live events which are our primary drive for monetization.