Diese Präsentation wurde erfolgreich gemeldet.

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

DeNA West & BigQuery

  1. 1. DeNA West & BigQuery Yoshiki Izawa
  2. 2. Who is DeNA? Japan China West
  3. 3. Who is DeNA West? • 1st Party: developed in house • 2nd-3rd Party: Developed externally with/without our help and published by DeNA • JP 1st Party: Import hits made by DeNA in Japan
  4. 4. What data do we care about? Standard Game KPIs Marketing Data Custom Game Insight
  5. 5. What’s our data like? of raw logs per minute ~60MB of raw logs per day ~50GB
  6. 6. How were we dealing with it? Log Table Logs from over 100 titles and multiple studios 42TB raw log data from May 2011 to Jan 2015 Accessed via HiveQL
  7. 7. What were our data woes? Data sources ETL Table Storage Data access for ad hoc analysis Visualization solutions Visualization workflows
  8. 8. Issue #1 - 3 Hour Data Delay Production Collector (add rats, scrip) Amazon S3 (for backup) ETL (Validation, Normalization) PS Logs GS Logs GC/PC Logs
  9. 9. Issue #2 - Too Many Cooks Hadoop Marketing Finance Game Analysts User Retention Clipart credit: iconka.com, webdesignhot.com
  10. 10. Issue #3 - Slow Queries
  11. 11. What was our solution? Simplify common tech using AppEngine and BigQuery
  12. 12. Solution #1 - BigQuery Instantly Ingestion Google API Ad Data PS / PC Logs GC/GC Logs PF Data
  13. 13. Solution #1 - BigQuery Instantly Ingestion (original) Gateway module PS / PC Logs GC/GC Logs Pull Task Queue Game Project Common Project App Engine Big Query streaming insert Game ProjectGame ProjectGame Project
  14. 14. Solution #1 - BigQuery Instantly Ingestion (current) Gateway module PS / PC Logs GC/GC Logs Cloud Storage Game Project Common Project App Engine Big Query Game ProjectGame ProjectGame Project Log Common Project TEMP dataset batch load every 3min create log timestamp base files by 3min cron job bq query every 3min
  15. 15. Solution #2 - BigQuery Scaling Scaling + support = happier analysts! Wh ya gonna call?
  16. 16. Solution #2 - BigQuery Permissions
  17. 17. Solution #3 - BigQuery is fast
  18. 18. Lessons Learned • Quotas • Different cost structure • New(ish) Product - some features may not work as expected
  19. 19. Is BigQuery for you? • Cutting edge tech • High control of your analysis • Zero maintenance
  20. 20. Let’s see some data!
  21. 21. Let’s see some data! Dropped entry price, increased conversion Reduced overall revenue at start of funnel but increase overall
  22. 22. Let’s see some data! Ruby Medal Balance over Time Tutorial Completion RateTCP Latency Distribution

Notizen

  • Don’t plan on saying anything here. This is just the slide to bring up while you’re getting on stage. But you can certainly duplicate this if you want.
  • Hi everyone! I work at DeNA, Japanese company growing our western presence in gaming. Last year, we had just under 2 Billion dollars worth of virtual currency spent in Japan and around 270 million dollars across the rest of the world.
  • Internally like Blood Brothers 2, or with licensed IP like Transformers, Star Wars, and Marvel. And we’re bringing over Final Fantasy: Record Keeper, one of our biggest hits in Japan that’s making us over $10 million dollars a month there.
  • common logs across all our games to compare KPIs. We get marketing data from ad vendors on install sources and spend. We implement custom logs in each game for design tuning.
  • <have time> On busy days, we get around 60 megabytes of raw player logs per minute and 50 gigabites of raw logs per day. <struggled to process this volume>
  • Old solution: Hadoop cluster accessed primarily via Hive. All of our player logs since May 2011 from over 100 titles across multiple studios is stored in one big 42TB table
  • Old infrastructure <pause> 15 seconds is definitely not enough time to describe it all. We had a pretty complicated set up before, and we would run into various bottlenecks and failure points that I’ll dive into next.
  • Our first big issue was a hefty delay between when a player would trigger logs to when analysts could query the data. The log collection and ETL process took a while, and analysts would need to wait 3 or more hours for new data - not fun when your games run live events.
  • Next, as DeNA West grew our portfolio, we also grew our data users - our increasing team of analysts would clog our systems and we had issues controlling permissions, especially with external developers.
  • And - this one drove me crazy -Queries took so long to run you’d forget what you were looking for, and exploratory analysis was clunky - just not as fun and intuitive as it should be.
  • <have time> So how did we decide to solve these issues? We simplified our games’ common tech in the West by using Google AppEngine as a platform server and Google BigQuery for analysis.
  • This addressed a lot of the problems we used to deal with. For starters, we experimented with Google’s Streaming API, Cloud Logging, and other set ups to get our logs almost instantly after they’re sent - not 3 hours later.
  • Gateway module receives ~35,000 records/min
    Pull Tasks queue handles 8K~9K/min including retries
    Hit URLFetch Quota on GAE
  • 3 queries per minute per game
    x 7 game projects
    x 1440 minutes per day
    = 30,240 queries per day
  • Not to mention pager duty for our Hadoop cluster will soon be a thing of the past. Now if we have too many users clogging the system, it’s Google’s problem, not ours. BigQuery is built to scale, so if we launch 20 more titles, we don’t need to worry about stress on our cluster like we used to.
  • And we can much more flexibly control permissions - internally as well as externally. For example, we can easily share 3rd party game data with the partner developer, an issue we struggled with before.
    <if time, our old solution for this was a nightmare>
  • And queries are SO much faster. In this video I’m getting tutorial completion rate by country, platform, and device. As the rules go, I only have 15 seconds to crunch over 150 gigs data - and it’s done! In Hadoop, the same query over the same volume took 2 minutes.
  • Not to say it’s been completely smooth sailing in our transition, mostly because we’ve had to look at things differently. We’ve learned to keep an eye on hitting quotas, as well as query and storage costs. And given it’s a newer product, we’ve had some problems where things don’t work as expected. It’s been crucial to be able to iterate quickly and work with Google when we’ve noticed issues.
  • <time, sum bq has helped us deal with some headaches> We’ve felt that it’s been a good solution given that we like doing our analysis in house, but don’t want to maintain a cluster ourselves.
  • Before I go, show you data with Tableau. In Blood Brothers 2 for example, we saw an issue where players lost easy missions early on, so we improved educating them and drilling into the data by day we could see our change helped.
  • dropped the entry price for a live event’s step-up gacha, which is one that has unlocking steps increasing in price. Though this meant lower revenue early in the funnel, overall revenue (the green line) was due to increased conversion to the end
  • We always strive to drive actions with data - prioritizing engineering effort by monitoring game performance, optimizing tutorial and purchase funnels, tuning our games, and iterating on live events which are our primary drive for monetization.
  • ×