SlideShare ist ein Scribd-Unternehmen logo
1 von 17
How to make lean Big Data with six 
tools from Google 
Nikolay Novozhilov (nik@bubbly.net) 
June 2014
2 
One slide about Bubbly 
Leading mobile social media & messaging service across Asia 
Singapore (HQ) + Mumbai, Manila, Jakarta, Tokyo, Hanoi & Bangkok 
Sequoia Capital, SingTel, JAFCO & Comcast 
40M+ users just over three years since launch of Bubbly 
Silicon Valley veterans with top engineers from around the world 
• Tony Bates, CEO Skype / CSO Microsoft 
• Jeff Karras, MD SingTel Innov8 
• Dave Williams, former CTO O2, AT&T, and Telefonica 
• Jimmy Iovine, Chairman, Interscope Records (Judge on American Idol) 
• Gaurav Garg, Sequoia Capital US 
• Nikki Han, President, SM Entertainment (Korea) 
• Mohit Bhatnagar, Sequoia India 
Overview 
Offices 
Investors 
Users 
Team 
Board / 
Advisors
3 
What do we want from Data Analytics? 
Make the Dashboard with key metrics 
Dive deep in user behavior and A/B testing 
Monitor availability and performance 
Produce reports for external users 
Etc… 
Everybody needs the same
4 
What did we do? 
We have tried many things to satisfy our needs. 
And found solution optimal for us 
Fast to make and cheap 
Flexible and with a lot of functionality 
Able to deal with Big Data – we log 60 mln events a day 
In this presentation we show how it’s done
5 
Why we didn’t use Mixpanel 
Not enough configurability 
Once you really care about your data – standard charts are not enough! 
Mixpanel export APIs don’t solve all issues 
What about extra features – not data mining: 
Use results inside your product 
Send monitoring alerts the way you want 
Give limited access to 3rd parties 
Costs a lot! 
People often sample data to Mixpanel. 
But what if you need full data dumped in one place? 
There are tons of other cloud-solutions, that might be doing 
some of these tricks, but I don’t trust “small projects”
6 
Why we didn’t use Hadoop 
It is too complicated 
Hadoop needs server infrastructure 
Even with hosted Hadoop solution you need a lot to setup 
Batch processing – Hadoop is not reactive to your queries. It kills 
you when you do: 
Ad hoc and trial-and-error data analysis 
Mistakes in scripts 
…I mean – you do it every day! 
Hadoop doesn’t give you visualization, monitoring, etc… You still 
have to build it.
7 
Why we didn’t use MySQL 
We have too much data for MySQL 
Still need to host it, build all functionality, etc… 
Already enough reasons!
8 
What did we do instead? 
Google Big Query 
Google Spreadsheets 
Google Charts 
Google Drive / Google Sites 
Store all possible events from users 
Query and transform data 
Interactive visualization 
Host the Dashboard 
Google Analytics Look after Dashboard users
9 
Why BigQuery? 
Solution hosted by Google – ready to use today! 
Much cheaper than hosting own applications in AWS. 
Established API – easy to add logging to your code. 
Web UI for queries 
Our trick to make it “schema less” 
For every upload check current schema in BigQuery 
Compare with schema of current upload 
If you have extra fields – add these fields using BigQuery API
10 
Why Google Spreadsheets? 
Nothing is better for analytics than spreadsheets!!! 
But why not MS Excel? Several reasons: 
Easy to query data from BigQuery (Tutorial from Goolge) 
Cloud hosted solution with cron-like scheduler for scripts 
Cross platform solution (Excel VBA scripts fail on Mac) 
Security – you can give read-only rights to some users 
Already has email functionality for alerts and much more…
11 
How to use Google Spreadsheets? 
Example - link! 
The goal was to make it usable for SQL-only people (no coding) 
How it works 
Our Google apps script is triggered periodically 
It scans all sheets for value “SQL” in A1. 
If it finds “SQL”, then A2 contains SQL query that is pushed to BigQuery 
Results are populated below on the same page
12 
Why Google Charts? 
Big visualization library, free, done by Google 
Integrated with Google spreadsheets (Google Tutorial) 
Interactive controls – business people can explore data 
too! 
Example - link
13 
Why Google Sites / Google Drive? 
Easy to manage access to data for all users (including 3rd 
parties) 
Dropbox gives you only “full-access” 
Google Drive has many roles: “owner”, “can edit”, “read only” 
After using BigQuery, Spreadsheets and Charts from Google – 
why not everything 
Google Drive – host html files with Charts. It has good desktop 
client so it is easy to manage charts 
Google Sites has WYSIWYG site builder
14 
Why Google Analytics? 
Dashboard is a product itself. In our case in has about 30 
users. 
You need data from users to improve your product 
You need analytics tool for it! 
I use Google Analytics to watch how users visit my 
Dashboard on Google Sites 
… and punish ones who is not using it ;)
15 
What about costs? 
In the whole solution only BigQuery costs money! 
We never paid more than 200$ per month 
Real costs come from time/efforts to develop and 
support. Our solution is smart but lean: 
The whole project is done by one analyst/developer 
1 month from idea to fist live version
16 
Best practice to optimize costs of BigQuery 
BigQuery performs full-table scans 
In most queries you care only about recent events 
If you store all data in one table with time you scan a lot of data for nothing 
resulting in 
Higher costs 
Slower queries 
We rotate event tables monthly, creating tables inside one dataset (like 
events_2014Jan, events_2014Feb,…) 
Google scripts Apps are ideal for monthly rotation 
For queries that require historical data we use meta-SQL that is parced by 
Google Spreadsheets script 
• “FROMDATASET dataset” – query all tables in dataset 
• “FROMLAST table” – query “table” and “table_2014Jan” (table from last 
month)
17 
Example dashboard 
Check out this page for example dashboard with all 
working source code: 
https://sites.google.com/site/leanbigdatawith6tools/

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Kürzlich hochgeladen (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

How to make lean Big Data with six tools from Google

  • 1. How to make lean Big Data with six tools from Google Nikolay Novozhilov (nik@bubbly.net) June 2014
  • 2. 2 One slide about Bubbly Leading mobile social media & messaging service across Asia Singapore (HQ) + Mumbai, Manila, Jakarta, Tokyo, Hanoi & Bangkok Sequoia Capital, SingTel, JAFCO & Comcast 40M+ users just over three years since launch of Bubbly Silicon Valley veterans with top engineers from around the world • Tony Bates, CEO Skype / CSO Microsoft • Jeff Karras, MD SingTel Innov8 • Dave Williams, former CTO O2, AT&T, and Telefonica • Jimmy Iovine, Chairman, Interscope Records (Judge on American Idol) • Gaurav Garg, Sequoia Capital US • Nikki Han, President, SM Entertainment (Korea) • Mohit Bhatnagar, Sequoia India Overview Offices Investors Users Team Board / Advisors
  • 3. 3 What do we want from Data Analytics? Make the Dashboard with key metrics Dive deep in user behavior and A/B testing Monitor availability and performance Produce reports for external users Etc… Everybody needs the same
  • 4. 4 What did we do? We have tried many things to satisfy our needs. And found solution optimal for us Fast to make and cheap Flexible and with a lot of functionality Able to deal with Big Data – we log 60 mln events a day In this presentation we show how it’s done
  • 5. 5 Why we didn’t use Mixpanel Not enough configurability Once you really care about your data – standard charts are not enough! Mixpanel export APIs don’t solve all issues What about extra features – not data mining: Use results inside your product Send monitoring alerts the way you want Give limited access to 3rd parties Costs a lot! People often sample data to Mixpanel. But what if you need full data dumped in one place? There are tons of other cloud-solutions, that might be doing some of these tricks, but I don’t trust “small projects”
  • 6. 6 Why we didn’t use Hadoop It is too complicated Hadoop needs server infrastructure Even with hosted Hadoop solution you need a lot to setup Batch processing – Hadoop is not reactive to your queries. It kills you when you do: Ad hoc and trial-and-error data analysis Mistakes in scripts …I mean – you do it every day! Hadoop doesn’t give you visualization, monitoring, etc… You still have to build it.
  • 7. 7 Why we didn’t use MySQL We have too much data for MySQL Still need to host it, build all functionality, etc… Already enough reasons!
  • 8. 8 What did we do instead? Google Big Query Google Spreadsheets Google Charts Google Drive / Google Sites Store all possible events from users Query and transform data Interactive visualization Host the Dashboard Google Analytics Look after Dashboard users
  • 9. 9 Why BigQuery? Solution hosted by Google – ready to use today! Much cheaper than hosting own applications in AWS. Established API – easy to add logging to your code. Web UI for queries Our trick to make it “schema less” For every upload check current schema in BigQuery Compare with schema of current upload If you have extra fields – add these fields using BigQuery API
  • 10. 10 Why Google Spreadsheets? Nothing is better for analytics than spreadsheets!!! But why not MS Excel? Several reasons: Easy to query data from BigQuery (Tutorial from Goolge) Cloud hosted solution with cron-like scheduler for scripts Cross platform solution (Excel VBA scripts fail on Mac) Security – you can give read-only rights to some users Already has email functionality for alerts and much more…
  • 11. 11 How to use Google Spreadsheets? Example - link! The goal was to make it usable for SQL-only people (no coding) How it works Our Google apps script is triggered periodically It scans all sheets for value “SQL” in A1. If it finds “SQL”, then A2 contains SQL query that is pushed to BigQuery Results are populated below on the same page
  • 12. 12 Why Google Charts? Big visualization library, free, done by Google Integrated with Google spreadsheets (Google Tutorial) Interactive controls – business people can explore data too! Example - link
  • 13. 13 Why Google Sites / Google Drive? Easy to manage access to data for all users (including 3rd parties) Dropbox gives you only “full-access” Google Drive has many roles: “owner”, “can edit”, “read only” After using BigQuery, Spreadsheets and Charts from Google – why not everything Google Drive – host html files with Charts. It has good desktop client so it is easy to manage charts Google Sites has WYSIWYG site builder
  • 14. 14 Why Google Analytics? Dashboard is a product itself. In our case in has about 30 users. You need data from users to improve your product You need analytics tool for it! I use Google Analytics to watch how users visit my Dashboard on Google Sites … and punish ones who is not using it ;)
  • 15. 15 What about costs? In the whole solution only BigQuery costs money! We never paid more than 200$ per month Real costs come from time/efforts to develop and support. Our solution is smart but lean: The whole project is done by one analyst/developer 1 month from idea to fist live version
  • 16. 16 Best practice to optimize costs of BigQuery BigQuery performs full-table scans In most queries you care only about recent events If you store all data in one table with time you scan a lot of data for nothing resulting in Higher costs Slower queries We rotate event tables monthly, creating tables inside one dataset (like events_2014Jan, events_2014Feb,…) Google scripts Apps are ideal for monthly rotation For queries that require historical data we use meta-SQL that is parced by Google Spreadsheets script • “FROMDATASET dataset” – query all tables in dataset • “FROMLAST table” – query “table” and “table_2014Jan” (table from last month)
  • 17. 17 Example dashboard Check out this page for example dashboard with all working source code: https://sites.google.com/site/leanbigdatawith6tools/