1. Analytics at Motorola,
a journey to enable
self-service using Google tools
Patrick Deglon, June 2015
2. After a PhD in Particle Physics and 10 years at the University of Geneva studying
the creation of the Universe, Patrick spent the next decade driving business
insights at eBay and Motorola Mobility.
At eBay, he led significant improvements in marketing effectiveness by developing
methods to measure incremental sales, and by running large scale experiments on
Internet marketing channels.
At Motorola Mobility, he raised the bar in Analytics and on-board open Google
tools and technologies including Google Docs, Big Query, App Engine and
Compute Engine.
In June 2015, he founded Deglon Consulting to help companies adopt the latest
technologies in Cloud computing as well as integrate sound analytical
methodologies to measure business impact and marketing incrementality.
He is married with two kids and recently moved to Sarasota, Florida
Patrick Deglon Bio
2
3. Agenda
● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + gChart + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
3
9. Mobile was a revolution,
but Mobile is an outdated concept.
The Digital World (Cloud, Internet,
World’s Information, Digital Personal
Assistant…) will be available
everywhere: phones, watches, glasses,
cars, appliance, microchip implant, ...
9
10. Evolution of mankind
1973: First hand-held portable telephone
1989: Web proposal
2009: First microchip implant
...
Homo
Sapiens
Homo
Technicus
10
11. ● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
11
13. Motorola Cloud Customers Ecosystem
WebProduct
Sales
Business
Operation
Customer
Support
Partners & Carriers
Moto
Maker
Consumers: Phones,
Wearables &
Companion Products
Internal Business
Teams
Marketing
FinanceEngineering
Motorola Cloud
13
14. Motorola Cloud Applications & Services
Infrastructure as a Service
On-Device Applications &
Services
Web Applications &
Software as a Service
Platform as a Service
Cloud Applications &
Services
14
15. Google Cloud Platform (GCP) 101
GAE
Google App Engine
GCE
Google Compute Engine
GCS
Google Cloud Storage
BQ
Google Big Query
GA
Google Analytics
Virtual Linux/Windows Server
Web Server with automatic scaling
“FTP”
Big Data warehouse, public version of Dremel that
is powering Google Search
Website, Mobile and IoT tracking & analysis
15
16. Confluence’s
Data Wiki
OSQA’s FAQ
(“Stackoverflow”)
Data & Analytics
Summits
Solution
Engineering
Analytics Ecosystem
Device Instrumentation
Check it out:
→ Android Settings
→ Motorola Privacy
⌧ Help Improve Motorola Products (On/Off)
⌧ Moto Care (On/Off)
Motorola Big Data Environment
Motorola Cloud
(GAE/GCE)
Big Data (BQ)
Moto Insights
(GAE)
BigFeed (GAE)
16
17. ● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
17
18. How to provide a global
source of truth,
available on any form
factor with an outreach
mentality?
Existing Situation
- Numerous (conflicting) sources of truth
- Too many variations of same data cube
- “Table in your face” approach
- No global business definition
- No curation of manually entered data
points
- Report accessible on an internal portal
only (through VPN)
- No mobile form factor
Daily Activations Report
18
22. Features of Daily Report
● Get data (pivot) from BigQuery
● Spreadsheet magic
● Insights: WoW trends with statistics test, Key driver for growth, Key
milestone, internal QA tests
● Email
● Embedded Chart
● Scheduler
Demo:
https://docs.google.com/spreadsheet/ccc?key=0AjgpL8JvOwsvdDJjV0s3NF
phS3RnRzBXakNpZUR1ZGc#gid=21
22
23. • Assume sales follow a diffusion S-shape, i.e.
Description of the illustrative simulation
Marketing Word of mouth
ΔN
Nmax
N
• Add random noise to theoretical daily activations (Poisson)
• Simulated daily activations (sales) for United States, Canada,
Brazil, India, Russia, China, Germany and United Kingdom
with various launch date per region
ΔN = a (Nmax - N) + b N (Nmax - N)
23
24. Step 1: Create a backbone table
SELECT
CAL_DT,
Country
FROM
ON A.Dummy=B.Dummy
WHERE
B.CAL_DT>=A.Launch_Date
motorola.com:sandbox:demo.backbone:
INNER JOIN
(
SELECT
Country,
CASE
WHEN Country IN ('United States','Canada') THEN '2013-08-01'
WHEN Country IN ('Brazil','Russia','India','China') THEN '2013-10-01'
ELSE '2013-12-01'
END AS Launch_Date,
GDP_USD/1e7 AS Scale,
1 AS Dummy
FROM
[motorola.com:sandbox:pdeglon.countries]
WHERE
Country IN ('United States','Canada','Brazil','Russia','India',
'China','Germany','United Kingdom')
) AS B
(
SELECT
CAL_DT,
1 AS Dummy
FROM
[motorola.com:sandbox:pdeglon.calendar]
) AS A
X
BACKUP
24
25. Step 2: Calculate KPI value over time
SELECT
CAL_DT,
Country,
‘Phone 123’ AS Model,
INTEGER(Scale*
EXP(-POW( -150,2)/2/POW(75,2))
/(75*SQRT(2*PI()))
) AS Daily_Activations
FROM
[motorola.com:sandbox:demo.backbone]
motorola.com:sandbox:demo.baseline:
DATEDIFF(TIMESTAMP(CAL_DT),TIMESTAMP(Launch_Date))
...
Normal Distribution:
BACKUP
25
26. Step 3: Add Random Noise
SELECT
CAL_DT,
Model,
Country,
INTEGER(
Daily_Activations + SQRT(Daily_Activations) *
SQRT(-2*LN(RAND()))*COS(2*PI()*RAND())
) AS Daily_Activations
FROM
[motorola.com:sandbox:demo.baseline]
motorola.com:sandbox:demo.simulation:
Normal (Gaussian) Random Number
(mu=0, sigma=1)
(pseudo) Poisson distribution for
N=Daily_activation
BACKUP
26
27. Step 4: Final Pivot for report
SELECT
CAL_DT,
SUM(Daily_Activations) AS Total,
SUM(CASE WHEN Country IN ('United States','Canada') THEN Daily_Activations ELSE 0 END) AS NA,
SUM(CASE WHEN Country IN ('Brazil','Russia','India','China') THEN Daily_Activations ELSE 0 END) AS BRIC,
SUM(CASE WHEN Country IN ('Germany','United Kingdom') THEN Daily_Activations ELSE 0 END) AS EU,
SUM(CASE WHEN Country='United States' THEN Daily_Activations ELSE 0 END) AS UnitedStates,
SUM(CASE WHEN Country='Canada' THEN Daily_Activations ELSE 0 END) AS Canada,
SUM(CASE WHEN Country='Brazil' THEN Daily_Activations ELSE 0 END) AS Brazil,
SUM(CASE WHEN Country='Russia' THEN Daily_Activations ELSE 0 END) AS Russia,
SUM(CASE WHEN Country='India' THEN Daily_Activations ELSE 0 END) AS India,
SUM(CASE WHEN Country='China' THEN Daily_Activations ELSE 0 END) AS China,
SUM(CASE WHEN Country='Germany' THEN Daily_Activations ELSE 0 END) AS Germany,
SUM(CASE WHEN Country='United Kingdom' THEN Daily_Activations ELSE 0 END) AS UnitedKingdom
FROM
[motorola.com:sandbox:demo.simulation]
WHERE
CAL_DT<CURRENT_DATE()
GROUP BY 1
ORDER BY 1 DESC
BACKUP
27
44. ● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Wish List
● Q & A
Agenda
Share of time
44
45. Big Query to Big Query ETL
BigFeed
Check-in
Data
(PB)
Stagging
Data
(TB) Reporting
Data
(GB)
BigFeed
45
46. ● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
46
47. Moto Insights: Democratizing Business Intelligence
Old Analytics Portal
1. Require VPN
2. New Report takes weeks to develop
3. New portal features takes months
4. Tableau incompatibility with BigQuery
5. Reports are produce by a centralized team
6. Role management is becoming out of
control
1. Global access with App Engine
2. Programmatic approach (SQL + metadata)
3. Lightweight App Engine framework
(Go/AngularJS) using G API
4. Google Charts and native BQ SQL
5. Google Drive API
6. Google Groups
Moto Insights (GAE+BQ+gChart)
47
55. ● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
55
56. Enlightenment Questions for an Analyst
When was this BigQuery table last refreshed?
How often is it refreshed?
How was it created?
Which underlying data sources/tables is it
using?
Who created this table?
Who knows how to use this table?
Where can I find this great query I ran?
Who knows how to use this tag?
How much bandwidth am I using in BigQuery?
How much space are my tables using?
How much does my usage of BigQuery cost?
www.holymolecartoon.com
56
57. How to track Big Query usage?
Google does not provide a data feed on Motorola’s usage of BigQuery.
However three API can help us:
bigquery.jobs.list
List all the Jobs
in a specified project.
Note: use projection = full
to get email of user
bigquery.jobs.get
Retrieve the
specified job by ID.
We created an App Engine (Moto Monitor) to crawl Google API so we can recursively
collect all queries ran (since mid 2013; for a specific list of projects).
The queries are parsed to extract underlying tables used, and the data is stored in the
App Engine datastore as well as in Big Query through the streaming API (every 15
minutes).
bigquery.projects.list
List all (visible) projects
57
58. Product Architecture
Moto Monitor App Engine (Web Service)
default
module
web pages,
CSS, JS, etc.
bqusage
module
user requests
worker
module
CRON jobs Big Querydatastore
queries/tables
information
Google
APIs
58
59. Moto Monitor Browsing
Browse
a table
Browse
a job
Browse
a flow
https://moto-monitor.appspot.com/bq/info/{long table name}
e.g. https://moto-monitor.appspot.com/bq/info/motorola.com:analytics-data:activations.gcp_activations_shipments
https://moto-monitor.appspot.com/bq/jobinfo/{long job name}
e.g. https://moto-monitor.appspot.com/bq/jobinfo/bold-site-589:job_pz_J3anj2HjIz5AEX0_STXPtWb4
https://moto-monitor.appspot.com/bq/flow/{long table name}
e.g. https://moto-monitor.appspot.com/bq/flow/motorola.com:analytics-data:devices.asn_r12
Browse
your usage
https://moto-monitor.appspot.com/bq/about/me
59
60. How often is activations.gcp_foundation3 refreshed?
60
63. Which table are impacted by motorola.com:analytics-
data:devices.asn_r12?
63
64. Where can I find this great query I ran the other day?
https://moto-monitor.appspot.com/bq/about/me
https://moto-monitor.appspot.com/bq/about/me?before=2014-12-09
https://moto-monitor.appspot.com/bq/about/me?before=2014-12-09&limit=500
https://moto-monitor.appspot.com/bq/about/me?before=1418223600&limit=500
64
65. Moto Monitor is available in BigQuery too
Id STRING Unique Identifier for the job (ProjectId:JobId)
ProjectId STRING Project Id under which the job was run
JobId STRING Job Id
CreationTime INTEGER Unix Time when the job was submitted
StartTime INTEGER Unix Time when the job started
EndTime INTEGER Unix Time when the job finished
GateTime INTEGER Gating time in ms between CreationTime and
StartTime
RunTime INTEGER Running time in ms between StartTime and
EndTime
TotalBytesProcessed INTEGER Total Bytes scanned for the job
CacheHit BOOLEAN Boolean flag to indicate if cache was used
User STRING Email of user running the job
MD5 STRING MD5 of the full query
Query STRING Query truncated to 18,000 characters
Status STRING Status of the job (=DONE here)
AllowLargeResults BOOLEAN If true, allows the query to produce arbitrarily
large result tables at a slight cost in
performance. Requires destinationTable to be
set.
Priority STRING Specifies a priority for the query. Possible
values include INTERACTIVE and BATCH.
UseQueryCache BOOLEAN Boolean flag to indicate if cache was
requested in the job
DestinationProjectId STRING Define the project where the results of the
query will be written.
DestinationDatasetId STRING Define the dataset where the results of the
query will be written.
DestinationTableId STRING Define the table where the results of the
query will be written.
ErrorLocation STRING Specifies where the error occurred, if present.
ErrorMessage STRING A human-readable description of the error.
ErrorReason STRING A short error code that summarizes the error.
moto-monitor:usage.bq_raw
65
66. Who was using the tag MOT_DEVICE_STATS_L1 in the last 7 days?
66
67. How much bandwidth am I using in BigQuery?
Use the view moto-monitor:usage.bq_view
67
68. Beyond Queries, we also scan Tables
bigquery.projects.list
List projects visible
bigquery.tables.list
List tables within a dataset
bigquery.datasets.list
List datasets within a project
bigquery.tables.get
Get details about a table
datastore
queries
information
user
email
store
details
68
69. A snapshot of Table statistics is kept as well
moto-monitor:usage.daily_table (daily snapshot)
or moto-monitor:usage.snapshot (latest manual snapshot with self-destruction after 3 days)
CreationTime INTEGER Table Creation Time (Unix)
Description STRING Table Description
Etag STRING NULLABLE
ExpirationTime INTEGER Expiration Time (Unix)
FriendlyName STRING Friendly name
Id STRING Unique Id
LastModifiedTime INTEGER Last Modified Time (Unix)
NumBytes INTEGER Number of bytes
NumRows INTEGER Number of rows
Fields STRING Schema definition
ProjectId STRING Project Id
DatasetId STRING Dataset Id
TableId STRING Table Id
Type STRING TABLE or VIEW
View STRING View Query
User STRING Last user who populated
Query STRING Last query used to populate
JobId STRING Last Job Id to populate
RefreshedTime INTEGER Last time it was populated
SnapshotTime STRING Snapshot timestamp
69
70. How did the size of a dataset grow over time?
70
71. How much space are my tables using?
Bigquery Storing Cost =
$0.02 per GB per month,
i.e. $6.83 per TB per day ,
i.e. $2,458 per TB per year
71
72. How much does my usage of BigQuery cost?
$0.02 per GB per month
$6.83 per TB per day
$2,458 per TB per year
Storage Cost
Query Cost $5 per TB
$20,000 per month
for 5 GB/s unit,
i.e. $1.58 per TB*
On-demand Reserved capacity
* Note: for continuous usage of the 5 GB/s bandwidth
72
73. How much does my usage of BigQuery cost?
Assuming that the Motorola bandwidth is elastic, i.e. we always pay for the
optimal number of units (5 Gb/s), we can use $1.58 per TB as a proxy
Caveat:
API Volume ~ Billing Volume
<> Real Volume Used
73
79. Examples
Data Issue (illustrative)
time (day)
#
Active
Users
Number of Active Users using their camera in US
Possible Root Causes
● Some files don’t get loaded properly
in BigQuery, creating gaps in user
count.
● The instrumentation changed on the
device
● Customer behavior
Business Issue (real life)
#
System
Restarts
Number of System Restarts in Brazil in Oct ‘14
Real life Root Cause
A buggy Android app (Color Notes) doesn’t
handle the timezone change in Brazil
properly, crashing the devices.
79
80. Approach
1. Define a multi-dimensional
cubes with real data. For example:
Day, Product, Market, # Users
taking a picture
2. Each cell becomes then a
time series
3. Clean the data (remove
seasonality, weekday cycle and
any other know perturbation)
* Note: (Bayesian likelihood with knowledge base)
4. Fit trend and establish
volatility band (2 std deviations)
5. Measure variance versus
prediction for each cell (e.g.
market/product/metric) and trigger
an exception if outside band
6. Collect all exceptions into a
matrix and apply fuzzy logic* to
propose potential root causes
(prescriptive analytics)
markets
BR
products
80
82. Exceptions Report POC
Real life example with Moto E in Spain
https://moto-monitor.appspot.com/fcst/trend?market=Spain&product=Moto%20E
Trend: where we should have been
Actuals: where we are
Story
Investigating with the Spain
GTM team, this large increase is
seasonal and due to “The Three
Kings Day” (Día De Los Reyes
Magos) where sales are usually
larger than pre-Holidays.
82
83. Demo Exception Report
Daily Email on Exceptions/Anomalies Online Report & Drilldown Immediate Learning & Findings
Juno Storm impact on Daily Activations
Daily Activation WoW on Jan 27th 2014
83
84. ● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q&A
Agenda
84
85. How to democratize
daily/weekly email
report with an
App Engine solution?
Existing Situation
- Numerous teams use Spreadsheet to
send weekly/daily email
- Enable very agile development of email
body
- Ease of connection to Big Query
- Can’t enable easily customization and
open-rate tracking at a user level
- Can’t leverage advance statistics (R in
GCE)
Self-Service Email System
85
86. Self-Service Email System
Email Widget
HTML Header template
HTML Body template
HTML Footer template
SQL to produce data for Body
gDrive Objects
(image,
attachment)
Underlying
Widgets
AppEngine
86
87. ● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
?
87