Data Science is increasingly being used to build new products in every industry, from Internet companies to physical businesses, and from large enterprise systems to consumer products that we carry in our pockets. The ability to understand the Data Science process is an increasingly important skill for Software Product Managers. What are some of the unique challenges when building a Data Science product? How do we build products that scale if there is an element of experimentation and research? In this seminar, you will learn what it takes to manage a Data Science product, and hear practical tips and examples from our experience at Eureka Analytics. This seminar is brought to you by Eureka Analytics
Unblocking The Main Thread Solving ANRs and Frozen Frames
Â
Eureka Analytics Seminar Series - Product Management for Data Science Products
1. Product Management for
Data Science Products
Aloysius Lim, Director of AI Products, Eureka Analytics
aloysius@eureka.ai
Eureka Seminar Series on
Building Data Science Products at Scale
2 Oct 2018
3. Eureka Seminar Series
Building Data Science Products at Scale
2 Oct Product Management for Data Science Products
15 Oct Technology Choices for Data Science Products
23 Oct Building Targeted Advertising Product
12 Nov Alternative Data and Algorithms for Risk Modelling Product
3 Dec Spatial Temporal Data and Algorithms for Mobility Intelligence
Product
Sign Up
and get updates
on our Meetup page
3
Tentative Schedule
6.30 pm to 8.30 pm
WeWork @ 71 Robinson Road
5. What are Data Science products?
5
https://enterprise.foursquare.com/products/places
6. What are Data Science products?
6
https://www.creditsesame.com/blog/credit/credit-bureau-guide-what-the-differences-are-between-equifax-transunion-experian/
7. What are Data Science products?
7
https://www.spotify.com/us/discoverweekly/
9. What are Data Science products?
9
Data as Product Data Science-Powered
Products & Services
Data Science
Software & Tools
https://www.datarobot.com/product/
http://www.moorinsightsstrategy.com/nvidia-
gpu-cloud-its-not-what-you-may-think-it-is/
10. Every industry is using Data Science in its products and services
Adobe
Airbnb
AlephD
Alibaba
Amazon
Amazon Web Services
AMD
Ant Financial
ASOS.com
Astound
Baidu
Boeing
CareerBuilder
Channel 4 Television
Charlotte-Mecklenburg Police
Department
City of Denver, CO
City of Syracuse, NY
CognitiveScale
Comcast
Comodo Security Solutions
comScore
Cray
Criteo
Didi Chuxing
Dstillery
Dynatrace
Facebook
Flipkart
Fox Chase Cancer Center
Galois
Google
Huawei
IBM
iFLYTEK
Intel
JD.com
KD Consulting
LineZone Data
LinkedIn
Microsoft
Mobike
NEC
NetEase
Netflix
NTT
Nvidia
Oath
Pinterest
Pittsburgh Bureau of Fire
risQ
Roku
S&P
SAS
SciSports
SecretarĂa de Hacienda
Distrital
ShopRunner
Snap
Sutter Health
Symantec
Tableau
Tata Consultancy Services
Tencent
Textkernel
The Globe and Mail
The Lab at DC
Thomson Reuters
Three Bridges Capital
Translational MRI
Two Sigma Investments
Uber
United States Census Bureau
Vatican Secret Archives
Workday
Yahoo
Zhejiang Cainiao Supply Chain
Management
10
Non-academic organizations with papers at KDD 2018, a top data science research conference
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
12. What does “scale” mean?
Organization Scale
Revenue Scale
Customer Scale
From Blitzscaling by Reid Hoffman & Chris Yeh
12
https://hbr.org/2016/04/blitzscaling
13. What makes it challenging to build Data Science products at scale?
A Product Manager’s perspective
Getting to a precise and correct articulation of the
Data Science problem
Designing a solution when feasibility and
performance are not certain, and require further
research
Lack of data, or lack of control over data collection
and quality
Building a product robust to local differences (data,
geopolitical, cultural, behavioral, etc.)
Ensuring privacy of consumers
Preparing for (possibly unknown) future shifts in
data distributions
How can the PM define the correct problem for the
team to solve?
How can the PM plan the roadmap and product
requirements if the solution is not known?
How can the PM find creative solutions to work with
data limitations?
How can the PM identify these issues, and design a
product that works everywhere?
How can the PM design privacy into the product?
How can the PM identify potentially catastrophic
changes in the external environment, and respond in
an agile manner?
13
14. There is no “best” way to do Product Management
But here’s one framework
“A startup is a human institution
designed to deliver a new product or
service under conditions of extreme
uncertainty.”
Extreme Uncertainty: “situations that
cannot be modeled, are not clear-cut,
and where the risk is not necessarily
large – it’s just not yet known.”
– Eric Ries
14
https://mm1.com/about-us/newsroom/publications/poster-lean-startup/
http://www.startuplessonslearned.com/
2010/06/what-is-startup.html
15. How we do Agile
15
Sprint Sprint Sprint Sprint Sprint Sprint Sprint
Planning
Backlog
Planning
Backlog
Retro-
spective
Retro-
spective
Retro-
spective
Retro-
spective
Retro-
spective
Retro-
spective
Artefacts Artefacts Artefacts Artefacts Artefacts ArtefactsContinuous
Integration
Scrum
2-week sprints
Every 2 to 3 months
16. What does Data Science look like in practice?
Example: Microsoft Team Data Science Process
16
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
17. Case Example
Eureka Customer Acquisition
Targeted marketing
using mobile operator (“telco”) data
to generate leads for clients.
17
19. CEO: “Your mission, should you choose to accept it…”
Other business models
• Insights / reports (e.g. Experian
Mosaic)
• Data APIs (e.g. Foursquare)
• B2C products and services (e.g.
Spotify)
• B2B products and services (e.g.
Criteo)
• Machine Learning APIs (e.g.
Azure / AWS / GCP)
• SaaS (e.g. DataRobot)
• Etc.
19
Global
Advertisers
10+ Telcos
in 10
countries
Campaign
channels
Team
Data
Infra-
structure
Direct
Global
Contracts
Cost per lead
Cost per acquired customer
Revenue share to telco
Campaign costs
Dev & ops costs
Acquire
new
customers
Access to 1
billion
mobile
subscribers
Developme
nt
Deploymen
t
Scoring
Campaigns
Eureka Customer
Acquisition
https://strategyzer.com/canvas/business-model-canvas
21. Who are the stakeholders (not only customers),
and what are their needs?
21
Internal Stakeholders
Scaling Challenge
One product
Many advertisers & telcos
Many stakeholders
Many contexts
(language, geopolitical, cultural)
Advertisers
“Help me get high quality leads.”
“I can’t give you data about
existing customers.”
Telcos
“Help me to monetize my data.”
“I have 200m subscribers.”
“Data cannot leave my firewall.”
“I can only give you 1 month of
daily aggregated data.”
“My data comes in this format.”
Investors
“Is your technology
patentable?”
“Is your technology
defensible against
competitors?”
Senior Execs
“We need to demonstrate
revenue in 2 months!”
“We can only hire 10 people.”
“We need to deploy in 10
countries in the next 6 months.”
Team
“I want to publish my
work.”
Regulators
“You must comply with data protection laws!”
“I must approve any data shared with
partners.”
Subscribers
“I only want to receive ads I’m
interested in.”
“Is my privacy being protected?”
Icons from www.flaticon.com
23. What are the Data Science problems?
23
Business Problem:
Find good leads for product X
Data Science Problem:
Create customer segments
for different product
categories
Product X
Select relevant segment
Run campaign
Collect responses /
conversions
Telco data
Segment A
Soccer players
Segment B
Mums, >1 children
Segment C
Watch enthusiasts
Challenges
How to create “good” segments?
• Rules: How do we know they work?
• Supervised: What training data to use?
• Clustering: How to interpret the clusters?
How to measure the quality of the segments?
How to scale to thousands of segments across telcos?
How to select the best segment for each
campaign?
What if no matching segment can be found?
What if the “best” segment responds poorly to
the campaign?
24. How else can we frame the problem?
24
Business Problem:
Find good leads for product X
Data Science Problem:
Find behavioral lookalikes of
existing customers, iteratively
What features are useful and easy to compute?
Product X
Create Seed group
Existing customers
Heuristic rules
Random
Collect responses /
conversions
Behavioral
data features
Mathematically
defined
Run lookalike
model
Run campaign on
Target group
Update&TuneSeedgroup
Telco data
How to select a good seed group?
How to find lookalikes?
How to update / tune seed group?
25. Framing the problem differently
could lead to a very different product!
25
Business Problem:
Find good leads for product X
Data Science Problem:
Create customer segments
for different product
categories
Data Science Problem:
Find behavioral lookalikes of
existing customers, iteratively
Product X
Select relevant segment
Run campaign
Collect responses /
conversions
Telco data
Segment A
Soccer players
Segment B
Mums, >1 children
Segment C
Watch enthusiasts
Product X
Create Seed group
Existing customers
Heuristic rules
Random
Collect responses /
conversions
Behavioral
data features
Mathematically
defined
Run lookalike
model
Run campaign on
Target group
Update&TuneSeedgroup
Telco data
“Customer segmentation engine”
“Behavioral lookalike engine”
26. It is critical to define the Data Science problem well
One more step to the Data Science process
26
Data Science
Problem
Definition
29. Most critical risks to test
Does it work?
Can telco data predict campaign response?
Is lookalike an effective method for campaign targeting?
Does it work in multiple settings?
Different telcos, data sources, cultures
Can it scale?
1m, 10m, 100m subscribers
29
30. Build – Measure – Learn helps to manage extreme uncertainty
Applying Agile in Data Science product development
30
Literature Review
Artefact: Research report
Analysis & Experimentation
Artefact: Vignette or report
MVP Development
Artefacts: Code, tests & docs
We cannot solve all problems at once!
1. Break the problem down
2. Identify most critical hypotheses
3. Plan a research, analysis or experimental task to
test the hypothesis
4. Time-bound the problem
31. Build – Measure – Learn
Conduct literature review
Hypothesis
Mathematically defined behavioral
features from telco data are useful for
identifying campaign targets.
Artefact: Research Report
31
32. Build – Measure – Learn
Conduct literature review
Hypothesis
A scalable and effective algorithm for
lookalike modeling exists that can be
used for campaign scoring.
Artefact: Research Report
32
33. Build – Measure – Learn
Perform analysis
Hypothesis
Feature X (e.g. Radius of Gyration)
can reveal meaningful patterns in
human behavior.
Artefact: Analysis Vignette
33
Correlations
• Radius of Gyration and Total Distance are moderately correlated
• Total Distance and Distinct Count are strongly correlated
• Radius of gyration is more correlated with Distinct Count than to Total Count
Radius of Gyration and Total Distance
• 50% of MSISDN move withn a radius of 24 km,but traveled up to 630km of
total distance within a month, this includs 11% of MSISDN did not move
• 75% of MSISDN move within a radius of 97 km and made up to 3264km total
distance.
• 90% move within 280km, total distance of 9431km, and 95% move within a
radius of 440km and total distance of 15294km.
• There are MSISDN with radius of gyration that is about the whole north-south
distance of the country, 1598km.
• Radius of gyration is computed as follows:
Eureka Feature Engine Run on
CDR Data
Analysts: Ying Li
2018-08-06
Problem Statement
Upon the completion of running Eureka Feature Engine MVP in
telco’s Zeppelin environment, we want to analyze the results to
1. Inspect and verify the engine output for feasibility and
correctness
2. Explore if any insights can be gained for telco or Eureka
3. Demonstrate an analysis artifact through this Vignette
Methodology Description
• Run Scala/SQL/R EDA to understand the data and the
distributions of features produced by Eureka Feature Engine
• Run cumulative distributions
• Run correlation on different features
• Spot any worthy data nuggets
• Spot any suspicious data issues
Summary
• Feature Engine covered the entire
user base, computation looks
correctly directionally
• Meaningful analysis can be
conducted regarding user behavior
• Some data issues were identified to
be further debugged
Call to Actions
1. Seek feedback from commercial
team
2. Build clustering models to
investigate feasibility of
unsupervised learning
3. Debug the data issues
4. Continue develop with richer
feature set
5. Compare ”truth about the data”
(e.g., median radius of gyration”)
against “truth about the world”
(e.g., commute distance for people
in this city)
Cumulative Distribution by Radius of Gyration
Cumulative Distribution by Total Distance
Total Distance Total Count Distinct Count
Radius of Gyration 0.456928567 0.332110891 0.478864808
Total Distance 0.727597715 0.745295476
Total Count 0.772572346
Caveat
• Distance computation was done as point to point distance
between consecutive cell tower locations on the trajectory
• Usage of any insights from this analysis at current state is not
advised as the computation engine is MVP for demonstrating
feasibility and hence not fully tested for accuracy
34. Build – Measure – Learn
Build and test MVP
Hypothesis
Product works in the real world!
Artefact: Code, tests, docs
Offline experimental results
Online (live deployment) results
34
Collect responses /
conversions
Behavioral data
features
Run lookalike
model
Lookalike group
Offline evaluation
Lift = 4.8%
Random group
Online evaluation
Lift = 5.2%
35. Build – Measure – Learn
Build and test MVP
Hypothesis
Product works in multiple telcos and
countries, with different data,
environment and behavior
Artefact: Deployed MVP
Offline experimental results
Online (live deployment) results
35
36. Build – Measure – Learn
36
Literature Review
Artefact: Research report
Analysis & Experimentation
Artefact: Vignette or report
MVP Development
Artefacts: Code, tests & docs
Collect responses /
conversions
Behavioral data
features
Run lookalike
model
Lookalike group
Offline evaluation
Lift = 4.8%
Random group
Online evaluation
Lift = 5.2%
37. Build – Measure – Learn in real settings as soon as possible
“Best model” is not always best!
“If you followed the Prize competition, you might be wondering
what happened with the final Grand Prize ensemble that won the
$1M two years later. This is a truly impressive compilation and
culmination of years of work, blending hundreds of predictive
models to finally cross the finish line. We evaluated some of the
new methods offline but the additional accuracy gains that we
measured did not seem to justify the engineering effort needed
to bring them into a production environment. Also, our focus on
improving Netflix personalization had shifted to the next level by
then.”
37
https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429
40. What does this statement mean?
“Our model can achieve 70% accuracy.”
Is this good? How can we tell?
40
41. It depends on the application
“Our model can achieve 70% accuracy.”
41
Disease detection rate
Not good, possibly catastrophic
Ad click through rate
Astounding!
Too good to be true
42. It depends on what one means by “accuracy”
“Our model can achieve 70% accuracy.”
Definition of accuracy
% of correct predictions in all classes
42
43. It depends on what one means by “accuracy”
43
“Our model can achieve 70% accuracy.”
Sometimes, people mean Precision
True Positives / predicted Positives
“70% of people predicted to like
soccer actually like soccer.”
Sometimes, people mean Recall
True Positives / actual Positives
“Our model can detect 70% of soccer
lovers.”
Sometimes, they mean
something else altogether!
44. What to measure?
OFFLINE
Model performance
Accuracy, precision, recall, AUROC, Gini, lift, etc.
Privacy
k-anonymity, differential privacy
Compute performance
Compute time, memory / CPU utilization, storage
ONLINE
Model performance
Privacy
Compute performance
Response behavior
Click through rate, API utilization
Business outcomes
Revenue, risk reduction, cost savings
44
45. You build what you measure
45
If the wrong measurement is used
The wrong thing will be built in the
next iteration
46. What is the right performance metric for campaigns?
One possibility:
Area Under ROC Curve (AUROC)
• Captures overall classification
performance
• 1 is best, 0.5 is random, 0 is worst
• Easy to compare different models
46
TruePositiveRate
False Positive Rate
This model has the
bigger AUROC
47. What is the right performance metric for campaigns?
But when we run a campaign, we
really only care about the True
Positive Rate for the users with the
highest scores
So a better measure would be
Lift@n%
E.g. Lift@5% if we want to target top
5% of targets
47
TruePositiveRate
False Positive Rate
This model has the
higher lift@5%
48. The real test is when we measure performance online
48
Offline (modeling phase)
Estimated Lift@1%: 10.2X
Target size: 100,000
Lookalike: 40,000 Random: 30,000 Heuristic rules: 30,000
CTR: 3.82% CTR: 1.95% CTR: 1.58%
Actual Lift: 2.0X Actual Lift: 0.8X
Actual results from one of
our campaigns using
lookalike modeling
50. Superpowers for a Data Science Product Manager
50
Data Science
Intuition for data
Statistics
Machine learning
Data quality
Computer Science
Algorithms
Complexity (Big O)
Distributed Computing
Mode of Working
Hands on, “feel” the data
Detail orientation
Logical, critical thinking
Not afraid to be critical
about one’s own
assumptions
Product
Management
51. Our team structure: Scrum teams with all the skills required to
research, develop and ship a User Story
51
Cross-functional team allows diverse
perspectives to be considered from the
get-go
Algo performance vs engineering cost
Tech choices
Pipeline design
Etc.
Team members are empowered and
required to work across the spectrum of
tasks from data science to engineering
“Know all the basics,
but be expert in a few areas”
Data Scientists write unit tests
Engineers read research papers
You vs Me
Recipe for a Data Science product team
(adjust as needed)
1 PM 2 Data Scientists 3 Engineers
PM
Data
Scientist
Data
Scientist
Engineer
Engineer
Engineer
52. Key takeaways
Data Science is being used in more products and services, in all industries
Build – Measure – Learn to work with extreme uncertainty, unknown unknowns
Get the right framing of the Data Science problem, and measurement of results
Test with customers
52