3. calculation | consulting data science leadership
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry
Over 10 years experience in applied Machine Learning
Developed ML algos for Demand Media; the ďŹrst $1B IPO since Google
Lean Start Ups: Aardvark (acquired by Google), eHow, Mode
Wall Street: BlackRock, GLG
Fortune 500: Big Pharma, Telecom, eBay
www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
4. BackStory: in 2011, Search Changed. Forever.
⢠ďŹrst $1B IPO since Google
⢠Machine Learning based SEO algorithms
⢠Measure the demand for search, and fulďŹll it
data science algorithms created a billion $ company
c|c
(TM)
(TM)
Demand Media
calculation | consulting data science leadership(TM)
4
eHow.com
5. BackStory: in 2011, Search Changed. Forever.
⢠Google adapted (Panda)
⢠Lack of diversiďŹcation
⢠Lack of adaptation
⢠Stock price never recovered
algorithmic accountability: DMD or Google?
c|c
(TM)
IPO
Panda
stock price 2011-2012
(TM)
calculation | consulting data science leadership
DMD
(TM)
5
6. ⢠ďŹrst $1B collapse due to Panda ?
⢠CPC revenues down
⢠premium online publishers died
collapse
?
stock price 2011-2012
c|c
(TM)
$1B in ad revenue was repriced and reallocated
Problem: Cornering the market on
search induced a market crash
calculation | consulting data science leadership(TM)
6
8. Data Science is Different
c|c
(TM)
Davenport
calculation | consulting data science leadership
Generating sustainable revenue requires
Data Science Leadership and Execution
(TM)
8
âCompanies need a Spock in the boardroomâ
9. Data Science is Different
c|c
(TM)
Davenport
calculation | consulting data science leadership
Generating sustainable revenue requires
Data Science Leadership and Execution
(TM)
9
http://www.theonion.com/articles/national-science-foundation-science-hard,1405/
10. Problem: Data Scientists are Different
c|c
(TM)
Davenport
calculation | consulting data science leadership(TM)
10
not all techies are the same
11. Problem: Data Scientists are Different
c|c
(TM)
Davenport
calculation | consulting data science leadership
theoretical physics
machine learning specialist
(TM)
11
experimental physics
data scientist
engineer
software, browser tech, dev ops, âŚ
not all techies are the same
12. Problem: Data Scientists are Different
c|c
(TM)
Davenport
calculation | consulting data science leadership(TM)
12
not all techies are the same
13. Managing: Data Science Process
⢠Acquire Domain Knowledge
⢠Formulate Hypothesis
⢠Generate Model(s) from the Data
⢠Predict Revenue Gains
⢠Backtest Predictions on your Data
⢠A/B Test in Production
⢠Attribute Gains to Model(s)
c|c
(TM)
(TM)
acting
solving
framing
calculation | consulting data science leadership
13
14. Managing: Data Science Process
c|c
(TM)
(TM)
calculation | consulting data science leadership
14
15. c|c
(TM)
⢠Systems Thinking: leveraging the inter-relationships
between data, marketing, and the customer
⢠Knowledge Transfer: mentoring â not training â to
develop both personal mastery and team learning
⢠Mental Models: create a base of small-scale models for
thinking about how to use your data
⢠Knowledge Sharing: foster collaboration between
research, engineering, and product to drive revenue
Managing: Learning from Data
calculation | consulting data science leadership(TM)
15
16. c|c
(TM)
⢠Cross-functional engineering, product, marketing, ďŹnance
⢠Autonomous: separate from the traditional engineering
product lifecycle. self-organizing and self-managing
⢠Experimental: form hypothesis, analyze data, make
predictions, run backtests, A/B testing
⢠Self-sustaining: not a cost center; generates revenue
(TM)
Data Science is Different
calculation | consulting data science leadership
16
17. Solution: Collecting and Organizing Data
(TM)
c|c
(TM)
⢠Most companies are struggling organizing their data
⢠Data needs to be examined
⢠Donât assume data is correct or useful
⢠More is More: simple algos work
⢠More is Less: noise is noise
Data not examined is not collected
calculation | consulting data science leadership
17
18. Solutions: Hadoop and Big Data
(TM)
c|c
(TM)
⢠Hadoop is an internal data ecosystem
⢠Hadoop appears to have won the adoption wars ?
⢠Hadoop : 90% deployments internal
⢠Hadoop is a cost center
⢠ROI needs cut across business divisions
Algorithms, not data, generate revenue
calculation | consulting data science leadership
18
19. Solutions: Cloud
(TM)
c|c
(TM)
⢠Startups donât need infrastructure
⢠long term Data Storage is virtually free
⢠Amazon Redshift
⢠Google Big Query
⢠Cloud is the future ?
Algorithms, not data, generate revenue
calculation | consulting data science leadership
19
20. Solutions: Spark
(TM)
c|c
(TM)
⢠Next Gen Platform for Machine Learning
⢠Sits on Hadoop or the Cloud
⢠Still very high touch
⢠Limited algos
Algorithms, not data, generate revenue
calculation | consulting data science leadership
20
21. Problem: Measurements
(TM)
c|c
(TM)
good experiments are amazing
calculation | consulting data science leadership
21
âIf you canât measure it, you canât ďŹx it.â
DJ Patil,White House Chief Data Scientist
22. Data Scienceâs Measurement Problem
(TM)
c|c
(TM)
good experiments are hard to design
calculation | consulting data science leadership
22
http://www.forbes.com/sites/lizryan/2014/02/10/if-you-cant-measure-it-you-cant-manage-it-is-bs/
23. Data Scienceâs Measurement Problem
(TM)
c|c
(TM)
good experiments are hard to design
calculation | consulting data science leadership
23
âData science has a measurement problem.
Simple metrics may not address complex situations.
But complex metrics present myriad problems.â
âAs we strive for better algorithms,
we often fail to think critically about what it means
for predictions to be âgoodââ
http://www.kdnuggets.com/2015/03/data-science-measurement-problem-accuracy-auroc-f1.html
24. Data Scienceâs Measurement Problem
(TM)
c|c
(TM)
good experiments are hard to design
calculation | consulting data science leadership
24
âBuffett found it 'extraordinary' that academics studied such things.
They studied what was measurable, rather than what was meaningful.â
⌠to a man with a hammer,
everything looks like a nail.â
â Roger Lowenstein, Buffett:
The Making of an American Capitalist
25. c|c
(TM)
(TM)
Problem: The Cult of the Algorithm
calculation | consulting data science leadership
25
what can algos actually do ?
âWe have a new machine learning algo that anticipate
your needs over time and behave accordinglyâ
26. c|c
(TM)
(TM)
Problem: What can Machine Learning Do?
calculation | consulting data science leadership
26
what can algos actually do ?
27. Demand Algos: Gas Station Analogy
Problem: where to open a gas station ?
Need: good trafďŹc, weak competition
c|c
(TM)
less competitors
no trafďŹc
sweet spot
great trafďŹc
too many competitors
calculation | consulting data science leadership
all businesses balance supply and demand
(TM)
27
28. SAAS Machine Learning Algos
c|c
(TM)
calculation | consulting data science leadership
(TM)
28
$100,000 ⢠167 teams
Diabetic Retinopathy Detection
$15,000 ⢠341 teams
March Machine Learning Mania 2015
machine learning contests
32. c|c
(TM)
(TM)
Problem: Externalities
calculation | consulting data science leadership
32
âZynga is our best company ever!â (2010)
John Doerr, Google Investor, LegendaryVC
http://venturebeat.com/2010/11/16/google-investor-john-doerr-zynga-is-our-best-company-ever/
one marketplace | big risks
33. c|c
(TM)
(TM)
Solution: Algorithmic Accountability
calculation | consulting data science leadership
An asset is an economic resource.
Anything tangible or intangible that is capable of
being owned or controlled to produce value and
that is held to have positive economic value is
considered an asset.
algorithms can be valuable assets
33
34. c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
34
does revenue depends on hidden algos ?
⢠WebMD Google SEO
⢠Amazon Product Listing Algo
⢠Pinterest Relevance Algo
⢠Twitter Spam ďŹlter
⢠Apple App Store Rankings
35. c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
35
do decisions depend on hidden factors ?
A 'Crisis' in Online Ads: One-Third of TrafďŹc Is Bogus
http://www.wsj.com/articles/SB10001424052702304026304579453253860786362
Now Algorithms Are DecidingWhomTo HireâŚ
http://www.npr.org/blogs/alltechconsidered/2015/03/23/394827451/now-algorithms-are-deciding-whom-to-hire-based-on-voice
What you donât know about Internet algorithms is hurting youâŚ
http://www.washingtonpost.com/news/the-intersect/wp/2015/03/23/what-you-dont-know-about-internet-algorithms-is-hurting-you-and-you-probably-dont-know-very-much/
36. c|c
(TM)
(TM)
Solution: Algorithmic Transparency
calculation | consulting data science leadership
36
can you be transparent and not be gamed ?
http://fortune.com/2015/03/18/how-do-you-govern-a-hidden-ďŹuid-and-amoral-algorithm/
83% of the participants in the study changed their behavior
once they knew about the algorithm
How do you govern a (hidden, ďŹuid and amoral) algorithm?
participants mistakenly believed that their friends intentionally
chose not to show them stories
37. c|c
(TM)
(TM)
Algorithmic Accountability
calculation | consulting data science leadership
Do you depend on some elseâs marketplace?
How does your revenue depend on algos?
Do you need an internal algo ?
Who will manage it? build it? maintain it?
algorithms have unforeseen liabilities
37