SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Growth accounting &
Time-based data structures
How to get re-usable datasets (1)
Bertil Hatt
Data science @ RentalCars, Booking.com
Previously on MancML…
Three structures
1. Separating growth in In vs. out
2. Maturity level of departures
3. Retention losenge
1. Unemployment US vs. France
2. How to fix a casual video game
3. Great startup vs. bonfire
Three stories
No sophisticated models
How to structure data
1. Accounting for growth
Separating In vs. Out
Similar unemployment pattern
0
1
2
3
4
5
6
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Unemployment US
Unemployment (M)
0
1
2
3
4
5
6
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Unemployment France
Unemployment (M)
Numbers are made-up; for real ones, go check Labor Economics, The MIT Press 2004 Pierre Cahuc, André Zylberberg
Very different issues
0
1
2
3
4
5
6
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Unemployment US
Unemployment (M) Lost job Found job
0
1
2
3
4
5
6
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Unemployment France
Unemployment (M) Lost job Found job
Numbers are made-up; for real ones, go check Labor Economics, The MIT Press 2004 Pierre Cahuc, André Zylberberg
How to build a detailed reference table
Period (day,
week, month)
User
ID
Present or
Active this
period
Present or
Active last
period
Last active
(period)
Status
2018-01-01 12345 TRUE NULL NULL New
2018-01-08 12345 TRUE TRUE 2018-01-01 Active
2018-01-15 12345 FALSE TRUE 2018-01-08 Lapsed
2018-01-22 12345 FALSE FALSE 2018-01-08 Lost
2018-01-29 12345 TRUE FALSE 2018-01-08 Re-activated
…
SELECT … AS period, id, CASE WHEN… LAG(…) OVER MAX(…) OVER CASE WHEN…
GROUP BY period, id
w AS WINDOW…
How to build an aggregated reference table
Period (day,
week, month)
Status Count
Last
active
Source
Last
action
2018-01-01 New 17 854
2018-01-08 Active 78 442
2018-01-15 Lapsed 12 325
2018-01-22 Lost 10 548
2018-01-29 Re-activated 2 428
SELECT … AS period, status, COUNT()
GROUP BY period, status
2. Maturity levels of departures
When are people leaving
Distinct user status allow better insight
0
1
2
3
4
5
6
Players
Daily active
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
120.0%
Players funnel
Lost Active
Numbers are made-up; they look nothing like a project I worked on.
0
1
2
3
4
5
6
Players
Daily active New Active Lost
3.Seniority triangle &
Retention lozenge
How to represent users’ experience
Cohort
• a group of people with a shared characteristic (Cambridge Eng. Dict.)
• a group of people who did something all during the same period (Me)
• Don’t focus exclusively on registration: first order, or third, re-activation, etc.
Triangle of user experience
Timeofthefirstactionorregistration
Cohort
Promotion
NowTime of the action
Retention Losange
Time of the action
Timeofthefirstactionorregistration
Too recently
acquired
Now
8 weeks
After
8 weeks
Avoid retention bias
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
Cohort retention
Last week 8th week
More considerations
• Arbitrary thresholds
• Simple, imperfect, memorable
• Communicate: catchy names
• Alex Schultz, VP Growth Facebook
• More time-like metrics
• Activity totals vs. Behaviour step
• Time spent vs. since registration
• Demographic age vs. seniority
• Experience on wider platform
• Friends’ experience levels
MancML Growth accounting

Weitere ähnliche Inhalte

Ähnlich wie MancML Growth accounting

Data in Support of Live Operations : Real-Life Examples | Julien Alphonse
Data in Support of Live Operations : Real-Life Examples | Julien AlphonseData in Support of Live Operations : Real-Life Examples | Julien Alphonse
Data in Support of Live Operations : Real-Life Examples | Julien AlphonseJessica Tams
 
Managing your black Friday logs - CloudConf.IT
Managing your black Friday logs - CloudConf.ITManaging your black Friday logs - CloudConf.IT
Managing your black Friday logs - CloudConf.ITDavid Pilato
 
Big Data, Big Disappointment
Big Data, Big DisappointmentBig Data, Big Disappointment
Big Data, Big DisappointmentJesus Ramos
 
Process mapping session final-Lean Six Sigma
Process mapping session final-Lean Six SigmaProcess mapping session final-Lean Six Sigma
Process mapping session final-Lean Six SigmaHaris Naved Ahmed
 
MBA 640 Final Project Investment Analysis Report
MBA 640 Final Project Investment Analysis Report MBA 640 Final Project Investment Analysis Report
MBA 640 Final Project Investment Analysis Report Kelly Giambra
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data ScientistDATAVERSITY
 
Open power meetup at h2o 20180325 v3
Open power meetup at h2o 20180325 v3Open power meetup at h2o 20180325 v3
Open power meetup at h2o 20180325 v3ISSIP
 
Aristocrat business strategy analysis
Aristocrat business strategy analysisAristocrat business strategy analysis
Aristocrat business strategy analysisCharlie Chen
 
Five Hot Trends for 2018
Five Hot Trends for 2018Five Hot Trends for 2018
Five Hot Trends for 2018ibi
 

Ähnlich wie MancML Growth accounting (13)

Data in Support of Live Operations : Real-Life Examples | Julien Alphonse
Data in Support of Live Operations : Real-Life Examples | Julien AlphonseData in Support of Live Operations : Real-Life Examples | Julien Alphonse
Data in Support of Live Operations : Real-Life Examples | Julien Alphonse
 
Managing your black Friday logs - CloudConf.IT
Managing your black Friday logs - CloudConf.ITManaging your black Friday logs - CloudConf.IT
Managing your black Friday logs - CloudConf.IT
 
Big Data, Big Disappointment
Big Data, Big DisappointmentBig Data, Big Disappointment
Big Data, Big Disappointment
 
Process mapping session final-Lean Six Sigma
Process mapping session final-Lean Six SigmaProcess mapping session final-Lean Six Sigma
Process mapping session final-Lean Six Sigma
 
MBA 640 Final Project Investment Analysis Report
MBA 640 Final Project Investment Analysis Report MBA 640 Final Project Investment Analysis Report
MBA 640 Final Project Investment Analysis Report
 
Data Exploration & BI
Data Exploration & BIData Exploration & BI
Data Exploration & BI
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
Data-Stream-Analytics.pptx
Data-Stream-Analytics.pptxData-Stream-Analytics.pptx
Data-Stream-Analytics.pptx
 
Open power meetup at h2o 20180325 v3
Open power meetup at h2o 20180325 v3Open power meetup at h2o 20180325 v3
Open power meetup at h2o 20180325 v3
 
Aristocrat business strategy analysis
Aristocrat business strategy analysisAristocrat business strategy analysis
Aristocrat business strategy analysis
 
Gas16.ppt
Gas16.pptGas16.ppt
Gas16.ppt
 
Gas16.ppt
Gas16.pptGas16.ppt
Gas16.ppt
 
Five Hot Trends for 2018
Five Hot Trends for 2018Five Hot Trends for 2018
Five Hot Trends for 2018
 

Mehr von Bertil Hatt

Five finger audit
Five finger auditFive finger audit
Five finger auditBertil Hatt
 
Are you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testBertil Hatt
 
Prediction machines
Prediction machinesPrediction machines
Prediction machinesBertil Hatt
 
Garbage in, garbage out
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage outBertil Hatt
 
What to do to get started with AI
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AIBertil Hatt
 

Mehr von Bertil Hatt (6)

Five finger audit
Five finger auditFive finger audit
Five finger audit
 
AlexNet
AlexNetAlexNet
AlexNet
 
Are you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point testAre you ready for Data science? A 12 point test
Are you ready for Data science? A 12 point test
 
Prediction machines
Prediction machinesPrediction machines
Prediction machines
 
Garbage in, garbage out
Garbage in, garbage outGarbage in, garbage out
Garbage in, garbage out
 
What to do to get started with AI
What to do to get started with AIWhat to do to get started with AI
What to do to get started with AI
 

Kürzlich hochgeladen

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 

Kürzlich hochgeladen (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 

MancML Growth accounting

  • 1. Growth accounting & Time-based data structures How to get re-usable datasets (1) Bertil Hatt Data science @ RentalCars, Booking.com
  • 2.
  • 4. Three structures 1. Separating growth in In vs. out 2. Maturity level of departures 3. Retention losenge 1. Unemployment US vs. France 2. How to fix a casual video game 3. Great startup vs. bonfire Three stories No sophisticated models How to structure data
  • 5. 1. Accounting for growth Separating In vs. Out
  • 6. Similar unemployment pattern 0 1 2 3 4 5 6 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Unemployment US Unemployment (M) 0 1 2 3 4 5 6 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Unemployment France Unemployment (M) Numbers are made-up; for real ones, go check Labor Economics, The MIT Press 2004 Pierre Cahuc, André Zylberberg
  • 7. Very different issues 0 1 2 3 4 5 6 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Unemployment US Unemployment (M) Lost job Found job 0 1 2 3 4 5 6 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Unemployment France Unemployment (M) Lost job Found job Numbers are made-up; for real ones, go check Labor Economics, The MIT Press 2004 Pierre Cahuc, André Zylberberg
  • 8. How to build a detailed reference table Period (day, week, month) User ID Present or Active this period Present or Active last period Last active (period) Status 2018-01-01 12345 TRUE NULL NULL New 2018-01-08 12345 TRUE TRUE 2018-01-01 Active 2018-01-15 12345 FALSE TRUE 2018-01-08 Lapsed 2018-01-22 12345 FALSE FALSE 2018-01-08 Lost 2018-01-29 12345 TRUE FALSE 2018-01-08 Re-activated … SELECT … AS period, id, CASE WHEN… LAG(…) OVER MAX(…) OVER CASE WHEN… GROUP BY period, id w AS WINDOW…
  • 9. How to build an aggregated reference table Period (day, week, month) Status Count Last active Source Last action 2018-01-01 New 17 854 2018-01-08 Active 78 442 2018-01-15 Lapsed 12 325 2018-01-22 Lost 10 548 2018-01-29 Re-activated 2 428 SELECT … AS period, status, COUNT() GROUP BY period, status
  • 10. 2. Maturity levels of departures When are people leaving
  • 11. Distinct user status allow better insight 0 1 2 3 4 5 6 Players Daily active 0.0% 20.0% 40.0% 60.0% 80.0% 100.0% 120.0% Players funnel Lost Active Numbers are made-up; they look nothing like a project I worked on. 0 1 2 3 4 5 6 Players Daily active New Active Lost
  • 12. 3.Seniority triangle & Retention lozenge How to represent users’ experience
  • 13. Cohort • a group of people with a shared characteristic (Cambridge Eng. Dict.) • a group of people who did something all during the same period (Me) • Don’t focus exclusively on registration: first order, or third, re-activation, etc.
  • 14. Triangle of user experience Timeofthefirstactionorregistration Cohort Promotion NowTime of the action
  • 15. Retention Losange Time of the action Timeofthefirstactionorregistration Too recently acquired Now 8 weeks After 8 weeks
  • 17. More considerations • Arbitrary thresholds • Simple, imperfect, memorable • Communicate: catchy names • Alex Schultz, VP Growth Facebook • More time-like metrics • Activity totals vs. Behaviour step • Time spent vs. since registration • Demographic age vs. seniority • Experience on wider platform • Friends’ experience levels

Hinweis der Redaktion

  1. This talk is probably part of a series I started suggesting a list of 12 questions to ask How mature a data organization is? This is a small step on how You can be a bit more systematic in handling your data
  2. Long way to say I’m old and cranky Small change since last time
  3. Last presentation I talked about 12 things that Needed to be there to make the small core part of data science Mainly, it’s a good ETL process and good habits around it A lot of that is good engineering and addressing analyst frustrations One of the most impactful part is reusable data structure This is something that fewer people in your organization are likely to ask Because it’s a less glaring pain point But enforcing consistent concept is important
  4. No sophisticated models How to structure data I’m more than happy to talk about convoluted models counter-intuitive corrections
  5. Learned last week Facebook self-credited Growth-accounting This, I learned during my Master’s Before Facebook founded Well. One of my Master’s
  6. Let’s have fun! Let’s talk about unemployment!
  7. You see, more jobs are created that destroyed When unemployment goes down. Same thing in both economies. But how you get there is not the same What is essential is that all things add up exactly
  8. The fact that a user is considered lapsed or lost after X or Y period is rather arbitrary Try to pick a number that separates well: few people transition from one to the other Once you have your Period and your status You can…
  9. Once you have your Period and your status You can… You can also draw transition graph You can and you should add a lot of things to that group by: you should have detailed totals by as many dimensions as you can think is relevant —————— Do you have any questions so far? Does this make sense to you? Do you see the applications to Data science?
  10. Now we have a relevant distinction in our population We can now train models trying to predict it! we can compare leavers with non-leaving customers With the same maturity or experience Keep in mind: any departure is temporary All cut-off is arbitrary That doesn’t matter What matters is that everyone is accounted for The key feature so far is how status add up to the active customers.
  11. Would you invest in a company with that kind of growth? Now, let’s apply the framework Of course, you know about funnels & retention and you would have caught that But even the funnel can be different with that insight: big drops might be relevant, but not all obstacles are hostile Good challenge Do not confuse progress and retained
  12. The wider idea in this talk is: Time is a key dimension in any product experience Cohorts are a great abstraction You need to come with more vocabulary around Two geometric arguments that explain a non-trivial concept
  13. The key notion in those two part, And a word that I have oddly not used so far has been a cohort What is a cohort? What is the widest, most simple definition of a cohort
  14. Let’s represent every action that your users on a two-dimensional plan X, or abscissa, is the time of the action Y, or ordinates, is the time that user registered or first did something In this corner, you can’t have any action because that would involve time travel Or some sort of weird CGI. Don’t go there. Or rather, do do there: count how many actions you have in there That’s a great thing to check. If there are, you know you have a problem This graph is actually used commonly to represent retention, Using colour maps
  15. This triangle becomes very useful when you are studying retention Same brown time travelers, still empty, hopefully If you want to know if people are retained after, say, eight weeks, you should exclude recent joiners And most people remember to do that. What people often overlook is to exclude activity * that we know about * after eight weeks of stay That losenge is what you should be looking at
  16. Once again, you are probably better off looking at a detailed colour maps But if you are trying to model retention, this is important. Let's assume you want to know if People who joined on week 1 are st And you should be making sure that finding the right number to model is easier to find. That’s the big lesson here: * Make the right metric easier to find *
  17. Once again, Not directly related to machine learning but getting those right is essential to building a relevant model Two things I would like to just say a word about First is: all those thresholds are arbitrary The right unit might not be calendar time Think about alternative way of counting Time in your service & calendar time since - Or think beyond what you see: wider platform, relations
  18. Do you have questions? Or would you rather have me give the floor to someone smarter than me?