Customer lifetime Value/Revenue(LTV/R) is the present value of the future profits/revenue from a customer. Estimating it, is important for businesses to optimise the marketing costs in acquiring and retaining the customers. Complex consumer behaviour and innumerable ways a consumer interacts with the business makes things challenging to estimate it. Years of ongoing research in this field has led to the development of various ML tools and techniques. We would like to take this opportunity to walkthrough some of these techniques and their applications in specific business contexts.
Condé Nast is a global media company that produces some of the world’s leading print, digital, video and social brands. These include Vogue, GQ, The New Yorker, Vanity Fair, Wired and Architectural Digest (AD), Condé Nast Traveler and La Cucina Italiana, among others. Subscription revenue is one of the major revenue streams for the organization and we’d like to demonstrate the implementation of LTV/R model for the subscription revenue for one of the brands using survival models and along with that illustrate the following.
Estimate the average lifetime (ALT) of a brand’s subscriber.
Estimate the average lifetime of various segments within the brand and identify the most valuable/least valuable segments, so marketing teams could device appropriate targeting strategies.
Finally attempt to estimate the lifetime at a subscriber level.
Key insights & findings through the analysis.
Demo of sample code
Leveraging databricks delta files for our big data processing needs.
2. Agenda
▪ About the company
▪ LTV/R overview
▪ Brand & Segment level insights
▪ Key business use cases
▪ Survival models introduction
▪ Demo of notebook
▪ Data processing in delta lake
3.
4. A media company for the future
Condé Nast is a global media house with
over a century of distinguished publishing
history with a portfolio of iconic brands.
11. Brand & Segment level insights
*ALT - Average Lifetime in years
LTR - Lifetime Revenue
TNY - The New Yorker
ALT* (LTR)
Population Online Eng.
TNY Bundle 7.6 ($880) 9.2 ($1,007)
TNY Digital 10.9 ($753) -
Observations
• ALT of digital subscriber is 46% greater than that of bundle subscriber
• Subscribers with Online Engagement tend to have longer lifetimes
Population - All TNY Subscribers
Online Eng. - TNY Subscribers with online engagement
Bundle - Includes Print with digital access
12. Brand & Segment level insights
Age ALT Income ALT RFM ALT
*ALT - Average Lifetime
Population - All TNY Subscribers
Bundle - Includes Print with digital access
RFM, in this context, stands for Recency, Frequency and Magnitude
● Recency - When did a subscriber last visit the website?
● Frequency - How frequent did a subscriber visit the website?
● Magnitude - How many articles did a subscriber read on the website?
13. Brand & Segment level insights
Observations
• Subscribers engaging with a newsletter tend to have ~75% longer lifetimes
• Subscribers engaging through mobiles have the shorter lifetimes
• Whereas, Subscribers using all three device types have longer lifetimes
14. Business use cases
▪ Influence marketing
spends
▪ Determining the
efficacy of paid social
media
▪ Personalised pricing
plan
• Health metric for
engagement
• Test LTR lift analysis for
data product efficacy
• Evaluate NL
recommendations efficacy
• Evaluate site
recommendations efficacy
Subscriber activation &
retention KPI
Optimise subscriber
acquisition costs
• Estimate ARPU of
subscriber from other
brands, ad revenue,
ecommerce
Unified customer analytics
KPI
17. What are Survival Models?
• Prediction of Time to event
• What is the expected time before an event occurs?
• What is the probability of surviving at any given point of time?
• What is the general behaviour of the subject?
• Fails too quick? Or Lasts longer?
• How does various conditions affect the survival probability of the subjects?
• Applications
• Reliability of Machinery - Machinery Failure
• Lifetime of the customers - Churn
• Efficacy of therapy - Death, Next Heart Attack, etc.,
Average Lifetime (in years) = Area Under the Survival Curve
ALT
18. Censoring Types
Start End
Time of observation
Uncensored
Left Censored
Right Censored
Interval Censored
?
?
?
“Censoring is a condition in which the value of
a measurement or observation is only partially
known.”
Interval Censoring: It can occur when
observing a value requires follow-ups or
inspections. The value of measurement is
somewhere on an interval between two values.
19. Types of Survival Models
▪ No parameters are estimated →
Empirical Formulae
▪ E.g., Kaplan-meier,
Nelson-Aalen,
Breslow-Fleming-Harrington,
etc.,
• Assumes a distribution →
Parameter estimation
techniques
• parameters can either be the
characteristics that define the
assumed distribution or
coefficients of the covariates
• E.g., Weibull, Exponential,
Lognormal, Accelerated Failure
Time (AFT), etc.,
Parametric
Non-Parametric
• Assumes a distribution but part
parametric, part
non-parametric
• parameters are the coefficients
of the covariates
• E.g., Cox Proportional Hazards
Semi Parametric
22. Data processing in Delta lake
Subscriber Data
(in the scale of a
Million per brand)
Format - ORC
Aggregated data with demographics
and various engagement features at
subscriber level
(in the scale of a Million per brand)
Used lifelines for modelling
(lifelines is an open source
library for survival analysis
authored by Cameron
Davidson-Pilon)
Customer Engagement Data
(in the scale of 1 Million
Pageviews per day per brand)
~2 Billion Records per brand
Conversion to a
pandas dataframe
23. Data processing in Delta lake
• Reasons why delta lake is preferred -
• Big data
• Dealing with more than 2 billion records of engagement data
• Running multiple queries to aggregate data at the required level with necessary features
• Lower query execution time
• Tightly integrated into the unified analytics framework
• Aligns with the future MLOps design using MLFlow
• Data processing with delta lake is easier
• Easier integration with ML libraries