This document discusses segmenting social game players to minimize those who become inactive. It proposes using machine learning techniques on user data to classify players into risk categories (low, medium, high) for becoming inactive. For high-risk players, targeted actions like emails could bring them back online. Analyzing what variables predict inactivity could also improve game design, like removing distracting elements found to reduce long-term play.
2. I see the problem of segmentation of customers in social
games (as Farmville) very similar to the churn analysis,
covering TELCO industries:
companies need to deal with customers
who end their relationship
3. Between the two areas, there are substantial differences:
• TELCO are interested in tracking the transition to another
competitor;
• for social games (I guess) we are “simply” interested in
players who leave the platform (usually because bored) and
do not return.
We can therefore adapt the Machine Learning techniques,
already used in churn analysis, to handle this problem.
4. So, first of all we identify two types of users:
1. active players;
2. dead players.
I define the dead player as the user that:
• creates a profile ;
• plays for at least one session;
• (possibly) plays in other sessions;
• is away from the platform for at least a time interval of X.
I’d like to a look at data before defining X, then deciding a
value which is useful for the business (perhaps in collaboration
with the marketing team). In this case let’s say X = 8 days.
5. The goal of this segmentation is, therefore, to minimize the
number of players passing from state 1 to state 2 (each new
customer is necessarily in the state 1).
This is my approach:
• organize the data (details below);
• develop a Stat Model (details below);
• define risk classes, let say 3:
A. low risk: prob( state=2 ) in [0, 0.33]
B. medium risk: prob(state=2) in [0.34, 0.80]
C. high risk: prob(state=2) in [0.81, 1.00]
• assign each gamers to a segment A, B or C;
• take an action for users in C (details below);
• take an action to improve the game design (details below);
6. Organize the data
Useful data should be these:
• Generic user data
• Time between each session
• Game time without any actions
• Number of game sessions without particular actions
• Interactions with neighboring (if provided by the game)
• Number of invites sent to others users
• Etc. …
7. Developing the Stat Model
The statistical model is developed using the historical data collected in
the previous sessions. So, looking at the users that are away for more
than 8 days, we define our training dataset with all the data available for
these gamers.
We obviously distinguish gamers that entered in the platform without
looking at the game. In this case we don't have enough data, we
probably should use only information provided by the user during the
registration.
We use the classical approach of Machine Learning: cross-validation.
Then, using the data we have, we develop a model for predicting the
probability of A, B, C. For example, a CART or a logit model.
By using users in C we can:
- take an action to bring back the users in the segment A o B;
- analyze the variables to understand what's going wrong in the game.
See below for details.
8. Taking an action for users in C
I'm considering two situations:
1. we are using the model in a “offline way” ;
2. we are using the model in real time.
In the first case, we try to bring back the users in the
segment A o B with usual marketing approach. For
example, we send him an email with “presents”.
In the second case, using the information on the current
session (in real time), the platform suggests the events
that should happen in the game. These matters should
be decided with the IT team and with game developers.
9. Take an action to improve the game design
It's possible to improve the game design using the
results of the developed model.
According an "old" statistical approach (not necessarily
outdated!), we could try to interpret the variables
making the model significant.
But following the modern Data Mining paradigm, our
model is usually developed as "black box": we don't
know how it works, we are only interested in estimating
(minimizing) the Error Rate of our Prediction Rules!
10. In such a situation, a second analysis is necessary.
Using the subset of data falling in the segment C, we can then
analyze all variables. We can therefore estimating a new
model, based on the classic approach, finalized to understand
how much each variable increases the probability of becoming
a dead gamers!
Farmville Example:
We could discover that after the level 13, players using
decorations become deads with high probability. We could
suppose that decorations (that is, the boolean variable “the
player uses decorations”) can divert users from the
real nature of the game. Games developers, therefore, could
work to remove decorations or change their use.