IEEE Happiness an inside job asoman 2017

Happiness, an inside job? Turnover prediction using
employee likability, engagement and relative happiness
Jose Berengueres Guillem Duran Ballester Dani Castro
http://bit.ly/2v2sEZg → Python notebooks
https://github.com/orioli/e3 → R code
EMPLOYEE
PROFILING
ASONAM 2017 Industrial Track – S1 August 1 2017 15:30 St James Room Mercure Hotel Sydney - Australia
PREDICT TURNOVER
TO UNDERSTAND
TURNOVER
RISK FACTORS
$ ↓
 ↑

myhappyforce.com
Duran&Berengueres 2

Monitoring happiness with an app (user flow)

Motivation
Duran&Berengueres 4
PREDICTION
WEEKS
Employee-Company features
Employee individual features
Graph features
CHURN
?
Company-wide features
1. 250+ papers on customer churn, few papers on
employee churn
2. Predict churn to reduce HR costs, plan, visualize…
3. What is the relation between Happiness and Churn?
4. What is the appropriate unit of analysis?
5. Identify turnover risk factors
ASONAM 2017

Outline
Duran&Berengueres 10ASONAM 2017
1. The dataset
2. Exploring the data
3. Feature engineering
4. Modeling turnover
5. Conclusions

Dataset – size
Duran&Berengueres 11
Table (Rows) Feed-back UI flow
Happiness
votes
(221k)
How happy are you at work
today?
- 4: Great
- 3: Good
- 2: So-so
- 1: Pretty Bad
1stscreen
Comments
(29.5k)
Comment box
(optional)
2nd screen
Likes (284k)
Dislikes
(52k)
Anonymous forum
Users can:
- view comments
- like a comment
- dislike a comment
3rd screen
ASONAM 2017

Dataset
Votes
Comments
Likes/Dislikes
ASONAM 2017

Recap
• Votes, comments, interactions
• 34 companies
• Span 2 years
• 3,881 employees of which 238 or 6%
churned

Outline
1. The dataset
5. Conclusions

App usage, periodicity & growth

Duran&Berengueres ASONAM 2017 16
MORE HAPPYLESS HAPPY
The bias towards “good”

The effect of weekday on happiness
Kolmogorov p < 1e-10
ASONAM 2017

The effect of weekday on likes received on a comment
Ballester&Berengueres #pyDataBCN2017 18
Kolmogorov p < 0.00000000001

Churn timeline

Recap
• Visualize the data to filter out outliers
• Strong influence of weekday
• Weekends = happiness

Outline
1. The dataset
5. Conclusions

List of features (N ~100)
Ballester&Berengueres 22
• Individual features (N=13)
• reported happiness: mean, standard deviation (sd),
length of employee comments (in chars): sum, mean, sd.
• count of comments posted per day of observation, count
of chars written per day of observation
• likes given in the forum: sum of all likes, mean (per day the
app was used), sd (per day the app was used).
• count of likes + dislikes received by the employee’s
posted comments in the forum, ratio of likes to likes +
dislikes (likability).
• Company-wide features (N=18 +34 dummy) aka entity
faceted features (ASONAM 2016)
• Same at company level
• Easier to interpret than clustering dummy variables…
• Counter intuitive
• Employee-Company features
• Individual features normalized by company average
• Social graph features (next page)

Three main ways to connect likes on a comment
23
Undirected (1)
Directed (2)
Feature Likability = Likes / Interactions
Feature Interactions = Likes + Dislikes

Intra company interactions
networkx
Hated people is blue node
Node size = mean happiness
Edge is L-ratio between 2 ppl

Happiness as a graph
Company D
Company K
churn

Modeling churn - Representing NMF 1
Viz 1st component of the Non negative matrix
factorization of the adjacency matrix of the graph

Modeling churn - Representing NMF 1
Effect of discretizing the first component into three
values, low as blue, neutral as black, and high as
orange

Outline
1. The dataset
5. Conclusions

Filtered employees & churn
employees who quit
are big, and red

Prediction performance GBM model (test set) P@50>75%
30Duran&Berengueres
0
100
200
300
0
100
200
300
YN
-4 -2 0 2
pred
count
churn
Y
N
@50

Top features that predict turnover
TOP FEATURES TYPE Influence
(a) Likability Employee* 33
(b) Posting frequency Company 9.6
(c) Relative Happiness EC 4.2
(d) Relative Variability of Happiness EC 2.4
(e) NMF Comp. 1 Social 2.2
(f) Mean Happiness of the employee. Employee 1.8

Scatter of two features
32Duran&Berengueres
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
YN
0.00 0.25 0.50 0.75 1.00
F1: Employee Likability
F3:RelativeStabilityofHapiness

“Likable” employees churn 3 times less (all sets)
33Duran&Berengueres

Outline
1. The dataset
5. Conclusions

Influence of feature group in prediction

Interpreting the predictive model as a medical test
Turnoverb
Yes (116) No (1828)
Prediction
output on
test seta
Positive
(50)
True
Positives
41
False
Positives
9
Negative
(1894)
False
Negatives
75
True
Negatives
1819
Sensitivity,
the proportion
of employees that
turnover and who tested
positive in the test is
35% (TP / (TP+FN) )
Specificity, the
proportion of employees
who stay and who tested
negative is 99.5%
Data geek
Nurse

Motivation (flash back)
1. 250+ papers on customer churn, few papers on
employee churn
2. Predict churn to reduce HR costs, plan, visualize…
3. What is the relation between Happiness and Churn?
4. What is the appropriate unit of analysis?
5. Identify turnover risk factors
ASONAM 2017

Conclusions
• Prediction performance
• P@50 ~75% (in medical terms Sensitivity = 35%)
• Relation between Happiness and Churn?
• Raw happiness (f) not correlated with turnover
• What is the appropriate unit of analysis?
• The top features not independent of peers
• Environment > Individual (BF Skinner)
• Top Risk factors
• Likeability top feature (25% have low likeability)
• Engaged company
• Relative Happiness
• Surprises
• Raw happiness not correlated
• Positivity of employee (# likes given to others) not correlated
• Entity faceted features work!
TOP FEATURES TYPE
(a) Likability Employee*
(b) Posting
frequency
Company
(c) Relative
Happiness
EC
(d) Relative
Variability of
Happiness
EC
(e) NMF Comp. 1 Social
(f) Mean Happiness
of the employee.
Employee

Backp up slides
Ballester&Berengueres 39

Machine Learning - The GBM
Straighforward improvements:
● Data balancing → 10% churn
● Data augmentation → 1000 employees
● Metaparameter tuning → Default parameters
● Advance feature selection → + 200 features
● XGBOOST (+10% kaggle)

Network analysis - Extracting features
Descriptive features:
- Number of nodes/edges → Number of employees/interactions
- Degree → Different interactions an employee has
Centrality features:
- Betweenness → Influence of a given employee
- Closeness → Different interactions an employee has
Clustering features:
- Non Negative Matrix Factorization → Reduce information in n comp.
- PCA on adjacency matrix → Reduce information in n comp
- Community → Find communities inside the graph*
- Filtering: MST, MPFG → Keep most important nodes*

IEEE Happiness an inside job asoman 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie IEEE Happiness an inside job asoman 2017

Ähnlich wie IEEE Happiness an inside job asoman 2017 (20)

Mehr von Jose Berengueres

Mehr von Jose Berengueres (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

IEEE Happiness an inside job asoman 2017

Hinweis der Redaktion