1. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
A
Presenta*on
from
The
Fes*val
of
NewMR
–
Training
Day
3
December
2012
All
copyright
owned
by
The
Future
Place
and
the
presenters
of
the
material
For
more
informa:on
about
NewMR
events
visit
NewMR.org
Sponsored
by:
See
the
eXhib:on
for
booths
from
media
partners
&
supporters
An
Introduc*on
to
Latent
Class
Analysis
for
Marke*ng
Segmenta*on
Tim
Bock,
Q
2. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
An Introduction to Latent
Class Analysis for Marketing
Segmentation
Tim Bock, Q
www.q-researchsoftware.com
tim.bock@q-researchsoftware.com
+61 425 241 989
3. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Overview
• Latent
class
analysis
versus
cluster
analysis
– Theore:cal
difference:
probabili:es
– Prac:cal
differences:
• Non-‐numeric
data
(e.g.,
categorical
data)
• Missing
values
• Applica:on:
what
do
research
buyer’s
want?
– Missing
values
– Response
bias
4. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Latent
class
analysis
turns
data
into
segments
Worriers
Concerned
with
decay
preven:on
Sociables
Concerned
with
tooth
colour
Sensory
Concerned
with
flavour
Independent
Concerned
with
price
Adapted
from:
Haley,
R.
I.
(1968).
"Benefit
Segmenta:on:
A
Decision
Oriented
Research
Tool."
Journal
of
Marke:ng
30(July):
30-‐35.
5. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
6. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Cluster
Analysis
Latent
Class
Analysis
7. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Cluster
Analysis
versus
Latent
Class
Analysis
for
segmenta*on
• Latent
class
analysis
is
theore:cally
superior
– Clearly-‐stated
assump:ons
– Cluster
analysis
is
inconsistent
with
elementary
laws
of
probability
(in
par:cular,
Bayes’
Theorem)
• Latent
class
analysis
so_ware
is
superior
– Any
type
of
data
(via
distribu:onal
assump:ons):
Categorical,
Conjoint,
Choice,
MaxDiff,
Rankings,
etc.
– “Mixed”
data
(e.g.,
categorical
and
numeric)
– Missing
values
– Response
biases
8. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
k-‐Means
Cluster
Analysis
9. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
k-‐Means
Cluster
Analysis
Randomly
allocate
respondents
to
clusters
10. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
k-‐Means
Cluster
Analysis
11. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Compute
cluster
means
k-‐Means
Cluster
Analysis
0
5
10
15
20
25
30
35
25
20
15
10
5
0
12. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Compute
cluster
means
k-‐Means
Cluster
Analysis
13. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
k-‐Means
Cluster
Analysis
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Compute
cluster
means
Allocate
respondents
to
most
similar
clusters
0
5
10
15
20
25
30
35
25
20
15
10
5
0
14. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
k-‐Means
Cluster
Analysis
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Compute
cluster
means
Allocate
respondents
to
most
similar
clusters
15. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Allocate
respondents
to
most
similar
clusters
k-‐Means
Cluster
Analysis
Compute
cluster
means
0
5
10
15
20
25
30
35
25
20
15
10
5
0
16. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Allocate
respondents
to
most
similar
clusters
k-‐Means
Cluster
Analysis
Compute
cluster
means
17. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Allocate
respondents
to
most
similar
clusters
k-‐Means
Cluster
Analysis
Compute
cluster
means
Repeat
un:l
changes
in
cluster
means
are
small
or
non-‐existent
18. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Specify
number
of
clusters
(k)
Randomly
allocate
respondents
to
clusters
Allocate
respondents
to
most
similar
clusters
Repeat
un:l
changes
in
cluster
means
are
small
or
non-‐existent
k-‐Means
Cluster
Analysis
Compute
cluster
means
Specify
number
of
classes
(k)
Randomly
allocate
respondents
to
classes
Compute
class
parameters*
Compute
probability
of
each
respondent
being
in
each
class
Repeat
un:l
changes
in
class
parameters
are
small
or
non-‐existent
Latent
Class
Analysis
Allocate
respondents
classes
with
highest
probabili:es
This
is
a
comparison
of
batch
k-‐means
and
Latent
Class
Analysis
with
an
EM
Algorithm.
See
Celeux
and
Govaert
(1991),
“Clustering
criteria
for
discrete
data
and
latent
class
models”,
Journal
of
Classifica:on,
8(2)
for
a
more
mathema:cal
comparison.
*
The
class
parameters
are
computed
as
weighted
averages
of
the
segmenta:on
variables,
where
the
weights
are
the
probabili:es
of
each
respondent
being
in
each
segment.
19. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Cluster
Analysis
0
5
10
15
20
25
30
35
25
20
15
10
5
0
Latent
Class
Analysis
20. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
21. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Cluster
Analysis
Latent
Class
Analysis
22. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
missing
values
23. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
How
many
clusters
(or
classes)
can
you
see
in
this
data?
24. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Missing
values
and
latent
class
analysis
A
B
C
D
Cluster
1
1
2
3
4
Cluster
2
4
3
2
1
Cluster
3
1
2
2
1
Class
means
25. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Missing
values
and
cluster
analysis
A
B
C
D
Cluster
1
1
2
3
3
Cluster
2
MISSING
MISSING
MISSING
MISSING
Cluster
3
3
3
2
1
Cluster
means
26. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
distribu*onal
assump*ons
27. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Distribu*onal
assump*ons
• Basic
idea:
instruct
a
latent
class
models
about
how
to
interpret
the
data
• Categorical
assump:on:
look
only
at
matches
– Example:
respondent
1
is
most
similar
to
2
and
3
(i.e.,
they
match
on
two
variables)
• Numeric
assump:on:
assign
values
and
compute
differences
(e.g.,
Agree
=
3,
Neither
=
2,
Disagree
=
1)
– Example:
respondent
1
is
most
similar
to
respondent
3
• Ranking
assump:on:
look
at
rela:ve
order
– Respondent
1
is
iden:cal
to
respondent
4
Variable
ID
A
B
C
1
Agree
Agree
Neither
2
Agree
Disagree
Neither
3
Agree
Neither
Neither
4
Neither
Neither
Disagree
28. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
29. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Example:
Categorical
data
Data
Shop Agree
(A)
or
disagree
(D)
that
“It
is
important
to
shop
around”
Understand Agree
(A)
or
disagree
(D)
that
“I
understand
my
company's
communica:on
needs”
Key
Agree
(A)
or
disagree
(D)
that
“Communica:ons
technology
is
key
to
our
business”
Interested Agree
(A)
or
disagree
(D)
that
“I
am
interested
in
communica:ons
technology”
Value Agree
(A)
or
disagree
(D)
that
“Value
for
money
is
more
important
to
us
than
gelng
the
best
technology”
ID
Shop
Understand
Key
Interest
Value
1 A A A A D
2 A A A D A
3 A A A A D
4 A A D A A
5 A D A D D
6 D A A A D
7 A D A D D
8 D D A A D
9 A A A A A
10 A A A A D
11 D A D D A
12 A A A A A
13 D D D D D
… … … … … …
30. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Specify
number
of
classes
(k)
Randomly
allocate
respondents
to
classes
Compute
class
parameters
Compute
probability
of
each
respondent
being
in
each
class
Repeat
un:l
changes
in
class
parameters
are
small
or
non-‐existent
Latent
Class
Analysis
Allocate
respondents
classes
with
highest
probabili:es
31. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
ID
Shop
Understand
Key
Interest
Value
… … … … … …
6 D A A A D
… … … … … …
Data
Parameters
Looking
at
the
parameters,
which
class
do
you
think
respondent
6
belongs
to?
Size Shop Under-‐
stand
Key Interest Value
Class
1 67% Agree 40% 40% 48% 16% 53%
Disagree 60% 60% 52% 84% 47%
Class
2 33% Agree 65% 90% 88% 100% 26%
Disagree 35% 10% 12% 0% 73%
32. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Compu*ng
the
probability
of
each
respondent
being
in
each
class
Size Shop Under-‐
stand
Key Interest Value
Class
1 67% Agree 40% 40% 48% 16% 53%
Disagree 60% 60% 52% 84% 47%
Class
2 33% Agree 65% 90% 88% 100% 26%
Disagree 35% 10% 12% 0% 73%
ID
Shop
Understand
Key
Interest
Value
… … … … … …
6 D A A A D
… … … … … …
Data
Parameters
Ini:al
best
guess
of
probabili:es
is
given
by
the
class
sizes:
Class
1:
67%
Class
2:
33%
Prior
Probability
that
somebody
in
each
class
would
give
answers:
Class
1:
60%×40%×48%×16%×47%
=
1%
Class
2:
35%×90%×88%×100%×73%
=
20%
Class
condi:onal
densi:es
67%×1%
67%×1%
+
3%×20%
33%×20%
67%×1%
+
33%×20%
Posterior
probability
(Probability
of
being
in
a
class)
Class
1:
=
9%
Class
2:
=
91%
33. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Applica*on
n
=
1,145
market
researchers
(GRIT2012/2013)
“How
important
do
you
think
each
of
the
following
atributes
is
to
clients
when
they
select
a
research
provider?”
5
POINT
SCALE
RANDOMLY
SHOW
15
OF
25
ATTRIBUTES
TO
EACH
RESPONDENT
34. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Cluster
Analysis
35. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Numeric
Assump*on
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needs
Completes
research
in
an
agreed-‐upon
:me
Good
rela:onship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputa:on
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/:me
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communica:on
channels
&
technologies
Also
does
qualita:ve
research
Consulta:on
on
best
prac:ces
and
methodology
effec:veness
Uses
sophis:cated
research
technology/strategies
Provides
highest
data
quality
Uses
the
latest
sta:s:cal/analy:cal
packages
Offers
unique
methodology
or
approach
Uses
the
latest
data
collec:on
technology
Segment
1
(45%)
%
Segment
2
(11%)
%
Segment
3
(45%)
%
Segment
1 Segment
2 Segment
3
Numeric
3
class
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needs
Completes
research
in
an
agreed-‐upon
time
Good
relationship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputation
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/time
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communication
channels
&
technologiesAlso
does
qualitative
research
Consultation
on
best
practices
and
methodology
effectivenessUses
sophisticated
research
technology/
strategies
Provides
highest
data
quality
Uses
the
latest
statistical/analytical
packagesOffers
unique
methodology
or
approach
Uses
the
latest
data
collection
technology
ortance
to
clients
(Research
providers
viewpoint):
Top
2
boxes
(out
of
5)
-‐
reordered
50
88
95
98
83
99
97
95
90
93
92
86
36
71
97
96
86
84
66
96
75
96
57
75
79
68
73
65
67
47
50
67
75
31
40
33
16
33
13
58
30
28
27
16
18
27
15
10
31
17
55
87
89
97
71
95
91
94
81
82
85
51
2
36
96
91
59
45
22
71
37
69
7
39
14
Top
2
Box
(%)
Percentages
are
Top
2
Box
Scores.
Where
values
are
significantly
higher
than
average
the
bars
are
shaded
orange.
Darker
shades
of
orange
correspond
to
smaller
p-‐
values.
36. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Categorical
Assump*on
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needs
Completes
research
in
an
agreed-‐upon
:me
Good
rela:onship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputa:on
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/:me
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communica:on
channels
&
technologies
Also
does
qualita:ve
research
Consulta:on
on
best
prac:ces
and
methodology
effec:veness
Uses
sophis:cated
research
technology/strategies
Provides
highest
data
quality
Uses
the
latest
sta:s:cal/analy:cal
packages
Offers
unique
methodology
or
approach
Uses
the
latest
data
collec:on
technology
Segment
1
(50%)
%
Segment
2
(50%)
%
Segment
1 Segment
2
All
categories
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needs
Completes
research
in
an
agreed-‐upon
time
Good
relationship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputation
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/time
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communication
channels
&
technologiesAlso
does
qualitative
research
Consultation
on
best
practices
and
methodology
effectivenessUses
sophisticated
research
technology/
strategies
Provides
highest
data
quality
Uses
the
latest
statistical/analytical
packagesOffers
unique
methodology
or
approach
Uses
the
latest
data
collection
technology
ortance
to
clients
(Research
providers
viewpoint):
Top
2
boxes
(out
of
5)
-‐
reordered
41
90
96
98
87
100
96
98
89
95
92
81
24
67
98
97
82
77
55
93
69
89
48
65
60
66
81
83
90
61
86
87
86
70
71
73
43
18
32
85
74
52
43
27
58
36
60
13
43
27
Top
2
Box
(%)
Percentages
are
Top
2
Box
Scores.
Where
values
are
significantly
higher
than
average
the
bars
are
shaded
orange.
Darker
shades
of
orange
correspond
to
smaller
p-‐
values.
37. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Ranking
Assump*on
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needs
Completes
research
in
an
agreed-‐upon
:me
Good
rela:onship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputa:on
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/:me
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communica:on
channels
&
technologies
Also
does
qualita:ve
research
Consulta:on
on
best
prac:ces
and
methodology
effec:veness
Uses
sophis:cated
research
technology/strategies
Provides
highest
data
quality
Uses
the
latest
sta:s:cal/analy:cal
packages
Offers
unique
methodology
or
approach
Uses
the
latest
data
collec:on
technology
Segment
1
(54%)
%
Segment
2
(46%)
%
Segment
1 Segment
2
Ranking
Lowest
price
Previous
experience
with
client/supplier
Rapid
response
to
requests
Listens
well
and
understands
client
needs
Flexibility
on
changing
project
parameters
Familiarity
with
client
needsCompletes
research
in
an
agreed-‐upon
time
Good
relationship
with
client/supplier
Breadth
of
experience
in
the
target
segment
Good
reputation
in
the
industry
Familiarity
with
the
industry
or
category
Length
of
experience/time
in
business
Has
an
access
panel
Company
is
financially
stable
Has
knowledgeable
staff
High
quality
analysis
Provides
data
analysis
services
Understands
new
consumer
communication
channels
&
technologies
Also
does
qualitative
research
Consultation
on
best
practices
and
methodology
effectivenessUses
sophisticated
research
technology/
strategies
Provides
highest
data
quality
Uses
the
latest
statistical/analytical
packages
Offers
unique
methodology
or
approach
Uses
the
latest
data
collection
technology
ortance
to
clients
(Research
providers
viewpoint):
Top
2
boxes
(out
of
5)
-‐
reordered
86
97
98
98
80
95
94
95
80
83
82
61
19
47
90
81
62
54
31
67
40
63
17
35
26
23
74
81
91
68
89
88
90
78
82
83
63
22
53
95
90
75
71
51
86
69
88
49
75
65
Top
2
Box
(%)
Percentages
are
Top
2
Box
Scores.
Where
values
are
significantly
higher
than
average
the
bars
are
shaded
orange.
Darker
shades
of
orange
correspond
to
smaller
p-‐
values.
38. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Latent
class
analysis
sobware
Product
Data/distribu*onal
assump*ons
Covariates*
Complex
Sampling*
Sawtooth
So_ware
Regression
(discrete
choice,
ranks),
Max-‐Diff
No
No
Q
Numeric,
Binary,
Categorical,
Ranks,
Par:al
Ranks,
Ranks
with
Ties,
Max-‐Diff,
Regression
(linear,
discrete
choice,
ranks,
par:al
ranks,
ranks
with
:es,
best-‐worst),
Mixed
data
No
No
Limdep
Regression
(linear,
discrete
choice,
censored,
ranks,
par:al
ranks,
counts,
survival,
etc.)
Yes
No
SAS
(PROC
LCA/LTA/
Mixed)
Numeric,
Binary,
Categorical,
Growth,
Regression
(discrete
choice,
ranks,
par:al
ranks)
Yes
Yes
MPlus
Numeric,
Binary,
Categorical,
Ordered,
Categorical,
Counts,
Mixed
data
Yes
Yes
Latent
gold/Latent
Gold
Choice
Numeric,
Binary,
Categorical,
Growth,
Ranks,
Par:al
Ranks,
Counts,
Regression
(linear,
discrete
choice,
censored,
ranks,
par:al
ranks)
Yes
Yes
*
Covariates
and
the
ability
to
handle
complex
sampling
can
be
relevant
when
applying
latent
class
analysis
to
non-‐
segmenta:on
problems
(e.g.,
crea:ng
predic:ve
models).
39. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Cluster
Analysis
Latent
Class
Analysis
40. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Thank you
Tim Bock
Q
41. Tim Bock, Q, Australia
Festival of NewMR 2012 – Training Day – Session 1
Tim Bock, Q
www.q-researchsoftware.com
tim.bock@q-researchsoftware.com
+61 425 241 989