3. IDENTIFY AND RECOMMEND TOP 1000 CUSTOMER TO TARGET
FROM DATASETS
OUTLINE THE PROBLEM
Sprocket Central Pty Ltd is a long-standing KPMG client
whom specializes in high-quality bikes and accessible
cycling accessories to riders.
Their marketing team is looking to boost business by
analyzing their existing customer dataset to determine
customer trends and behavior.
Sprocket Central Pty Ltd has given us a new list of 1000
potential customers with their demographic and
transactions. However, these customers do not have prior
transaction history with the organization.
CONTENT OF DATA ANALYSIS
‘NEW and ‘OLD’ Age Customer distribution
Bike related purchases over the last 3 years by gender
Job Industry distributions
Wealth Segmentation by age category
Number of cars owned and not owned by State
RFM analysis and customer distribution
INTRODUCTION
4. INTRODUCTION
DATA QUALITY ASSESSMENT AND ‘CLEAN UP’
Key issues for the Data Quality Assessment
Accuracy: Correct values
Completeness: Data Fields with values
Consistency: Value free from contradiction
Currency: Values up to date
Relevancy: Data items with Value Meta-data
Validity: Data containing Allowable Values
Uniqueness: Records that are Duplicated
Accuracy Completeness Consistency Currency Relevancy Validity
Customer
demographic
DOB:
inaccurate
AGE: missing
Job title: Blanks
Customer id:
incomplete
Gender:
inconsistency
Deceased
customers:
Filter out
Deceased
customers:
Filter out
Default
column:
deleted
Customer
Address
Customer id:
incomplete
State:
inconsistency
Transactions Profit: missing Customer id:
incomplete
Order online:
Blanks
Brand: blanks
Cancelled
status order:
filter out
List price:
format
Product sold
date: format
An in-depth analysis has been sent via mail
6. New’ and ‘Old’ Customer Age Distributions
Most customer are aged between 40-49 in ‘New’
sheet. Additionally in ‘Old’ datasheet majority of
customers are aged between 40-90.
The lowest age group are under 20 and 80+ for both
‘New’ and ‘Old’ customer lists.
The ‘New’ customer lists suggests that the groups 20-
29 and 40-69 are most populated.
The ‘Old’ customer list suggests that age group 20-69
There is a sleep drop of customers in the 30-39 age
group in ‘New’.
Place any supporting images, graphs, data or
extra text here.
12
141
81
177
143 141
66
23
0
50
100
150
200
250
NUMBER
OF
PEOPLE
AXIS AGE DISTRIBUTION [20=UNDER 20,30=20-29]
AGE DISTRIBUTION_NEW CUSTOMER
20
30
40
50
60
70
80
90
49
583
641
1147
597
394
2 2
0
200
400
600
800
1000
1200
NUMBER
OF
PEOPLE
AXIS AGE DISTRIBUTION [20=UNDER 20,30=20-29]
OLD CUSTOMER AGE DISTRIBUTION
20
30
40
50
60
70
80
90
DATA EXPLORATION
7. DATA EXPLORATION
Bike related purchases over last 3 years by gender
Over the last 3 years about 50% of bike related
purchases were made by the females to 48% of
purchases made by males. Approximately 2% were
made by unknown gender.
Numerically, females purchases almost 10,000 more
than males
Females make up majority of bike related sales
50.98%
46.83%
2.20%
0.00%
20.00%
40.00%
60.00%
PERCENTAGE
OF
BIKE
PURCHASES
GENDER CATEGORY
BIKE PURCHASES FOR THE PAST 3 YEARS
BY GENDER
Female
Male
U
98359
93483
3718
0
20000
40000
60000
80000
100000
Number
of
people
purchases
GENDER CATEGORY
BIKE RELATED PURCHASES OVER PAST 3
YEAR
Female
Male
U
8. DATA EXPLORATION
Job Industry Distribution
20% of ‘New’ customers are in
Manufacturing and Financial
services.
The smallest number of
customers are in Agriculture
and Telecommunications at
3%.
Similar pattern in ‘Old’
customer list, at 20% and 195
in Manufacturing and Financial
services respectively.
3% 3%
19%
15%
6%
20%
16%
7%
9%
2%
OLD JOB INDUSTRY
Argiculture
Entertainment
Financial Services
Health
IT
Manufacturing
n/a
Property
Retail
Telecommunications
3% 4%
20%
15%
5%
20%
16%
6%
8%
3%
NEW CUSTOMER JOB
INDUSTRY DISTRIBUTION
Argiculture Entertainment
Financial Services Health
IT Manufacturing
n/a Property
Retail Telecommunications
9. Wealth Segmentation by age category
In all age categories the
largest number of
customers are classified
as ‘Mass Customer’
The next category is the
‘High Net Worth’
customers.
The ‘Affluent Customer’
can outperforms the ‘High
Net Worth’ customer in
the 40-49 age group. 20 30 40 50 60 70 80 90
Mass Customer 20 290 322 570 295 197 1
High Net Worth 16 137 163 299 152 103 1
Affluent Customer 13 156 156 278 150 94 1 1
13
156 156
278
150 94
1 1
16
137 163
299
152
103
1
20
290
322
570
295
197
1
0
200
400
600
800
1000
1200
total
number
of
people
as
per
age
category
OLD CUSTOMER WEALTH SEGMENT BY
AGE
DATA EXPLORATION
10 20 30 40 50 60 70 80 90
Mass Customer 57 8 68 45 90 60 69 39 9
High Net Worth 20 2 39 17 53 37 35 12 4
Affluent Customer 18 2 34 19 34 46 37 15 10
18 2 34 19 34 46 37 15 10
20
2
39
17
53
37
35
12 4
57
8
68
45
90 60
69
39
9
0
50
100
150
200
Number
of
people
in
each
stage
category
New Customer wealth Segment By Age
10. Number of car owned and Not Owned by State.
NSW has the largest amount of people
that do not own car. NSW seems to
have higher number of people from
which data was collected.
Victoria is also spilt quite evenly. But
both numbers are significantly lower
then those of NSW.
QLD has a relatively higher number of
customers that owns a car.
272
103
132
234
125
134
0
50
100
150
200
250
300
NSW QLD VIC
Number
of
cars
owned/not
owned
State names
NUMBER OF CARS OWNED AS PER STATE
No
Yes
DATA EXPLORATION
12. RFM Analysis and Customer Classification.
RFM analysis is used to determine which
customer a business should target to
increase its revenue and value.
The RFM (Recency, Frequency and
Monetary) model shows customers that
have displayed high shows customers
that have displayed high levels of
engagement with the business in three
categories mentioned.
0 1 2 3 4
Almost Lost Customer
Becoming Loyal
Evasive Customer
High Risk Customer
Late Bloomer
Losing Customer
Lost Customer
Platinum
Potential Customer
Recent Customer
Very Loyal
RMF value Assigned
Customer
Title
Customer Title and Score
Min of M_score Min of R_score Min of F_score
MODEL DEVELOPMENT
13. Scatter-Plot based off RFM Analysis
The chart shows that customers who
purchased more recently have generated
more revenue, than customer who visited a
while ago.
Customers from recent past (50-100 days)
show to generate a moderate amount of
revenue.
Those who visited more than 200 days ago
generated low revenue.
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200 250 300 350
Monetary
Value
($)
Recency Value
Recency against Monetary
MODEL DEVELOPMENT
14. Scatter-Plot based off RFM Analysis
Customer classified as”
Platinum customer”, “Very
Loyal”, and “Becoming Loyal”
visit frequently, which
correlated with increased
revenue for the business.
Naturally, there is a positive
relationship between frequency
and monetary gain for the
business.
0
2000
4000
6000
8000
10000
12000
0 2 4 6 8 10 12 14
Monetary
Value
($)
Frequency of purchases
Frequency against Monetary
MODEL DEVELOPMENT
15. Scatter-Plot based off RFM Analysis
Very low frequency of 0-2 correlated
with high recency values, i.e. more
than 250 days ago.
Customers that have visited more
recently (0-50 days) have higher
chance of visiting more frequency (6+).
Higher frequency has a negative
relationship with recency values. Such
that very recent customers are also
frequent customer.
0
2
4
6
8
10
12
14
0 50 100 150 200 250 300 350 400
Frequency
of
purchases
Recency (days)
Recency against Frequency
MODEL DEVELOPMENT
16. Customer Title Definition List with RFM values Assigned
Rank Customer Title Description RFM Value
1 Platinum Customer Most recent buy, buys, often, most spent 444
2 Very Loyal Most recent buy, buys, often, spent large amount of money 433
3 Becoming Loyal Relatively recent buy, bought more then once, spent large amount of
money
432
4 Recent Customer Bought recently, not very often, average money spent 414
5 Potential Customer Bought recently, never bought before, spent small amount 343
6 Late Bloomer No purchases recently, but RFM value is larger then average 322
7 Losing Customer Purchases was a while ago, below average RFM value 244
8 High Risk Customer Purchases was long time ago, frequency is quite high, amount spent
is high
223
9 Almost Lost Customer Very low recency, low frequency, but high amount spent 211
10 Evasive Customer Very low recency, very low frequency, but small amount spent 123
11 Lost Customer Very low RFM 111
MODEL DEVELOPMENT
17. Customer Title Definition List with RFM values Assigned
0
50
100
150
200
250
300
350
400
450
211
432
123
223
322
244
111
444
343
414
443
MODEL DEVELOPMENT
18. Customer Distributions in Dataset
326
344
400
360
337
355
292
174
352
367
187
0 100 200 300 400
NUMBER OF CUSTOMER
CUSTOMER
TITLE
Distributions of customer
Very Loyal
Recent Customer
Potential Customer
Platinum
Lost Customer
Losing Customer
Late Bloomer
High Risk Customer
Evasive Customer
Becoming Loyal
Almost Lost Customer
9%
10%
12%
10%
10%
10%
8%
5%
10%
11%
5%
Distributions of customer
Almost Lost
Customer
Becoming Loyal
Evasive Customer
High Risk
Customer
Late Bloomer
Losing Customer
Lost Customer
Platinum
Potential
Customer
MODEL DEVELOPMENT
19. SUMMARY TABLE OF THE TOP 1000 CUSTOMER TO TARGET
Rank Customer Title Description
Number of
customer
Cumulative Customer Selection
1 Platinum
Customer
Most recent buy, buys, often, most spent
174 174
174
2 Very Loyal Most recent buy, buys, often, spent large amount of money 187 361 187
3 Becoming Loyal Relatively recent buy, bought more then once, spent large
amount of money
344 705
344
4 Recent Customer Bought recently, not very often, average money spent 367 1072 295
5 Potential
Customer
Bought recently, never bought before, spent small amount
352 1424
0
6 Late Bloomer No purchases recently, but RFM value is larger then average 337 1761 0
7 Losing Customer Purchases was a while ago, below average RFM value 355 2116 0
8 High Risk
Customer
Purchases was long time ago, frequency is quite high, amount
spent is high
360 2476
0
9 Almost Lost
Customer
Very low recency, low frequency, but high amount spent
326 2802
0
10 Evasive Customer Very low recency, very low frequency, but small amount spent 400 3202 0
11 Lost Customer Very low RFM 292 3494 0
MODEL DEVELOPMENT
21. CUSTOMER TO TARGET AND METHODOLOGY
Rank Customer Title Description
Number of
customer
Cumulative
Customer
Selection
1 Platinum
Customer
Most recent buy, buys, often, most spent
174 174 174
2 Very Loyal Most recent buy, buys, often, spent large amount of money 187 361 187
3 Becoming Loyal Relatively recent buy, bought more then once, spent large
amount of money
344 705 344
4 Recent Customer Bought recently, not very often, average money spent 367 1072 295
Total Customer 1000
Filter though the top 1000 customers assigning the conditions discussed in the table
above. As a company cannot ignore there ‘loyal(very loyal and becoming loyal)’ and
‘platinum customers’ though they should select all of them, additionally the remaining
295 customer must be selected from the definition of ‘Recent customer’ to get total 1000
customer. (174+187+344)-1000=295
The 1000 customers discovered would have bought recently, they have bought very
frequently in the past and tend to spend more than other customers.
INTERPRETATIONS