Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Embedded Automatic Model
Training and Forecasting
in an Enterprise Software Application
(… or how to embed a data mining consultant in a box)

Presented to the SF Bay ACM Data Mining SIG
March 11, 2009 by Greg Makowski
Principal Consultant, Golden Data Mining
p , g

Outline
Challenge:
How to automate not only forecasting, but model
training?

Solution:
Focus on a vertical market application
Deeply investigate the business & technical issues

Result:
An enterprise application
Up to a 30% reduction in $ lost to over and under
stock
1

Challenge: Business Pain Point
JDA Software (
(who owns the IP) has dozens of
)
enterprise retail supply chain applications

The R l i h
Th Replenishment software does a very good
t ft d gd
job keeping store shelves stocked at the right
level when sales are steadyy
Moves product from warehouse to DC to store
Sales are NOT STEADY during sales events!

PAIN POINT: The event planner has to estimate
the lift in sales for every store-item combination,
store item
(6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts.
2

Retail
(context)

Challenge: 16 Page Newspaper Insert
Can vary by region or ZIP

Event Lift Forecasting (ELF)

Lift is a multiplier for the increase in sales over
normal
“Prod X in Store Y will sell 6.8 times more than normal”
Normal sales are around the event, for the same:
time period (i.e. Thr – Sun), a week before and after
(non-overlapping)
Store – product (SKU is a key for product)

Event
E t Lift

4

Retail

Challenge: Appropriate for Business User
A retail event planner
Has revenue goals and a “budget” of discount $
Has to get through a lot of detail quickly
Does not typically create mathematical forecasts

Uses an enterprise application to layout the
event flyer about 3 weeks in advance
Decides for the event:
departments / items / pricing / photos / language
Uses the software to specify SKU’s, images and
l
layout th fl
t the flyer

5

Product Mgmt
Software Arch

Challenge: How to Productize (Agile)?
This is not a one-off consulting project, but SW
Software engineering needs (get in the ballpark)
right starting p
g g position, metrics, use cases, data flow
, , ,
Support good Agile development process
Goals
At least 90% software and 10% configuration,
not repeated consulting projects
projects,
Control the Total Cost of Ownership for the product
RELIABLE when used by the business user user,
working at the level of detail that the user cares about
6

Product Mgmt

Challenge: Details we Have vs. Need to Start

Outline
g
Challenge: How to automate not only
y
forecasting, but model training?

Solution:
Deeply investigate the business & technical i
D li i hb i h i l issues

Result:
30 educt o ost o e a d u de
stock
8

Product Mgmt
Data Mining

Path to Solution
Customer lead, product driven – design general

Can’t data mine – without data
Start data request process with several clients
Jumpstart efforts with Monte Carlo
Combine Census fields with noise to create a target
The models and forecast matter less – the process MORE

Ask for business interviews
Understand users, metrics, past challenges
What is the BATNA?
Best Alternative, To A New Alternative (system)?
9

Data Mining

Data Sources
Event Attributes (for planned in 3 weeks & past)
Pricing, placement (page #, on a page)
Products, departments, layout
Store f
S features, d
demographics of population in
hi f li i
area,
Past events
Flyers may have 1, 8, 12, 16, 20, 64 pages
Same week last year may have a different prod mix
Calculate Lift for all store-items for all past events
Normal sales (not during an event) near in time
Event sales; Lift = (event sales) / (non-event sales)
10

Data Mining

Iterative KDD Process
Knowledge Discovery in Databases (KDD)

Select Data for Analysis (from prior event app)
1.

Exploratory Data Analysis (EDA)
2.

Preprocessing (manipulating fields)
p g( p g )
3.

Model Building (Training DM algorithms)
4.

Model Evaluation (appl to hold o t data)
(apply out
5.
5

Post-process score to business value
6.

Feed the next application (Lift / store-item)
7.
11

Data Mining
Product Mgmt

Easiest to Automate From the Core

Go through full process, automating
model building / evaluation
EDA & Preprocessing
Select past marketing campaigns

12

Data Mining
Hypothesis to Select Past Campaigns:
1) Most Similar Past Events
Attention: your expertise will be quizzed!
Hypothesis: a close fit to the new event is better

Compare high level event attributes
Number of pages of the flyer
Discount (average, max)
“Primary” departments, sub-dep, catg, sub-category
… and so on

Use “fuzzy” Euclidian distance to match past
events to the planned event in 3 weeks
Select the 1-10 most similar events in the last year
13

Data Mining
Hypothesis to Select Past Campaigns:
2) Select Broadly

Hypothesis: more training records p
yp g provides a
wide variety of behavior, and better generalization

Exclude past marketing events that are quite
different (but be broadly inclusive)
If the planned event is 10-18 pages, exclude 1-2 and
64 page events

Audience Quiz: VOTE for what you expect
1) Close fit,
fit 2) Broad fit ?

14

Data Mining

Select Past Campaigns: Results & Why
g
Answer from testing:
BROADLY selecting past marketing
events to train for the planned
event works much better

Why: Breadth Robust G
Generalization
Same sale last year was different in many ways
Broad variety of price points / item or department
Variety of items on cover
Variation
V i ti over geography h
15

Data Mining


Front cover items had a lift 5.1 times higher
than the average elsewhere!

Lift as high as 130 – after Halloween candy
sale
l

The top 5% of the records had 90% of the lift
(over all store-item combinations)

16

Data Mining
Retail

The Cash Flow is Very Concentrated
Range of Lift Values
Range of Lift Values
(Omitting the Largest)
(The Top 5% Provides 88% of the Lift)
7
140

6
120

5
100

t)
Lift (Target
Lift (Target)

4
80

3
60

2

?
40

1
20

0
0
012 3456 7 8 9 10 11 12 13 14 15 16 17 18 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bins of an Equal num ber of Records
Bins of an Equal num ber of Records
Lift Baseline
Lift Baseline

Test weight and target variations, lift and lift_log 17

Data Mining++

Preprocessing - Categorical
Average past Lift per category
Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%)
Price Savings Bin (i.e. $2, $4, $6 …)
Store hi
S hierarchy
h
Product hierarchy (50k to 100k SKUs, 4-6 levels)
Department, Sub department, Category Sub-Catgegory
Department Sub-department Category, Sub Catgegory
Seasonality, time, month, week
Reason codes (the event is a circular, clearance)
Location on the page in the flyer (top right, top left..)
Multivariate combinations – powerful & scalable
(price bin) + (page loc bin) + (sub-cat)
18

Data Mining++

Preprocessing – Interactions

19

Data Mining

Design Of Experiments (DOE)
Model Notebook (pictured in next slide)
One row per model trained
input columns: data version, model parameters
output columns: training time, results in-sample,
out of sample, gap (bigger is worse), and gap
penalized results
Sections per data mining algorithm, i.e.
Stepwise Regression Naïve Bayes
Regression,
Cubist (tree w/ regression in leaves)
Neural Net
TreeNet (from Salford Systms)
20

Data Mining++
Instead of Occam’s Razor

Model Notebook Tracks DOE
Generalization Error = abs( in sample res – out of sample res )
Conservative Result = worst( in, out samp ) + Generalization Err
(, p
MODEL RESULTS
ANALYSIS ENGINE SETTINGS Mean Abs Err (-good)
1 2 3 4
N in In Out of Gen: Out +
Eng parameter 1 parameter 2 parameter 3 comment
ser Samp Samp In-Out Gen

1 regr Try target: LIFT LOG
LIFT_LOG 58 vars selected 1.184 1.264
1 184 1 264 0.08
0 08 1.34
1 34

limit to 15
2 regr Try target: LIFT_LOG limit to 15 1.21 1.289 0.08 1.37
vars

3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58

limit to 15
4 regr Try target: LIFT limit to 15 1.714 1.837
1 714 1 837 0.12
0 12 1.96
1 96
vars
Start with unv4_trn, and set larger wgt's
5 regr for larger lift values wgt_2=1; 60 vars selected 1.20 1.42 0.22 1.63
21
IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3;

Data Mining++

Data Mining Algorithm Improvements
Cubist http://www.rulequest.com/cubist-info.html
Ross Quinlan uses a “greedy algorithm” to select
regression fields for each leaf
Tested and changed to “stepwise regression” for
stepwise regression
each leaf
Split 1

Split 2
p Split 3
p

Leaf 1 Leaf 2 Leaf 3 Leaf 4

22

Data Mining
Retail

Training Priority – a Complex Surface
$180,000

$160,000

$140,000

$120,000

on-Event Cash Flow w
e-Items *
$100,000

Event Lift *
$80,000

C
Num Store
$60,000
$60 000

$40,000

$20,000

N
lift to 4.1 $0

No
cash to $7,647
lift to 2.1

cash to $182
cash to $79.38
lift to 1.4

ca to $48.03
cash to $32.36
cash to $22.89
lift to 1.0
Lift
cash to $17.08
2.54

ash
Cash Flow =
lift to .55
55
o

h
1
cash to $8.81

$
cash to $12
cash to $6

Non-Event
Units/day *
Price
23

Data Mining
Retail

Model Notebook: Example of Describing Models
Top 1/6 of most expensive items, $5.30+
||||||||||||||||||||||||||||||||||

Past lift by store, sub dept, dept, front page
store sub-dept dept
|||||||||||

Average daily sales per item over prior events
||||||||||

Average price
|||

Item is located on the front page of the flyer
|

Number of Saturday & Sundays in the event
Item comes from the Health and Beauty dept
Item in the Stationary department
Avg # items sold / day

24

Data Mining
Retail

Calculate $ of “Business Pain”
Business Pain

zero
error

Over
Under
Stock
Sk
Stock

25

Data Mining
Retail

Business Pain

zero
error

?
15% business
pain $
1% bus Over
Under
pain $ Stock
Sk
Stock

Equal mistakes
q
Unequal PAIN in $
26

Data Mining++
Retail

Business Pain

No way – that could get you fired!
New progress in getting feedback
zero
30% bus
error
15% business pain $
pain $
1% bus Over
Under
pain $ Stock
Sk
Stock
4 week supply
Equal mistakes
q of SKU
Unequal PAIN in $ 30% off sale
27

Data Mining

Best Models by Lift Correlation <> Best by $

The order of “best” models ranked by
best
technical metrics (correlation, MAD) vs.
business pain metric did ’t match
bi i t i didn’t th
A HUGE mismatch!

Change error function of data mining algs
“$ over stock and under stock”

28

Data Mining++

Change Data Mining Algorithm Error Func

Error function depends on
knowing the threshold per SKU
“4 weeks of normal sales volume for the SKU
4 SKU”
Neural Net (proprietary, from missile targeting)
After epoch, i.e. forward pass of 1000 records,
calculate this error to minimize
Stepwise Regression & Cubist Leaf Regr.
Change optimization problem from an RMSE of
the target to RMSE of this error function & target
29

Product Mgmt
Retail

Worry About Response Time

30

Product Mgmt
Data Mining

User Interface: 5 Levels of Complexity
Needs to make reliable for simplest step
Source data fields: use what is available & populated
Insure the minimum data enables a reliable system
Use metadata to select fields (i e exclude low corr, empty)
(i.e. corr
Level 1:
Train 6 models each for 3 fast engines, or with fast settings
g g
(i.e. more shallow trees)
(~30 seconds)
Later Levels:
Add more extensive search per engine of model parameters
more models in DOE, use slower engines, stay time sensitive
(~30 minutes to 2 hours)

31

Product Mgmt
Data Mining

How is ELF Software and Not Consulting?
Software install and configuration process
Connect to Event Planning, Connect to Replenishment
Use metadata tags on custom fields
Not dependent on field names
Semantic (i.e. spending) and analytic tags (categorical, source)
Preprocessing executes if supporting data is available
Installer validates by using ELF to create test models
End users create production models

Event
E t Lift

4
32

Outline
g
Challenge: How to automate not only
y
forecasting, but model training?

Solution:
Deeply investigate the business & technical i
D li i hb i h i l issues

Result:
30 educt o ost o e a d u de
stock
33

Retail
Data Mining

Result: Reduction in Business Pain
8 to 30% Reduction in Business Pain $
ELF, Model 117
ELF
ELF ELF over $ over ELF HIGH $ High Over $ under
under
stocking stock stock Over Stock Stock stock
stock
181 87 $ 87
190 31 $ 31
183 46 $ 46
115 77 $ 233
179 105 $ 105
191 109 $ 109
252 101 $ 101
176 40 $ 40
122 37 $ 111
169 6$ 6
183 122 $ 122
119 37 $ 112
287 130 $ 477
34
412 141 $ 281

Product Mgmt
Software Dev

Result: Start Agile Process After
After…
Product Requirements Document (PRD)

Technical Specifications:
data flow diagrams, use cases, business metric
Working Prototype, support for testing

Go through Agile & Scrum efforts w/ the
software
soft are engineering group
gro p
Review, revise, evaluate vs. business metrics

35

Product Mgmt
Data Mining

Result: Patent Application Process
Provisional Patent http://www.uspto.gov/
Re-write with help of patent attorney, very formal
Application will not be published for 18 months
Ordinary Skill in the Art Written by…
Jeffrey D Ullman, Stanford Computer Science
http://infolab.stanford.edu/~ullman/pub/focs00.html
h //i f l b f d d / ll / b/f 00 h l
The idea must be “novel,” “non obvious” & useful
Novel – does not appear in previous literature
Non obvious – would not be discovered by one of
“ordinary skill in the art when the idea is needed
ordinary art”
How obvious is “obvious?” To how many of 100?
36

Data Mining

To What other Verticals Could This Apply?
It can apply where p
pp y p ,
past examples in volume,
relate to future examples
Marketing / Advertising: (media independent)
g g( p )
Finding new customers, clickers, buyers, spending
Cross sell, up sell
p
Customer Attrition (most likely to cancel)
Mortgage Bond p
gg pricing
g (p
(help US out of this mess)
)
rating mortgages inside,
forecasting p p y
g prepayment & default rates
Many other verticals
37

Summary
How to automate? From the center out (i.e. onion)
Narrow vertical application, known data source & feeds
application
How to select training data? Broadly
Best improvement?
B ti t?
Optimize by what gets people promoted or fired
Change DM alg. to opt. bus metric
alg opt
How to make robust? Support, but not require, fields
Heavy Research and Prototyping (R&P) before starting Agile
How to succeed in business software?
Support end users at the level of complexity they want
pp p y y
Help them succeed consistently and reliably
38

Questions & Answers?
Greg_Makowski@Yahoo.com
(408)781-6808 cell

This PPT will be posted on SF Bay ACM and LinkedIn, below
http://sfbayacm.org/events/2009-03-11.php
http://www.LinkedIn.com/in/GregMakowski
http://fora.tv/ (Video company)

Future talks for ACM and ACM DM SIG
http://www.sfbayacm.org/dmsig.php
http://www sfbayacm org/dmsig php

Other talks
http://www.meetup.com/Bay-Area-Collective-Intelligence/
http://www meetup com/Bay-Area-Collective-Intelligence/
http://www.sdforum.org (business intelligence & other sigs)

39

Embedded Automatic Model Training And Forc In An Enterprise Sw Applic

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (12)

Mehr von Greg Makowski

Mehr von Greg Makowski (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Embedded Automatic Model Training And Forc In An Enterprise Sw Applic