http://www.sfbayacm.org/events/2009-03-11.php
Topic
How can the process of Knowledge Discovery in Databases be automated, competitive and reliable? One approach is to focus on a narrow vertical market application, with known data sources and data feeds. Then you can automate the Exploratory Data Analysis (EDA) and Preprocessing phases. But how do you automate the selection of training data? Can the enterprise application be installed and configured at a variety of clients without a Senior Knowledge Discovery Engineer? How can you minimize "worst case" results of such a system when used by a business user going through their normal business role? How can you deeply investigate and model "business values" (i.e. things that can get an end user promoted or fired) into the core of the data mining algorithms?
This talk will answer these questions and more. The patent-pending application, ELF, is an enterprise application in the retail supply chain vertical market. Before the development of this system, one enterprise application was used to lay out a weekly newspaper flier three weeks before the sales event, which in turn fed data into a replenishment application. The replenishment application kept products on the store shelves, with a minimal amount of over stock and under stock. The pain point was that the retail buyer would have to manually estimate the the sales lift, or the multiplier increase in sales, for every item for every store. While human expertise can be great, it isn\'t as scalable when applied to a sales event with 1,000 - 4,000 items on sale in 6,000 stores. ELF (Event Lift Forecasting) would import data from a planned event and automatically analyze and forecast the lift for each store-item combination. Data elements used included pricing, placement in the flier, store geography and demographics, seasonality, and product hierarchy.
The resulting ELF system produced a 8-30% reduction in over and under stock costs, which is very significant in terms of the low profit margins in the supply chain industry.
About the Speaker
Greg Makowski is a Principal Consultant of Golden Data Mining, in Los Altos, California. Since 1992, he has deployed over 70 data mining models for clients i n targeted marketing, financial services, supply chain, e-commerce, and Internet advertising in North America, South America and Europe. He has applied a variety of data mining algorithms during these engagements and has experience using SQL, SAS, Java, and areas of Cloud Computing. Greg has eight years of experience in Product Management and over six years of experience working with start ups. See also www.LinkedIn.com/in/GregMakowski
Embedded Automatic Model Training And Forc In An Enterprise Sw Applic
1. Embedded Automatic Model
Training and Forecasting
in an Enterprise Software Application
(… or how to embed a data mining consultant in a box)
Presented to the SF Bay ACM Data Mining SIG
March 11, 2009 by Greg Makowski
Principal Consultant, Golden Data Mining
p , g
2. Outline
Challenge:
How to automate not only forecasting, but model
training?
Solution:
Focus on a vertical market application
Deeply investigate the business & technical issues
Result:
An enterprise application
Up to a 30% reduction in $ lost to over and under
stock
1
3. Challenge: Business Pain Point
JDA Software (
(who owns the IP) has dozens of
)
enterprise retail supply chain applications
The R l i h
Th Replenishment software does a very good
t ft d gd
job keeping store shelves stocked at the right
level when sales are steadyy
Moves product from warehouse to DC to store
Sales are NOT STEADY during sales events!
PAIN POINT: The event planner has to estimate
the lift in sales for every store-item combination,
store item
(6k stores) * (1k to 4k item’s) 24 mm store-item lift estmts.
2
4. Retail
(context)
Challenge: 16 Page Newspaper Insert
Can vary by region or ZIP
5. Event Lift Forecasting (ELF)
Lift is a multiplier for the increase in sales over
normal
“Prod X in Store Y will sell 6.8 times more than normal”
Normal sales are around the event, for the same:
time period (i.e. Thr – Sun), a week before and after
(non-overlapping)
Store – product (SKU is a key for product)
Event
E t Lift
4
6. Retail
Challenge: Appropriate for Business User
A retail event planner
Has revenue goals and a “budget” of discount $
Has to get through a lot of detail quickly
Does not typically create mathematical forecasts
Uses an enterprise application to layout the
event flyer about 3 weeks in advance
Decides for the event:
departments / items / pricing / photos / language
Uses the software to specify SKU’s, images and
l
layout th fl
t the flyer
5
7. Product Mgmt
Software Arch
Challenge: How to Productize (Agile)?
This is not a one-off consulting project, but SW
Software engineering needs (get in the ballpark)
right starting p
g g position, metrics, use cases, data flow
, , ,
Support good Agile development process
Goals
At least 90% software and 10% configuration,
not repeated consulting projects
projects,
Control the Total Cost of Ownership for the product
RELIABLE when used by the business user user,
working at the level of detail that the user cares about
6
9. Outline
g
Challenge: How to automate not only
y
forecasting, but model training?
Solution:
Focus on a vertical market application
Deeply investigate the business & technical i
D li i hb i h i l issues
Result:
An enterprise application
Up to a 30% reduction in $ lost to over and under
30 educt o ost o e a d u de
stock
8
10. Product Mgmt
Data Mining
Path to Solution
Customer lead, product driven – design general
Can’t data mine – without data
Start data request process with several clients
Jumpstart efforts with Monte Carlo
Combine Census fields with noise to create a target
The models and forecast matter less – the process MORE
Ask for business interviews
Understand users, metrics, past challenges
What is the BATNA?
Best Alternative, To A New Alternative (system)?
9
11. Data Mining
Data Sources
Event Attributes (for planned in 3 weeks & past)
Pricing, placement (page #, on a page)
Products, departments, layout
Store f
S features, d
demographics of population in
hi f li i
area,
Past events
Flyers may have 1, 8, 12, 16, 20, 64 pages
Same week last year may have a different prod mix
Calculate Lift for all store-items for all past events
Normal sales (not during an event) near in time
Event sales; Lift = (event sales) / (non-event sales)
10
12. Data Mining
Iterative KDD Process
Knowledge Discovery in Databases (KDD)
Select Data for Analysis (from prior event app)
1.
Exploratory Data Analysis (EDA)
2.
Preprocessing (manipulating fields)
p g( p g )
3.
Model Building (Training DM algorithms)
4.
Model Evaluation (appl to hold o t data)
(apply out
5.
5
Post-process score to business value
6.
Feed the next application (Lift / store-item)
7.
11
13. Data Mining
Product Mgmt
Easiest to Automate From the Core
Go through full process, automating
model building / evaluation
EDA & Preprocessing
Select past marketing campaigns
12
14. Data Mining
Hypothesis to Select Past Campaigns:
1) Most Similar Past Events
Attention: your expertise will be quizzed!
Hypothesis: a close fit to the new event is better
Compare high level event attributes
Number of pages of the flyer
Discount (average, max)
“Primary” departments, sub-dep, catg, sub-category
… and so on
Use “fuzzy” Euclidian distance to match past
events to the planned event in 3 weeks
Select the 1-10 most similar events in the last year
13
15. Data Mining
Hypothesis to Select Past Campaigns:
2) Select Broadly
Hypothesis: more training records p
yp g provides a
wide variety of behavior, and better generalization
Exclude past marketing events that are quite
different (but be broadly inclusive)
If the planned event is 10-18 pages, exclude 1-2 and
64 page events
Audience Quiz: VOTE for what you expect
1) Close fit,
fit 2) Broad fit ?
14
16. Data Mining
Select Past Campaigns: Results & Why
g
Answer from testing:
BROADLY selecting past marketing
events to train for the planned
event works much better
Why: Breadth Robust G
Generalization
Same sale last year was different in many ways
Broad variety of price points / item or department
Variety of items on cover
Variation
V i ti over geography h
15
17. Data Mining
Exploratory Data Analysis (EDA)
Front cover items had a lift 5.1 times higher
than the average elsewhere!
Lift as high as 130 – after Halloween candy
sale
l
The top 5% of the records had 90% of the lift
(over all store-item combinations)
16
18. Data Mining
Retail
Exploratory Data Analysis (EDA)
The Cash Flow is Very Concentrated
Range of Lift Values
Range of Lift Values
(Omitting the Largest)
(The Top 5% Provides 88% of the Lift)
7
140
6
120
5
100
t)
Lift (Target
Lift (Target)
4
80
3
60
2
?
40
1
20
0
0
012 3456 7 8 9 10 11 12 13 14 15 16 17 18 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Bins of an Equal num ber of Records
Bins of an Equal num ber of Records
Lift Baseline
Lift Baseline
Test weight and target variations, lift and lift_log 17
19. Data Mining++
Preprocessing - Categorical
Average past Lift per category
Percent off bin (i.e. 0%, 5%, 10%, 15% … 80%)
Price Savings Bin (i.e. $2, $4, $6 …)
Store hi
S hierarchy
h
Product hierarchy (50k to 100k SKUs, 4-6 levels)
Department, Sub department, Category Sub-Catgegory
Department Sub-department Category, Sub Catgegory
Seasonality, time, month, week
Reason codes (the event is a circular, clearance)
Location on the page in the flyer (top right, top left..)
Multivariate combinations – powerful & scalable
(price bin) + (page loc bin) + (sub-cat)
18
21. Data Mining
Design Of Experiments (DOE)
Model Notebook (pictured in next slide)
One row per model trained
input columns: data version, model parameters
output columns: training time, results in-sample,
out of sample, gap (bigger is worse), and gap
penalized results
Sections per data mining algorithm, i.e.
Stepwise Regression Naïve Bayes
Regression,
Cubist (tree w/ regression in leaves)
Neural Net
TreeNet (from Salford Systms)
20
22. Data Mining++
Instead of Occam’s Razor
Model Notebook Tracks DOE
Generalization Error = abs( in sample res – out of sample res )
Conservative Result = worst( in, out samp ) + Generalization Err
(, p
MODEL RESULTS
ANALYSIS ENGINE SETTINGS Mean Abs Err (-good)
1 2 3 4
N in In Out of Gen: Out +
Eng parameter 1 parameter 2 parameter 3 comment
ser Samp Samp In-Out Gen
1 regr Try target: LIFT LOG
LIFT_LOG 58 vars selected 1.184 1.264
1 184 1 264 0.08
0 08 1.34
1 34
limit to 15
2 regr Try target: LIFT_LOG limit to 15 1.21 1.289 0.08 1.37
vars
3 regr Try target: LIFT 65 vars selected 1.732 2.654 0.92 3.58
limit to 15
4 regr Try target: LIFT limit to 15 1.714 1.837
1 714 1 837 0.12
0 12 1.96
1 96
vars
Start with unv4_trn, and set larger wgt's
5 regr for larger lift values wgt_2=1; 60 vars selected 1.20 1.42 0.22 1.63
21
IF(2<lift) wgt_2 = 2; IF(5<lift) wgt_2 = 3;
23. Data Mining++
Data Mining Algorithm Improvements
Cubist http://www.rulequest.com/cubist-info.html
Ross Quinlan uses a “greedy algorithm” to select
regression fields for each leaf
Tested and changed to “stepwise regression” for
stepwise regression
each leaf
Split 1
Split 2
p Split 3
p
Leaf 1 Leaf 2 Leaf 3 Leaf 4
22
24. Data Mining
Retail
Training Priority – a Complex Surface
$180,000
$160,000
$140,000
$120,000
on-Event Cash Flow w
e-Items *
$100,000
Event Lift *
$80,000
C
Num Store
$60,000
$60 000
$40,000
$20,000
N
lift to 4.1 $0
No
cash to $7,647
lift to 2.1
cash to $182
cash to $79.38
lift to 1.4
ca to $48.03
cash to $32.36
cash to $22.89
lift to 1.0
Lift
cash to $17.08
2.54
ash
Cash Flow =
lift to .55
55
o
h
1
cash to $8.81
$
cash to $12
cash to $6
Non-Event
Units/day *
Price
23
25. Data Mining
Retail
Model Notebook: Example of Describing Models
Top 1/6 of most expensive items, $5.30+
||||||||||||||||||||||||||||||||||
Past lift by store, sub dept, dept, front page
store sub-dept dept
|||||||||||
Average daily sales per item over prior events
||||||||||
Average price
|||
Item is located on the front page of the flyer
|
Number of Saturday & Sundays in the event
Item comes from the Health and Beauty dept
Item in the Stationary department
Avg # items sold / day
24
26. Data Mining
Retail
Calculate $ of “Business Pain”
Business Pain
zero
error
Over
Under
Stock
Sk
Stock
25
27. Data Mining
Retail
Calculate $ of “Business Pain”
Business Pain
zero
error
?
15% business
pain $
1% bus Over
Under
pain $ Stock
Sk
Stock
Equal mistakes
q
Unequal PAIN in $
26
28. Data Mining++
Retail
Calculate $ of “Business Pain”
Business Pain
No way – that could get you fired!
New progress in getting feedback
zero
30% bus
error
15% business pain $
pain $
1% bus Over
Under
pain $ Stock
Sk
Stock
4 week supply
Equal mistakes
q of SKU
Unequal PAIN in $ 30% off sale
27
29. Data Mining
Best Models by Lift Correlation <> Best by $
The order of “best” models ranked by
best
technical metrics (correlation, MAD) vs.
business pain metric did ’t match
bi i t i didn’t th
A HUGE mismatch!
Change error function of data mining algs
“$ over stock and under stock”
28
30. Data Mining++
Change Data Mining Algorithm Error Func
Error function depends on
knowing the threshold per SKU
“4 weeks of normal sales volume for the SKU
4 SKU”
Neural Net (proprietary, from missile targeting)
After epoch, i.e. forward pass of 1000 records,
calculate this error to minimize
Stepwise Regression & Cubist Leaf Regr.
Change optimization problem from an RMSE of
the target to RMSE of this error function & target
29
31. Product Mgmt
Retail
Worry About Response Time
30
32. Product Mgmt
Data Mining
User Interface: 5 Levels of Complexity
Needs to make reliable for simplest step
Source data fields: use what is available & populated
Insure the minimum data enables a reliable system
Use metadata to select fields (i e exclude low corr, empty)
(i.e. corr
Level 1:
Train 6 models each for 3 fast engines, or with fast settings
g g
(i.e. more shallow trees)
(~30 seconds)
Later Levels:
Add more extensive search per engine of model parameters
more models in DOE, use slower engines, stay time sensitive
(~30 minutes to 2 hours)
31
33. Product Mgmt
Data Mining
How is ELF Software and Not Consulting?
Software install and configuration process
Connect to Event Planning, Connect to Replenishment
Use metadata tags on custom fields
Not dependent on field names
Semantic (i.e. spending) and analytic tags (categorical, source)
Preprocessing executes if supporting data is available
Installer validates by using ELF to create test models
End users create production models
Event
E t Lift
4
32
34. Outline
g
Challenge: How to automate not only
y
forecasting, but model training?
Solution:
Focus on a vertical market application
Deeply investigate the business & technical i
D li i hb i h i l issues
Result:
An enterprise application
Up to a 30% reduction in $ lost to over and under
30 educt o ost o e a d u de
stock
33
35. Retail
Data Mining
Result: Reduction in Business Pain
8 to 30% Reduction in Business Pain $
ELF, Model 117
ELF
ELF ELF over $ over ELF HIGH $ High Over $ under
under
stocking stock stock Over Stock Stock stock
stock
181 87 $ 87
190 31 $ 31
183 46 $ 46
115 77 $ 233
179 105 $ 105
191 109 $ 109
252 101 $ 101
176 40 $ 40
122 37 $ 111
169 6$ 6
183 122 $ 122
119 37 $ 112
287 130 $ 477
34
412 141 $ 281
36. Product Mgmt
Software Dev
Result: Start Agile Process After
After…
Product Requirements Document (PRD)
Technical Specifications:
data flow diagrams, use cases, business metric
Working Prototype, support for testing
Go through Agile & Scrum efforts w/ the
software
soft are engineering group
gro p
Review, revise, evaluate vs. business metrics
35
37. Product Mgmt
Data Mining
Result: Patent Application Process
Provisional Patent http://www.uspto.gov/
Re-write with help of patent attorney, very formal
Application will not be published for 18 months
Ordinary Skill in the Art Written by…
Jeffrey D Ullman, Stanford Computer Science
http://infolab.stanford.edu/~ullman/pub/focs00.html
h //i f l b f d d / ll / b/f 00 h l
The idea must be “novel,” “non obvious” & useful
Novel – does not appear in previous literature
Non obvious – would not be discovered by one of
“ordinary skill in the art when the idea is needed
ordinary art”
How obvious is “obvious?” To how many of 100?
36
38. Data Mining
To What other Verticals Could This Apply?
It can apply where p
pp y p ,
past examples in volume,
relate to future examples
Marketing / Advertising: (media independent)
g g( p )
Finding new customers, clickers, buyers, spending
Cross sell, up sell
p
Customer Attrition (most likely to cancel)
Mortgage Bond p
gg pricing
g (p
(help US out of this mess)
)
rating mortgages inside,
forecasting p p y
g prepayment & default rates
Many other verticals
37
39. Summary
How to automate? From the center out (i.e. onion)
Narrow vertical application, known data source & feeds
application
How to select training data? Broadly
Best improvement?
B ti t?
Optimize by what gets people promoted or fired
Change DM alg. to opt. bus metric
alg opt
How to make robust? Support, but not require, fields
Heavy Research and Prototyping (R&P) before starting Agile
How to succeed in business software?
Support end users at the level of complexity they want
pp p y y
Help them succeed consistently and reliably
38
40. Questions & Answers?
Greg_Makowski@Yahoo.com
(408)781-6808 cell
This PPT will be posted on SF Bay ACM and LinkedIn, below
http://sfbayacm.org/events/2009-03-11.php
http://www.LinkedIn.com/in/GregMakowski
http://fora.tv/ (Video company)
Future talks for ACM and ACM DM SIG
http://www.sfbayacm.org/dmsig.php
http://www sfbayacm org/dmsig php
Other talks
http://www.meetup.com/Bay-Area-Collective-Intelligence/
http://www meetup com/Bay-Area-Collective-Intelligence/
http://www.sdforum.org (business intelligence & other sigs)
39