Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Don't Blame the Retriever; Who Threw the Ball
1. Pay No Attention to the Man
Behind The Curtain
The Changing Requirements of Business
Analytics in Financial Services
Jon C Farrar M.A.
Don’t Blame the Retriever….
Farrar -1 Don’t blame the Retriever;
Who threw the ball?
2. Introduction
• “There is no business challenge that cannot be
solved if one considers that a Business
Challenge is simply a Tennis Ball waiting to be
thrown….”
– Jon Farrar
Don’t blame the Retriever;
who threw the ball?
Farrar -2 Don’t blame the Retriever;
Who threw the ball?
3. Once Upon A Time….
There was this dream that everyone who
needed a loan would always be treated fairly
Farrar -3 Don’t blame the Retriever;
Who threw the ball?
4. But there were factors at work
that made the dream almost
impossible
Farrar -4 Don’t blame the Retriever;
Who threw the ball?
5. But the needs were so great….
Farrar -5 Don’t blame the Retriever;
Who threw the ball?
6. Then all of a sudden
Someone invented something called Credit
Scores
They were a bit odd, at first, but they were also kind of an
elegant accessory and they fit real good.
Once folks found out about ‘em, Everybody wanted ‘em
Don’t blame the Retriever;
Who threw the ball?
8. They Seemed to go with EVERYTHING,
and they were a little Magical besides…
Farrar -8 Don’t blame the Retriever;
Who threw the ball?
9. But, trouble was a-brewin….
The Wizard of OCC (“awk”) found out
about the Credit Scores and he was not
happy.
Farrar -9 Don’t blame the Retriever;
Who threw the ball?
10. The Wizard of OCC thought Credit
Scores looked like this…
Farrar -10 Don’t blame the Retriever;
Who threw the ball?
11. And The Wizard Wanted them to
look more like this…
Farrar -11 Don’t blame the Retriever;
Who threw the ball?
12. So, The Wizard sent his Minions to do
some work….
Farrar -12 Don’t blame the Retriever;
Who threw the ball?
13. Storm after Storm blew down on
everyone using Credit Scores
Farrar -13 Don’t blame the Retriever;
Who threw the ball?
14. And Because the Wizard of OCC
Wasn’t always real clear about what he wanted
everybody to do. People were confused…
REG B
Farrar -14 Don’t blame the Retriever;
Who threw the ball?
15. So the Wizard Tried Again,
OCC
97-24
More Confusion……
Farrar -15 Don’t blame the Retriever;
Who threw the ball?
16. And Again….
OCC
2000-16
And Still, NOBODY seemed to know what
to do….
Farrar -16 Don’t blame the Retriever;
Who threw the ball?
17. And then the wisest one of them all
had an Idea….
C’mon, you guys!
We just gotta go
talk to the old bird
Farrar -17 Don’t blame the Retriever;
Who threw the ball?
19. So there was only one thing
left to do…..
They formed their little group and they went off
to see the Wizard….
They just needed to TALK to him…..
Farrar -19 Don’t blame the Retriever;
Who threw the ball?
20. So, they followed the
FICO-Built Road
OCC
Farrar -20 Don’t blame the Retriever;
Who threw the ball?
21. But that proved kinda scary,
everybody said The Wizard was REAL
MEAN!
Farrar -21 Don’t blame the Retriever;
Who threw the ball?
22. And the Minions seemed to like that
everybody was confused
Farrar -22 Don’t blame the Retriever;
Who threw the ball?
23. And all along the road
There were “Empirically Derived”s
And “Demonstrably and Statistically Sound”s
There were Models, Reporting,
and BackTests
OH MY!
Farrar -23 Don’t blame the Retriever;
Who threw the ball?
24. They knew they weren’t in Kansas
Anymore….
Farrar -24 Don’t blame the Retriever;
Who threw the ball?
25. But They Carried On
In spite of attempts to
deter them from their
road…
Farrar -25 Don’t blame the Retriever;
Who threw the ball?
26. And When they finally got to
The Wizard
They made an appointment with his Admin
Farrar -26 Don’t blame the Retriever;
Who threw the ball?
27. When They First Got
Inside, they WERE scared
Farrar -27 Don’t blame the Retriever;
Who threw the ball?
28. But Then They Realized
something funny
Farrar -28 Don’t blame the Retriever;
Who threw the ball?
29. The Wizard of OCC wasn’t such a bad
guy after all
Farrar -29 Don’t blame the Retriever;
Who threw the ball?
30. He Just wanted Everybody to
Understand How the Ruby Slippers
Were Made
So that they held up,
and didn’t fall apart,
and were the right size,
and were available to all,
And so everybody could buy
and sell more slippers,
In a kinda sorta fair way….
Farrar -30 Don’t blame the Retriever;
Who threw the ball?
31. So the Wizard Of OCC Created
Farrar -31 Don’t blame the Retriever;
Who threw the ball?
32. And everybody sort of understood,
And everybody was sort of happy
Farrar -32 Don’t blame the Retriever;
Who threw the ball?
33. It STILL wasn’t perfect,
But it was a gosh-darn sight better
than what came before…..
REG B
OCC
97-24 OCC
2000-
16
Farrar -33 Don’t blame the Retriever;
Who threw the ball?
34. There was something for everybody
Farrar -34 Don’t blame the Retriever;
Who threw the ball?
36. Dorothy Understood that she
needed to spread the word
Farrar -36 Don’t blame the Retriever;
Who threw the ball?
37. And with the help of a very good
Travel Agent….
Farrar -37 Don’t blame the Retriever;
Who threw the ball?
38. They loaded the Ruby Slippers
and the New Instructions into
the Open Gray Box
Farrar -38 Don’t blame the Retriever;
Who threw the ball?
39. And set off back to
where it all started…
Farrar -39 Don’t blame the Retriever;
Who threw the ball?
40. *
* Well, for the time being anyway….
Farrar -40 Don’t blame the Retriever;
Who threw the ball?
41. Pay No Attention… Part II
• What we have learned thus far
– Since the beginning, Models were Magical
– Regulators were always concerned with Fairness and
measurability
– Models offer Promise but lots of confusion
• Models are used for lots of different functions
• Models are not always clearly understood
• Regulating them lagged behind their prevalence
and use
– Multiple attempts to regulate but never clear
– Finally catching up but still lagging
– OCC 2011-12 best so far, large way there
• Dem’s da rules, Dat’s how we gotta play…
Farrar -41 Don’t blame the Retriever;
Who threw the ball?
42. Models offered Promise but lots
of confusion too
• We started using models for all sorts of different
functions
• Consumers started asking lots of questions
• “You didn’t Score enough” didn’t cut it
• “Lemme talk to your MANAGER!”
Farrar -42 Don’t blame the Retriever;
Who threw the ball?
43. Characteristic Points You see, time
Home Ownership
Own
Rent
35
25
was…..
Lives with parents 20
Other 15 Earlier Models were able
Years On Job to be very simply rendered
< 2years 15
2 – 5 years 20
5 – 8 years
8+ years
20
16
One just added up the
Credit History points
< 2 years 5
2-4 years 10
4-7 years 15 If there were enough to
7+ years 20
pass the cutoff, the
Credit Report
< 3 Inquiries 20
customer was approved
3+ Inquiries 5
< 3 Satisfactory 10
3+ Satisfactory
Worst Rating 60+ Delinq
25
-10
But still nobody really
Worst Rating Derog
Worst Rating Satisfactory
-20
20
knew how to explain them
Farrar -43 Don’t blame the Retriever;
Who threw the ball?
44. And we started using models for
all kinds of things
Line
Authorizations Severe
Credit Prescreen Cross-selling Collections
Objective
Extension Solicit Collecting Reissue Recovery
New Account Solicitation Collection
Tool Scoring Behavior Scoring Scoring
Scoring
Masterfile
Typical
Purchases & Masterfile,
Sources Application, Credit Bureau,
Payments Credit Bureau,
of Data Credit Bureau Demographics
Loan Details Loan Details
Linear Regression Models
Logistic Regression Models
Farrar -44 Don’t blame the Retriever;
Who threw the ball?
45. Models offered Promise but lots
of confusion
• Models used for lots of different functions
• Consumers started asking lots of questions
– Why did I only get that Loan Amount?
– Why was I turned down?
– Why didn’t you renew my Credit Line?
– Why did you call me for a payment?
Farrar -45 Don’t blame the Retriever;
Who threw the ball?
46. Models offered Promise but lots
of confusion
• Models used for lots of different functions
• Consumers gaining Savvy and asked lots of
questions
• “You didn’t Score enough” didn’t cut it
– Customers didn’t get it
– Loan Officers also didn’t get it
– The Tin Woodsman didn’t get it
(and he had an Axe!)
Farrar -46 Don’t blame the Retriever;
Who threw the ball?
47. And now look where we are
(not to mention where we’re going…)
Severe
Objective Collections
Recovery Fraud Attrition Cross Sell Utilization Propensity Operations
Traditional MS Office Suite
Collection Data extraction tools
Tool
Scoring Leading edge Statistical packages (SAS, SPSS, R)
Data Mining packages
Pattern Recognition Algorithms
Categorization and Regression Trees (CART®)
Stochastic Gradient Boosting (TreeNet ®)
Programming and Application Languages
Typical
Sources Masterfile,
of Data Credit Bureau
DataMarts,
Data Warehouses
Web Logs, Transactional Databases,
Historical time series databases
Internal system databases (DDA, Collection, Recovery, Financial, etc.)
Farrar -47 Don’t blame the Retriever;
Who threw the ball?
48. Models offered Promise but lots
of confusion
• Models used for lots of different functions
• Consumers started asking lots of questions
• “You didn’t Score enough” didn’t cut it
Farrar -48 Don’t blame the Retriever;
Who threw the ball?
49. But how ya gonna keep ‘em
down on the Farm…?
• Plethora of Modeling techniques and Methodologies
are part of Statistical training
• Reality Bites
• Only very small number of learned statistical techniques can
actually be used in most business scenarios
• Where we can apply them in Business, even fewer of those
meet usability requirements
– Tracking, Monitoring, Maintaining, Refreshing
– Time to Develop, Validate, Test, Deploy
– Extensible, Scalable, contribute to KPI’s and Financial Measures like
ROA, RAROC, ROI, etc.
– EXPLAINABLE! (ahhhh… back to the Regulations….in a moment…)
• So in general, it makes more sense to use simpler types of
models for most business applications
Farrar -49 Don’t blame the Retriever;
Who threw the ball?
50. So How ya gonna keep ‘em down on
the Farm?
• Easy.
• Tell ‘em they have to follow 2011-12
• They’ll NEVER leave!
Farrar -50 Don’t blame the Retriever;
Who threw the ball?
51. OCC 2011-12
• The design, theory, and logic underlying the
model should be well documented and generally
supported by published research and sound
industry practice. The model methodologies and
processing components that implement the
theory, including the mathematical specification
and the numerical techniques and
approximations, should be explained in detail
with particular attention to merits and
limitations.
Farrar -51 Don’t blame the Retriever;
Who threw the ball?
52. OCC 2011-12 (2)
• Without adequate documentation, model risk
assessment and management will be ineffective.
Documentation of model development and
validation should be sufficiently detailed so that
parties unfamiliar with a model can understand
how the model operates, its limitations, and its
key assumptions. Documentation provides for
continuity of operations, makes compliance with
policy transparent, and helps track
recommendations, responses, and exceptions.
Farrar -52 Don’t blame the Retriever;
Who threw the ball?
53. Vital organs of 2011-12
• Oversight – Model Risk Management Division
– Manage Model Risk like any other type of risk
– Detailed Policies and procedures for Models , their uses and permitted Overrides
– Rigorous assessment of Data quality, relevance, appropriateness and documentation
– All model assumptions must be tracked and monitored
– Appropriateness of chosen Methodology must be defensible (design and construction)
– Audit and Compliance Signoffs
• Rigorous Testing before Implementation
– Stress testing against multiple economic and Financial Scenarios to identify model uncertainty
and potential for inaccuracy
• Independent Validation prior to Implementation (internal unit or Contracted
External resource)
• Model used for population designed on
• Reporting formalized, pre-established thresholds for performance effectiveness
and stability
• Exhaustive documentation to EXPLAIN everything
– Business Goals, Assumptions, Data, Intended Use, Methodology, How Model Works, ties in to
Policy and Procedures, Adverse Action, Testing, Validation and tracking protocols, etc
Farrar -53 Don’t blame the Retriever;
Who threw the ball?
54. EXPLAINING now is a
really BIG thing…
The sum of the square
roots of any two sides of
an isosceles triangle is
equal to the square root
of the remaining side. Oh
joy! Rapture! I got a
brain! How can I ever
thank you enough?
Farrar -54 Don’t blame the Retriever;
Who threw the ball?
55. Explaining Models
• Logistic and Linear Regression Models are very well
understood, have been reliably used in Business
Applications for over 60 years, and when properly built are
stable, very good predictors of outcomes
• Logistic and Linear Regression Models are relatively easy to
explain
– A linear regression line has an equation of the form Y = a +
bX, where X is the explanatory variable and Y is the dependent
variable. The slope of the line is b, and a is the intercept (the value
of y when x = 0)*
– Logistic regression is used for predicting binary outcomes
(Bernoulli trials) rather than continuous outcomes, and models a
transformation of the expected value as a linear function of the
predictors, rather than the expected value itself**
*http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm
**http://en.wikipedia.org/wiki/Logistic_regression#Definition
Farrar -55 Don’t blame the Retriever;
Who threw the ball?
56. Explaining Models (2)
• Regression Models generally assume a statistically normal
distribution of variables and predicted outcomes
• Both Linear and Logistic Models are founded on the
correlative nature of multiple variables to predicted outcomes
and require some type of linear relationship between each
variable and the predicted outcome
– Sometimes (generally) first require data to be transformed in a variety
of ways to establish an optimal linear relationship
– Use a given variable only once in a given model, according to the
(derived) linear relationship
• One variable (or range), one coefficient
Farrar -56 Don’t blame the Retriever;
Who threw the ball?
57. On The Other Hand…
• Business Data is becoming less and less normally
distributed
• Businesses must now pay more and more attention to
exceptions and outliers in order to maximize targeting and
profitability
• Linear and Logistic methodologies are no longer always
adequate to solve the more complex business challenges
– Some build model suites to address a single challenge
– Lead times for development, validation, testing and
documenting suites of models are therefore much more
extended
– Newer methodologies can help here, in the sense that often
one model can be built, but…..
• but 2011-12 rears its head again….
Farrar -57 Don’t blame the Retriever;
Who threw the ball?
58. 2011-12 rears its head again
• If ya’ can’t explain it, ya’ can’t use it
• Neural Networks, Bayesian
Networks, Stochastic Gradient Boosting, etc.
all need to be explained
• Mathematical formulas, and underpinnings
like assumptions, must be justified, can be
difficult to objectively explain, and may be
difficult if not impossible to place into an
Adverse Action context
Farrar -58 Don’t blame the Retriever;
Who threw the ball?
59. Why CART is so cool…
See, Decision Trees are “easy” because we can
explain this one no problem:
INDUS <= 6.145
INDUS > 6.145 &&
PT <= 18.65 && INDUS > 6.145 &&
DIS <= 4.91145 PT > 18.65 &&
NOX > 0.755
INDUS > 6.145 &&
PT <= 18.65 &&
DIS > 4.91145 INDUS > 6.145 &&
PT > 18.65 &&
NOX <= 0.755 &&
INDUS > 6.145 && LSTAT > 5.165
PT > 18.65 &&
NOX <= 0.755 &&
LSTAT <= 5.165
Farrar -59 Don’t blame the Retriever;
Who threw the ball?
61. But how in Munchkin Land
can you explain this thing?
Farrar -61 Don’t blame the Retriever;
Who threw the ball?
62. And what if ya’ had Oh
MY!
something like THIS …
+ + + + + + +
+ + +
+ + + +
+ + + + + + +
+ + +
+ + + +
+ + …….
+ + + +
Farrar -62 Don’t blame the Retriever;
Who threw the ball?
63. Even the TREES get
confused…
+ + + + + + +
+ + + + +
+ +
+ + + + + + +
+ + + + +
+ +
+ + + + …..
+ +
Farrar -63 Don’t blame the Retriever;
Who threw the ball?
64. BAD news
SILENCE!
• ya really
can’t
explain this
one OCC 2011-12
Farrar -64 Don’t blame the Retriever;
Who threw the ball?
65. Good news
• Ya CAN explain this one…..
+ + + + + + +
+ + + + + + +
+ + + + + + +
But what the
+ + + + + + + Kansas is this
thing anyway?
+ + + + + + ……..
Farrar -65 Don’t blame the Retriever;
Who threw the ball?
66. A Woodman’s view of TreeNet®
• Borrowing from Dan Steinberg’s introductory video….
– TreeNet® is also called Stochastic Gradient Boosting
– It’s speed and accuracy are unparalleled in Modeling and it has a
number of advantages over more traditional methodologies
• I will leave the Sales Pitch to Salford, but it is my favorite tool and we used it for
every kind of model you can think of
– I am no expert but here is kind of how it works (and TreeNet® does
this automatically and keeps track of it all for you):
• build an initial tree and identify the misclassifications
• using the misclassified cases as the target, pull your whole sample again, develop a
new tree based on that
• continue until you have exhausted your errors. Could be hundreds or thousands of
builds, all happening very quickly
• You then “simply” add up all of the weights of the variables in the individual trees
and Voilà!
Farrar -66 Don’t blame the Retriever;
Who threw the ball?
67. Think about it like this….
• So you get your one tree…
• TreeNet® changes your target to the Misclasses
and creates a second tree….
• And TreeNet® does it again and again
and again while you get a treat for Toto……
• In the end, TreeNet® adds the weights of the variables in
all the trees together…..
+ + + + + + ….
• Then you simply export the code and implement the
model!
Farrar -67 Don’t blame the Retriever;
Who threw the ball?
68. Here’s a bit of what a Treenet Model
looks like to a C Programmer
/********************************************** * Here come the treenets in the grove. A shell for calling /**********************************************
************ them *********/
* The following C source code was automatically * appears at the end of this source file. if (CRIM == DBL_MISSING_VALUE) CRIM = 0.2102;
generated *********************************************** if (ZN == DBL_MISSING_VALUE) ZN = 0;
* by the TRANSLATE feature in Salford Predictive ****************/ if (INDUS == DBL_MISSING_VALUE) INDUS = 8.14;
Miner(tm). double TreeNet_1(double * const pProb0, double * const if (NOX == DBL_MISSING_VALUE) NOX = 0.515;
* Modeling version: 6.6.0.091, Translation version: pProb1) if (RM == DBL_MISSING_VALUE) RM = 6.251;
6.6.0.091 { if (AGE == DBL_MISSING_VALUE) AGE = 74.3;
/* TreeNet version: 6.6.0.091 */ if (DIS == DBL_MISSING_VALUE) DIS = 3.4211;
*********************************************** /* TreeNet: TreeNet_1 */ if (RAD == DBL_MISSING_VALUE) RAD = 5;
***********/ /* Timestamp: 2012043172135 */ if (TAX == DBL_MISSING_VALUE) TAX = 207;
/* Grove: if (PT == DBL_MISSING_VALUE) PT = 18.6;
#include <string.h> /* for strcmp() */ C:DOCUME~1OfficeLOCALS~1Temps5u137 */ if (B == DBL_MISSING_VALUE) B = 192.11;
#include <math.h> /* for exp() */ /* Target: CHAS */ if (LSTAT == DBL_MISSING_VALUE) LSTAT = 10.3;
/* N trees: 197 */ if (MV == DBL_MISSING_VALUE) MV = 21.7;
/********************************************** /* N target classes: 2 */
************
* **** APPLICATION-DEPENDENT MISSING VALUES double target, net_response = 0.0; /* Tree 1 of 197 */
**** int node, done; /* N terminal nodes = 6, Depth = 5 */
* The two constants must be set **by you** to int response = 0;
whatever target = 0.0;
* value(s) you use in your data management or /***************************/ node = 1; /* start at root node */
programming /* Class-specific treenets */ done = 0; /* set at terminal node */
* workflow to represent missing data. /***************************/
while (!done) switch (node) {
*********************************************** double expsum = 0.0;
***********/ double prob0, score0; /* CHAS = 0 */ case 1:
double prob1, score1; /* CHAS = 1 */ if (NOX != DBL_MISSING_VALUE && NOX < 0.755)
const double DBL_MISSING_VALUE = /* value needed node = 2;
here! */ ; else node = -6;
const int INT_MISSING_VALUE = /* value needed here! /********************************************** break;
*/ ; *********/
/* The following predictors had no missing data in */ case 2:
/************ /* the learn sample, so the TreeNet model is unable to if (TAX != DBL_MISSING_VALUE && TAX < 278) node =
* PREDICTORS */ 3;
************/ /* accommodate missing data for them during scoring. else node = 5;
*/ break;
double /* They must be imputed. These particular values are
CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD, TAX, PT, B, L */ case 3:
STAT, MV; /* the learn sample medians and/or modes. These are if (RM != DBL_MISSING_VALUE && RM < 5.93) node =
*/ -1;
/********************************************** /* provided as a convenience, you may wish to replace else node = 4;
***************** */ break;
/* these expressions with your own. */
Farrar -68 Don’t blame the Retriever;
Who threw the ball?
69. TreeNet®
case -2: code2
case -1: default: /* error */ target = -0.005427301;
target = -1.202511; target = 0.0; node = 2;
node = 1; done = 1; done = 1;
done = 1; node = 0; break;
break; break;
case -3:
case 4: } target = 0.0093125903;
if (LSTAT != DBL_MISSING_VALUE && LSTAT < 6.13) node = 3;
node = -2; net_response += target; done = 1;
else node = -3; break;
break; /* Tree 2 of 197 */
/* N terminal nodes = 6, Depth = 5 */ case 5:
case -2: if (RM != DBL_MISSING_VALUE && RM < 5.5815)
target = -1.217944; target = 0.0; node = -4;
node = 2; node = 1; /* start at root node */ else node = -5;
done = 1; done = 0; /* set at terminal node */ break;
break;
while (!done) switch (node) { case -4:
Code for
case -3: target = 0.00081652142; the first 3
target = -1.2337965; case 1: node = 4;
node = 3; if (NOX != DBL_MISSING_VALUE && NOX < 0.7155) done = 1; Trees in
done = 1; node = 2; break;
break; else node = -6; the
case 5:
break; case -5:
target = -0.0047567333;
Model…*
if (MV != DBL_MISSING_VALUE && MV < 27.3) case 2: node = 5;
node = -4; if (PT != DBL_MISSING_VALUE && PT < 17.7) node = done = 1;
else node = -5; 3; break;
break; else node = 5;
break; case -6:
case -4: target = 0.01884071;
target = -1.2337965; case 3: node = 6;
node = 4; if (TAX != DBL_MISSING_VALUE && TAX < 40.5) done = 1;
done = 1; node = -1; break;
break; else node = 4;
break; default: /* error */
case -5: target = 0.0;
target = -1.2231822; case -1: done = 1;
node = 5; target = 0.024272515; node = 0;
done = 1; node = 1; break;
break; done = 1;
break; }
case -6:
target = -1.2087922; case 4: net_response += target;
node = 6; if (CRIM != DBL_MISSING_VALUE && CRIM <
done = 1; 0.191425) node = -2; /* Tree 3 of 197 */
break; else node = -3; /* N terminal nodes = 6, Depth = 5 */ (…..)
break;
Farrar -69 *NOTE! We multiplied the results times 10,000 to eliminate double precision problems during implementation… Ask me! Don’t blame the Retriever;
Who threw the ball?
70. Imagine that for THOUSANDS
of trees….
Farrar -70 Don’t blame the Retriever;
Who threw the ball?
71. But back to the Wizard of OCC….
• Forget about the code… it’s just text! IT can handle it!
• What you need to focus on is explaining it
all for the Wizard….
• And that doesn’t mean slapping down a bunch of
code lines
• The Wizard needs to understand how come the
Ruby Slippers fit so well, how the Slippers were
put together, and where the material comes from
(the variables and weights that drive the results)
• Especially if you need to communicate to
customers the effects of wearing the Slippers
– In modeling terms, like if it is an Origination model
needing Score Factor Codes for Adverse Action
Letters…)
• So here’s one way to do that….
Farrar -71 Don’t blame the Retriever;
Who threw the ball?
72. CASE STUDY from Real Life….
•Attrition Model – Customer will close all accounts
•Needed Talking Points (Score Factors) to facilitate
attempts to save customer accounts
•Built TreeNet® model to predict probability that a
customer will close all of their accounts
•Identified CART Equivalent Rules for all Accounts
•Pulled new out of sample data for recent periods
•Scored and Validated the results against known
outcomes
•Based on the Probability, generated list of high risk
accounts and pushed to Branches with Score
Factors (rules) appended
Farrar -72 Don’t blame the Retriever;
Who threw the ball?
73. Attrition Model Process
• Built TreeNet® Model
• Scored Validation Set using model built
• Created new data set appending probability score and Node identifier to each sample
point
• Identified Variable Importance
• Used CART to derive a Regression tree using TreeNet® score as the target
• Compared Variable Importances
• Looked at rules governing each of the like nodes
• Manually went through tree finding Terminal Nodes with like Mean values
• Generalized like nodes based on rules and split thresholds, creating factors such as “Low
Balance,” “Short Time On Books,” “Diminishing Balance Over Last 6 Months,” etc.
• Pruned Tree where possible (without fundamentally changing Rules and split
thresholds)
• Analyzed each step to understand Utility vs. Complexity tradeoffs
• Tested outcome (same data) with the generalized variables
• Tested with repeated out-of-sample Validation sets
• Subjected process to Model Risk Management Unit which independently validated
model and documentation
• Implemented Model
Farrar -73 Don’t blame the Retriever;
Who threw the ball?
74. A Schematical* Representation of
what I just explained…
Initial Regression Tree
(post- TreeNet®)
*HAH! I love new words….
Farrar -74 Don’t blame the Retriever;
Who threw the ball?
82. Tnode 8 0.00222916
Before… Tnode 11 0.002227983
YRS_OB > 6.145 &&
CONTACTS <= 18.65 &&
NUM_ACCTS <= 4.5 &&
YRS_OB > 6.145 &&
C_BTL > 0.04427 &&
CONTACTS <= 18.65 &&
BRANCH > 290.5 &&
NUM_ACCTS <= 4.5 &&
BRANCH <= 417.5 &&
C_BTL > 0.04427 &&
TN8 and TN11 NUM_PROD > 4.5 &&
PROFIT_ILE > 24.95 &&
FEE_LEVEL > 0.5055 &&
BRANCH <= 290.5 && Good for Consolidation, differences can be PROFIT_ILE > 24.95 &&
TRANS_NUM > 1325.5 && dealt with PROFIT_ILE <= 96.05 &&
TRANS_NUM <= 2731
MOS_ACTIVE <= 367.315 &&
TRANS_NUM <= 1563
Rules Tnodes 8 & 11
BRANCH = “NORTHERN THRU CENTRAL”
YRS_OB > 6
CONTACTS = “LOW”
TRANS_NUM = “MODERATE”
and After…
TNode 5
YRS_OB > 6.145 && TNode 8
Rules TNodes 5 & 8 YRS_OB > 6.145 &&
CONTACTS > 18.65 &&
FEE_LEVEL <= 0.755 && CONTACTS > 18.65 &&
BRANCH = “NORTHERN THRU CENTRAL” FEE_LEVEL > 0.755
TRANS_NUM <= 540.5 YRS_OB > 6
CONTACTS = “LOW”
TRANS_NUM = “MODERATE”
Effect after pruning (Where Art Meets Science):
•TNodes change from 8 and 11 to 5 and 8 (Smaller tree)
•“BRANCH” kept since it applied prior to pruning and aids in list generation and routing
•“YRS_OB” split threshold becomes rounded generalized threshold
•Generalization can still be used
•In this example, “FEE_LEVEL” was not included ( “<= and >” cancel each other out)
•“CONTACTS” thresholds change ( “<= becomes >” ) but threshold still can be used within “Low” designation
•“TRANS_NUM was kept since it applied prior to pruning and aided in talking points
Farrar -82 Don’t blame the Retriever;
Who threw the ball?
83. RUN, Toto, RUN!!!!
• Implement the Dog-gone thing!
Customer Branch Risk Point 1 Point 2 Point 3 Point 4
Long time on
Bill Muchkinovski 200 Low books 6 Mos. Moderate Balance Moderate number products Moderate Profit
Short time on 6 Mos. Low number
Millie Smoller 27 Med books 6 Mos. Low Balance Low Number Products contacts
Short time on 6 Mos. High number
Beulah Diminuitive 343 Med books 6 Mos. Low Balance Low Number Products contacts
Casper Long time on 6 Mos. High Number of
Lollipopovich 721 High books 6 Mos. Diminishing Balance Contacts Moderate Profit
Long time on 6 Mos. High Number of
Martha Smallkind 14 High books 6 Mos. Diminishing Balance Contacts Moderate Profit
Elmo Long time on 6 Mos. High Number of
Munchkinovich 1 High books Contacts 6 Mos. High Balance High Profit
Farrar -83 Don’t blame the Retriever;
Who threw the ball?
88. Jon’s 30+ years of Predictive Modeling expertise comes
from various segments of the financial industry
including Banking, Consumer Finance, Mortgage, and
Modeling Vendor. He has experience in the U.S.,
Canada, Australia and the United Kingdom. As SVP and
Manager of Predictive Modeling at Union Bank, Jon
introduced Scoring technology in 1995 and provided
The now departed Zeppelin, best human
Credit Risk research, analytics and Customer
being I ever knew, proudly displaying the Segmentation strategies, along with many of the Bank’s
four balls he so loved to retrieve… Business Intelligence and Operations statistical models.
Contact Information: Jon’s Expertise includes Regulatory oversight and all
things AVM (Automated Valuation Modeling).
jcf4now@sbcglobal.net
In addition to Consulting and Expert Witness
engagements, Jon holds a Master’s Degree in
Counseling Psychology and speaks at a variety of
Industry conferences.
Farrar -88 Don’t blame the Retriever;
Who threw the ball?