SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Optimizing Acceptance Threshold
in Credit Scoring
using Reinforcement Learning
Student: Mykola Herasymovych
Supervisors: Oliver Lukason (PhD)
Karl Mรคrka (MSc)
Credit Scoring Problem
(Crook et al., 2007; Lessmann et al. 2015; Thomas et al., 2017)
โ€ข Predict the probability of a loan application being bad:
Pr ๐ต๐‘Ž๐‘‘ ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘–๐‘ ๐‘ก๐‘–๐‘๐‘  ๐’™} = ๐‘ ๐‘ฆ = 1|๐’™ = ๐‘ฆ
โ€ข Transform it into a credit score reflecting applicationโ€™s
creditworthiness level:
๐‘  ๐ถ๐‘†
๐’™ = ๐‘  ๐ถ๐‘†
(๐‘ฆ, ๐‘ง),
๐‘  ๐ถ๐‘†
- credit score, ๐‘ฆ - estimated probability, ๐‘ง โ€“ other factors (e.g.
policy rules)
2
Credit Business Process
(Creditstar Group)
Loan
application
Estimate
credit score
50%
Credit score is high:
give loan
50%
Credit score is low:
reject application
Client doesnโ€™t repay:
money loss
Client repays:
money gain
Profits change
credit score
3
Acceptance Threshold Optimization
Optimizing Acceptance Threshold
in Credit Scoring
using Reinforcement Learning
Acceptance Threshold
(Viaene and Dedene, 2005; Verbraken et al., 2014; Skarestad, 2017)(Banasik et al., 2003; Wu and Hand, 2007; Dey, 2010)(Sousa et al., 2013; Bellotti and Crook, 2013; Nikolaidis, 2017)
Selection BiasPopulation Drift
4
Credit Scoring Literature 1
(Number of published articles with โ€œcredit scoringโ€ keyword)
0
50
100
150
200
250
300
Articles by year
General trend
Note: adapted from Louzada et al. (2016) and updated by the author based on literature review.
5
Credit Scoring Literature 2
(Percentage of papers published on the topic in 1992-2015)
Note: adapted from Louzada et al. (2016) and updated by the author based on literature review.
0% 10% 20% 30% 40% 50% 60%
New method to propose rating
Comparison in traditional techinques
Conceptual discussion
Variable selection
Literature review
Performance measures
Other issues
Acceptance threshold optimization
0% 10% 20% 30% 40% 50% 60%
New method to propose rating
Comparison in traditional techinques
Conceptual discussion
Variable selection
Literature review
Performance measures
Other issues
Acceptance threshold optimization
0% 10% 20% 30% 40% 50% 60%
New method to propose rating
Comparison in traditional techinques
Conceptual discussion
Variable selection
Literature review
Performance measures
Other issues
Acceptance threshold optimization
0% 10% 20% 30% 40% 50% 60%
New method to propose rating
Comparison in traditional techinques
Conceptual discussion
Variable selection
Literature review
Performance measures
Other issues
Acceptance threshold optimization
6
Shortcomings of
Traditional Approach
โ€ข Is static and backward looking;
โ€ข Ignores credit scoring modelโ€™s performance
uncertainty (Thomas et al., 2017);
โ€ข Ignores selection bias (Hand, 2006; Dey, 2010);
โ€ข Ignores population drift (Sousa et al., 2013; Nikolaidis, 2017);
โ€ข Oversimplifies lenderโ€™s utility function (Finlay, 2010;
Skarestad, 2017).
7
Solution
A Reinforcement Learning (RL) agent:
โ€ข a dynamic forward-looking system
โ€ข that adapts to the live data feedback
โ€ข and adjusts acceptance threshold
โ€ข to maximize accurately specified lenderโ€™s
utility function.
Reinforcement Learning
8
RL Achievements
โ€ข Forex, stocks and securities trading (Neuneier, 1996);
โ€ข Resource allocation (Tesauro et al., 2006);
โ€ข Tax and debt collection optimization (Abe et al., 2010);
โ€ข Dynamic pricing (Kim et al., 2016);
โ€ข Behavioral marketing (Sato, 2016);
โ€ข Bank portfolio optimization (Strydom, 2017).
โ€ข Has not been applied to the credit scoring yet,
to the best of our knowledge.
9
Where We Fit
Portfolio
Optimization
Credit
Scoring
Artificial
Intelligence
10
RL parameters:
๐›ผ โ€“ learning rate;
๐›พ โ€“ discount rate;
๐‘ก ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ_๐‘ก
โ€“ inverse
scaling parameter
of the learning
rate;
Credit Business
Environment
RL Agent
Q-Value Function Update Rule:
๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ ๐‘ก ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก
๐œ•๐‘„(๐‘ ,๐‘Ž)
๐œ•๐‘ค ๐‘Ž
,
๐›ผ ๐‘ก =
๐›ผ0
๐‘ก ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ_๐‘ก,
๐œ•๐‘„(๐‘ ,๐‘Ž)
๐œ•๐‘ค ๐‘Ž
= ๐‘†๐‘ก
Profit
Reward
๐‘… ๐‘†, ๐ด = ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘ก๐‘ ๐‘—
๐‘Ž ๐‘—
๐‘Ž ๐‘š๐‘Ž๐‘ฅ
๐‘Ž= ๐‘Ž ๐‘—
๐‘ก
๐‘—=0
Acceptance
Rate
State
๐‘†(๐ด) =
๐‘ ๐‘–
๐ถ๐‘†
โ‰ฅ ๐‘ก ๐ด๐‘‡ ๐‘Ž ๐‘กโˆ’1
๐‘› ๐‘ก
๐‘–=1
๐‘› ๐‘ก
Acceptance
Threshold
Action
๐ด ๐‘† = ๐œ‹(๐‘„(๐‘ค, ๐‘‹(๐‘†)))
Reward
๐‘… ๐‘Ž (๐‘†, ๐ด) = ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘ก๐‘ ๐‘—
๐‘Ž ๐‘—๐‘ก
๐‘—=0
Q-values
๐‘„ ๐‘ค, ๐‘‹ = ๐‘ค๐‘‹
Q-values
๐‘„ ๐‘Ž ๐‘ค ๐‘Ž, ๐‘‹ = ๐‘ค ๐‘Ž ๐‘‹
Action
๐ด ๐‘„ = ๐œ‹ ๐ด ๐‘†)
RBF Features
๐‘‹ ๐‘† =
2
๐‘˜
cos ๐‘ค ๐‘…๐ต๐น ๐‘†(๐ด) + ๐‘ ๐‘…๐ต๐น ,
๐‘ค ๐‘…๐ต๐น ~ ๐‘ 0, 2๐›พ ๐‘…๐ต๐น , ๐‘ ๐‘…๐ต๐น ~ ๐‘ˆ(0, 2๐œ‹)
Prediction
Learning
Q-Value Function
๐‘„ ๐œ‹ ๐‘†, ๐ด = ๐”ผ ๐œ‹ ๐›พ ๐‘– ๐‘… ๐‘ก+๐‘–
โˆž
๐‘–=0
๐‘†๐‘ก = ๐‘ , ๐ด ๐‘ก = ๐‘Ž
Policy
๐œ‹ ๐ด ๐‘†) = โ„™ ๐ด ๐‘ก = ๐‘Ž ๐‘†๐‘ก = ๐‘ ]
CS variables:
๐‘› โ€“ number of
loan applications;
๐‘  ๐ถ๐‘†
โ€“ credit score;
๐‘ก ๐ด๐‘‡
โ€“ acceptance
threshold.
RBF parameters:
๐‘ค ๐‘…๐ต๐น
โ€“ RBF weights;
๐‘ ๐‘…๐ต๐น
โ€“ RBF offset values;
๐›พ ๐‘…๐ต๐น
- variance parameter
of a normal distribution;
๐‘˜ - numbers of RBF
components.
Policy parameters:
๐œ โ€“ the temperature parameter
of the Boltzmann distribution.
Exploitative
๐œ‹ ๐บ๐‘Ÿ๐‘’๐‘’๐‘‘๐‘ฆ
(๐‘„) = argmax
๐‘Ž
๐‘„ ๐‘†, ๐ด
Explorative
๐œ‹ ๐ต๐‘œ๐‘™๐‘ก๐‘ง๐‘š๐‘Ž๐‘›๐‘›
๐ด ๐‘† =
๐‘’
๐‘„(๐‘†,๐ด)
๐œ
๐‘’
๐‘„(๐‘†,๐ดโ€ฒ)
๐œ
๐ดโ€ฒโˆˆ๐’œ
Note: the process repeats at a weekly frequency: t โ€“ week number.
Note: The State
object
summarizes
characteristics
of the loan
portfolio.
Note: The Action object is mapped to one out of
20 discrete values of acceptance threshold.
Note: The policy is
explorative during
training episodes
and exploitative
during test ones.
Note: Higher ๐œ lead to a more greedy policy,
while lower ๐œ โ€“ to a more random one.
Note: The Q-value function is
approximated with Stochastic
Gradient Descent (SGD)
models.
Note: The RL is less
responsive during
training and more
responsive during
test episodes.
11
Credit Business
Environment
RL Agent
(the dog)
Profit
Reward
Acceptance
Rate
State
Acceptance
Threshold
Action
Acceptance
Rate
Next
State
Week
Reinforcement Learning (RL)
(Sutton and Barto, 2017)
12 Loss
Reward
3 Higher
Profit
Reward
1010010004
12
Learned Value Function shape
(after 6000 simulated weeks of training)
Notes: state denotes the application acceptance rate during the previous week, action denotes the acceptance
threshold for the following week, value is the prediction of the Value Function model for a particular state-action pair,
optimum shows the state-action pair that corresponds to the highest value in the state-action space.
13
Traditional Approach
(Baseline)
Notes: Baseline approach follows methodology of Verbraken et al. (2014) and Skarestad (2017).
14
Test Simulation Results 1
(shift in score distribution)
Notes: figures show 100 simulation runs and their average. In each scenario the distribution of total profit differences is
significantly higher than zero according to the one-tailed t-test. Profit is measured in thousands of euros.
15
Notes: figures show 100 simulation runs and their average. In each scenario the distribution of total profit differences is
significantly higher than zero according to the one-tailed t-test. Profit is measured in thousands of euros.
Test Simulation Results 2
(shift in default rates)
16
Performance on the Real Data 1
(acceptance threshold policy)
Notes: figure shows the difference between action variables and the baseline action. Baseline denotes the acceptance
threshold optimized using traditional approach, RL chosen denotes the one used by the RL agent, Value Function-optimal
denotes the one optimal according to the Value Function model.
17
Performance on the Real Data 2
(profits received)
Note: figure shows the difference between reward variables and the baseline reward. Baseline denotes the profits received with
the acceptance threshold optimized using traditional approach, RL received weekly and total denote profits received by the RL
agent. Profit is measured in thousands of euros.
18
Implications
โ€ข The work improves the traditional acceptance
threshold optimization approach in credit scoring of
Verbraken et al. (2014) and Skarestad (2017);
โ€ข Solves the problem of optimization in a dynamic
partially observed credit business environment
outlined in Thomas et al. (2017) and Nikolaidis (2017);
โ€ข Provides more evidence on superiority of RL-based
systems compared to traditional methodology in line
with Strydom (2017) and Sutton and Barto (2017);
โ€ข Produces practical benefit to Creditstar Group as a
decision support system.
19
Conclusions
โ€ข The credit scoring literature usually omits the problem of
acceptance threshold optimization, despite its
significant impact on the credit business efficiency;
โ€ข The traditional approach fails to optimize the
acceptance threshold due to issues like population drift
and selection bias;
โ€ข The developed RL algorithm manages to correct for the
flawed knowledge and successfully adapt to the real
environment, significantly outperforming the traditional
approach;
โ€ข Being a proof of concept, our work describes a large
room for further research and improvement of the
acceptance threshold optimization issue.
Q&A
20
Supplementary Materials
21
Acceptance Threshold Optimization
Optimizing Acceptance Threshold
in Credit Scoring
using Reinforcement Learning
Acceptance Threshold
22
Traditional Approach 1
(Viaene and Dedene, 2005; Hand, 2009; Lessmann et al., 2015)
โ€ข Construct the misclassification costs function:
๐‘€๐ถ ๐‘ก ๐ด๐‘‡
; ๐‘๐ต
, ๐‘๐บ
= ๐‘๐ต
๐œ‹ ๐ต
๐‘ƒ๐‘ƒ
(1 โˆ’ ๐น๐ต ๐‘ก ๐ด๐‘‡
) + ๐‘๐บ
๐œ‹ ๐บ
๐‘ƒ๐‘ƒ
๐น๐บ(๐‘ก ๐ด๐‘‡
)
โ€ข Minimize using FOC w.r.t. acceptance threshold:
๐‘“๐ต(๐‘‡ ๐ด๐‘‡)
๐‘“๐บ(๐‘‡ ๐ด๐‘‡)
=
๐œ‹ ๐บ
๐‘ƒ๐‘ƒ
๐œ‹ ๐ต
๐‘ƒ๐‘ƒ
๐‘๐บ
๐‘๐ต
๐‘ก ๐ด๐‘‡
โ€“ acceptance threshold, ๐‘‡ ๐ด๐‘‡
โ€“ optimal acceptance threshold,
๐‘๐ต
and ๐‘๐บ
โ€“ average cost per misclassified bad (Type I error)and good
(Type II error)application respectively,
๐œ‹ ๐บ
๐‘ƒ๐‘ƒ
and ๐œ‹ ๐ต
๐‘ƒ๐‘ƒ
โ€“ prior probabilities of being a good and a bad application
respectively and
๐‘“๐บ ๐‘ก ๐ด๐‘‡
and ๐‘“๐ต ๐‘ก ๐ด๐‘‡
โ€“ probability density of the scores at cut-off point ๐‘ก ๐ด๐‘‡
for good and bad applications respectively.
23
Traditional Approach 2
(Viaene and Dedene, 2005; Hand, 2009; Lessmann et al., 2015)
24
Note: Based on Crook et al. (2007), Hand (2009) and Verbraken et al. (2014). ๐‘  ๐ถ๐‘†(๐’™) โ€“ applicationโ€™s credit score estimated based on the
application data ๐’™; ๐‘“ ๐บ (๐‘  ๐ถ๐‘† ) and ๐‘“ ๐ต (๐‘  ๐ถ๐‘†) โ€“ credit scoreโ€™s probability density functions of actual good and bad applications respectively; ๐‘ก ๐ด๐‘‡
โ€“ acceptance threshold for the credit score; ๐น ๐ต (๐‘ก ๐ด๐‘‡ ) โ€“ correctly classified bad applications; 1 โˆ’ ๐น ๐บ (๐‘ก ๐ด๐‘‡ ) โ€“ correctly classified good
applications; 1 โˆ’ ๐น ๐ต (๐‘ก ๐ด๐‘‡ ) โ€“ bad applications misclassified as good ones; ๐น ๐บ (๐‘ก ๐ด๐‘‡ ) โ€“ good applications misclassified as bad ones; blue line is
the estimated potential profit (in thousands of euros for illustration purposes); grey dotted lines show alternative acceptance thresholds ๐‘ก ๐‘–
๐ด๐‘‡
and corresponding levels of potential profit; vertical red dotted line is the estimated optimal acceptance threshold ๐‘‡ ๐ด๐‘‡ , while horizontal red
dotted lines show the corresponding potential profit and shares of correctly classified and misclassified good and bad applications.
RL Benefits
โ€ข solves optimization problems with little or no prior
information about the environment (Kim et al., 2016);
โ€ข learns directly from the real-time data without any
simplifying assumptions (Rana and Oliveira, 2015);
โ€ข dynamically adjusts the policy over the learning period
adapting to environmental changes (Abe et al., 2010);
โ€ข avoids suffering potential costly poor performance by
training in a simulated environment or learning off-policy
(Aihe and Gonzalez, 2015);
โ€ข satisfies contradictive performance goals (Varela et al.,
2016);
โ€ข was found effective in portfolio optimization problems
(mainly stock and forex trading) (Strydom, 2017);
25
Parameters:
๐›ผ โ€“ learning rate;
๐›พ โ€“ discount rate.
Credit Business
Environment
RL Agent
Value Update Target:
๐‘„ ๐‘†, ๐ด + ๐›ผ[๐‘… + ๐›พmax ๐‘Ž ๐‘„ ๐‘†โ€ฒ
, ๐‘Ž โˆ’ ๐‘„ ๐‘†, ๐ด ]
Profit
Reward
๐‘…(๐‘†, ๐ด)
Acceptance
Rate
State
๐‘†(๐ด)
Acceptance
Threshold
Action
๐ด(๐‘†)
Value Function
Policy
Reward
๐‘…(๐‘†, ๐ด)
Q-values
๐‘„(๐‘†)
Q-values
๐‘„(๐‘†, ๐ด)
Action
๐ด(๐‘„)
State
๐‘†(๐ด)
Prediction
Learning
26
Value Function
Action value function (also called Q-value function)
describes an expected discounted reward of taking
action a in a state s and following a policy ฯ€
thereafter:
๐‘„ ๐œ‹ ๐‘ , ๐‘Ž = ๐”ผ ๐œ‹ ๐‘…๐‘ก + ๐›พ๐‘…๐‘ก+1 + ๐›พ2 ๐‘…๐‘ก+2 + โ‹ฏ ๐‘†๐‘ก = ๐‘ , ๐ด ๐‘ก = ๐‘Ž],
where ๐›พ is a discount rate.
Usually, the value function is approximated by a
model. In our case, we use Gaussian Radial Basis
Functions approximator and a set of Stochastic
Gradient Descent models.
27
Value Function
ActionState
20
action
values
2000
transformed
features
RBFs
transformation
SGD
weights
Policy
28
Gaussian Radial Basis Functions (RBF) transformation:
๐‘ฅ =
2
๐‘˜
cos ๐‘ค ๐‘…๐ต๐น
๐‘  + ๐‘ ๐‘…๐ต๐น
, ๐‘ค ๐‘…๐ต๐น
~ ๐‘ 0, 2๐›พ ๐‘…๐ต๐น , ๐‘ ๐‘…๐ต๐น
~ ๐‘ˆ(0, 2๐œ‹),
where ๐‘ฅ is the resulting transformed feature vector, ๐‘  is the input state variable,
๐‘˜ is the number of Monte Carlo samples per original feature, ๐‘ค ๐‘…๐ต๐น
is a
๐‘˜-element vector of randomly generated RBF weights, ๐‘ ๐‘…๐ต๐น
is a ๐‘˜-element
vector of randomly generated RBF offset values and ๐›พ ๐‘…๐ต๐น is the variance
parameter of a normal distribution.
Stochastic Gradient Descent (SGD) model for each action:
๐‘„ ๐‘ค ๐‘Ž, ๐‘  = ๐‘ค ๐‘Ž ๐‘…๐ต๐น(๐‘ ) = ๐‘ค ๐‘Ž ๐‘ฅ,
where ๐‘ค ๐‘Ž is a vector of regression weights for action ๐‘Ž, ๐‘  is the state variable,
๐‘…๐ต๐น is the RBF transformation function, ๐‘ฅ is the resulting vector of features
and ๐‘„ is the value of action ๐‘Ž in state ๐‘  corresponding to the feature vector ๐‘ฅ.
Choose action according to the current policy:
๐‘Ž = ๐œ‹ ๐บ๐‘Ÿ๐‘’๐‘’๐‘‘๐‘ฆ
(๐‘ ) = argmax
๐‘Ž
๐‘„ ๐‘ , ๐‘Ž
๐‘Ž = ๐œ‹ ๐ต๐‘œ๐‘™๐‘ก๐‘ง๐‘š๐‘Ž๐‘›๐‘›
๐‘Ž ๐‘  = โ„™ ๐ด ๐‘ก = ๐‘Ž ๐‘†๐‘ก = ๐‘ ] =
๐‘’
๐‘„(๐‘ ,๐‘Ž)
๐œ
๐‘’
๐‘„(๐‘ ,๐‘Žโ€ฒ)
๐œ
๐‘Žโ€ฒโˆˆ๐’œ
,
where ๐’œ is the set of all actions, ๐‘Žโ€ฒ is any action except action ๐‘Ž and ๐œ is the
temperature parameter of the Boltzmann distribution.
Forward Propagation (Prediction)
29
Backward Propagation (Learning)
The approximation error is:
๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก
To adjust the SGD model weights in the direction of the steepest error descent we
use the following update rule:
๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ[๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก ]
๐œ•๐‘„(๐‘ ,๐‘Ž)
๐œ•๐‘ค ๐‘Ž
,
which under assumption that ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž does not depend on ๐‘ค ๐‘Ž simplifies
to the general SGD update rule:
๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก ๐‘†๐‘ก,
where ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก can be thought of as current model prediction,
๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โ€“ the target and ๐‘†๐‘ก โ€“ the gradient of the weights.
30
Learning Episode
-52 0 60 82
Warming-up
Phase
Interaction
Phase
Delayed
Learning
Phase
Learning and
state-action
generation
starts
Simulation
starts
State-action
generation
ends
Learning and
simulation
ends
31
Note: State denotes the application acceptance rate during the previous iteration, action denotes the acceptance threshold for the following iteration,
value is the prediction of the Value Function model for a particular state-action pair, optimum shows the state-action pair that corresponds to the
highest value in the state-action space.
Value Function Model Convergence
1st episode Whole run
32
Result of the t-test for
Various Distortion Scenarios
33
Scenario t-statistic p-value
Scenario 1: downwards shift in score distribution 29.56631 1.55E-51
Scenario 2: upwards shift in score distribution 42.72066 2.45E-66
Scenario 3: downwards shift in default rates 5.172688 5.95E-07
Scenario 4: upwards shift in default rates 4.600158 6.20E-06
Note: the t-test null hypothesis is that the mean difference between the episode reward received by the RL agent and the episode
reward received using the traditional approach throughout 100 episodes is equal to or lower than zero.
Credit Scoring Literature
โ€ข Thomas, L. C., D. B. Edelmann, and J. N. Crook. "Credit Scoring and
Application." SIAM, Philadelphia (2017);
โ€ข Crook, Jonathan N., David B. Edelman, and Lyn C. Thomas. "Recent
developments in consumer credit risk assessment." European Journal of
Operational Research 183.3 (2007): 1447-1465;
โ€ข Hand, David J. "Measuring classifier performance: a coherent alternative
to the area under the ROC curve." Machine learning 77.1 (2009): 103-123;
โ€ข Verbraken, Thomas, et al. "Development and application of consumer
credit scoring models using profit-based classificatio measures." European
Journal of Operational Research 238.2 (2014): 505-513.
โ€ข Viaene, Stijn, and Guido Dedene. "Cost-sensitive learning and decision
making revisited." European journal of operational research 166.1 (2005):
212-220.
โ€ข Lessmann, Stefan, et al. "Benchmarking state-of-the-art classification
algorithms for credit scoring: An update of research." European Journal of
Operational Research 247.1 (2015): 124-136.
โ€ข Oliver, R. M., and L. C. Thomas. "Optimal score cutoffs and pricing in
regulatory capital in retail credit portfolios." (2009).
โ€ข Bellotti, Tony, and Jonathan Crook. "Forecasting and stress testing credit
card default using dynamic models." International Journal of Forecasting
29.4 (2013): 563-574.
34
RL Literatureโ€ข Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press,
2017;
โ€ข Mnih, Volodymyr, et al. "Human-level control through deep reinforcement
learning." Nature 518.7540 (2015): 529;
โ€ข Neuneier, Ralph. "Optimal asset allocation using adaptive dynamic programming." Advances in
Neural Information Processing Systems. 1996;
โ€ข Tesauro, Gerald, et al. "A hybrid reinforcement learning approach to autonomic resource
allocation." Autonomic Computing, 2006. ICAC'06. IEEE International Conference on. IEEE, 2006;
โ€ข Abe, Naoki, et al. "Optimizing debt collections using constrained reinforcement
learning." Proceedings of the 16th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 2010;
โ€ข Kim, Byung-Gook, et al. "Dynamic pricing and energy consumption scheduling with
reinforcement learning." IEEE Transactions on Smart Grid 7.5 (2016): 2187-2198;
โ€ข Sato, Masamichi. "Quantitative Realization of Behavioral Economic Heuristics by Cognitive
Category: Consumer Behavior Marketing with Reinforcement Learning." (2016);
โ€ข Strydom, Petrus. "Funding optimization for a bank integrating credit and liquidity risk." Journal of
Applied Finance and Banking 7.2 (2017): 1;
โ€ข Aihe, David O., and Avelino J. Gonzalez. "Correcting flawed expert knowledge through
reinforcement learning." Expert Systems with Applications 42.17-18 (2015): 6457-6471;
โ€ข Rana, Rupal, and Fernando S. Oliveira. "Dynamic pricing policies for interdependent perishable
products or services using reinforcement learning." Expert Systems with Applications 42.1 (2015):
426-436;
โ€ข Varela, Martรญn, Omar Viera, and Franco Robledo. "A q-learning approach for investment
decisions." Trends in Mathematical Economics. Springer, Cham, 2016. 347-368.
35

Weitere รคhnliche Inhalte

ร„hnlich wie Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning

Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptxVickyKumar131533
ย 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
ย 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
ย 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
ย 
Case study of Machine learning in banks
Case study of Machine learning in banksCase study of Machine learning in banks
Case study of Machine learning in banksZhongmin Luo
ย 
Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Zhongmin Luo
ย 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_FinalEric Esajian
ย 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyaltyTheDataNation
ย 
Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Kaustav Lahiri
ย 
Credit iconip
Credit iconipCredit iconip
Credit iconipArmando Vieira
ย 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
ย 
Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Michael Jacobs, Jr.
ย 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishArsalan Qadri
ย 
LSSGB_Project_SimpliLearn.ppt
LSSGB_Project_SimpliLearn.pptLSSGB_Project_SimpliLearn.ppt
LSSGB_Project_SimpliLearn.pptMash92
ย 
Cecl automation banking book analytics v3
Cecl automation   banking book analytics v3Cecl automation   banking book analytics v3
Cecl automation banking book analytics v3Sohail Farooq
ย 
Pmp exam prep Pdf- 2
Pmp exam prep Pdf- 2Pmp exam prep Pdf- 2
Pmp exam prep Pdf- 2Amr Miqdadi
ย 
Practical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxPractical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxRon Harasym
ย 
Business impact analysis and Cost-benefit Analysis. Risk Assesment
Business impact analysis and Cost-benefit Analysis. Risk AssesmentBusiness impact analysis and Cost-benefit Analysis. Risk Assesment
Business impact analysis and Cost-benefit Analysis. Risk Assesmenterfan7486
ย 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922stone55
ย 

ร„hnlich wie Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning (20)

Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
ย 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
ย 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
ย 
Creditscore
CreditscoreCreditscore
Creditscore
ย 
Case study of Machine learning in banks
Case study of Machine learning in banksCase study of Machine learning in banks
Case study of Machine learning in banks
ย 
Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?Should all a- rated banks have the same default risk as lehman?
Should all a- rated banks have the same default risk as lehman?
ย 
Case2_Best_Model_Final
Case2_Best_Model_FinalCase2_Best_Model_Final
Case2_Best_Model_Final
ย 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
ย 
Satisfaction and loyalty
Satisfaction and loyaltySatisfaction and loyalty
Satisfaction and loyalty
ย 
Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.Effective Cost Measurement through DMAIC.
Effective Cost Measurement through DMAIC.
ย 
Credit iconip
Credit iconipCredit iconip
Credit iconip
ย 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
ย 
Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4Jacobs stress testing_aug13_8-15-13_v4
Jacobs stress testing_aug13_8-15-13_v4
ย 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
ย 
LSSGB_Project_SimpliLearn.ppt
LSSGB_Project_SimpliLearn.pptLSSGB_Project_SimpliLearn.ppt
LSSGB_Project_SimpliLearn.ppt
ย 
Cecl automation banking book analytics v3
Cecl automation   banking book analytics v3Cecl automation   banking book analytics v3
Cecl automation banking book analytics v3
ย 
Pmp exam prep Pdf- 2
Pmp exam prep Pdf- 2Pmp exam prep Pdf- 2
Pmp exam prep Pdf- 2
ย 
Practical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxPractical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptx
ย 
Business impact analysis and Cost-benefit Analysis. Risk Assesment
Business impact analysis and Cost-benefit Analysis. Risk AssesmentBusiness impact analysis and Cost-benefit Analysis. Risk Assesment
Business impact analysis and Cost-benefit Analysis. Risk Assesment
ย 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
ย 

Mehr von Eesti Pank

Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023
Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023
Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023Eesti Pank
ย 
Eesti finantssektori olukord ja peamised riskid
Eesti finantssektori olukord ja peamised riskidEesti finantssektori olukord ja peamised riskid
Eesti finantssektori olukord ja peamised riskidEesti Pank
ย 
Eesti Panga majandusprognoos 2023โ€’2025
Eesti Panga majandusprognoos 2023โ€’2025Eesti Panga majandusprognoos 2023โ€’2025
Eesti Panga majandusprognoos 2023โ€’2025Eesti Pank
ย 
Finantssstabiilsuse รผlevaade 2023/1
Finantssstabiilsuse รผlevaade 2023/1Finantssstabiilsuse รผlevaade 2023/1
Finantssstabiilsuse รผlevaade 2023/1Eesti Pank
ย 
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?Juuso Vanhala. Persistent misallocation or a necessary temporary evil?
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?Eesti Pank
ย 
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...Eesti Pank
ย 
Tรถรถturu รœlevaade 1/2023
Tรถรถturu รœlevaade 1/2023Tรถรถturu รœlevaade 1/2023
Tรถรถturu รœlevaade 1/2023Eesti Pank
ย 
Eesti Panga majandusprognoos 2023-2025
Eesti Panga majandusprognoos 2023-2025Eesti Panga majandusprognoos 2023-2025
Eesti Panga majandusprognoos 2023-2025Eesti Pank
ย 
Majanduse Rahastamise รœlevaade. Veebruar 2023
Majanduse Rahastamise รœlevaade. Veebruar 2023Majanduse Rahastamise รœlevaade. Veebruar 2023
Majanduse Rahastamise รœlevaade. Veebruar 2023Eesti Pank
ย 
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...Eesti Pank
ย 
Luck and skill in the performance of global equity funds in Central and Easte...
Luck and skill in the performance of global equity funds in Central and Easte...Luck and skill in the performance of global equity funds in Central and Easte...
Luck and skill in the performance of global equity funds in Central and Easte...Eesti Pank
ย 
Adjusting to Economic Sanctions
Adjusting to Economic SanctionsAdjusting to Economic Sanctions
Adjusting to Economic SanctionsEesti Pank
ย 
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikes
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikesPangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikes
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikesEesti Pank
ย 
Eesti Pank Economic forecast 2022โ€“2025
Eesti Pank Economic forecast 2022โ€“2025Eesti Pank Economic forecast 2022โ€“2025
Eesti Pank Economic forecast 2022โ€“2025Eesti Pank
ย 
Eesti Panga majandusprognoos 2022โ€“2025
Eesti Panga majandusprognoos 2022โ€“2025Eesti Panga majandusprognoos 2022โ€“2025
Eesti Panga majandusprognoos 2022โ€“2025Eesti Pank
ย 
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika roll
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika rollMadis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika roll
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika rollEesti Pank
ย 
Marko Allikson. Energiaturu olukorrast
Marko Allikson. Energiaturu olukorrastMarko Allikson. Energiaturu olukorrast
Marko Allikson. Energiaturu olukorrastEesti Pank
ย 
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...Eesti Pank
ย 
Romain Duval. IMF Regional Economic Outlook for Europe
Romain Duval. IMF Regional Economic Outlook for EuropeRomain Duval. IMF Regional Economic Outlook for Europe
Romain Duval. IMF Regional Economic Outlook for EuropeEesti Pank
ย 
Finantsstabiilsuse รœlevaade 2022/2
Finantsstabiilsuse รœlevaade 2022/2Finantsstabiilsuse รœlevaade 2022/2
Finantsstabiilsuse รœlevaade 2022/2Eesti Pank
ย 

Mehr von Eesti Pank (20)

Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023
Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023
Eesti Panga majandusprognoos 2023โ€’2026. 19.12.2023
ย 
Eesti finantssektori olukord ja peamised riskid
Eesti finantssektori olukord ja peamised riskidEesti finantssektori olukord ja peamised riskid
Eesti finantssektori olukord ja peamised riskid
ย 
Eesti Panga majandusprognoos 2023โ€’2025
Eesti Panga majandusprognoos 2023โ€’2025Eesti Panga majandusprognoos 2023โ€’2025
Eesti Panga majandusprognoos 2023โ€’2025
ย 
Finantssstabiilsuse รผlevaade 2023/1
Finantssstabiilsuse รผlevaade 2023/1Finantssstabiilsuse รผlevaade 2023/1
Finantssstabiilsuse รผlevaade 2023/1
ย 
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?Juuso Vanhala. Persistent misallocation or a necessary temporary evil?
Juuso Vanhala. Persistent misallocation or a necessary temporary evil?
ย 
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...
Karsten Staehr. Macroeconomic News and Sovereign Interest Rate Spreads before...
ย 
Tรถรถturu รœlevaade 1/2023
Tรถรถturu รœlevaade 1/2023Tรถรถturu รœlevaade 1/2023
Tรถรถturu รœlevaade 1/2023
ย 
Eesti Panga majandusprognoos 2023-2025
Eesti Panga majandusprognoos 2023-2025Eesti Panga majandusprognoos 2023-2025
Eesti Panga majandusprognoos 2023-2025
ย 
Majanduse Rahastamise รœlevaade. Veebruar 2023
Majanduse Rahastamise รœlevaade. Veebruar 2023Majanduse Rahastamise รœlevaade. Veebruar 2023
Majanduse Rahastamise รœlevaade. Veebruar 2023
ย 
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...
The Sufficiency of Debt Relief as a Panacea to Sovereign Debt Crisis in Sub-S...
ย 
Luck and skill in the performance of global equity funds in Central and Easte...
Luck and skill in the performance of global equity funds in Central and Easte...Luck and skill in the performance of global equity funds in Central and Easte...
Luck and skill in the performance of global equity funds in Central and Easte...
ย 
Adjusting to Economic Sanctions
Adjusting to Economic SanctionsAdjusting to Economic Sanctions
Adjusting to Economic Sanctions
ย 
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikes
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikesPangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikes
Pangalaenude intressimarginaalid Eestis erinevate laenutรผรผpide lรตikes
ย 
Eesti Pank Economic forecast 2022โ€“2025
Eesti Pank Economic forecast 2022โ€“2025Eesti Pank Economic forecast 2022โ€“2025
Eesti Pank Economic forecast 2022โ€“2025
ย 
Eesti Panga majandusprognoos 2022โ€“2025
Eesti Panga majandusprognoos 2022โ€“2025Eesti Panga majandusprognoos 2022โ€“2025
Eesti Panga majandusprognoos 2022โ€“2025
ย 
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika roll
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika rollMadis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika roll
Madis Mรผller. Inflatsiooni pรตhjused, vรคljavaated ja rahapoliitika roll
ย 
Marko Allikson. Energiaturu olukorrast
Marko Allikson. Energiaturu olukorrastMarko Allikson. Energiaturu olukorrast
Marko Allikson. Energiaturu olukorrast
ย 
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...
Fabio Canovaand Evi Pappa. Costly disasters, energy consumption, and the role...
ย 
Romain Duval. IMF Regional Economic Outlook for Europe
Romain Duval. IMF Regional Economic Outlook for EuropeRomain Duval. IMF Regional Economic Outlook for Europe
Romain Duval. IMF Regional Economic Outlook for Europe
ย 
Finantsstabiilsuse รœlevaade 2022/2
Finantsstabiilsuse รœlevaade 2022/2Finantsstabiilsuse รœlevaade 2022/2
Finantsstabiilsuse รœlevaade 2022/2
ย 

Kรผrzlich hochgeladen

Top Rated Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...Call Girls in Nagpur High Profile
ย 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
ย 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...Call Girls in Nagpur High Profile
ย 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfGale Pooley
ย 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
ย 
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbaiVasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbaipriyasharma62062
ย 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
ย 
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...dipikadinghjn ( Why You Choose Us? ) Escorts
ย 
The Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdfThe Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdfGale Pooley
ย 
Top Rated Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
ย 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptxFinTech Belgium
ย 
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...dipikadinghjn ( Why You Choose Us? ) Escorts
ย 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfGale Pooley
ย 
Top Rated Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...
Top Rated  Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...Top Rated  Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...
Top Rated Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...Call Girls in Nagpur High Profile
ย 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )Pooja Nehwal
ย 
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...dipikadinghjn ( Why You Choose Us? ) Escorts
ย 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfSaviRakhecha1
ย 

Kรผrzlich hochgeladen (20)

Top Rated Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex...
ย 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
ย 
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...Booking open Available Pune Call Girls Talegaon Dabhade  6297143586 Call Hot ...
Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...
ย 
The Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdfThe Economic History of the U.S. Lecture 19.pdf
The Economic History of the U.S. Lecture 19.pdf
ย 
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
Call Girls in New Ashok Nagar, (delhi) call me [9953056974] escort service 24X7
ย 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
ย 
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7
ย 
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbaiVasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
Vasai-Virar Fantastic Call Girls-9833754194-Call Girls MUmbai
ย 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
ย 
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...
( Jasmin ) Top VIP Escorts Service Dindigul ๐Ÿ’ง 7737669865 ๐Ÿ’ง by Dindigul Call G...
ย 
The Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdfThe Economic History of the U.S. Lecture 23.pdf
The Economic History of the U.S. Lecture 23.pdf
ย 
Top Rated Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Dighi โŸŸ 6297143586 โŸŸ Call Me For Genuine Sex Servi...
ย 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx
ย 
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...
VIP Call Girl in Mumbai ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday Wit...
ย 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
ย 
Top Rated Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...
Top Rated  Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...Top Rated  Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...
Top Rated Pune Call Girls Sinhagad Road โŸŸ 6297143586 โŸŸ Call Me For Genuine S...
ย 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )
Vip Call US ๐Ÿ“ž 7738631006 โœ…Call Girls In Sakinaka ( Mumbai )
ย 
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road ๐Ÿ’ง 9920725232 ( Call Me ) Get A New Crush Everyday ...
ย 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
ย 

Mykola Herasymovych: Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning

  • 1. Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning Student: Mykola Herasymovych Supervisors: Oliver Lukason (PhD) Karl Mรคrka (MSc)
  • 2. Credit Scoring Problem (Crook et al., 2007; Lessmann et al. 2015; Thomas et al., 2017) โ€ข Predict the probability of a loan application being bad: Pr ๐ต๐‘Ž๐‘‘ ๐‘โ„Ž๐‘Ž๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘’๐‘Ÿ๐‘–๐‘ ๐‘ก๐‘–๐‘๐‘  ๐’™} = ๐‘ ๐‘ฆ = 1|๐’™ = ๐‘ฆ โ€ข Transform it into a credit score reflecting applicationโ€™s creditworthiness level: ๐‘  ๐ถ๐‘† ๐’™ = ๐‘  ๐ถ๐‘† (๐‘ฆ, ๐‘ง), ๐‘  ๐ถ๐‘† - credit score, ๐‘ฆ - estimated probability, ๐‘ง โ€“ other factors (e.g. policy rules) 2
  • 3. Credit Business Process (Creditstar Group) Loan application Estimate credit score 50% Credit score is high: give loan 50% Credit score is low: reject application Client doesnโ€™t repay: money loss Client repays: money gain Profits change credit score 3
  • 4. Acceptance Threshold Optimization Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning Acceptance Threshold (Viaene and Dedene, 2005; Verbraken et al., 2014; Skarestad, 2017)(Banasik et al., 2003; Wu and Hand, 2007; Dey, 2010)(Sousa et al., 2013; Bellotti and Crook, 2013; Nikolaidis, 2017) Selection BiasPopulation Drift 4
  • 5. Credit Scoring Literature 1 (Number of published articles with โ€œcredit scoringโ€ keyword) 0 50 100 150 200 250 300 Articles by year General trend Note: adapted from Louzada et al. (2016) and updated by the author based on literature review. 5
  • 6. Credit Scoring Literature 2 (Percentage of papers published on the topic in 1992-2015) Note: adapted from Louzada et al. (2016) and updated by the author based on literature review. 0% 10% 20% 30% 40% 50% 60% New method to propose rating Comparison in traditional techinques Conceptual discussion Variable selection Literature review Performance measures Other issues Acceptance threshold optimization 0% 10% 20% 30% 40% 50% 60% New method to propose rating Comparison in traditional techinques Conceptual discussion Variable selection Literature review Performance measures Other issues Acceptance threshold optimization 0% 10% 20% 30% 40% 50% 60% New method to propose rating Comparison in traditional techinques Conceptual discussion Variable selection Literature review Performance measures Other issues Acceptance threshold optimization 0% 10% 20% 30% 40% 50% 60% New method to propose rating Comparison in traditional techinques Conceptual discussion Variable selection Literature review Performance measures Other issues Acceptance threshold optimization 6
  • 7. Shortcomings of Traditional Approach โ€ข Is static and backward looking; โ€ข Ignores credit scoring modelโ€™s performance uncertainty (Thomas et al., 2017); โ€ข Ignores selection bias (Hand, 2006; Dey, 2010); โ€ข Ignores population drift (Sousa et al., 2013; Nikolaidis, 2017); โ€ข Oversimplifies lenderโ€™s utility function (Finlay, 2010; Skarestad, 2017). 7
  • 8. Solution A Reinforcement Learning (RL) agent: โ€ข a dynamic forward-looking system โ€ข that adapts to the live data feedback โ€ข and adjusts acceptance threshold โ€ข to maximize accurately specified lenderโ€™s utility function. Reinforcement Learning 8
  • 9. RL Achievements โ€ข Forex, stocks and securities trading (Neuneier, 1996); โ€ข Resource allocation (Tesauro et al., 2006); โ€ข Tax and debt collection optimization (Abe et al., 2010); โ€ข Dynamic pricing (Kim et al., 2016); โ€ข Behavioral marketing (Sato, 2016); โ€ข Bank portfolio optimization (Strydom, 2017). โ€ข Has not been applied to the credit scoring yet, to the best of our knowledge. 9
  • 11. RL parameters: ๐›ผ โ€“ learning rate; ๐›พ โ€“ discount rate; ๐‘ก ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ_๐‘ก โ€“ inverse scaling parameter of the learning rate; Credit Business Environment RL Agent Q-Value Function Update Rule: ๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ ๐‘ก ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก ๐œ•๐‘„(๐‘ ,๐‘Ž) ๐œ•๐‘ค ๐‘Ž , ๐›ผ ๐‘ก = ๐›ผ0 ๐‘ก ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ_๐‘ก, ๐œ•๐‘„(๐‘ ,๐‘Ž) ๐œ•๐‘ค ๐‘Ž = ๐‘†๐‘ก Profit Reward ๐‘… ๐‘†, ๐ด = ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘ก๐‘ ๐‘— ๐‘Ž ๐‘— ๐‘Ž ๐‘š๐‘Ž๐‘ฅ ๐‘Ž= ๐‘Ž ๐‘— ๐‘ก ๐‘—=0 Acceptance Rate State ๐‘†(๐ด) = ๐‘ ๐‘– ๐ถ๐‘† โ‰ฅ ๐‘ก ๐ด๐‘‡ ๐‘Ž ๐‘กโˆ’1 ๐‘› ๐‘ก ๐‘–=1 ๐‘› ๐‘ก Acceptance Threshold Action ๐ด ๐‘† = ๐œ‹(๐‘„(๐‘ค, ๐‘‹(๐‘†))) Reward ๐‘… ๐‘Ž (๐‘†, ๐ด) = ๐‘ƒ๐‘Ÿ๐‘œ๐‘“๐‘–๐‘ก๐‘ ๐‘— ๐‘Ž ๐‘—๐‘ก ๐‘—=0 Q-values ๐‘„ ๐‘ค, ๐‘‹ = ๐‘ค๐‘‹ Q-values ๐‘„ ๐‘Ž ๐‘ค ๐‘Ž, ๐‘‹ = ๐‘ค ๐‘Ž ๐‘‹ Action ๐ด ๐‘„ = ๐œ‹ ๐ด ๐‘†) RBF Features ๐‘‹ ๐‘† = 2 ๐‘˜ cos ๐‘ค ๐‘…๐ต๐น ๐‘†(๐ด) + ๐‘ ๐‘…๐ต๐น , ๐‘ค ๐‘…๐ต๐น ~ ๐‘ 0, 2๐›พ ๐‘…๐ต๐น , ๐‘ ๐‘…๐ต๐น ~ ๐‘ˆ(0, 2๐œ‹) Prediction Learning Q-Value Function ๐‘„ ๐œ‹ ๐‘†, ๐ด = ๐”ผ ๐œ‹ ๐›พ ๐‘– ๐‘… ๐‘ก+๐‘– โˆž ๐‘–=0 ๐‘†๐‘ก = ๐‘ , ๐ด ๐‘ก = ๐‘Ž Policy ๐œ‹ ๐ด ๐‘†) = โ„™ ๐ด ๐‘ก = ๐‘Ž ๐‘†๐‘ก = ๐‘ ] CS variables: ๐‘› โ€“ number of loan applications; ๐‘  ๐ถ๐‘† โ€“ credit score; ๐‘ก ๐ด๐‘‡ โ€“ acceptance threshold. RBF parameters: ๐‘ค ๐‘…๐ต๐น โ€“ RBF weights; ๐‘ ๐‘…๐ต๐น โ€“ RBF offset values; ๐›พ ๐‘…๐ต๐น - variance parameter of a normal distribution; ๐‘˜ - numbers of RBF components. Policy parameters: ๐œ โ€“ the temperature parameter of the Boltzmann distribution. Exploitative ๐œ‹ ๐บ๐‘Ÿ๐‘’๐‘’๐‘‘๐‘ฆ (๐‘„) = argmax ๐‘Ž ๐‘„ ๐‘†, ๐ด Explorative ๐œ‹ ๐ต๐‘œ๐‘™๐‘ก๐‘ง๐‘š๐‘Ž๐‘›๐‘› ๐ด ๐‘† = ๐‘’ ๐‘„(๐‘†,๐ด) ๐œ ๐‘’ ๐‘„(๐‘†,๐ดโ€ฒ) ๐œ ๐ดโ€ฒโˆˆ๐’œ Note: the process repeats at a weekly frequency: t โ€“ week number. Note: The State object summarizes characteristics of the loan portfolio. Note: The Action object is mapped to one out of 20 discrete values of acceptance threshold. Note: The policy is explorative during training episodes and exploitative during test ones. Note: Higher ๐œ lead to a more greedy policy, while lower ๐œ โ€“ to a more random one. Note: The Q-value function is approximated with Stochastic Gradient Descent (SGD) models. Note: The RL is less responsive during training and more responsive during test episodes. 11
  • 12. Credit Business Environment RL Agent (the dog) Profit Reward Acceptance Rate State Acceptance Threshold Action Acceptance Rate Next State Week Reinforcement Learning (RL) (Sutton and Barto, 2017) 12 Loss Reward 3 Higher Profit Reward 1010010004 12
  • 13. Learned Value Function shape (after 6000 simulated weeks of training) Notes: state denotes the application acceptance rate during the previous week, action denotes the acceptance threshold for the following week, value is the prediction of the Value Function model for a particular state-action pair, optimum shows the state-action pair that corresponds to the highest value in the state-action space. 13
  • 14. Traditional Approach (Baseline) Notes: Baseline approach follows methodology of Verbraken et al. (2014) and Skarestad (2017). 14
  • 15. Test Simulation Results 1 (shift in score distribution) Notes: figures show 100 simulation runs and their average. In each scenario the distribution of total profit differences is significantly higher than zero according to the one-tailed t-test. Profit is measured in thousands of euros. 15
  • 16. Notes: figures show 100 simulation runs and their average. In each scenario the distribution of total profit differences is significantly higher than zero according to the one-tailed t-test. Profit is measured in thousands of euros. Test Simulation Results 2 (shift in default rates) 16
  • 17. Performance on the Real Data 1 (acceptance threshold policy) Notes: figure shows the difference between action variables and the baseline action. Baseline denotes the acceptance threshold optimized using traditional approach, RL chosen denotes the one used by the RL agent, Value Function-optimal denotes the one optimal according to the Value Function model. 17
  • 18. Performance on the Real Data 2 (profits received) Note: figure shows the difference between reward variables and the baseline reward. Baseline denotes the profits received with the acceptance threshold optimized using traditional approach, RL received weekly and total denote profits received by the RL agent. Profit is measured in thousands of euros. 18
  • 19. Implications โ€ข The work improves the traditional acceptance threshold optimization approach in credit scoring of Verbraken et al. (2014) and Skarestad (2017); โ€ข Solves the problem of optimization in a dynamic partially observed credit business environment outlined in Thomas et al. (2017) and Nikolaidis (2017); โ€ข Provides more evidence on superiority of RL-based systems compared to traditional methodology in line with Strydom (2017) and Sutton and Barto (2017); โ€ข Produces practical benefit to Creditstar Group as a decision support system. 19
  • 20. Conclusions โ€ข The credit scoring literature usually omits the problem of acceptance threshold optimization, despite its significant impact on the credit business efficiency; โ€ข The traditional approach fails to optimize the acceptance threshold due to issues like population drift and selection bias; โ€ข The developed RL algorithm manages to correct for the flawed knowledge and successfully adapt to the real environment, significantly outperforming the traditional approach; โ€ข Being a proof of concept, our work describes a large room for further research and improvement of the acceptance threshold optimization issue. Q&A 20
  • 22. Acceptance Threshold Optimization Optimizing Acceptance Threshold in Credit Scoring using Reinforcement Learning Acceptance Threshold 22
  • 23. Traditional Approach 1 (Viaene and Dedene, 2005; Hand, 2009; Lessmann et al., 2015) โ€ข Construct the misclassification costs function: ๐‘€๐ถ ๐‘ก ๐ด๐‘‡ ; ๐‘๐ต , ๐‘๐บ = ๐‘๐ต ๐œ‹ ๐ต ๐‘ƒ๐‘ƒ (1 โˆ’ ๐น๐ต ๐‘ก ๐ด๐‘‡ ) + ๐‘๐บ ๐œ‹ ๐บ ๐‘ƒ๐‘ƒ ๐น๐บ(๐‘ก ๐ด๐‘‡ ) โ€ข Minimize using FOC w.r.t. acceptance threshold: ๐‘“๐ต(๐‘‡ ๐ด๐‘‡) ๐‘“๐บ(๐‘‡ ๐ด๐‘‡) = ๐œ‹ ๐บ ๐‘ƒ๐‘ƒ ๐œ‹ ๐ต ๐‘ƒ๐‘ƒ ๐‘๐บ ๐‘๐ต ๐‘ก ๐ด๐‘‡ โ€“ acceptance threshold, ๐‘‡ ๐ด๐‘‡ โ€“ optimal acceptance threshold, ๐‘๐ต and ๐‘๐บ โ€“ average cost per misclassified bad (Type I error)and good (Type II error)application respectively, ๐œ‹ ๐บ ๐‘ƒ๐‘ƒ and ๐œ‹ ๐ต ๐‘ƒ๐‘ƒ โ€“ prior probabilities of being a good and a bad application respectively and ๐‘“๐บ ๐‘ก ๐ด๐‘‡ and ๐‘“๐ต ๐‘ก ๐ด๐‘‡ โ€“ probability density of the scores at cut-off point ๐‘ก ๐ด๐‘‡ for good and bad applications respectively. 23
  • 24. Traditional Approach 2 (Viaene and Dedene, 2005; Hand, 2009; Lessmann et al., 2015) 24 Note: Based on Crook et al. (2007), Hand (2009) and Verbraken et al. (2014). ๐‘  ๐ถ๐‘†(๐’™) โ€“ applicationโ€™s credit score estimated based on the application data ๐’™; ๐‘“ ๐บ (๐‘  ๐ถ๐‘† ) and ๐‘“ ๐ต (๐‘  ๐ถ๐‘†) โ€“ credit scoreโ€™s probability density functions of actual good and bad applications respectively; ๐‘ก ๐ด๐‘‡ โ€“ acceptance threshold for the credit score; ๐น ๐ต (๐‘ก ๐ด๐‘‡ ) โ€“ correctly classified bad applications; 1 โˆ’ ๐น ๐บ (๐‘ก ๐ด๐‘‡ ) โ€“ correctly classified good applications; 1 โˆ’ ๐น ๐ต (๐‘ก ๐ด๐‘‡ ) โ€“ bad applications misclassified as good ones; ๐น ๐บ (๐‘ก ๐ด๐‘‡ ) โ€“ good applications misclassified as bad ones; blue line is the estimated potential profit (in thousands of euros for illustration purposes); grey dotted lines show alternative acceptance thresholds ๐‘ก ๐‘– ๐ด๐‘‡ and corresponding levels of potential profit; vertical red dotted line is the estimated optimal acceptance threshold ๐‘‡ ๐ด๐‘‡ , while horizontal red dotted lines show the corresponding potential profit and shares of correctly classified and misclassified good and bad applications.
  • 25. RL Benefits โ€ข solves optimization problems with little or no prior information about the environment (Kim et al., 2016); โ€ข learns directly from the real-time data without any simplifying assumptions (Rana and Oliveira, 2015); โ€ข dynamically adjusts the policy over the learning period adapting to environmental changes (Abe et al., 2010); โ€ข avoids suffering potential costly poor performance by training in a simulated environment or learning off-policy (Aihe and Gonzalez, 2015); โ€ข satisfies contradictive performance goals (Varela et al., 2016); โ€ข was found effective in portfolio optimization problems (mainly stock and forex trading) (Strydom, 2017); 25
  • 26. Parameters: ๐›ผ โ€“ learning rate; ๐›พ โ€“ discount rate. Credit Business Environment RL Agent Value Update Target: ๐‘„ ๐‘†, ๐ด + ๐›ผ[๐‘… + ๐›พmax ๐‘Ž ๐‘„ ๐‘†โ€ฒ , ๐‘Ž โˆ’ ๐‘„ ๐‘†, ๐ด ] Profit Reward ๐‘…(๐‘†, ๐ด) Acceptance Rate State ๐‘†(๐ด) Acceptance Threshold Action ๐ด(๐‘†) Value Function Policy Reward ๐‘…(๐‘†, ๐ด) Q-values ๐‘„(๐‘†) Q-values ๐‘„(๐‘†, ๐ด) Action ๐ด(๐‘„) State ๐‘†(๐ด) Prediction Learning 26
  • 27. Value Function Action value function (also called Q-value function) describes an expected discounted reward of taking action a in a state s and following a policy ฯ€ thereafter: ๐‘„ ๐œ‹ ๐‘ , ๐‘Ž = ๐”ผ ๐œ‹ ๐‘…๐‘ก + ๐›พ๐‘…๐‘ก+1 + ๐›พ2 ๐‘…๐‘ก+2 + โ‹ฏ ๐‘†๐‘ก = ๐‘ , ๐ด ๐‘ก = ๐‘Ž], where ๐›พ is a discount rate. Usually, the value function is approximated by a model. In our case, we use Gaussian Radial Basis Functions approximator and a set of Stochastic Gradient Descent models. 27
  • 29. Gaussian Radial Basis Functions (RBF) transformation: ๐‘ฅ = 2 ๐‘˜ cos ๐‘ค ๐‘…๐ต๐น ๐‘  + ๐‘ ๐‘…๐ต๐น , ๐‘ค ๐‘…๐ต๐น ~ ๐‘ 0, 2๐›พ ๐‘…๐ต๐น , ๐‘ ๐‘…๐ต๐น ~ ๐‘ˆ(0, 2๐œ‹), where ๐‘ฅ is the resulting transformed feature vector, ๐‘  is the input state variable, ๐‘˜ is the number of Monte Carlo samples per original feature, ๐‘ค ๐‘…๐ต๐น is a ๐‘˜-element vector of randomly generated RBF weights, ๐‘ ๐‘…๐ต๐น is a ๐‘˜-element vector of randomly generated RBF offset values and ๐›พ ๐‘…๐ต๐น is the variance parameter of a normal distribution. Stochastic Gradient Descent (SGD) model for each action: ๐‘„ ๐‘ค ๐‘Ž, ๐‘  = ๐‘ค ๐‘Ž ๐‘…๐ต๐น(๐‘ ) = ๐‘ค ๐‘Ž ๐‘ฅ, where ๐‘ค ๐‘Ž is a vector of regression weights for action ๐‘Ž, ๐‘  is the state variable, ๐‘…๐ต๐น is the RBF transformation function, ๐‘ฅ is the resulting vector of features and ๐‘„ is the value of action ๐‘Ž in state ๐‘  corresponding to the feature vector ๐‘ฅ. Choose action according to the current policy: ๐‘Ž = ๐œ‹ ๐บ๐‘Ÿ๐‘’๐‘’๐‘‘๐‘ฆ (๐‘ ) = argmax ๐‘Ž ๐‘„ ๐‘ , ๐‘Ž ๐‘Ž = ๐œ‹ ๐ต๐‘œ๐‘™๐‘ก๐‘ง๐‘š๐‘Ž๐‘›๐‘› ๐‘Ž ๐‘  = โ„™ ๐ด ๐‘ก = ๐‘Ž ๐‘†๐‘ก = ๐‘ ] = ๐‘’ ๐‘„(๐‘ ,๐‘Ž) ๐œ ๐‘’ ๐‘„(๐‘ ,๐‘Žโ€ฒ) ๐œ ๐‘Žโ€ฒโˆˆ๐’œ , where ๐’œ is the set of all actions, ๐‘Žโ€ฒ is any action except action ๐‘Ž and ๐œ is the temperature parameter of the Boltzmann distribution. Forward Propagation (Prediction) 29
  • 30. Backward Propagation (Learning) The approximation error is: ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก To adjust the SGD model weights in the direction of the steepest error descent we use the following update rule: ๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ[๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก ] ๐œ•๐‘„(๐‘ ,๐‘Ž) ๐œ•๐‘ค ๐‘Ž , which under assumption that ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž does not depend on ๐‘ค ๐‘Ž simplifies to the general SGD update rule: ๐‘ค ๐‘Ž โ† ๐‘ค ๐‘Ž + ๐›ผ ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก ๐‘†๐‘ก, where ๐‘„ ๐‘†๐‘ก, ๐ด ๐‘ก can be thought of as current model prediction, ๐‘…๐‘ก + ๐›พ๐‘š๐‘Ž๐‘ฅ ๐‘Ž ๐‘„ ๐‘†๐‘ก+1, ๐‘Ž โ€“ the target and ๐‘†๐‘ก โ€“ the gradient of the weights. 30
  • 31. Learning Episode -52 0 60 82 Warming-up Phase Interaction Phase Delayed Learning Phase Learning and state-action generation starts Simulation starts State-action generation ends Learning and simulation ends 31
  • 32. Note: State denotes the application acceptance rate during the previous iteration, action denotes the acceptance threshold for the following iteration, value is the prediction of the Value Function model for a particular state-action pair, optimum shows the state-action pair that corresponds to the highest value in the state-action space. Value Function Model Convergence 1st episode Whole run 32
  • 33. Result of the t-test for Various Distortion Scenarios 33 Scenario t-statistic p-value Scenario 1: downwards shift in score distribution 29.56631 1.55E-51 Scenario 2: upwards shift in score distribution 42.72066 2.45E-66 Scenario 3: downwards shift in default rates 5.172688 5.95E-07 Scenario 4: upwards shift in default rates 4.600158 6.20E-06 Note: the t-test null hypothesis is that the mean difference between the episode reward received by the RL agent and the episode reward received using the traditional approach throughout 100 episodes is equal to or lower than zero.
  • 34. Credit Scoring Literature โ€ข Thomas, L. C., D. B. Edelmann, and J. N. Crook. "Credit Scoring and Application." SIAM, Philadelphia (2017); โ€ข Crook, Jonathan N., David B. Edelman, and Lyn C. Thomas. "Recent developments in consumer credit risk assessment." European Journal of Operational Research 183.3 (2007): 1447-1465; โ€ข Hand, David J. "Measuring classifier performance: a coherent alternative to the area under the ROC curve." Machine learning 77.1 (2009): 103-123; โ€ข Verbraken, Thomas, et al. "Development and application of consumer credit scoring models using profit-based classificatio measures." European Journal of Operational Research 238.2 (2014): 505-513. โ€ข Viaene, Stijn, and Guido Dedene. "Cost-sensitive learning and decision making revisited." European journal of operational research 166.1 (2005): 212-220. โ€ข Lessmann, Stefan, et al. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research." European Journal of Operational Research 247.1 (2015): 124-136. โ€ข Oliver, R. M., and L. C. Thomas. "Optimal score cutoffs and pricing in regulatory capital in retail credit portfolios." (2009). โ€ข Bellotti, Tony, and Jonathan Crook. "Forecasting and stress testing credit card default using dynamic models." International Journal of Forecasting 29.4 (2013): 563-574. 34
  • 35. RL Literatureโ€ข Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2017; โ€ข Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529; โ€ข Neuneier, Ralph. "Optimal asset allocation using adaptive dynamic programming." Advances in Neural Information Processing Systems. 1996; โ€ข Tesauro, Gerald, et al. "A hybrid reinforcement learning approach to autonomic resource allocation." Autonomic Computing, 2006. ICAC'06. IEEE International Conference on. IEEE, 2006; โ€ข Abe, Naoki, et al. "Optimizing debt collections using constrained reinforcement learning." Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010; โ€ข Kim, Byung-Gook, et al. "Dynamic pricing and energy consumption scheduling with reinforcement learning." IEEE Transactions on Smart Grid 7.5 (2016): 2187-2198; โ€ข Sato, Masamichi. "Quantitative Realization of Behavioral Economic Heuristics by Cognitive Category: Consumer Behavior Marketing with Reinforcement Learning." (2016); โ€ข Strydom, Petrus. "Funding optimization for a bank integrating credit and liquidity risk." Journal of Applied Finance and Banking 7.2 (2017): 1; โ€ข Aihe, David O., and Avelino J. Gonzalez. "Correcting flawed expert knowledge through reinforcement learning." Expert Systems with Applications 42.17-18 (2015): 6457-6471; โ€ข Rana, Rupal, and Fernando S. Oliveira. "Dynamic pricing policies for interdependent perishable products or services using reinforcement learning." Expert Systems with Applications 42.1 (2015): 426-436; โ€ข Varela, Martรญn, Omar Viera, and Franco Robledo. "A q-learning approach for investment decisions." Trends in Mathematical Economics. Springer, Cham, 2016. 347-368. 35