SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Lili Zhang1, Jennifer Priestley2, and Xuelei Ni3
1Program in Analytics and Data Science, Kennesaw State University, Georgia, USA
2Analytics and Data Science Institute, Kennesaw State University, Georgia, USA
3Department of Statistics, Kennesaw State University, Georgia, USA
Influence of the Event Rate on Discrimination Abilities
of Bankruptcy Prediction Models
22
Outline
• Introduction
• Related Work
• Data
• Exploratory Analysis
• Methodology
• Discussions and Conclusions
• Future Work
333
Introduction
• Bankruptcy prediction has been studied
since 1960s to improve decision making.
• To improve its discrimination ability,
efforts have been done in respects of
significant predictors, algorithms, and
sampling.
3
444
Related Work
4
• Altman: financial ratios (Working
Capital/Total Assets, etc.) + multiple
discrimination analysis.
• Odom: first neural network.
• Shin: support vector machine
outperformed neural network on small
training datasets.
• Zibanezhad: decision tree + most
important predictors.
555
Related Work
5
• Sun: Bayesian network + influence of
variable selection and discretization.
• Kim: geometric mean based boosting
algorithm to address the imbalance.
• Zhou: effects of sampling methods for
five models.
• etc.
666
Related Work
6
Logistic Regression:
𝑝 =
1
1 + 𝑒−(𝛽0+𝛽1 𝑥1+⋯+𝛽 𝑛 𝑥 𝑛)
max 𝑦𝑙𝑜𝑔 𝑝 + 1 − 𝑦 log 1 − 𝑝
777
Related Work
7
Decision Tree:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑥 = −
𝑖=1
𝑘
𝑝 𝑖 𝑥 𝑙𝑜𝑔2 𝑝 𝑖 𝑥
𝐺𝑖𝑛𝑖 𝑥 = 1 −
𝑖=1
𝑘
𝑝 𝑖 𝑥 2
888
Related Work
8
Random Forest: train trees individually on
each bootstrap dataset.
Gradient Boosting: utilize the information
of previously trained trees.
Support Vector Machine: maximize the
marginal distance.
999
Related Work
9
Neural Network: transform by activation
functions between layers + minimum loss.
Bayesian Network: represent probability
relationship and conditional dependencies
via a directed acyclic graph.
101010
Data
Information of 11,787,287 U.S. companies in
the 4th Quarter of 2012 and 2013:
• Market Participant Identifier
• Public Records: judgments and liens .
• Firmographics: industry, location, size,
and structure.
• Bankruptcy Indicator.
10
111111
Data
Question:
Given the public records and firmographics
indicators of an organization, can we predict its
operation status one year ahead?
11
121212
Data
• Create dependent variable.
• Drop interval variables if the percentage
of coded values or missing values is
greater than 30%.
12
BrtInd 2012 BrtInd 2013 BrtIndChg
0 1 1
0 0 0
Table 1. Creation of the Dependent Variable BrtIndChg
131313
Data
• Drop observations in an interval variable
or a categorical variable if the percentage
of the abnormal/incorrect values in that
variable is less than 5%.
• Discretize continuous variables.
• Retain the variable with the best
predictive ability among several
correlated variables
13
141414
Data
14
Variable Type Description
MPID Nominal Market Participant Identifier
BrtIndChg Binary Bankruptcy Indicator Change
curLiensJudInd Nominal Current Liens/Judgment Indicator
histLiensJudInd Nominal Historical Liens/Judgment Indicator
Industry Nominal Industry
LargeBusinessInd Nominal Large Business Indicator
Region Nominal Geographical Region
PublicCompanyFlag Nominal Public Company Flag
SubsidiaryInd Nominal Subsidiary Indicator
MonLstRptDatePlcRec Interval
Number of Months Since Last
Report Date on Public Records
Table 2. Variables for Analysis and Modelling.
151515
Data
15
BrtIndChg Frequency Percent (%)
1 1031 0.12
0 843330 99.88
Table 3. Frequency of Dependent Variable.
161616
Exploratory Analysis
16
Effect Odds Ratio 95% Confidence Interval Chi-Square p-value
curLiensJudInd 0 vs 1 0.529 [0.447, 0.627] <.0001
histLiensJudInd 0 vs 1 0.680 [0.601, 0.768] <.0001
LargeBusinessInd N vs Y 0.542 [0.474, 0.620]
<.0001
LargeBusinessInd U vs Y 0.202 [0.165, 0.249]
PublicCompanyFlag N vs Y 0.295 [0.104, 0.838]
0.065
PublicCompanyFlag U vs Y 0.370 [0.138, 0.989]
SubsidiaryInd N vs Y 1.745 [0.997, 3.053]
<.0001
SubsidiaryInd U vs Y 0.411 [0.261, 0.648]
Industry 1 vs 8 1.538 [0.947, 2.496]
<.0001
Industry 2 vs 8 3.085 [1.118, 8.514]
Industry 3 vs 8 2.079 [1.545, 2.797]
Industry 4 vs 8 1.971 [1.365, 2.847]
Industry 5 vs 8 1.648 [1.136, 2.392]
Industry 6 vs 8 2.421 [1.704, 3.439]
Industry 7 vs 8 1.386 [1.033, 1.859]
Industry 9 vs 8 1.348 [1.012, 1.795]
Industry 10 vs 8 0.885 [0.216, 3.629]
Industry U vs 8 0.473 [0.343, 0.651]
Region 1 vs 9 0.699 [0.479, 1.019]
<.0001
Region 2 vs 9 0.443 [0.358, 0.549]
Region 3 vs 9 0.627 [0.505, 0.779]
Region 4 vs 9 0.913 [0.686, 1.215]
Region 5 vs 9 0.636 [0.525, 0.772]
Region 6 vs 9 1.203 [0.928, 1.558]
Region 7 vs 9 1.084 [0.875, 1.343]
Region 8 vs 9 1.194 [0.920, 1.549]
MonLstRptDatePlcRec 0.971 [0.969, 0.973] <.0001
Table 4. Univariate Odds Ratio and Chi-Square p-value.
171717
Methodology
Sampling:
• Oversample event rate from 0.12% to
10%, 20%, 30%, 40%, and 50%,
respectively, with the proportion of non-
events undersampled from 99.88% to
90%, 80%, 70%, 60%, and 50%
correspondingly.
• Train and Validation split: 70% vs. 30%.
17
181818
Methodology
Model Development and Evaluation:
• Platform: SAS Enterprise Miner 14.1.
• Tune hyperparameters of every model to
their best performance.
• Logistic Regression: backwards selection
+ significance level 0.05.
• Decision Tree: entropy.
• Random Forest and Gradient Descent:
Gini index.
18
191919
Methodology
Model Development and Evaluation:
• Support Vector Machine: linear kernel.
• Neural Network: tanh in hidden layer +
sigmoid in output layer + 3 hidden units
• Bayesian Network: G-square +
significance level 0.2.
19
202020
Methodology
20
Event (%)
Model
0.12 10 20 30 40 50
Decision Tree
0
(.)
0.49
(0.06)
0.424
(0.12)
0.435
(0.21)
0.475
(0.41)
0.497
(0.43)
Gradient Boosting
0
(.)
0.495
(0.10)
0.486
(0.25)
0.471
(0.35)
0.473
(0.45)
0.532
(0.58)
Bayesian Network
0.43
(0.12)
0.501
(0.22)
0.496
(0.33)
0.473
(0.4)
0.471
(0.44)
0.503
(0.6)
Random Forest
0.027
(0.02)
0.536
(0.1)
0.488
(0.25)
0.46
(0.25)
0.469
(0.43)
0.516
(0.49)
Neural Network
0.048
(0.01)
0.523
(0.1)
0.503
(0.2)
0.488
(0.26)
0.477
(0.51)
0.519
(0.46)
Support Vector Machine
0
(.)
0
(.)
0.02
(0.01)
0.475
(0.43)
0.439
(0.52)
0.516
(0.54)
Logistic Regression
0.037
(0.01)
0.526
(0.08)
0.502
(0.25)
0.474
(0.25)
0.44
(0.44)
0.526
(0.56)
Table 5. K-S Statistic (K-S Probability Cut-off) under Different Event Rates.
212121
Methodology
21
Model Accuracy F1 Score Type I Error Type II Error Cut-off
Decision Tree 99.88% . 0% 100% .
Gradient Boosting 99.88% . 0% 100% .
Bayesian Network 64.43% 0.0056 35.59% 17.31% 0.11
Random Forest 86.83% 0.0022 13.08% 87.95% 0.01
Neural Network 99.23% 0.0221 0.65% 92.94% 0.01
Support Vector
Machine
99.88% . 0.00% 100% .
Logistic Regression 99.41% 0.0204 0.47% 95.01% 0.01
Table 6. Performance of Models under Event Rate 0.12%.
222222
Methodology
22
Model Accuracy F1 Score Type I Error Type II Error Cut-off
Decision Tree 72.26% 0.7507 38.97% 16.50% 0.28
Gradient Boosting 73.44% 0.7623 38.28% 14.84% 0.42
Bayesian Network 70.53% 0.7413 43.41% 15.53% 0.37
Random Forest 73.93% 0.7656 37.31% 14.84% 0.42
Neural Network 72.75% 0.7579 39.81% 14.70% 0.37
Support Vector
Machine
73.23% 0.7605 38.56% 14.98% 0.49
Logistic Regression 72.61% 0.7575 40.36% 14.42% 0.42
Table 7. Performance of Models under Oversampled Event Rate 50%.
232323
Methodology
23
Figure 2. ROC Curve on Validation
Dataset under Event Rate 0.12%
Figure 3. ROC Curve on Validation Dataset
under Oversampled Event Rate 50%
242424
Methodology
24
Variable Decision Tree
Gradient
Boosting
Random
Forest
MonLstRptDatePlcRec 1.0000 1.0000 0.0911
Region 0.2423 0.2880 0.0048
Industry 0.1663 0.3516 0.0110
curLiensJudInd 0.1550 0.0820 0.0024
histLiensJudInd 0.1192 0.1205 0.0038
LargeBusinessInd 0.0308 0.2752 0.0100
Table 8. Variable Importance.
252525
Methodology
25
Effect Odds Ratio Chi-Square p-value
curLiensJudInd 0 vs 1 0.573 0.0046
histLiensJudInd 0 vs 1 0.508 <.0001
LargeBusinessInd N vs Y 0.796
<.0001
LargeBusinessInd U vs Y 0.332
Region 1 vs 9 1.067
0.0002
Region 2 vs 9 0.411
Region 3 vs 9 0.583
Region 4 vs 9 0.839
Region 5 vs 9 0.558
Region 6 vs 9 0.858
Region 7 vs 9 0.881
Region 8 vs 9 1.261
MonLstRptDatePlcRec 0.976 <.0001
Table 9. Multivariate Odds Ratio and Chi-Square p-value.
262626
Methodology
26
Probability Cut-off Tuning
Figure 4. Performance Measures vs. Probability Cut-offs of Logistic Regression
272727
Methodology
27
Overfitting Checking
Figure 5. ROC Curve on Training Dataset under Oversampled Event Rate 50%
282828
Discussions and Conclusions
28
• The statistical association and
significance of public records and
firmographics predictors are shown.
• The event rate influences the
performance of different classification
models in different ways.
292929
Discussions and Conclusions
29
• The statistical association and
significance of public records and
firmographics predictors are shown.
• The event rate influences the
performance of different classification
models in different ways. Bayesian
Network is the most insensitive to the
event rate, while Support Vector Machine
is the most sensitive.
303030
Discussions and Conclusions
30
• Practitioners may examine the
performance measures comprehensively
with different probability cut-offs and
handle the trade-offs among them as well
as the model interpretability based on
their expectations.
313131
Future Work
31
• Try different sampling techniques like
SMOTE to adjust the proportions of
events and non-events and examine its
influence further.
• Include financial ratio predictors to
improve the model performance as well
as testing the model performance in a
wider time span.
323232
Questions?
32
Thanks!

Weitere ähnliche Inhalte

Ähnlich wie Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models

Predicting Match using Speed Dating data
Predicting Match using Speed Dating dataPredicting Match using Speed Dating data
Predicting Match using Speed Dating dataRicha Girdhar
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docx
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docxDemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docx
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docxcuddietheresa
 
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202emersonpearline
 
statistic project on Hero motocorp
statistic project on Hero motocorpstatistic project on Hero motocorp
statistic project on Hero motocorpYug Bokadia
 
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...DataScienceConferenc1
 
"Can organisational restructuring of hospitals improve quality and safety"
 "Can organisational restructuring of hospitals improve quality and safety" "Can organisational restructuring of hospitals improve quality and safety"
"Can organisational restructuring of hospitals improve quality and safety"Glen Farrow
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked inSanket Butoliya
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
A Six Sigma Analysis of Mobile Data Usage
A Six Sigma Analysis of Mobile Data UsageA Six Sigma Analysis of Mobile Data Usage
A Six Sigma Analysis of Mobile Data UsageBrandon Theiss, PE
 
Evaluation of Service Performance & Analysis of Scope of Market
Evaluation of Service Performance & Analysis of Scope of MarketEvaluation of Service Performance & Analysis of Scope of Market
Evaluation of Service Performance & Analysis of Scope of MarketSarvajeet Halder
 
Factors influencing the Human Development Index (HDI) using Multiple Linear R...
Factors influencing the Human Development Index (HDI) using Multiple Linear R...Factors influencing the Human Development Index (HDI) using Multiple Linear R...
Factors influencing the Human Development Index (HDI) using Multiple Linear R...apanugan
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...IFSD14
 
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...SlideTeam
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 

Ähnlich wie Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models (20)

Predicting Match using Speed Dating data
Predicting Match using Speed Dating dataPredicting Match using Speed Dating data
Predicting Match using Speed Dating data
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docx
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docxDemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docx
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202.docx
 
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202
DemandSimpleMarket Demand SchedulePriceQuantity1051201548702046202
 
MKT 798 (Zheng, Mandi, Mei)
MKT 798 (Zheng, Mandi, Mei)MKT 798 (Zheng, Mandi, Mei)
MKT 798 (Zheng, Mandi, Mei)
 
statistic project on Hero motocorp
statistic project on Hero motocorpstatistic project on Hero motocorp
statistic project on Hero motocorp
 
1645 track2 short
1645 track2 short1645 track2 short
1645 track2 short
 
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...
[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a...
 
"Can organisational restructuring of hospitals improve quality and safety"
 "Can organisational restructuring of hospitals improve quality and safety" "Can organisational restructuring of hospitals improve quality and safety"
"Can organisational restructuring of hospitals improve quality and safety"
 
Ogilvie - Beyond the statistical average
Ogilvie  - Beyond the statistical averageOgilvie  - Beyond the statistical average
Ogilvie - Beyond the statistical average
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked in
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
A Six Sigma Analysis of Mobile Data Usage
A Six Sigma Analysis of Mobile Data UsageA Six Sigma Analysis of Mobile Data Usage
A Six Sigma Analysis of Mobile Data Usage
 
Evaluation of Service Performance & Analysis of Scope of Market
Evaluation of Service Performance & Analysis of Scope of MarketEvaluation of Service Performance & Analysis of Scope of Market
Evaluation of Service Performance & Analysis of Scope of Market
 
Factors influencing the Human Development Index (HDI) using Multiple Linear R...
Factors influencing the Human Development Index (HDI) using Multiple Linear R...Factors influencing the Human Development Index (HDI) using Multiple Linear R...
Factors influencing the Human Development Index (HDI) using Multiple Linear R...
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...
Economics, Policy and Value Chains: Barriers to Technology Adoption in the CA...
 
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...
Stock Pitch For Satellite Based Solutions PowerPoint Presentation Ppt Slide T...
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models

  • 1. Lili Zhang1, Jennifer Priestley2, and Xuelei Ni3 1Program in Analytics and Data Science, Kennesaw State University, Georgia, USA 2Analytics and Data Science Institute, Kennesaw State University, Georgia, USA 3Department of Statistics, Kennesaw State University, Georgia, USA Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models
  • 2. 22 Outline • Introduction • Related Work • Data • Exploratory Analysis • Methodology • Discussions and Conclusions • Future Work
  • 3. 333 Introduction • Bankruptcy prediction has been studied since 1960s to improve decision making. • To improve its discrimination ability, efforts have been done in respects of significant predictors, algorithms, and sampling. 3
  • 4. 444 Related Work 4 • Altman: financial ratios (Working Capital/Total Assets, etc.) + multiple discrimination analysis. • Odom: first neural network. • Shin: support vector machine outperformed neural network on small training datasets. • Zibanezhad: decision tree + most important predictors.
  • 5. 555 Related Work 5 • Sun: Bayesian network + influence of variable selection and discretization. • Kim: geometric mean based boosting algorithm to address the imbalance. • Zhou: effects of sampling methods for five models. • etc.
  • 6. 666 Related Work 6 Logistic Regression: 𝑝 = 1 1 + 𝑒−(𝛽0+𝛽1 𝑥1+⋯+𝛽 𝑛 𝑥 𝑛) max 𝑦𝑙𝑜𝑔 𝑝 + 1 − 𝑦 log 1 − 𝑝
  • 7. 777 Related Work 7 Decision Tree: 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑥 = − 𝑖=1 𝑘 𝑝 𝑖 𝑥 𝑙𝑜𝑔2 𝑝 𝑖 𝑥 𝐺𝑖𝑛𝑖 𝑥 = 1 − 𝑖=1 𝑘 𝑝 𝑖 𝑥 2
  • 8. 888 Related Work 8 Random Forest: train trees individually on each bootstrap dataset. Gradient Boosting: utilize the information of previously trained trees. Support Vector Machine: maximize the marginal distance.
  • 9. 999 Related Work 9 Neural Network: transform by activation functions between layers + minimum loss. Bayesian Network: represent probability relationship and conditional dependencies via a directed acyclic graph.
  • 10. 101010 Data Information of 11,787,287 U.S. companies in the 4th Quarter of 2012 and 2013: • Market Participant Identifier • Public Records: judgments and liens . • Firmographics: industry, location, size, and structure. • Bankruptcy Indicator. 10
  • 11. 111111 Data Question: Given the public records and firmographics indicators of an organization, can we predict its operation status one year ahead? 11
  • 12. 121212 Data • Create dependent variable. • Drop interval variables if the percentage of coded values or missing values is greater than 30%. 12 BrtInd 2012 BrtInd 2013 BrtIndChg 0 1 1 0 0 0 Table 1. Creation of the Dependent Variable BrtIndChg
  • 13. 131313 Data • Drop observations in an interval variable or a categorical variable if the percentage of the abnormal/incorrect values in that variable is less than 5%. • Discretize continuous variables. • Retain the variable with the best predictive ability among several correlated variables 13
  • 14. 141414 Data 14 Variable Type Description MPID Nominal Market Participant Identifier BrtIndChg Binary Bankruptcy Indicator Change curLiensJudInd Nominal Current Liens/Judgment Indicator histLiensJudInd Nominal Historical Liens/Judgment Indicator Industry Nominal Industry LargeBusinessInd Nominal Large Business Indicator Region Nominal Geographical Region PublicCompanyFlag Nominal Public Company Flag SubsidiaryInd Nominal Subsidiary Indicator MonLstRptDatePlcRec Interval Number of Months Since Last Report Date on Public Records Table 2. Variables for Analysis and Modelling.
  • 15. 151515 Data 15 BrtIndChg Frequency Percent (%) 1 1031 0.12 0 843330 99.88 Table 3. Frequency of Dependent Variable.
  • 16. 161616 Exploratory Analysis 16 Effect Odds Ratio 95% Confidence Interval Chi-Square p-value curLiensJudInd 0 vs 1 0.529 [0.447, 0.627] <.0001 histLiensJudInd 0 vs 1 0.680 [0.601, 0.768] <.0001 LargeBusinessInd N vs Y 0.542 [0.474, 0.620] <.0001 LargeBusinessInd U vs Y 0.202 [0.165, 0.249] PublicCompanyFlag N vs Y 0.295 [0.104, 0.838] 0.065 PublicCompanyFlag U vs Y 0.370 [0.138, 0.989] SubsidiaryInd N vs Y 1.745 [0.997, 3.053] <.0001 SubsidiaryInd U vs Y 0.411 [0.261, 0.648] Industry 1 vs 8 1.538 [0.947, 2.496] <.0001 Industry 2 vs 8 3.085 [1.118, 8.514] Industry 3 vs 8 2.079 [1.545, 2.797] Industry 4 vs 8 1.971 [1.365, 2.847] Industry 5 vs 8 1.648 [1.136, 2.392] Industry 6 vs 8 2.421 [1.704, 3.439] Industry 7 vs 8 1.386 [1.033, 1.859] Industry 9 vs 8 1.348 [1.012, 1.795] Industry 10 vs 8 0.885 [0.216, 3.629] Industry U vs 8 0.473 [0.343, 0.651] Region 1 vs 9 0.699 [0.479, 1.019] <.0001 Region 2 vs 9 0.443 [0.358, 0.549] Region 3 vs 9 0.627 [0.505, 0.779] Region 4 vs 9 0.913 [0.686, 1.215] Region 5 vs 9 0.636 [0.525, 0.772] Region 6 vs 9 1.203 [0.928, 1.558] Region 7 vs 9 1.084 [0.875, 1.343] Region 8 vs 9 1.194 [0.920, 1.549] MonLstRptDatePlcRec 0.971 [0.969, 0.973] <.0001 Table 4. Univariate Odds Ratio and Chi-Square p-value.
  • 17. 171717 Methodology Sampling: • Oversample event rate from 0.12% to 10%, 20%, 30%, 40%, and 50%, respectively, with the proportion of non- events undersampled from 99.88% to 90%, 80%, 70%, 60%, and 50% correspondingly. • Train and Validation split: 70% vs. 30%. 17
  • 18. 181818 Methodology Model Development and Evaluation: • Platform: SAS Enterprise Miner 14.1. • Tune hyperparameters of every model to their best performance. • Logistic Regression: backwards selection + significance level 0.05. • Decision Tree: entropy. • Random Forest and Gradient Descent: Gini index. 18
  • 19. 191919 Methodology Model Development and Evaluation: • Support Vector Machine: linear kernel. • Neural Network: tanh in hidden layer + sigmoid in output layer + 3 hidden units • Bayesian Network: G-square + significance level 0.2. 19
  • 20. 202020 Methodology 20 Event (%) Model 0.12 10 20 30 40 50 Decision Tree 0 (.) 0.49 (0.06) 0.424 (0.12) 0.435 (0.21) 0.475 (0.41) 0.497 (0.43) Gradient Boosting 0 (.) 0.495 (0.10) 0.486 (0.25) 0.471 (0.35) 0.473 (0.45) 0.532 (0.58) Bayesian Network 0.43 (0.12) 0.501 (0.22) 0.496 (0.33) 0.473 (0.4) 0.471 (0.44) 0.503 (0.6) Random Forest 0.027 (0.02) 0.536 (0.1) 0.488 (0.25) 0.46 (0.25) 0.469 (0.43) 0.516 (0.49) Neural Network 0.048 (0.01) 0.523 (0.1) 0.503 (0.2) 0.488 (0.26) 0.477 (0.51) 0.519 (0.46) Support Vector Machine 0 (.) 0 (.) 0.02 (0.01) 0.475 (0.43) 0.439 (0.52) 0.516 (0.54) Logistic Regression 0.037 (0.01) 0.526 (0.08) 0.502 (0.25) 0.474 (0.25) 0.44 (0.44) 0.526 (0.56) Table 5. K-S Statistic (K-S Probability Cut-off) under Different Event Rates.
  • 21. 212121 Methodology 21 Model Accuracy F1 Score Type I Error Type II Error Cut-off Decision Tree 99.88% . 0% 100% . Gradient Boosting 99.88% . 0% 100% . Bayesian Network 64.43% 0.0056 35.59% 17.31% 0.11 Random Forest 86.83% 0.0022 13.08% 87.95% 0.01 Neural Network 99.23% 0.0221 0.65% 92.94% 0.01 Support Vector Machine 99.88% . 0.00% 100% . Logistic Regression 99.41% 0.0204 0.47% 95.01% 0.01 Table 6. Performance of Models under Event Rate 0.12%.
  • 22. 222222 Methodology 22 Model Accuracy F1 Score Type I Error Type II Error Cut-off Decision Tree 72.26% 0.7507 38.97% 16.50% 0.28 Gradient Boosting 73.44% 0.7623 38.28% 14.84% 0.42 Bayesian Network 70.53% 0.7413 43.41% 15.53% 0.37 Random Forest 73.93% 0.7656 37.31% 14.84% 0.42 Neural Network 72.75% 0.7579 39.81% 14.70% 0.37 Support Vector Machine 73.23% 0.7605 38.56% 14.98% 0.49 Logistic Regression 72.61% 0.7575 40.36% 14.42% 0.42 Table 7. Performance of Models under Oversampled Event Rate 50%.
  • 23. 232323 Methodology 23 Figure 2. ROC Curve on Validation Dataset under Event Rate 0.12% Figure 3. ROC Curve on Validation Dataset under Oversampled Event Rate 50%
  • 24. 242424 Methodology 24 Variable Decision Tree Gradient Boosting Random Forest MonLstRptDatePlcRec 1.0000 1.0000 0.0911 Region 0.2423 0.2880 0.0048 Industry 0.1663 0.3516 0.0110 curLiensJudInd 0.1550 0.0820 0.0024 histLiensJudInd 0.1192 0.1205 0.0038 LargeBusinessInd 0.0308 0.2752 0.0100 Table 8. Variable Importance.
  • 25. 252525 Methodology 25 Effect Odds Ratio Chi-Square p-value curLiensJudInd 0 vs 1 0.573 0.0046 histLiensJudInd 0 vs 1 0.508 <.0001 LargeBusinessInd N vs Y 0.796 <.0001 LargeBusinessInd U vs Y 0.332 Region 1 vs 9 1.067 0.0002 Region 2 vs 9 0.411 Region 3 vs 9 0.583 Region 4 vs 9 0.839 Region 5 vs 9 0.558 Region 6 vs 9 0.858 Region 7 vs 9 0.881 Region 8 vs 9 1.261 MonLstRptDatePlcRec 0.976 <.0001 Table 9. Multivariate Odds Ratio and Chi-Square p-value.
  • 26. 262626 Methodology 26 Probability Cut-off Tuning Figure 4. Performance Measures vs. Probability Cut-offs of Logistic Regression
  • 27. 272727 Methodology 27 Overfitting Checking Figure 5. ROC Curve on Training Dataset under Oversampled Event Rate 50%
  • 28. 282828 Discussions and Conclusions 28 • The statistical association and significance of public records and firmographics predictors are shown. • The event rate influences the performance of different classification models in different ways.
  • 29. 292929 Discussions and Conclusions 29 • The statistical association and significance of public records and firmographics predictors are shown. • The event rate influences the performance of different classification models in different ways. Bayesian Network is the most insensitive to the event rate, while Support Vector Machine is the most sensitive.
  • 30. 303030 Discussions and Conclusions 30 • Practitioners may examine the performance measures comprehensively with different probability cut-offs and handle the trade-offs among them as well as the model interpretability based on their expectations.
  • 31. 313131 Future Work 31 • Try different sampling techniques like SMOTE to adjust the proportions of events and non-events and examine its influence further. • Include financial ratio predictors to improve the model performance as well as testing the model performance in a wider time span.

Hinweis der Redaktion

  1. 1