SlideShare ist ein Scribd-Unternehmen logo
1 von 10
IC Fraud Prediction
To predict whether an insurance claim is acceptable or not.
Data Gathering and
Preparation
Data Analysis and
Visualization
Predictive Model Building
Explanatory Model
Building
Workflow
Data Preparation
Data Gathering ✔️ Data quality checks ✔️ Handling extreme values ✔️ Handling missing data ✔️ Feature selection ✔️ Encoding ✔️
Columns with
outliers
● Policy annual
premium
● Umbrella limit
● Capital loss
● Property claim
Solved with:
Median imputation
● Initial data
provided
● Intuitive
cross-check
● Ideation for
derived
columns
2 derived columns: ‘Months
within incident date and policy
bind date’ and ‘incident
within customership’
Columns with
missing data
● Collision type
● Property
damage
● Police report
available
Solved with:
Mode imputation
10
most
important
features
10
least
important
features
Feature - Feature Correlation
Heatmap
Initial: 1000 rows, 40 columns
● Total claim is
the sum of
Property claim,
Vehicle claim
and
Injury claim
● Values in
numeric
columns > 0
1 row containing umbrella limit
< 0 removed
Initial: 1000 rows, 40 columns
Columns removed due to non-relevance: Policy
number, _c39
Columns removed due to correlation >
95% with other column:
Vehicle claim
Columns removed due to contribution transferred
to a derived column: Incident date, Policy bind
date
Columns removed due to feature importance
score < 0.02: Collision type, Property damage,
Incident within customership, Insured sex,
Umbrella limit, Number of vehicles involved,
Police report available, Incident type
Columns in final Analytical Dataset:
Months as customer, Age, Policy state, Policy csl,
Policy deductible, Policy annual premium, Insured zip,
Insured education level, Insured occupation, Insured
hobbies, Insured relationship, Capital gains, Capital
loss, Incident severity, Authorities contacted, Incident
state, Incident city, Incident hour of the day, Bodily
injuries, Witnesses, Total claim amount, Injury claim,
Property claim, Auto make, Auto model, Auto year,
Months between incident date and bind date
Final: 999 rows, 27 columns
Handling imbalanced data✔️
Fraud 25%
Non-Fraud 75%
Initial
imbalanced
dataset
Imbalanced
Training
dataset
Balanced
Training
dataset
For Train Dataset
SMOTE (Synthetic Minority
Oversampling TEchnique)
Train - Test Split
Initial
imbalanced
dataset
Imbalanced
Test
dataset
For Test Dataset
Train - Test Split
Distribution of target labels
Data Analysis and Visualization
Distribution of Target column values along Categorical columns✔️ Distribution of Target column values along Non-Categorical columns✔️
Bar Charts - Feature column (X) vs
Target Column (Y)
Density Plots - Feature Column (X) vs Target Column (Y)
Explanatory Model Building
ML Model performances✔️ Main and Interaction effects on Model Outputs✔️
Model Accura
cy
Precisi
on
Recall F1
Score
LR 0.76 0 0 0
KNN 0.74 0.38 0.12 0.19
NB 0.735 0.35 0.12 0.18
DT 0.74 0.47 0.60 0.53
RF 0.77 0.53 0.44 0.48
XGB 0.775 0.53 0.58 0.55
Heatmap
for
Main
and
Interaction
effects
Therm
plot
for
main
effects
Best performing models are Tree-based models
Selected model: XGBoost
Predictive Model Building
Current Model performance✔️ Improvements✔️
Accuracy Precision Recall F1 Score
0.775 0.53 0.58 0.55
🚀 Hyperparameter Tuning
by GridSearchCV
Best Parameter values:
'colsample_bytree': 1,
'learning_rate': 0.01,
'max_depth': 10,
'n_estimators': 100,
'subsample': 0.7
Accuracy Precision Recall F1 Score
0.82 0.60 0.77 0.67
🚀 Tuning threshold from
ROC by maximising AUC
Theshold
value
=
0.68
Accuracy Precision Recall F1 Score
0.83 0.60 0.90 0.72
Challenges ● Intuitive cross-check and deriving
features.
● Improving the performance - Determining
the set of values for parameters in
hyperparameter tuning.
● Improving the performance further -
Determining correct optimizer for
procuring threshold from ROC. Finalized
at: (TPR - FPR)
Insights
Highest contributing columns [i.e. columns that
should be made sure to contain correct values]
Examples
of
their
contributions
Thanks!
|Srijit|
srijitpanja@gmail.com

Weitere ähnliche Inhalte

Ähnlich wie Data Science use case: Fraud Insurance Claims Detection by ML algo

Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
AI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentAI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentValue Amplify Consulting
 
Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1SPIN Chennai
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckSasha Lazarevic
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine LearningJulien SIMON
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecomAmit Kumar
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Dipesh Patel
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Dipesh Patel
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Ledger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfLedger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfpatiladiti752
 
Understanding Natural Language Instructions for Fetching Daily Objects Using ...
Understanding Natural Language Instructions for Fetching Daily Objects Using ...Understanding Natural Language Instructions for Fetching Daily Objects Using ...
Understanding Natural Language Instructions for Fetching Daily Objects Using ...Aly Magassouba
 

Ähnlich wie Data Science use case: Fraud Insurance Claims Detection by ML algo (20)

Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
AI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentAI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI development
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecom
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Ledger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfLedger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdf
 
Understanding Natural Language Instructions for Fetching Daily Objects Using ...
Understanding Natural Language Instructions for Fetching Daily Objects Using ...Understanding Natural Language Instructions for Fetching Daily Objects Using ...
Understanding Natural Language Instructions for Fetching Daily Objects Using ...
 

Kürzlich hochgeladen

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证nhjeo1gg
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Kürzlich hochgeladen (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Data Science use case: Fraud Insurance Claims Detection by ML algo

  • 1. IC Fraud Prediction To predict whether an insurance claim is acceptable or not.
  • 2. Data Gathering and Preparation Data Analysis and Visualization Predictive Model Building Explanatory Model Building Workflow
  • 3. Data Preparation Data Gathering ✔️ Data quality checks ✔️ Handling extreme values ✔️ Handling missing data ✔️ Feature selection ✔️ Encoding ✔️ Columns with outliers ● Policy annual premium ● Umbrella limit ● Capital loss ● Property claim Solved with: Median imputation ● Initial data provided ● Intuitive cross-check ● Ideation for derived columns 2 derived columns: ‘Months within incident date and policy bind date’ and ‘incident within customership’ Columns with missing data ● Collision type ● Property damage ● Police report available Solved with: Mode imputation 10 most important features 10 least important features Feature - Feature Correlation Heatmap Initial: 1000 rows, 40 columns ● Total claim is the sum of Property claim, Vehicle claim and Injury claim ● Values in numeric columns > 0 1 row containing umbrella limit < 0 removed
  • 4. Initial: 1000 rows, 40 columns Columns removed due to non-relevance: Policy number, _c39 Columns removed due to correlation > 95% with other column: Vehicle claim Columns removed due to contribution transferred to a derived column: Incident date, Policy bind date Columns removed due to feature importance score < 0.02: Collision type, Property damage, Incident within customership, Insured sex, Umbrella limit, Number of vehicles involved, Police report available, Incident type Columns in final Analytical Dataset: Months as customer, Age, Policy state, Policy csl, Policy deductible, Policy annual premium, Insured zip, Insured education level, Insured occupation, Insured hobbies, Insured relationship, Capital gains, Capital loss, Incident severity, Authorities contacted, Incident state, Incident city, Incident hour of the day, Bodily injuries, Witnesses, Total claim amount, Injury claim, Property claim, Auto make, Auto model, Auto year, Months between incident date and bind date Final: 999 rows, 27 columns
  • 5. Handling imbalanced data✔️ Fraud 25% Non-Fraud 75% Initial imbalanced dataset Imbalanced Training dataset Balanced Training dataset For Train Dataset SMOTE (Synthetic Minority Oversampling TEchnique) Train - Test Split Initial imbalanced dataset Imbalanced Test dataset For Test Dataset Train - Test Split Distribution of target labels
  • 6. Data Analysis and Visualization Distribution of Target column values along Categorical columns✔️ Distribution of Target column values along Non-Categorical columns✔️ Bar Charts - Feature column (X) vs Target Column (Y) Density Plots - Feature Column (X) vs Target Column (Y)
  • 7. Explanatory Model Building ML Model performances✔️ Main and Interaction effects on Model Outputs✔️ Model Accura cy Precisi on Recall F1 Score LR 0.76 0 0 0 KNN 0.74 0.38 0.12 0.19 NB 0.735 0.35 0.12 0.18 DT 0.74 0.47 0.60 0.53 RF 0.77 0.53 0.44 0.48 XGB 0.775 0.53 0.58 0.55 Heatmap for Main and Interaction effects Therm plot for main effects Best performing models are Tree-based models Selected model: XGBoost
  • 8. Predictive Model Building Current Model performance✔️ Improvements✔️ Accuracy Precision Recall F1 Score 0.775 0.53 0.58 0.55 🚀 Hyperparameter Tuning by GridSearchCV Best Parameter values: 'colsample_bytree': 1, 'learning_rate': 0.01, 'max_depth': 10, 'n_estimators': 100, 'subsample': 0.7 Accuracy Precision Recall F1 Score 0.82 0.60 0.77 0.67 🚀 Tuning threshold from ROC by maximising AUC Theshold value = 0.68 Accuracy Precision Recall F1 Score 0.83 0.60 0.90 0.72
  • 9. Challenges ● Intuitive cross-check and deriving features. ● Improving the performance - Determining the set of values for parameters in hyperparameter tuning. ● Improving the performance further - Determining correct optimizer for procuring threshold from ROC. Finalized at: (TPR - FPR) Insights Highest contributing columns [i.e. columns that should be made sure to contain correct values] Examples of their contributions

Hinweis der Redaktion

  1. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  2. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  3. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  4. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}