SlideShare ist ein Scribd-Unternehmen logo
1 von 27
R Data Mining for Insurance Retention Modeling
R In Insurance Conference 2014
Giorgio A. Spedicato, Ph.D C.Stat ACAS
UnipolSai Assicurazioni Research & Developmnent
Introduction
Retention modeling
• Modeling policyholders' retention consists in predicting whether an existing
policyholder will renew its contract in order to take underwriting decisions accordingly.
• It means enriching a classical actuarial loss cost modeling with marketing and
competitive business considerations in order to propose an "optimized premium"upon
renew.
• Similarly, "conversion" analysis incorporates in prospective policyholders' proposed
premium marketing based considerations.
2
Introduction
Retention modeling
• Logistic regression is the most widely used approach in actuarial practice, but little
attention has been given to other machine learning (ML) techniques until now.
• An application that showes the use of Random Forest for life insurance policy retention
can be found in [Milhaud et al., 2011].
• Good introductions on the business side can be found in [Arquette and Guven, 2013]
and [Anderson and McPhail, 2013].
3
Introduction
Retention modeling
• A statistician would consider retention modeling as a classification tasks. In fact,
policyholders can be classified in two groups (Renew/Not Renew) according to
features that lie within: demographics (age, gender, ...), insurance policy history (years
since customer, claim history, coverage dept, ...), marketing and business environment
(year to year price change and relative competitiveness, ...).
• An adequately tuned retention model can be integrated within an underwriting
algorithm to define the renewal price.
• Redefining renewal price will need to recalculate price variation and competitiveness
variables iteratively.
4
Introduction
The business problem
• A 50K policies dataset has been available containing demographics, insurance and
business variables as well as recorded Renew/Lapse decision.
• Various retention models will be t to model the probability that the policyholder will
renew upon termination date, pi .
• The aim is to get a function pi = f (Pi ), where renewal probability is a function of the
proposed renewal premium, Pi .
5
Introduction
The business problem
• The final aim consists in maximizing the existing business gross
premium
• The renewal premium can be modified by 1% with respect to
initial proposed value in our optimization scenario.
• This exercise will not consider the loss cost component in the
analysis, thus the portfolio premium volume will be optimized,
not the global underwriting margin.
6
( )∑=
=
Ni
ii
T
pPP
1
*
Introduction
The approach
• A standard logistic regression will be t and compared with various machine learning
techniques in Section 2.
• The models found in Section 2 will be used to calculate an optimized premium for each
policyholder in Section 3.
• Brief considerations will be drawn also.
7
Predictive Modeling
Data and Methods
• The dataset can be deemed representative of a common personal lines retention
problem.
• Most statistical fitting process will be based on the caret [from Jed Wing et al., 2014]
package framework to fit predictive models.
• The reference model will be a logistic regression where continuous predictors have
been handled using restricted cubic splines from rms package [Jr, 2014].
8
Predictive Modeling
Data and Methods
• Logistic regression key advantages are: relatively fast to be fit, easily interpretable,
and no need of parameters' tuning.
• Most machine learning techniques have at least one or more parameter that cannot be
estimated from data and need to be tuned (tuning parameters).
• The caret package, [from Jed Wing et al., 2014], provides a very valuable
infrastructure that standardizes the process for fitting, validating and predicting models
that have been implemented within different packages.
9
Predictive Modeling
Approach
• A performance measure should be defined. For binary classification models, the area
under the ROC curve (AUC) is a very common choice.
• In addition, fitting the model over different samples (cross - validation) is a standard
practice to better assess model performance in order to avoid obverting.
• A final choice is to divide the dataset between a train set and a test set and to perform
the final model evaluation under the test set.
10
Predictive Modeling
Model fitting and assessment
• The following models were fitted on a train set split: logistic regression, C5.0
classification trees, Flexible Discriminant Analysis (FDA), Naive Bayes (NB), Support
Vector Machines (SVM), Neural Networks (NN), Boosted Glms.
• Only FDA shows a slight increase in predictive performance in the provided dataset
when compared with logistic regression (the reference model)
• Logistic regression and FDA appear very close in terms of predictive gain (Lift Curve,
Figure 2) and precision in estimated probabilities (Calibration Chart, Figure 1) .
11
Model fitting and assessment
Calibration Plot
12
Model fitting and assessment
Lift Chart
13
Pricing optimization
Renewal premium optimization
The retention model has been used to optimize the renewal premium, Pi . The proposed Pi
was varied in different scenarios and the expected renewal premium inflow,
pi PI (pi ), was calculated for each individual policyholder.
Three premium variation scenarios were proposed: no change, +1% increase, -1%
decrease.
The highest expected renewal premium scenario for each customer has been selected.
14
Pricing optimization
Renewal premium optimization
• Each policyholder would be proposed the highest expected renewal premium
according to the three scenarios.
• The final renewal decision for each policyholder is shown in Figure 3.
• In the available data set, the overall variation in expected renewal premium is very
close using either FDA or Logistic approach
15
Pricing optimization
Renewal premium optimization
16
Pricing optimization
Considerations
• R appears to be a valid environment, not only to perform standard loss cost modeling
but also to estimate renewal probabilities.
• Logistic regression requires less estimation effort than other machine learning
techniques (that not always offer improved predictive performance).
• The caret package [from Jed Wing et al., 2014] provides a good framework to unify the
process of building and assess competing machine learning techniques.
17
Technical Notes
The infrastructure
• Various machine learning techniques have been compared in the analysis: glms (linear
logistic regression and its boosted version); tree based methods (C5.0, Random
Forest); non - linear methods (neural network, Naive Bayes Classiers, Support Vector
Machines and Flexible Discriminant Analysis).
• The caret package has been used in order to provide a common infrastructure to t the
different models, compare their performance and perform predictions.
• The data source was originally provided in SAS and imported in R thanks to the
package sas7bdat, [Shotwell, 2014].
18
Technical Notes
The caret package
• The core caret function used is the train function that provides a common wrapper to t
more than one hundred machine learning techniques. A predict method is available for
all trained objects
• Hyperparameter have been generally searched over a model pre - defined grid. 10 -
fold cross validation has been used to better assess the predictive performance.
• Among the other package's functions, lift and calibration function have been used to
project lift gain and calibration charts.
19
Technical Notes
Logistic regression and extensions
• glm R base function was wrapped by train in order to fit logistic regression.
• rms package has been used to add restricted cubic splines to the prediction formula
on the continuous predictors.
• A boosted version of the logistic regession has also be fit with the use of mboost
[Buehlmann and Hothorn, 2007] package.
20
Technical Notes
Tree based methods
Tree based approaches perform classification by finding splits within available features
that provides the most predictive performance within the subset. The splits sequence
provides and effective way to visualize variables' dependency.
The C5.0 algorithm, one of the most widely used approach, has been used in this exercise
via the C50 package, [Kuhn et al., 2014].
A boosted approach, Gradient Boosted Machines, gbm, has also be used thanks to
package [with contributions from others, 2013].
21
Technical Notes
Neural Networks
• A neural network model consists in a set of adaptive weights (neurons) that can
approximate non linear functions.
• Neural Networks represent a classical machine learning techniques, probably the one
proposed first.
• Among the many R packages available, the nnet package, [Venables and Ripley,
2002], has been used in this exercise.
22
Technical Notes
Support Vector Machines
• A support vector machine (SVM) ts an hyperplane that aims to separate categories
within the dependent variable with least error. It is rated among the top classiers.
• Non linear shapes can be used in order to define the hyperplane: polynomial,
hyperbolic, radial basis functions.
• The kernlab package, [Karatzoglou et al., 2004] has been used in this exercise.
23
Technical Notes
Flexible Discriminant Analysis
• The Flexible Discriminant Analysis (FDM) method extends the linear discriminant
analysis by including non - linear hinge functions h (x) = max (x 􀀀 h; 0) in the
calculation of the discriminant function.
• hinge function can be used within the discriminant function also in polynomial form.
• FDA models have been t with package the earth package, [from mda:mars by Trevor
Hastie and utilities with Thomas Lumley's].
24
Technical Notes
K Nearest Neighbours
• A KNN model classifies a sample with the most frequent item within its closest
neighbors in the train set.
• It is a very simple approach, that can be used both in regression and classification.
• KNN models can be t with the caret package, [from Jed Wing et al., 2014].
25
Technical Notes
Naive Bayes
• A Naive Bayes (NB) model classifies a sample using the conditional distribution of the
predictors given the sample.
• Even if independency across predictors is assumed, the overall accuracy provided by
the model is often competitive compared to other techniques.
• NB models can be estimated thanks to KlaR package, [Weihs et al., 2005].
26
Thanks
• The author is grateful to UnipolSai, his employeer, for the support.
• Nevertheless any opinion herewith stated is responsibility of the author only. UnipolSai
does not take any position on such opinions and disclaims any responsibility for any
incorrectness or legal error eventually found therein.
• A special thanks in addition is given to Dr. Andrea Dal Pozzolo for his suggestions.
27

Weitere ähnliche Inhalte

Was ist angesagt?

multi criteria decision making
multi criteria decision makingmulti criteria decision making
multi criteria decision makingShankha Goswami
 
Operations Research and Mathematical Modeling
Operations Research and Mathematical ModelingOperations Research and Mathematical Modeling
Operations Research and Mathematical ModelingVinodh Soundarajan
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision makingKartik Bansal
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...IRJET Journal
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision makingKhalid Mdnoh
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...ijcseit
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective OptimizationNawroz University
 
Study_Pricing_Digital_Call_Options
Study_Pricing_Digital_Call_OptionsStudy_Pricing_Digital_Call_Options
Study_Pricing_Digital_Call_OptionsBruce Haydon, PMP
 
Multi criteria decision support system on mobile phone selection with ahp and...
Multi criteria decision support system on mobile phone selection with ahp and...Multi criteria decision support system on mobile phone selection with ahp and...
Multi criteria decision support system on mobile phone selection with ahp and...Reza Ramezani
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk
 
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...Shankha Goswami
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Optimization of supply chain logistics cost
Optimization of supply chain logistics costOptimization of supply chain logistics cost
Optimization of supply chain logistics costIAEME Publication
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learningRakshith Sathish
 

Was ist angesagt? (18)

multi criteria decision making
multi criteria decision makingmulti criteria decision making
multi criteria decision making
 
Operations Research and Mathematical Modeling
Operations Research and Mathematical ModelingOperations Research and Mathematical Modeling
Operations Research and Mathematical Modeling
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision making
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
 
DEA
DEADEA
DEA
 
Multi criteria decision making
Multi criteria decision makingMulti criteria decision making
Multi criteria decision making
 
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
MULTI-OBJECTIVE ENERGY EFFICIENT OPTIMIZATION ALGORITHM FOR COVERAGE CONTROL ...
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective Optimization
 
Study_Pricing_Digital_Call_Options
Study_Pricing_Digital_Call_OptionsStudy_Pricing_Digital_Call_Options
Study_Pricing_Digital_Call_Options
 
Multi criteria decision support system on mobile phone selection with ahp and...
Multi criteria decision support system on mobile phone selection with ahp and...Multi criteria decision support system on mobile phone selection with ahp and...
Multi criteria decision support system on mobile phone selection with ahp and...
 
G0211056062
G0211056062G0211056062
G0211056062
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017
 
Lo4b (nabil) breakeven analysis
Lo4b (nabil)   breakeven analysisLo4b (nabil)   breakeven analysis
Lo4b (nabil) breakeven analysis
 
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...
A COMPARATIVE STUDY OF DIFFERENT INTEGRATED MULTIPLE CRITERIA DECISION MAKING...
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Optimization of supply chain logistics cost
Optimization of supply chain logistics costOptimization of supply chain logistics cost
Optimization of supply chain logistics cost
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learning
 

Ähnlich wie R Data Mining Insurance Retention Modeling

Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations managementsmumbahelp
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations managementsmumbahelp
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations managementStudy Stuff
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...IRJET Journal
 
IRJET- Novel based Stock Value Prediction Method
IRJET- Novel based Stock Value Prediction MethodIRJET- Novel based Stock Value Prediction Method
IRJET- Novel based Stock Value Prediction MethodIRJET Journal
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
 
Integration of cost-risk within an intelligent maintenance system
Integration of cost-risk within an intelligent maintenance systemIntegration of cost-risk within an intelligent maintenance system
Integration of cost-risk within an intelligent maintenance systemLaurent Carlander
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code OptimizationIRJET Journal
 
Machine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer ClassificationMachine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer ClassificationIRJET Journal
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET Journal
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...IRJET Journal
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
Pricing Optimization using Machine Learning
Pricing Optimization using Machine LearningPricing Optimization using Machine Learning
Pricing Optimization using Machine LearningIRJET Journal
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]IRJET Journal
 

Ähnlich wie R Data Mining Insurance Retention Modeling (20)

Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations management
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations management
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations management
 
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
Comparative Analysis of Machine Learning Algorithms for their Effectiveness i...
 
IRJET- Novel based Stock Value Prediction Method
IRJET- Novel based Stock Value Prediction MethodIRJET- Novel based Stock Value Prediction Method
IRJET- Novel based Stock Value Prediction Method
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
Integration of cost-risk within an intelligent maintenance system
Integration of cost-risk within an intelligent maintenance systemIntegration of cost-risk within an intelligent maintenance system
Integration of cost-risk within an intelligent maintenance system
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
Machine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer ClassificationMachine Learning Aided Breast Cancer Classification
Machine Learning Aided Breast Cancer Classification
 
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine Learning
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
Machine Learning Approaches to Predict Customer Churn in Telecommunications I...
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
Pricing Optimization using Machine Learning
Pricing Optimization using Machine LearningPricing Optimization using Machine Learning
Pricing Optimization using Machine Learning
 
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACHIMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
 
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACHIMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
 

Mehr von Giorgio Alfredo Spedicato

Long term care insurance with markovchain package
Long term care insurance with markovchain packageLong term care insurance with markovchain package
Long term care insurance with markovchain packageGiorgio Alfredo Spedicato
 
Pricing and modelling under the Italian Direct Compensation Card Scheme
Pricing and modelling under the Italian Direct Compensation Card SchemePricing and modelling under the Italian Direct Compensation Card Scheme
Pricing and modelling under the Italian Direct Compensation Card SchemeGiorgio Alfredo Spedicato
 
Actuarial modeling of general practictioners' drug prescriptions costs
Actuarial modeling of general practictioners' drug prescriptions costsActuarial modeling of general practictioners' drug prescriptions costs
Actuarial modeling of general practictioners' drug prescriptions costsGiorgio Alfredo Spedicato
 

Mehr von Giorgio Alfredo Spedicato (16)

Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Machine Learning - Unsupervised Learning
Machine Learning - Unsupervised LearningMachine Learning - Unsupervised Learning
Machine Learning - Unsupervised Learning
 
Machine Learning - Principles
Machine Learning - PrinciplesMachine Learning - Principles
Machine Learning - Principles
 
Machine Learning - Intro
Machine Learning - IntroMachine Learning - Intro
Machine Learning - Intro
 
Meta analysis essentials
Meta analysis essentialsMeta analysis essentials
Meta analysis essentials
 
Long term care insurance with markovchain package
Long term care insurance with markovchain packageLong term care insurance with markovchain package
Long term care insurance with markovchain package
 
The markovchain package use r2016
The markovchain package use r2016The markovchain package use r2016
The markovchain package use r2016
 
Cat Bond and Traditional Insurance
Cat Bond and Traditional InsuranceCat Bond and Traditional Insurance
Cat Bond and Traditional Insurance
 
It skills for actuaries
It skills for actuariesIt skills for actuaries
It skills for actuaries
 
Statistics in insurance business
Statistics in insurance businessStatistics in insurance business
Statistics in insurance business
 
Spedicato r ininsurance2015
Spedicato r ininsurance2015Spedicato r ininsurance2015
Spedicato r ininsurance2015
 
Introduction to lifecontingencies R package
Introduction to lifecontingencies R packageIntroduction to lifecontingencies R package
Introduction to lifecontingencies R package
 
L'assicurazione rca gs
L'assicurazione rca   gsL'assicurazione rca   gs
L'assicurazione rca gs
 
Princing insurance contracts with R
Princing insurance contracts with RPrincing insurance contracts with R
Princing insurance contracts with R
 
Pricing and modelling under the Italian Direct Compensation Card Scheme
Pricing and modelling under the Italian Direct Compensation Card SchemePricing and modelling under the Italian Direct Compensation Card Scheme
Pricing and modelling under the Italian Direct Compensation Card Scheme
 
Actuarial modeling of general practictioners' drug prescriptions costs
Actuarial modeling of general practictioners' drug prescriptions costsActuarial modeling of general practictioners' drug prescriptions costs
Actuarial modeling of general practictioners' drug prescriptions costs
 

Kürzlich hochgeladen

fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdfmar yame
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...AES International
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasCherylouCamus
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...Amil baba
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTharshitverma1762
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Sonam Pathan
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxuzma244191
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 

Kürzlich hochgeladen (20)

fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdf
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng Pilipinas
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptx
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 

R Data Mining Insurance Retention Modeling

  • 1. R Data Mining for Insurance Retention Modeling R In Insurance Conference 2014 Giorgio A. Spedicato, Ph.D C.Stat ACAS UnipolSai Assicurazioni Research & Developmnent
  • 2. Introduction Retention modeling • Modeling policyholders' retention consists in predicting whether an existing policyholder will renew its contract in order to take underwriting decisions accordingly. • It means enriching a classical actuarial loss cost modeling with marketing and competitive business considerations in order to propose an "optimized premium"upon renew. • Similarly, "conversion" analysis incorporates in prospective policyholders' proposed premium marketing based considerations. 2
  • 3. Introduction Retention modeling • Logistic regression is the most widely used approach in actuarial practice, but little attention has been given to other machine learning (ML) techniques until now. • An application that showes the use of Random Forest for life insurance policy retention can be found in [Milhaud et al., 2011]. • Good introductions on the business side can be found in [Arquette and Guven, 2013] and [Anderson and McPhail, 2013]. 3
  • 4. Introduction Retention modeling • A statistician would consider retention modeling as a classification tasks. In fact, policyholders can be classified in two groups (Renew/Not Renew) according to features that lie within: demographics (age, gender, ...), insurance policy history (years since customer, claim history, coverage dept, ...), marketing and business environment (year to year price change and relative competitiveness, ...). • An adequately tuned retention model can be integrated within an underwriting algorithm to define the renewal price. • Redefining renewal price will need to recalculate price variation and competitiveness variables iteratively. 4
  • 5. Introduction The business problem • A 50K policies dataset has been available containing demographics, insurance and business variables as well as recorded Renew/Lapse decision. • Various retention models will be t to model the probability that the policyholder will renew upon termination date, pi . • The aim is to get a function pi = f (Pi ), where renewal probability is a function of the proposed renewal premium, Pi . 5
  • 6. Introduction The business problem • The final aim consists in maximizing the existing business gross premium • The renewal premium can be modified by 1% with respect to initial proposed value in our optimization scenario. • This exercise will not consider the loss cost component in the analysis, thus the portfolio premium volume will be optimized, not the global underwriting margin. 6 ( )∑= = Ni ii T pPP 1 *
  • 7. Introduction The approach • A standard logistic regression will be t and compared with various machine learning techniques in Section 2. • The models found in Section 2 will be used to calculate an optimized premium for each policyholder in Section 3. • Brief considerations will be drawn also. 7
  • 8. Predictive Modeling Data and Methods • The dataset can be deemed representative of a common personal lines retention problem. • Most statistical fitting process will be based on the caret [from Jed Wing et al., 2014] package framework to fit predictive models. • The reference model will be a logistic regression where continuous predictors have been handled using restricted cubic splines from rms package [Jr, 2014]. 8
  • 9. Predictive Modeling Data and Methods • Logistic regression key advantages are: relatively fast to be fit, easily interpretable, and no need of parameters' tuning. • Most machine learning techniques have at least one or more parameter that cannot be estimated from data and need to be tuned (tuning parameters). • The caret package, [from Jed Wing et al., 2014], provides a very valuable infrastructure that standardizes the process for fitting, validating and predicting models that have been implemented within different packages. 9
  • 10. Predictive Modeling Approach • A performance measure should be defined. For binary classification models, the area under the ROC curve (AUC) is a very common choice. • In addition, fitting the model over different samples (cross - validation) is a standard practice to better assess model performance in order to avoid obverting. • A final choice is to divide the dataset between a train set and a test set and to perform the final model evaluation under the test set. 10
  • 11. Predictive Modeling Model fitting and assessment • The following models were fitted on a train set split: logistic regression, C5.0 classification trees, Flexible Discriminant Analysis (FDA), Naive Bayes (NB), Support Vector Machines (SVM), Neural Networks (NN), Boosted Glms. • Only FDA shows a slight increase in predictive performance in the provided dataset when compared with logistic regression (the reference model) • Logistic regression and FDA appear very close in terms of predictive gain (Lift Curve, Figure 2) and precision in estimated probabilities (Calibration Chart, Figure 1) . 11
  • 12. Model fitting and assessment Calibration Plot 12
  • 13. Model fitting and assessment Lift Chart 13
  • 14. Pricing optimization Renewal premium optimization The retention model has been used to optimize the renewal premium, Pi . The proposed Pi was varied in different scenarios and the expected renewal premium inflow, pi PI (pi ), was calculated for each individual policyholder. Three premium variation scenarios were proposed: no change, +1% increase, -1% decrease. The highest expected renewal premium scenario for each customer has been selected. 14
  • 15. Pricing optimization Renewal premium optimization • Each policyholder would be proposed the highest expected renewal premium according to the three scenarios. • The final renewal decision for each policyholder is shown in Figure 3. • In the available data set, the overall variation in expected renewal premium is very close using either FDA or Logistic approach 15
  • 17. Pricing optimization Considerations • R appears to be a valid environment, not only to perform standard loss cost modeling but also to estimate renewal probabilities. • Logistic regression requires less estimation effort than other machine learning techniques (that not always offer improved predictive performance). • The caret package [from Jed Wing et al., 2014] provides a good framework to unify the process of building and assess competing machine learning techniques. 17
  • 18. Technical Notes The infrastructure • Various machine learning techniques have been compared in the analysis: glms (linear logistic regression and its boosted version); tree based methods (C5.0, Random Forest); non - linear methods (neural network, Naive Bayes Classiers, Support Vector Machines and Flexible Discriminant Analysis). • The caret package has been used in order to provide a common infrastructure to t the different models, compare their performance and perform predictions. • The data source was originally provided in SAS and imported in R thanks to the package sas7bdat, [Shotwell, 2014]. 18
  • 19. Technical Notes The caret package • The core caret function used is the train function that provides a common wrapper to t more than one hundred machine learning techniques. A predict method is available for all trained objects • Hyperparameter have been generally searched over a model pre - defined grid. 10 - fold cross validation has been used to better assess the predictive performance. • Among the other package's functions, lift and calibration function have been used to project lift gain and calibration charts. 19
  • 20. Technical Notes Logistic regression and extensions • glm R base function was wrapped by train in order to fit logistic regression. • rms package has been used to add restricted cubic splines to the prediction formula on the continuous predictors. • A boosted version of the logistic regession has also be fit with the use of mboost [Buehlmann and Hothorn, 2007] package. 20
  • 21. Technical Notes Tree based methods Tree based approaches perform classification by finding splits within available features that provides the most predictive performance within the subset. The splits sequence provides and effective way to visualize variables' dependency. The C5.0 algorithm, one of the most widely used approach, has been used in this exercise via the C50 package, [Kuhn et al., 2014]. A boosted approach, Gradient Boosted Machines, gbm, has also be used thanks to package [with contributions from others, 2013]. 21
  • 22. Technical Notes Neural Networks • A neural network model consists in a set of adaptive weights (neurons) that can approximate non linear functions. • Neural Networks represent a classical machine learning techniques, probably the one proposed first. • Among the many R packages available, the nnet package, [Venables and Ripley, 2002], has been used in this exercise. 22
  • 23. Technical Notes Support Vector Machines • A support vector machine (SVM) ts an hyperplane that aims to separate categories within the dependent variable with least error. It is rated among the top classiers. • Non linear shapes can be used in order to define the hyperplane: polynomial, hyperbolic, radial basis functions. • The kernlab package, [Karatzoglou et al., 2004] has been used in this exercise. 23
  • 24. Technical Notes Flexible Discriminant Analysis • The Flexible Discriminant Analysis (FDM) method extends the linear discriminant analysis by including non - linear hinge functions h (x) = max (x 􀀀 h; 0) in the calculation of the discriminant function. • hinge function can be used within the discriminant function also in polynomial form. • FDA models have been t with package the earth package, [from mda:mars by Trevor Hastie and utilities with Thomas Lumley's]. 24
  • 25. Technical Notes K Nearest Neighbours • A KNN model classifies a sample with the most frequent item within its closest neighbors in the train set. • It is a very simple approach, that can be used both in regression and classification. • KNN models can be t with the caret package, [from Jed Wing et al., 2014]. 25
  • 26. Technical Notes Naive Bayes • A Naive Bayes (NB) model classifies a sample using the conditional distribution of the predictors given the sample. • Even if independency across predictors is assumed, the overall accuracy provided by the model is often competitive compared to other techniques. • NB models can be estimated thanks to KlaR package, [Weihs et al., 2005]. 26
  • 27. Thanks • The author is grateful to UnipolSai, his employeer, for the support. • Nevertheless any opinion herewith stated is responsibility of the author only. UnipolSai does not take any position on such opinions and disclaims any responsibility for any incorrectness or legal error eventually found therein. • A special thanks in addition is given to Dr. Andrea Dal Pozzolo for his suggestions. 27