SlideShare a Scribd company logo
1 of 12
Morse-Smale Regression
Extensions for Risk
Modeling
Colleen M. Farrelly
Introduction
 Subgroups are ubiquitous in scientific research and actuarial science.
 Risk is not uniform.
 Risk types can vary in degree and kind, and high risk on some factors are not as
high risk overall as lower risk on other factors.
 Piecewise regression is one method that can accurately capture this
phenomenon.
 Morse-Smale regression is a topologically-based piecewise regression method
that has shown promise on various Tweedie-distributed outcomes, including
common distributions used in modeling risk.
 This method currently employs elastic net and generalized linear modeling to fit
the regression pieces to Morse-Smale-complex-based partitions.
 Many machine learning extensions of regression exist and can capture multivariate
trends in the data, which is a limitation of both elastic net and generalized linear
modeling.
 Extending Morse-Smale regression to machine-learning-based models can
potentially improve accuracy and understanding of risk.
Tweedie Regression Overview
• Tweedie model framework (many
biological/social count variables):
• y=φµξ
• Where φ is the dispersion parameter (extra
zeros in model, here=1.5)
• µ is the mean
• ξ is the Tweedie parameter (increased mass
near zero and fatness of non-zero
distributions)
• Many exponential distributions converge to
Tweedie distributions (normal, Poisson,
gamma, compound Poisson-gamma…)
• Examples
• Number of students enrolled by advisor in a
month
• Insurance claim payout
• Heroin use per month
Morse-Smale Regression Overview: I
 To build intuition:
 Imagine a soccer player kicking a ball on the ground of a hilly field to explore the
field.
 The high and low points (maxima and minima) determine where the ball will come to rest.
 These paths of the ball define which parts of the field share common hills and valleys.
 These paths are actually gradient paths defined by height on the field’s topological
space.
 The spaces they define are the Morse-Smale complex of the field, partitioning it
into different regions (clusters).
Morse-Smale Regression Overview: II
 Morse-Smale clusters
partition data space into
sections with common
minimums and maximums
based on the function flow.
 Groups can be visualized in
low-dimensional space to
see commonalities and
differences (top right).
 Groups can also be
examined based on
differences in predictor
values (bottom right).
 This provides users with a
good visualization tool to
understand the data.
Example: 2 groups,
3 predictors
Extending Morse-Smale Regression
 Multivariate algorithms to fit partitioned regression models
 Random forest
 Bagged ensemble of tree models
 Akin to combining novel summaries of a class randomly assigned a few chapters
 Boosted regression
 Iteratively added model of main effects and interaction terms
 Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there
 Homotopy LASSO
 Extends penalized regression model (LASSO) through homotopy estimation methods
 Akin to a blind-folded person navigating around obstacles between two set points by
following a rope
 Conditional inference tree
 Tree method that partitions space by assessing covariate independence
 Extreme learning machine
 Single-layer feed-forward neural networks based on random mapping between layers
 Has universal approximation properties
Simulation and Swedish Motor Insurance
 Simulation
 Simulation design parameters
 4 true predictors, 11 noise variables
 Sample size set to 10,000
 Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion
(1, 2, 4)
 Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a
combination of 2 main effects and 1 interaction effect)
 Each trial was run 10 times with a 70/30 training/test split.
 Mean square error (MSE) was used to assess model accuracy.
 Swedish 3rd Party Motor Insurance 1977
 2182 observations with 6 predictors (kilometers traveled per year, geographic zone,
bonus, car model make, number of years insured, total claims)
 MSE assessed for all models based on 70/30 training/test split
Simulation Results
 Most multivariate Morse-Smale regression algorithms perform well against the
original Morse-Smale regression algorithm, particularly for trials involving linear or
mixed predictor relationships and trials with lower dispersion.
 Some of these models outperformed their non-piecewise counterpart models, as
well.
 Even when algorithms perform similarly to non-piecewise counterparts, they
provide a comparison of predictor importance among different risk subgroups and
methods to visualize these differences (random forest model shown below).
Swedish Motor Insurance Results: I
 Most machine learning models perform well, and multivariate Morse-Smale
regression methods perform exceptionally well.
Swedish Motor Insurance Results: II
 Three distinct subgroups were found, and risk type varied significantly
between them.
 Group 1: relatively high dependence on make and number of claims
 Group 2: relatively high dependence on bonus and number of years insured
 Group 3: almost solely dependent on number of claims and geographic zone
Conclusions
 Multivariate Morse-Smale regression models typically:
 Outperform the original Morse-Smale regression algorithm
 Perform comparably to the non-partitioned models built with the same machine
learning algorithm.
 Multivariate Morse-Smale regression models provide subgroup-based analytics
capabilities and differentiated risk structure abilities that can help actuaries:
 Better understand risk
 Create models based on insurance policy risk groups (as well as risk level)
 Visualize this process to help others within the industry understand the models
(less black-box)
 However, some black-box algorithms perform better on Tweedie regression
problems (particularly Farrelly, 2017, KNN regression ensembles); these
methods don’t allow for visualization or comparison of risk factors.
 Large sample sizes are needed for good performance, but most insurance
datasets are large enough to circumvent potential convergence issues.
References
 Talk is a summary of:
 Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to
Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for
publication by Casualty Actuarial Society
 Selected references from 2017 Farrelly paper:
 De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data
(Vol. 10). Cambridge: Cambridge University Press.
 Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of
Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.
 Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse–
smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-
214.
 McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the
American Statistical Association, 65(331), 1109-1124.
 Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting
Glaucomatous Progression with Piecewise Regression Model from Heterogeneous
Medical Data. HEALTHINF, 2016.

More Related Content

What's hot

ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theoryswapnac12
 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data MiningSamad Baseer Khan
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 
How to choose Machine Learning algorithm.
How to choose Machine Learning  algorithm.How to choose Machine Learning  algorithm.
How to choose Machine Learning algorithm.Mala Deep Upadhaya
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)cairo university
 
Unit 4 Data Editing.pptx
Unit 4 Data Editing.pptxUnit 4 Data Editing.pptx
Unit 4 Data Editing.pptxe20ag004
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at ScaleSri Ambati
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuningArsalan Qadri
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9Sunwoo Kim
 

What's hot (20)

ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Data preprocessing in Data Mining
Data preprocessing  in Data MiningData preprocessing  in Data Mining
Data preprocessing in Data Mining
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
How to choose Machine Learning algorithm.
How to choose Machine Learning  algorithm.How to choose Machine Learning  algorithm.
How to choose Machine Learning algorithm.
 
Kriging
KrigingKriging
Kriging
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Logistic regression analysis
Logistic regression analysisLogistic regression analysis
Logistic regression analysis
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 
Unit 4 Data Editing.pptx
Unit 4 Data Editing.pptxUnit 4 Data Editing.pptx
Unit 4 Data Editing.pptx
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Stochastic gradient descent and its tuning
Stochastic gradient descent and its tuningStochastic gradient descent and its tuning
Stochastic gradient descent and its tuning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
PRML Chapter 9
PRML Chapter 9PRML Chapter 9
PRML Chapter 9
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 

Similar to Morse-Smale Regression for Risk Modeling

Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Stats Statswork
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512IJRAT
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret languagejaved khan
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...BRNSS Publication Hub
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdfLeonardo Auslender
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningIOSR Journals
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...butest
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingGregg Barrett
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear modelsColleen Farrelly
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingErika G. G.
 

Similar to Morse-Smale Regression for Risk Modeling (20)

Statsci
StatsciStatsci
Statsci
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...Panel data analysis a survey on model based clustering of time series - stats...
Panel data analysis a survey on model based clustering of time series - stats...
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
Morse et al 2012
Morse et al 2012Morse et al 2012
Morse et al 2012
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Building Predictive Models R_caret language
Building Predictive Models R_caret languageBuilding Predictive Models R_caret language
Building Predictive Models R_caret language
 
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC  DISTRIBUTION USING MAXIMUM LIKELIH...
ALPHA LOGARITHM TRANSFORMED SEMI LOGISTIC DISTRIBUTION USING MAXIMUM LIKELIH...
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
DATA-DEPENDENT MODELS OF SPECIES-HABITAT RELATIONSHIPS D. Todd ...
 
Modelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boostingModelling the expected loss of bodily injury claims using gradient boosting
Modelling the expected loss of bodily injury claims using gradient boosting
 
Quantum generalized linear models
Quantum generalized linear modelsQuantum generalized linear models
Quantum generalized linear models
 
Maxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modelingMaxentropic and quantitative methods in operational risk modeling
Maxentropic and quantitative methods in operational risk modeling
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptxColleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxColleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxColleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxColleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxColleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptxColleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptxColleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptxColleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxColleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptxColleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxColleen Farrelly
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptxColleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxColleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxColleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science TalkColleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Recently uploaded (20)

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

Morse-Smale Regression for Risk Modeling

  • 1. Morse-Smale Regression Extensions for Risk Modeling Colleen M. Farrelly
  • 2. Introduction  Subgroups are ubiquitous in scientific research and actuarial science.  Risk is not uniform.  Risk types can vary in degree and kind, and high risk on some factors are not as high risk overall as lower risk on other factors.  Piecewise regression is one method that can accurately capture this phenomenon.  Morse-Smale regression is a topologically-based piecewise regression method that has shown promise on various Tweedie-distributed outcomes, including common distributions used in modeling risk.  This method currently employs elastic net and generalized linear modeling to fit the regression pieces to Morse-Smale-complex-based partitions.  Many machine learning extensions of regression exist and can capture multivariate trends in the data, which is a limitation of both elastic net and generalized linear modeling.  Extending Morse-Smale regression to machine-learning-based models can potentially improve accuracy and understanding of risk.
  • 3. Tweedie Regression Overview • Tweedie model framework (many biological/social count variables): • y=φµξ • Where φ is the dispersion parameter (extra zeros in model, here=1.5) • µ is the mean • ξ is the Tweedie parameter (increased mass near zero and fatness of non-zero distributions) • Many exponential distributions converge to Tweedie distributions (normal, Poisson, gamma, compound Poisson-gamma…) • Examples • Number of students enrolled by advisor in a month • Insurance claim payout • Heroin use per month
  • 4. Morse-Smale Regression Overview: I  To build intuition:  Imagine a soccer player kicking a ball on the ground of a hilly field to explore the field.  The high and low points (maxima and minima) determine where the ball will come to rest.  These paths of the ball define which parts of the field share common hills and valleys.  These paths are actually gradient paths defined by height on the field’s topological space.  The spaces they define are the Morse-Smale complex of the field, partitioning it into different regions (clusters).
  • 5. Morse-Smale Regression Overview: II  Morse-Smale clusters partition data space into sections with common minimums and maximums based on the function flow.  Groups can be visualized in low-dimensional space to see commonalities and differences (top right).  Groups can also be examined based on differences in predictor values (bottom right).  This provides users with a good visualization tool to understand the data. Example: 2 groups, 3 predictors
  • 6. Extending Morse-Smale Regression  Multivariate algorithms to fit partitioned regression models  Random forest  Bagged ensemble of tree models  Akin to combining novel summaries of a class randomly assigned a few chapters  Boosted regression  Iteratively added model of main effects and interaction terms  Akin to guessing a puzzle’s picture by adding key pieces until the picture is mostly there  Homotopy LASSO  Extends penalized regression model (LASSO) through homotopy estimation methods  Akin to a blind-folded person navigating around obstacles between two set points by following a rope  Conditional inference tree  Tree method that partitions space by assessing covariate independence  Extreme learning machine  Single-layer feed-forward neural networks based on random mapping between layers  Has universal approximation properties
  • 7. Simulation and Swedish Motor Insurance  Simulation  Simulation design parameters  4 true predictors, 11 noise variables  Sample size set to 10,000  Outcome Tweedie-distributed, with Tweedie parameter varying (1, 1.5, 2) and dispersion (1, 2, 4)  Nature of predictor relationships varied (4 main effects, 2 interaction effects, or a combination of 2 main effects and 1 interaction effect)  Each trial was run 10 times with a 70/30 training/test split.  Mean square error (MSE) was used to assess model accuracy.  Swedish 3rd Party Motor Insurance 1977  2182 observations with 6 predictors (kilometers traveled per year, geographic zone, bonus, car model make, number of years insured, total claims)  MSE assessed for all models based on 70/30 training/test split
  • 8. Simulation Results  Most multivariate Morse-Smale regression algorithms perform well against the original Morse-Smale regression algorithm, particularly for trials involving linear or mixed predictor relationships and trials with lower dispersion.  Some of these models outperformed their non-piecewise counterpart models, as well.  Even when algorithms perform similarly to non-piecewise counterparts, they provide a comparison of predictor importance among different risk subgroups and methods to visualize these differences (random forest model shown below).
  • 9. Swedish Motor Insurance Results: I  Most machine learning models perform well, and multivariate Morse-Smale regression methods perform exceptionally well.
  • 10. Swedish Motor Insurance Results: II  Three distinct subgroups were found, and risk type varied significantly between them.  Group 1: relatively high dependence on make and number of claims  Group 2: relatively high dependence on bonus and number of years insured  Group 3: almost solely dependent on number of claims and geographic zone
  • 11. Conclusions  Multivariate Morse-Smale regression models typically:  Outperform the original Morse-Smale regression algorithm  Perform comparably to the non-partitioned models built with the same machine learning algorithm.  Multivariate Morse-Smale regression models provide subgroup-based analytics capabilities and differentiated risk structure abilities that can help actuaries:  Better understand risk  Create models based on insurance policy risk groups (as well as risk level)  Visualize this process to help others within the industry understand the models (less black-box)  However, some black-box algorithms perform better on Tweedie regression problems (particularly Farrelly, 2017, KNN regression ensembles); these methods don’t allow for visualization or comparison of risk factors.  Large sample sizes are needed for good performance, but most insurance datasets are large enough to circumvent potential convergence issues.
  • 12. References  Talk is a summary of:  Farrelly, C. M. (2017). Extensions of Morse-Smale Regression with Application to Actuarial Science. arXiv preprint arXiv:1708.05712.—Accepted Dec 2017 for publication by Casualty Actuarial Society  Selected references from 2017 Farrelly paper:  De Jong, P., & Heller, G. Z. (2008). Generalized linear models for insurance data (Vol. 10). Cambridge: Cambridge University Press.  Farrelly, C. M. (2017). KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods. arXiv preprint arXiv:1708.02122.  Gerber, S., Rübel, O., Bremer, P. T., Pascucci, V., & Whitaker, R. T. (2013). Morse– smale regression. Journal of Computational and Graphical Statistics, 22(1), 193- 214.  McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression. Journal of the American Statistical Association, 65(331), 1109-1124.  Tomoda, K., Morino, K., Murata, H., Asaoka, R., & Yamanishi, K. (2016). Predicting Glaucomatous Progression with Piecewise Regression Model from Heterogeneous Medical Data. HEALTHINF, 2016.