SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Model Automation in R
Using MASS, randomForest, forecast,
and caret
Who is Will Johnson?
● Database Manager at Uline (Pleasant Prairie)
● MS Predictive Analytics (2015)
● Operating www.LearnByMarketing.com
○ R tutorials, thoughts on analysis.
Learn By
Marketing.com
Agenda
1. What is Model Automation
2. Pros and Cons of Model Automation
3. Decision Trees and Random Forests {randomForest}
4. Stepwise Regression {MASS}
5. Auto.Arima for time series {forecast}
6. Hyperparameter Search {caret}
What is Model Automation?
Hypothesis Space
vs
Hyperparameter Space
Pros and Cons of Model Automation
PROS:
● You Don’t Have to Think!
● “Faster” Iterations.
● See what’s “Important”
CONS:
● You Don’t Have to Think!
● Jellybeans
Agenda
1. What is Model Automation
2. Pros and Cons of Model Automation
3. Decision Trees and Random Forests {randomForest}
4. Stepwise Regression {MASS}
5. Auto.Arima for time series {forecast}
6. Hyperparameter Search {caret}
Decision Trees
● Gini Index +
Entropy
randomForest
● Mean Decrease
in Gini Index
library(randomForest)
rf <- randomForest(y~., data = dat)
rf$importance #Var Name + Importance
varImpPlot(rf) #Visualization
Stepwise
Regression
● AIC
Stepwise
Regression
library(MASS)
mod <- lm(hp~.,data=mt)
#Step Backward and remove one variable at a time
stepAIC(mod,direction = "backward",trace = T)
#Create a model using only the intercept
mod_lower = lm(hp~1,data=mt)
#Step Forward and add one variable at a time
stepAIC(mod_lower,direction = "forward",
scope=list(upper=upper_form,lower=~1))
#Step Forward or Backward each step starting with a intercept model
stepAIC(mod_lower,direction = "both",
scope=list(upper=upper_form,lower=~1))
#Get the Independent Variables
#(and exclude hp dependent variable)
indep_vars <-paste(names(mt)[-which(names(mt)=="hp")],
collapse="+")
#Turn those variable names into a formula
upper_form = formula(paste("~",indep_vars,collapse=""))
#~mpg + cyl + disp + drat + wt + qsec + vs + am + gear + carb
Auto.Arima
● Time Series models.
● AutoRegressive…
● Moving Averages…
● With Differencing!
library(forecast)
library(fpp)
#Step Backward and remove one variable at a time
data("elecequip")
ee <- elecequip[1:180]
model <- auto.arima(ee,stationary = T)
# ar1 ma1 ma2 ma3 intercept
#0.8428 -0.6571 -0.1753 0.6353 95.7265
#s.e. 0.0431 0.0537 0.0573 0.0561 3.2223
plot(forecast(model,h=10))
lines(x = 181:191, y= elecequip[181:191],
type = 'l', col = 'red')
Auto.Arima
train {caret}
library(caret)
#Step Backward and remove one variable at a time
tctrl <- trainControl(method = "cv",number=10,
repeats=10)
rpart_opts <- expand.grid(cp = seq(0.0,0.01, by = 0.001))
rpart_model <- train(y~. data, method="rpart",
metric = "Kappa", trControl = tctrl,
tuneGrid = rpart_opts, subset = train_log)
train {caret}
Recap
Learn By
Marketing.com
library(randomForest) varImpPlot()
library(MASS) stepAIC()
library(forecast) auto.arima()
library(caret) train()
Questions?
Learn By
Marketing.com

Weitere ähnliche Inhalte

Andere mochten auch

Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionWill Johnson
 
The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...odsc
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachWithTheBest
 
Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Makoto Yui
 
Error analysis randomforest
Error analysis randomforestError analysis randomforest
Error analysis randomforestriswan_zen
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with Rbutest
 
Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...butest
 
Big Data in Stock Exchange( HFT, Forex, Flash Crashes)
Big Data in Stock Exchange( HFT, Forex, Flash Crashes) Big Data in Stock Exchange( HFT, Forex, Flash Crashes)
Big Data in Stock Exchange( HFT, Forex, Flash Crashes) Dmytro Melnychuk
 
Larry tabb hft - part 1
Larry tabb   hft - part 1Larry tabb   hft - part 1
Larry tabb hft - part 1Smith Kim
 
Meeting the data management challenges of MiFID II
Meeting the data management challenges of MiFID IIMeeting the data management challenges of MiFID II
Meeting the data management challenges of MiFID IILeigh Hill
 
MiFID II: Data for best execution
MiFID II: Data for best executionMiFID II: Data for best execution
MiFID II: Data for best executionLeigh Hill
 
Getting Ready for MiFID II
Getting Ready for MiFID II Getting Ready for MiFID II
Getting Ready for MiFID II corfinancial
 
MiFID II: Data for transparency
MiFID II: Data for transparencyMiFID II: Data for transparency
MiFID II: Data for transparencyLeigh Hill
 
The impact of MiFID II on your OTC derivatives trading business
The impact of MiFID II on your OTC derivatives trading businessThe impact of MiFID II on your OTC derivatives trading business
The impact of MiFID II on your OTC derivatives trading businessTom White
 
MiFID II- Client issues presentation Leeds
MiFID II- Client issues presentation LeedsMiFID II- Client issues presentation Leeds
MiFID II- Client issues presentation LeedsBovill
 
Naive Bayes Example using R
Naive Bayes Example using  R Naive Bayes Example using  R
Naive Bayes Example using R Dr. Volkan OBAN
 
MiFID II - investor protection - Bovill briefing feb 15
MiFID II - investor protection - Bovill briefing feb 15MiFID II - investor protection - Bovill briefing feb 15
MiFID II - investor protection - Bovill briefing feb 15Bovill
 
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTExtent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTextentconf Tsoy
 

Andere mochten auch (19)

Recommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS FunctionRecommender Systems with Apache Spark's ALS Function
Recommender Systems with Apache Spark's ALS Function
 
The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...The caret package is a unified interface to a large number of predictive mode...
The caret package is a unified interface to a large number of predictive mode...
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna Quach
 
Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016Hivemall dbtechshowcase 20160713 #dbts2016
Hivemall dbtechshowcase 20160713 #dbts2016
 
Error analysis randomforest
Error analysis randomforestError analysis randomforest
Error analysis randomforest
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...Visualization and Machine Learning - for exploratory data ...
Visualization and Machine Learning - for exploratory data ...
 
Access any data anywhere
Access any data anywhereAccess any data anywhere
Access any data anywhere
 
Big Data in Stock Exchange( HFT, Forex, Flash Crashes)
Big Data in Stock Exchange( HFT, Forex, Flash Crashes) Big Data in Stock Exchange( HFT, Forex, Flash Crashes)
Big Data in Stock Exchange( HFT, Forex, Flash Crashes)
 
Larry tabb hft - part 1
Larry tabb   hft - part 1Larry tabb   hft - part 1
Larry tabb hft - part 1
 
Meeting the data management challenges of MiFID II
Meeting the data management challenges of MiFID IIMeeting the data management challenges of MiFID II
Meeting the data management challenges of MiFID II
 
MiFID II: Data for best execution
MiFID II: Data for best executionMiFID II: Data for best execution
MiFID II: Data for best execution
 
Getting Ready for MiFID II
Getting Ready for MiFID II Getting Ready for MiFID II
Getting Ready for MiFID II
 
MiFID II: Data for transparency
MiFID II: Data for transparencyMiFID II: Data for transparency
MiFID II: Data for transparency
 
The impact of MiFID II on your OTC derivatives trading business
The impact of MiFID II on your OTC derivatives trading businessThe impact of MiFID II on your OTC derivatives trading business
The impact of MiFID II on your OTC derivatives trading business
 
MiFID II- Client issues presentation Leeds
MiFID II- Client issues presentation LeedsMiFID II- Client issues presentation Leeds
MiFID II- Client issues presentation Leeds
 
Naive Bayes Example using R
Naive Bayes Example using  R Naive Bayes Example using  R
Naive Bayes Example using R
 
MiFID II - investor protection - Bovill briefing feb 15
MiFID II - investor protection - Bovill briefing feb 15MiFID II - investor protection - Bovill briefing feb 15
MiFID II - investor protection - Bovill briefing feb 15
 
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTExtent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
 

Ähnlich wie Model Automation in R Using MASS, randomForest, forecast, and caret

TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
Linear regression in R
Linear regression in R Linear regression in R
Linear regression in R Leon Kim
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLVijaySharma802
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneXiaoweiJiang7
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGLubna_Alhenaki
 
Time Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlowTime Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlowSeungHyun Jeon
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systemsOlivier Teytaud
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 

Ähnlich wie Model Automation in R Using MASS, randomForest, forecast, and caret (20)

TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Linear regression in R
Linear regression in R Linear regression in R
Linear regression in R
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Cutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tuneCutting edge hyperparameter tuning made simple with ray tune
Cutting edge hyperparameter tuning made simple with ray tune
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
 
Time Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlowTime Series Analysis: Challenge Kaggle with TensorFlow
Time Series Analysis: Challenge Kaggle with TensorFlow
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 

Kürzlich hochgeladen

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 

Kürzlich hochgeladen (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 

Model Automation in R Using MASS, randomForest, forecast, and caret

  • 1. Model Automation in R Using MASS, randomForest, forecast, and caret
  • 2. Who is Will Johnson? ● Database Manager at Uline (Pleasant Prairie) ● MS Predictive Analytics (2015) ● Operating www.LearnByMarketing.com ○ R tutorials, thoughts on analysis. Learn By Marketing.com
  • 3. Agenda 1. What is Model Automation 2. Pros and Cons of Model Automation 3. Decision Trees and Random Forests {randomForest} 4. Stepwise Regression {MASS} 5. Auto.Arima for time series {forecast} 6. Hyperparameter Search {caret}
  • 4. What is Model Automation? Hypothesis Space vs Hyperparameter Space
  • 5. Pros and Cons of Model Automation PROS: ● You Don’t Have to Think! ● “Faster” Iterations. ● See what’s “Important” CONS: ● You Don’t Have to Think! ● Jellybeans
  • 6.
  • 7. Agenda 1. What is Model Automation 2. Pros and Cons of Model Automation 3. Decision Trees and Random Forests {randomForest} 4. Stepwise Regression {MASS} 5. Auto.Arima for time series {forecast} 6. Hyperparameter Search {caret}
  • 8. Decision Trees ● Gini Index + Entropy
  • 9. randomForest ● Mean Decrease in Gini Index library(randomForest) rf <- randomForest(y~., data = dat) rf$importance #Var Name + Importance varImpPlot(rf) #Visualization
  • 11. Stepwise Regression library(MASS) mod <- lm(hp~.,data=mt) #Step Backward and remove one variable at a time stepAIC(mod,direction = "backward",trace = T) #Create a model using only the intercept mod_lower = lm(hp~1,data=mt) #Step Forward and add one variable at a time stepAIC(mod_lower,direction = "forward", scope=list(upper=upper_form,lower=~1)) #Step Forward or Backward each step starting with a intercept model stepAIC(mod_lower,direction = "both", scope=list(upper=upper_form,lower=~1)) #Get the Independent Variables #(and exclude hp dependent variable) indep_vars <-paste(names(mt)[-which(names(mt)=="hp")], collapse="+") #Turn those variable names into a formula upper_form = formula(paste("~",indep_vars,collapse="")) #~mpg + cyl + disp + drat + wt + qsec + vs + am + gear + carb
  • 12. Auto.Arima ● Time Series models. ● AutoRegressive… ● Moving Averages… ● With Differencing! library(forecast) library(fpp) #Step Backward and remove one variable at a time data("elecequip") ee <- elecequip[1:180] model <- auto.arima(ee,stationary = T) # ar1 ma1 ma2 ma3 intercept #0.8428 -0.6571 -0.1753 0.6353 95.7265 #s.e. 0.0431 0.0537 0.0573 0.0561 3.2223 plot(forecast(model,h=10)) lines(x = 181:191, y= elecequip[181:191], type = 'l', col = 'red')
  • 14. train {caret} library(caret) #Step Backward and remove one variable at a time tctrl <- trainControl(method = "cv",number=10, repeats=10) rpart_opts <- expand.grid(cp = seq(0.0,0.01, by = 0.001)) rpart_model <- train(y~. data, method="rpart", metric = "Kappa", trControl = tctrl, tuneGrid = rpart_opts, subset = train_log)
  • 16. Recap Learn By Marketing.com library(randomForest) varImpPlot() library(MASS) stepAIC() library(forecast) auto.arima() library(caret) train()