SlideShare ist ein Scribd-Unternehmen logo
1 von 17
How-To

Build Your
First Model
A
publicatio
n of
INTRO
Building your first model with a new data mining tool can be intimidating.
Though some of us may have some intuition for model building, it’s pretty daunting to
look at the default settings, knowing you have a ways to go before you have an
accurate, explainable predictive model to hand over to your boss.
To make sure you’re set up for data mining success, follow these simple steps to build
your first models in the SPM software suite.
Want to skip ahead? Here’s what we’re going to cover.
IMPORT DATA
5 … Prepare
6 … Stay Organized

PERFORMANCE
Model Setup
8 … Select and Engine 15 … What To Look For
17 … What’s Next
9 … Analysis Type
10 … Variables
11 … Testing
12 … Control Parameters
IMPORT
DATA
We’re going to walk you through best practices
for preparing and uploading your data into the
SPM software.
PREPARE
1

Make sure your data is in a ‘flat’ file (i.e. rows x columns)

2

Make sure you understand your variable labels! If you don’t
understand what your variables represent, you’re going to have a heck
of a time understanding your results.

Want to read the nitty gritty?
Want to read the nitty gritty?

Check out the complete SPM User Guide.
Check out the complete SPM User Guide.
STAY ORGANIZED
Save your data set, or sets, in one,
easy-to-find folder. If you’re pulling in
data from all over creation, you’re
just making the process longer and
more difficult to comprehend. Do
yourself a favor and dedicate a
directory to each data mining project
you’re working on.
Model Setup
10 parameters to pay attention to
when building a model
Once you have imported your data, you need to set
a few parameters (leaving most of them in default
settings) before you click ‘start.’
RuleLearner/Model Compression

Random Forests

Select an
Engine.

Regression
CART

Data Binning

CART Ensembles
MARS

TreeNet

Logit
GPS/Generalized Lasso
Classification.
Regression.
Logistic Binary.
Unsupervised.
SELECT A TARGET VARIABLE
AND PREDICTORS
1

You must have a target variable.

2

You should have multiple predictors.

3

You don’t need to use all of your predictors.

4

Take note of categorical vs. continuous variables.
SELECT A TESTING METHOD
No independent testing – exploratory tree
Fraction of cases selected at random for testing (%)
Test sample contained in a separate file
V-fold cross-validation (i.e 10)
Salford Systems Recommends
That You Manually Set Your:
•

Learn rate

•

Number of trees built

•

Number of nodes in a tree

•

Loss criterion

*These will vary depending on the modeling engine being used to build a model.
CLICK START!
CLICK START!
YOU ARE NOW BUILDING
YOUR FIRST MODEL
EVALUATING
YOUR
PERFORMANCE
Don’t get overwhelmed by all of the fancy reporting features available in the SPM
software suite. Start slow. We will show you where to begin if you are new to using
SPM and just want to understand what your model means.
What To Look For
•
•
•
•
•

Mean Squared Error (MSE)
R-Squared
Test vs. Learn Performance
Variable Performance
Variable Dependence Plots
(TreeNet)
… AND
YOU’RE
DONE!

If you have already
downloaded the SPM
software, build a model!
Once you’ve built your first
model, start tweaking some of
the control parameters we
discussed.
What is your best model
performance so far?
WHAT’S NEXT?
• Watch our video series on
how to build your first
model.
Watch our video series on how to build your first model.

Weitere ähnliche Inhalte

Andere mochten auch

Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...
SlideTeam.net
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
 
Build a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hoursBuild a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hours
DataWorks Summit
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
Patrick McFadin
 

Andere mochten auch (19)

What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Tune up your data science process
Tune up your data science processTune up your data science process
Tune up your data science process
 
Analysis, data & process modeling
Analysis, data & process modelingAnalysis, data & process modeling
Analysis, data & process modeling
 
Cross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive data
 
Ordinary least squares linear regression
Ordinary least squares linear regressionOrdinary least squares linear regression
Ordinary least squares linear regression
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...Statistical analysis of process data 7 stages oil flow chart power point temp...
Statistical analysis of process data 7 stages oil flow chart power point temp...
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Predictions from MARS
Predictions from MARSPredictions from MARS
Predictions from MARS
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 
How to read a data model
How to read a data modelHow to read a data model
How to read a data model
 
Build a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hoursBuild a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hours
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the FuturePython Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 

Mehr von Salford Systems

Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
Salford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
Salford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
Salford Systems
 

Mehr von Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

How to Build Your First Predictive Model

  • 2. INTRO Building your first model with a new data mining tool can be intimidating. Though some of us may have some intuition for model building, it’s pretty daunting to look at the default settings, knowing you have a ways to go before you have an accurate, explainable predictive model to hand over to your boss. To make sure you’re set up for data mining success, follow these simple steps to build your first models in the SPM software suite.
  • 3. Want to skip ahead? Here’s what we’re going to cover. IMPORT DATA 5 … Prepare 6 … Stay Organized PERFORMANCE Model Setup 8 … Select and Engine 15 … What To Look For 17 … What’s Next 9 … Analysis Type 10 … Variables 11 … Testing 12 … Control Parameters
  • 4. IMPORT DATA We’re going to walk you through best practices for preparing and uploading your data into the SPM software.
  • 5. PREPARE 1 Make sure your data is in a ‘flat’ file (i.e. rows x columns) 2 Make sure you understand your variable labels! If you don’t understand what your variables represent, you’re going to have a heck of a time understanding your results. Want to read the nitty gritty? Want to read the nitty gritty? Check out the complete SPM User Guide. Check out the complete SPM User Guide.
  • 6. STAY ORGANIZED Save your data set, or sets, in one, easy-to-find folder. If you’re pulling in data from all over creation, you’re just making the process longer and more difficult to comprehend. Do yourself a favor and dedicate a directory to each data mining project you’re working on.
  • 7. Model Setup 10 parameters to pay attention to when building a model Once you have imported your data, you need to set a few parameters (leaving most of them in default settings) before you click ‘start.’
  • 8. RuleLearner/Model Compression Random Forests Select an Engine. Regression CART Data Binning CART Ensembles MARS TreeNet Logit GPS/Generalized Lasso
  • 10. SELECT A TARGET VARIABLE AND PREDICTORS 1 You must have a target variable. 2 You should have multiple predictors. 3 You don’t need to use all of your predictors. 4 Take note of categorical vs. continuous variables.
  • 11. SELECT A TESTING METHOD No independent testing – exploratory tree Fraction of cases selected at random for testing (%) Test sample contained in a separate file V-fold cross-validation (i.e 10)
  • 12. Salford Systems Recommends That You Manually Set Your: • Learn rate • Number of trees built • Number of nodes in a tree • Loss criterion *These will vary depending on the modeling engine being used to build a model.
  • 13. CLICK START! CLICK START! YOU ARE NOW BUILDING YOUR FIRST MODEL
  • 14. EVALUATING YOUR PERFORMANCE Don’t get overwhelmed by all of the fancy reporting features available in the SPM software suite. Start slow. We will show you where to begin if you are new to using SPM and just want to understand what your model means.
  • 15. What To Look For • • • • • Mean Squared Error (MSE) R-Squared Test vs. Learn Performance Variable Performance Variable Dependence Plots (TreeNet)
  • 16. … AND YOU’RE DONE! If you have already downloaded the SPM software, build a model! Once you’ve built your first model, start tweaking some of the control parameters we discussed. What is your best model performance so far?
  • 17. WHAT’S NEXT? • Watch our video series on how to build your first model. Watch our video series on how to build your first model.