SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Predictive Analytics
Peter Bruce
THE INSTITUTE FOR STATISTICS EDUCATION
at Statistics.com
peter.bruce@statistics.com
About Statistics.com
THE INSTITUTE FOR STATISTICS EDUCATION
• 100+ courses, introductory and advanced
• Traditional statistics, data mining, machine
learning, text mining, clinical
trials, optimization, use of R
• All online
• Typically 4 weeks, scheduled dates
• Don’t need to be online particular times/days
• Private discussion forum with instructors - noted
authors & experts
A man walks into a Target® store…
Predictive Analytics
• In marketing, used for model driven targeted
sales efforts
• Also… will loan default, what diagnosis (given
symptoms), is tax return fraudulent, …
Market Research
• Traditionally surveys, analysis, information
gathering, strategy
• Moving online increases the amount of
data, speeds its flow, and makes it more
accessible
Washington Post (web)
• 35 different reports tracking traffic daily
• Midday report “are we on track for visitors?”
• # visitors from key domains - .gov, .mil, .senate
or .house
Daily Mail (UK web)
• A traditional ingredient is stories about
animals – tracked on web
• “The animals that do best are
monkeys, dogs, and cats, in that order…”
Martin Clark (editor)
Back to Target
Predictive Analytics
• Goes beyond the obvious, capturing
complexity
• Implemented for real-time behavior and
decisions
Pregnant?
• Obvious retail clues – maternity clothes, baby
food, baby clothes, crib …
• These may be too late
• Earlier clues not so obvious –
lotions, supplements, and, esp., combinations
and changes in purchase patterns
• Data mining algorithms can capture these less
obvious, more complex signals
Training the Model
• Bridal registry
• Women of similar demographic not on bridal
registry
• Together, the training set
– Known outcome
– Purchase data over time
Hypothetical Data
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ?
1011 1 1 1 1 0 1 0
1012 1 0 1 0 1 0 1
1013 1 1 0 1 1 0 1
1014 0 1 1 0 1 1 0
1015 1 1 0 1 1 0 0
1016 0 0 1 0 1 0 1
Classification Algorithms
• K-nearest neighbors (involves 3 notions)
– Distance measure
– Centroid
– Majority vote or average
K-NN
Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90
Registry
?
1 1 1 1 1 0 1 0
2 1 1 1 1 0 1 0
3 1 1 0 1 0 1 0
4 0 1 1 0 1 1 0
5 1 1 0 1 1 0 1
6 0 0 1 0 1 0 1
NEW 1 0 1 1 0 1 ?
Classification Algorithms, cont.
• Logistic Regression
• CART
• Discriminant Analysis
• Neural Network
• Naïve Bayes
The Overfit Problem
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Complex function - overfit
0
200
400
600
800
1000
1200
1400
1600
0 200 400 600 800 1000
Revenue
Expenditure
Therefore: Validate the Model
• Partition the original data
– Training
– Validation
• Fit the model to the training data
• Assess performance using the validation data
Performance Metrics
• Continuous
– RMSE
• Categorical (often binary)
– % accurate (confusion matrix)
– Lift
Confusion Matrix and Cutoff Control
Training Data scoring - Summary Report
Cut off Prob.Val. for Success (Updatable) 0.5
Classification Confusion Matrix
Predicted Class
Actual Class 1 0
1 43 8
0 6 247
Lift
• In classifying “pregnant” vs. “not-pregnant”
classifying everyone as “not-pregnant” has
very high overall accuracy
• Need metric that reflects greater importance
of the “pregnant” category, which is rare
• Lift is the model’s improvement over average
random selection
Decile Lift Chart
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Decilemean/Globalmean
Deciles
Decile-wise lift chart (validation dataset)
Validate the Model
• Compare one model to another
• Avoid overfit
• Solution: apply model to hold-out sample
– Assess performance of different models
– Fine tune parameters of individual models
Partitioning
• Randomly split the initial data into 2 or 3
groups
– Training
– Validation
– Test
• Repeated use of validation data to compare
and fine tune models -> overfit to
validation, in addition to training
– “Test” partition used only once, at the end
Software
• SAS Enterprise Miner $$$$
• IBM SPSS Modeler (Clementine) $$$$
• XLMiner (Excel add-in) $
• Statistica Data Miner $$
• Salford Systems $$
• Rapid Miner $$ (open source free version)
• R open source free
Data Mining - More
• Clustering (segmentation)
• Profiling (explanatory models)
• Time series
• Affinity (recommender systems)
• Text analytics (NLP, sentiment analysis)
Skill Shortage
• McKinsey “Big Data” report
– Supply gap of 140,000-190,000 “deep analytical
talent”
• Emergence of “Analytics” masters programs
(Northwestern, NC State, …)

Weitere ähnliche Inhalte

Andere mochten auch

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifagileDSS
 
Prospect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshProspect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshFerdous Kabir
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modellinglalit Lalitm7225
 
Le price training presentation
Le price training presentationLe price training presentation
Le price training presentationzainudinyahya
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Sentient Science
 
Cwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCapgemini
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Tina Zhang
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentCapgemini
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?Danko Nikolic
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive MaintenanceSaama
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionSenturus
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with Reoda GmbH
 
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN ibtissam el hassani
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenanceJames Shearer
 
DeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelDeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelEric Mauvais
 
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...polenumerique33
 

Andere mochten auch (20)

Déjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictifDéjeuner Conférence - La maintenance à l'ère du prédictif
Déjeuner Conférence - La maintenance à l'ère du prédictif
 
Prospect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladeshProspect of non destructive testing and condition monitoring scope in bangladesh
Prospect of non destructive testing and condition monitoring scope in bangladesh
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
Le price training presentation
Le price training presentationLe price training presentation
Le price training presentation
 
Predictive analysis
Predictive analysisPredictive analysis
Predictive analysis
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Cwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenanceCwin16 tls-faurecia predictive maintenance
Cwin16 tls-faurecia predictive maintenance
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environment
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
 
Gpao 4 Juste à temps Kanban
Gpao 4 Juste à temps KanbanGpao 4 Juste à temps Kanban
Gpao 4 Juste à temps Kanban
 
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN GP Chapitre 5 : Le juste à temps et la méthode KANBAN
GP Chapitre 5 : Le juste à temps et la méthode KANBAN
 
Machinery Oil Analysis
Machinery Oil AnalysisMachinery Oil Analysis
Machinery Oil Analysis
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
DeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnelDeciLogic, l'envergure d'un projet décisionnel
DeciLogic, l'envergure d'un projet décisionnel
 
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
Conférence Internet des objets IoT M2M - CCI Bordeaux - 02 04 2015 - Introduc...
 

Ähnlich wie Predictive Analysis

Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logisticsPostNL België
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxPrincePatel272012
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?SwiftKeyComms
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientificRevenue
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkInSemble
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4RichardGroom
 

Ähnlich wie Predictive Analysis (20)

Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Digital transformation in transport and logistics
Digital transformation in transport and logisticsDigital transformation in transport and logistics
Digital transformation in transport and logistics
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
DataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptxDataAnalyticsIntroduction and its ci.pptx
DataAnalyticsIntroduction and its ci.pptx
 
What is Machine Learning?
What is Machine Learning?What is Machine Learning?
What is Machine Learning?
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Basic Overview of Data Mining
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data Mining
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache Spark
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
Mir 2012 13 session #4
Mir 2012 13 session #4Mir 2012 13 session #4
Mir 2012 13 session #4
 

Mehr von Michael Bystry

Peril and Promise of Social Media
Peril and Promise of Social MediaPeril and Promise of Social Media
Peril and Promise of Social MediaMichael Bystry
 
Creating Marketing Personas
Creating Marketing PersonasCreating Marketing Personas
Creating Marketing PersonasMichael Bystry
 
Why Become PRC Certified
Why Become PRC CertifiedWhy Become PRC Certified
Why Become PRC CertifiedMichael Bystry
 
Learning About America from the 2010 Census
Learning About America from the 2010 CensusLearning About America from the 2010 Census
Learning About America from the 2010 CensusMichael Bystry
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey ResearchMichael Bystry
 
Exploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipExploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipMichael Bystry
 
Online Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelOnline Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelMichael Bystry
 
Broadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsBroadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsMichael Bystry
 
Predicting College Tuition
Predicting College TuitionPredicting College Tuition
Predicting College TuitionMichael Bystry
 
Conjoint class project
Conjoint class projectConjoint class project
Conjoint class projectMichael Bystry
 

Mehr von Michael Bystry (11)

Peril and Promise of Social Media
Peril and Promise of Social MediaPeril and Promise of Social Media
Peril and Promise of Social Media
 
Creating Marketing Personas
Creating Marketing PersonasCreating Marketing Personas
Creating Marketing Personas
 
Why Become PRC Certified
Why Become PRC CertifiedWhy Become PRC Certified
Why Become PRC Certified
 
Learning About America from the 2010 Census
Learning About America from the 2010 CensusLearning About America from the 2010 Census
Learning About America from the 2010 Census
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey Research
 
Exploring Evoving Trends in Viewship
Exploring Evoving Trends in ViewshipExploring Evoving Trends in Viewship
Exploring Evoving Trends in Viewship
 
Online Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic ChannelOnline Video - What Does it Mean for National Geographic Channel
Online Video - What Does it Mean for National Geographic Channel
 
Broadcast Television: Trends and Implications
Broadcast Television: Trends and ImplicationsBroadcast Television: Trends and Implications
Broadcast Television: Trends and Implications
 
Predicting College Tuition
Predicting College TuitionPredicting College Tuition
Predicting College Tuition
 
On Campus DvD kiosks
On Campus DvD kiosksOn Campus DvD kiosks
On Campus DvD kiosks
 
Conjoint class project
Conjoint class projectConjoint class project
Conjoint class project
 

Kürzlich hochgeladen

Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...lizamodels9
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 

Kürzlich hochgeladen (20)

Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
Lowrate Call Girls In Sector 18 Noida ❤️8860477959 Escorts 100% Genuine Servi...
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 

Predictive Analysis

  • 1. Predictive Analytics Peter Bruce THE INSTITUTE FOR STATISTICS EDUCATION at Statistics.com peter.bruce@statistics.com
  • 2. About Statistics.com THE INSTITUTE FOR STATISTICS EDUCATION • 100+ courses, introductory and advanced • Traditional statistics, data mining, machine learning, text mining, clinical trials, optimization, use of R • All online • Typically 4 weeks, scheduled dates • Don’t need to be online particular times/days • Private discussion forum with instructors - noted authors & experts
  • 3. A man walks into a Target® store…
  • 4. Predictive Analytics • In marketing, used for model driven targeted sales efforts • Also… will loan default, what diagnosis (given symptoms), is tax return fraudulent, …
  • 5. Market Research • Traditionally surveys, analysis, information gathering, strategy • Moving online increases the amount of data, speeds its flow, and makes it more accessible
  • 6. Washington Post (web) • 35 different reports tracking traffic daily • Midday report “are we on track for visitors?” • # visitors from key domains - .gov, .mil, .senate or .house
  • 7. Daily Mail (UK web) • A traditional ingredient is stories about animals – tracked on web • “The animals that do best are monkeys, dogs, and cats, in that order…” Martin Clark (editor)
  • 9. Predictive Analytics • Goes beyond the obvious, capturing complexity • Implemented for real-time behavior and decisions
  • 10. Pregnant? • Obvious retail clues – maternity clothes, baby food, baby clothes, crib … • These may be too late • Earlier clues not so obvious – lotions, supplements, and, esp., combinations and changes in purchase patterns • Data mining algorithms can capture these less obvious, more complex signals
  • 11. Training the Model • Bridal registry • Women of similar demographic not on bridal registry • Together, the training set – Known outcome – Purchase data over time
  • 12. Hypothetical Data Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1011 1 1 1 1 0 1 0 1012 1 0 1 0 1 0 1 1013 1 1 0 1 1 0 1 1014 0 1 1 0 1 1 0 1015 1 1 0 1 1 0 0 1016 0 0 1 0 1 0 1
  • 13. Classification Algorithms • K-nearest neighbors (involves 3 notions) – Distance measure – Centroid – Majority vote or average
  • 14. K-NN Cust # zinc10 zinc90 mag10 mag90 cotton10 cotton90 Registry ? 1 1 1 1 1 0 1 0 2 1 1 1 1 0 1 0 3 1 1 0 1 0 1 0 4 0 1 1 0 1 1 0 5 1 1 0 1 1 0 1 6 0 0 1 0 1 0 1 NEW 1 0 1 1 0 1 ?
  • 15. Classification Algorithms, cont. • Logistic Regression • CART • Discriminant Analysis • Neural Network • Naïve Bayes
  • 16. The Overfit Problem 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 17. Complex function - overfit 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Revenue Expenditure
  • 18. Therefore: Validate the Model • Partition the original data – Training – Validation • Fit the model to the training data • Assess performance using the validation data
  • 19. Performance Metrics • Continuous – RMSE • Categorical (often binary) – % accurate (confusion matrix) – Lift
  • 20. Confusion Matrix and Cutoff Control Training Data scoring - Summary Report Cut off Prob.Val. for Success (Updatable) 0.5 Classification Confusion Matrix Predicted Class Actual Class 1 0 1 43 8 0 6 247
  • 21. Lift • In classifying “pregnant” vs. “not-pregnant” classifying everyone as “not-pregnant” has very high overall accuracy • Need metric that reflects greater importance of the “pregnant” category, which is rare • Lift is the model’s improvement over average random selection
  • 22. Decile Lift Chart 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 Decilemean/Globalmean Deciles Decile-wise lift chart (validation dataset)
  • 23. Validate the Model • Compare one model to another • Avoid overfit • Solution: apply model to hold-out sample – Assess performance of different models – Fine tune parameters of individual models
  • 24. Partitioning • Randomly split the initial data into 2 or 3 groups – Training – Validation – Test • Repeated use of validation data to compare and fine tune models -> overfit to validation, in addition to training – “Test” partition used only once, at the end
  • 25. Software • SAS Enterprise Miner $$$$ • IBM SPSS Modeler (Clementine) $$$$ • XLMiner (Excel add-in) $ • Statistica Data Miner $$ • Salford Systems $$ • Rapid Miner $$ (open source free version) • R open source free
  • 26. Data Mining - More • Clustering (segmentation) • Profiling (explanatory models) • Time series • Affinity (recommender systems) • Text analytics (NLP, sentiment analysis)
  • 27. Skill Shortage • McKinsey “Big Data” report – Supply gap of 140,000-190,000 “deep analytical talent” • Emergence of “Analytics” masters programs (Northwestern, NC State, …)