SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Creating Your First
Predictive Model In Python
Coming November 2015!
Information Everywhere
Makes Panda sad and confused
Each New Thing You Learn
Leads to another new thing to learn, and another…
So Many Things
1. Which predictive modeling technique to use
2. How to get the data into a format for modeling
3. How to ensure the “right” data is being used
4. How to feed the data into the model
5. How to validate the model results
6. How to save the model to use in production
7. How to implement the model in production and apply it to new observations
8. How to save the new predictions
9. How to ensure, over time, that the model is correctly predicting outcomes
10.How to later update the model with new training data
Choose Your Model
http://scikit-learn.org/stable/tutorial/machine_learning_map/
Format The Data
• Pandas FTW!
• Use the map() function to convert any text to a
number
• Fill in any missing values
• Split the data into features (the data) and targets
(the outcome to predict) using .values on the
DataFrame
Get The Right Data
• This is called “Feature selection”
• Univariate feature selection
• SelectKBest removes all but the k highest scoring features
• SelectPercentile removes all but a user-specified highest scoring
percentage of features using common univariate statistical tests for
each feature: false positive rate
• SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe.
• GenericUnivariateSelect allows to perform univariate feature selection
with a configurable strategy.
http://scikit-learn.org/stable/modules/feature_selection.html
Data => Model
1. Build the model
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import linear_model
logClassifier = linear_model.LogisticRegression(C=1,
random_state=111)
2. Train the model
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data,
the_targets,
cv=12,
test_size=0.20,
random_state=111)
logClassifier.fit(X_train, y_train)
Validation!
1. Accuracy Score
http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn import metrics
metrics.accuracy_score(y_test, predicted)
2. Confusion Matrix
metrics.confusion_matrix(y_test, predicted)
Save the Model
Pickle it!
https://docs.python.org/3/library/pickle.html
import pickle
model_file = "/lr_classifier_09.29.15.dat"
pickle.dump(logClassifier, open(model_file, "wb"))
Did it work?
logClassifier2 = pickle.load(open(model, "rb"))
print(logClassifier2)
Implement in Production
• Clean the data the same way you did for the model
• Feature mappings
• Column re-ordering
• Create a function that returns the prediction
• Deserialize the model from the file you created
• Feed the model the data in the same order
• Call .predict() and get your answer
Save Your Predictions
As you would any other piece of data
Ensure Accuracy Over Time
Employ your minion army, or get more creative
Update the Model
Train it again, but with validated predictions
Coming November 2015!
Robert Dempsey
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com

Weitere ähnliche Inhalte

Was ist angesagt?

Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
DotNetCampus
 

Was ist angesagt? (19)

Automatic image moderation in classifieds
Automatic image moderation in classifiedsAutomatic image moderation in classifieds
Automatic image moderation in classifieds
 
machine learning
machine learningmachine learning
machine learning
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
BigML Education - Anomaly Detection
BigML Education - Anomaly DetectionBigML Education - Anomaly Detection
BigML Education - Anomaly Detection
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Santander customer satisfaction
Santander customer satisfactionSantander customer satisfaction
Santander customer satisfaction
 
BigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML Education - Logistic Regression
BigML Education - Logistic Regression
 
Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...Prediction of quality for different type of winebased on different feature se...
Prediction of quality for different type of winebased on different feature se...
 
RapidMiner: Nested Subprocesses
RapidMiner:   Nested SubprocessesRapidMiner:   Nested Subprocesses
RapidMiner: Nested Subprocesses
 
Zoo information system presentation
Zoo information system presentationZoo information system presentation
Zoo information system presentation
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
EuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scaleEuroSciPy 2019: Visual diagnostics at scale
EuroSciPy 2019: Visual diagnostics at scale
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
20 Simple CART
20 Simple CART20 Simple CART
20 Simple CART
 
Incheon National University - EATED SRA
Incheon National University - EATED SRAIncheon National University - EATED SRA
Incheon National University - EATED SRA
 
Tutorial 4 how to edit the unsafe control actions of stpa project in xstampp
Tutorial 4 how to edit the unsafe control actions of stpa project in xstamppTutorial 4 how to edit the unsafe control actions of stpa project in xstampp
Tutorial 4 how to edit the unsafe control actions of stpa project in xstampp
 
RapidMiner: Advanced Processes And Operators
RapidMiner:  Advanced Processes And OperatorsRapidMiner:  Advanced Processes And Operators
RapidMiner: Advanced Processes And Operators
 

Ähnlich wie Creating Your First Predictive Model In Python

Ähnlich wie Creating Your First Predictive Model In Python (20)

Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
11 ta dts2021-11-v2
11 ta dts2021-11-v211 ta dts2021-11-v2
11 ta dts2021-11-v2
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with KubernetesHow To Build Auto-Adaptive Machine Learning Models with Kubernetes
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
Final
FinalFinal
Final
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
Techniques for building robust machine learning systems
Techniques for building robust machine learning systemsTechniques for building robust machine learning systems
Techniques for building robust machine learning systems
 
Iris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure MLIris Multi-Class Classifier with Azure ML
Iris Multi-Class Classifier with Azure ML
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 

Mehr von Robert Dempsey

Mehr von Robert Dempsey (20)

Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
 
Growth Hacking 101
Growth Hacking 101Growth Hacking 101
Growth Hacking 101
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's Version
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media Campaigns
 
Goal Writing Workshop
Goal Writing WorkshopGoal Writing Workshop
Goal Writing Workshop
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords Introduction
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance Success
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media Powerhouse
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation Teams
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanban
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And Market
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound Marketing
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile Requirements
 
Twitter For Business
Twitter For BusinessTwitter For Business
Twitter For Business
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For Managers
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for Managers
 

Kürzlich hochgeladen

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Kürzlich hochgeladen (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

Creating Your First Predictive Model In Python

  • 4. Each New Thing You Learn Leads to another new thing to learn, and another…
  • 5. So Many Things 1. Which predictive modeling technique to use 2. How to get the data into a format for modeling 3. How to ensure the “right” data is being used 4. How to feed the data into the model 5. How to validate the model results 6. How to save the model to use in production 7. How to implement the model in production and apply it to new observations 8. How to save the new predictions 9. How to ensure, over time, that the model is correctly predicting outcomes 10.How to later update the model with new training data
  • 7. Format The Data • Pandas FTW! • Use the map() function to convert any text to a number • Fill in any missing values • Split the data into features (the data) and targets (the outcome to predict) using .values on the DataFrame
  • 8. Get The Right Data • This is called “Feature selection” • Univariate feature selection • SelectKBest removes all but the k highest scoring features • SelectPercentile removes all but a user-specified highest scoring percentage of features using common univariate statistical tests for each feature: false positive rate • SelectFpr, false discovery rate SelectFdr, or family wise error SelectFwe. • GenericUnivariateSelect allows to perform univariate feature selection with a configurable strategy. http://scikit-learn.org/stable/modules/feature_selection.html
  • 9. Data => Model 1. Build the model http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import linear_model logClassifier = linear_model.LogisticRegression(C=1, random_state=111) 2. Train the model from sklearn import cross_validation X_train, X_test, y_train, y_test = cross_validation.train_test_split(the_data, the_targets, cv=12, test_size=0.20, random_state=111) logClassifier.fit(X_train, y_train)
  • 10. Validation! 1. Accuracy Score http://scikit-learn.org/stable/modules/cross_validation.html from sklearn import metrics metrics.accuracy_score(y_test, predicted) 2. Confusion Matrix metrics.confusion_matrix(y_test, predicted)
  • 11. Save the Model Pickle it! https://docs.python.org/3/library/pickle.html import pickle model_file = "/lr_classifier_09.29.15.dat" pickle.dump(logClassifier, open(model_file, "wb")) Did it work? logClassifier2 = pickle.load(open(model, "rb")) print(logClassifier2)
  • 12. Implement in Production • Clean the data the same way you did for the model • Feature mappings • Column re-ordering • Create a function that returns the prediction • Deserialize the model from the file you created • Feed the model the data in the same order • Call .predict() and get your answer
  • 13. Save Your Predictions As you would any other piece of data
  • 14. Ensure Accuracy Over Time Employ your minion army, or get more creative
  • 15. Update the Model Train it again, but with validated predictions