SlideShare a Scribd company logo
1 of 14
AzureML – Zero to Hero
Govind Kanshi
MTC Bangalore
2nd August 2014
What we will cover
• AzureML-
• What it enables
• Examples
• Upload data/understand explore it
• Develop model/evaluate it/deploy it
What this discussion is not about
• Data Science/Big Data defn/use etc
• ML Advanced topics
• Feature Engineering – which features are useful/cleaning/dropping
• For PCA kind of work – use R today
• Individual algorithm discussion/deep dive.
• Model tuning(Parameter sweep) or other techniques – boosting/bagging
• Overcoming Data vagaries
What you should walk out with
• Excitement and confidence that ML with AzureML is doable by all of
us as long as we are curious and patient.
• AzureML is democratized platform for learning from data ensuring
better informed decisions. It helps to bring sophisticated algorithms
and mechanisms in easy to use way for masses and high end
researchers today.
What are we trying to do
• Learn from existing Data to do prediction on data
• Classification – Put labels
• Regression - price,
• Recommendation – Rank choices
• Examples – classify different behavior, price,recommend, find anamoly
• Explore data form natural groupings based on some distance formula
• Clustering
Demo
• Deployed model for public dataset to classify if person has diabetes
• Deployed model to predict Decibels of noise
• How old is this stuff term “regression ” firstly appears in the Galton´s (1822-
1911) biological works.
• Y = a_1 * X_1 + ... + a_n * X_n...
• Solve for ...
What did we see
• Exposed Web service in Raw format to do prediction as request-
response
Demo
• Walkthrough of the model creation for Classification
• Possibly choose another algorithm to compare/evaluate
What did we see
AzureML studio – Experiments/Datasets/Web services
Web Services – RR or Batch mode
Algorithms – Classification, Regression, Recommendation, Ranking
Data – Ingestion, cleansing, massaging,
R Integration
Dataset/Experiments are immutable – new versions can be deployed
What did we do(typical AzureML path)
• Define the goal – regression or classification or recommendation
• Create a model and train it using dataset
• Get data –
• Cleanup the data or replace missing data if required
• Use the appropriate algorithm/train it
• Score the model with test data
• Looked at the algorithm parameters
• Evaluate Model using metrics
• Add more algorithms to compare
• Deploy Model as webservice for request-response mechanism
• What about batch – yes you can.
• Data exploration – visualization of data/results
Evaluate Models – summary(classification)
• Confusion Matrix
• Precision - (TP / (TP+FP) )
• Recall - (TP / (TP + FN))
• F1-score
• ROC curve + AUC - Area under ROC curve
Actual  Predicted class yes no
yes True positive (TP) False negative (FN)
no False positive (FP) True negative (TN)
Issues to think about
• Cleaning/choosing right data points
• Missing data/transforming data/dropping data/relationship between features
• Evaluating the algorithm, comparing, tuning the parameters,
relearning
• Which algorithm to choose(Boolean classification vs 10 class vs
ranking), Data has many attributes 1000s to 5 digits, vs very less data
or very sparse/noisy data
• What loss function, hyper parameter to aim for
• Explain the output – black box vs decision trees
• Online/Active Learning
Machine Learning Resources
• Coursera Machine Learning class
https://www.coursera.org/course/ml
• Access to AzureML – it is in preview
• http://www.youtube.com/watch?v=wjTJVhmu1JM
• Draft of Alex Smola and Vishy book on ML: http://alex.smola.org/drafts/thebook.pdf
• Elements of Statistical Learning – Hastie, Tibshirani et al: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
• Information Theory, Inference, and Learning Algos – David Mackay: http://www.inference.phy.cam.ac.uk/mackay/itila/
• Datasets - http://archive.ics.uci.edu/ml/datasets.html
• Official AzureML – tutorials/Video walkthroughs - https://azure.microsoft.com/en-us/documentation/services/machine-learning/
Advanced topics
• Other topics
• How to use various input data cleanup procedures(dropping/adding/correlated features)
• How to publish Web service to Azure Market Place($) - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-web-
service-to-azure-marketplace/
• How do you version assets/”dag”
• Techniques to overcome vagaries of data
• Stratification- sampling for training and testing within classes to overcome issues in data samples
representation
• k-fold CV - data is split randomly into k subsets + each subset is used for testing and the remainder for
training. This is repeated and results averaged. CV uses sampling without replacement.
• Bootstraping - uses sampling with replacement to form the training set.
• Increasing performance of Model
• Bagging - Combining predictions by voting or averaging (for numeric prediction).
• Boosting - Uses voting/averaging but models are weighted according to their performance.
• Parameter sweeping
• Regularization parameter handling – Penalty for overfitting
• Understanding the algorithm performance/visualization of the algorithm path when possible.
• Associated statistics(confidence/distributions)

More Related Content

What's hot

Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesWill Gardella
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Institute of Contemporary Sciences
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningSetu Chokshi
 
Using H2O AutoML for Kaggle Competitions
Using H2O AutoML for Kaggle CompetitionsUsing H2O AutoML for Kaggle Competitions
Using H2O AutoML for Kaggle CompetitionsSri Ambati
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentationehtshamelahi
 
Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017fredverheul
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Microsoft azure machine learning
Microsoft azure machine learningMicrosoft azure machine learning
Microsoft azure machine learningAmol Gholap
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Andrew Badera
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Machine learning 101 sit hvr
Machine learning 101 sit hvrMachine learning 101 sit hvr
Machine learning 101 sit hvrfredverheul
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato ReviewHang Li
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning FundamentalsSigOpt
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Building a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamBuilding a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamRaymond Tay
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopHannes Fassold
 

What's hot (20)

Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
Using H2O AutoML for Kaggle Competitions
Using H2O AutoML for Kaggle CompetitionsUsing H2O AutoML for Kaggle Competitions
Using H2O AutoML for Kaggle Competitions
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Microsoft azure machine learning
Microsoft azure machine learningMicrosoft azure machine learning
Microsoft azure machine learning
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Machine learning 101 sit hvr
Machine learning 101 sit hvrMachine learning 101 sit hvr
Machine learning 101 sit hvr
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Building a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beamBuilding a modern data platform with scala, akka, apache beam
Building a modern data platform with scala, akka, apache beam
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
 

Similar to AzureML – zero to hero

The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...Lucas Jellema
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?Axel de Romblay
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalkUdaya Kumar
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...Lucas Jellema
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Machine Learning Using Cloud Services
Machine Learning Using Cloud ServicesMachine Learning Using Cloud Services
Machine Learning Using Cloud ServicesSC5.io
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MAHIRA
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
How to solve a problem with machine learning
How to solve a problem with machine learningHow to solve a problem with machine learning
How to solve a problem with machine learningAmendra Shrestha
 

Similar to AzureML – zero to hero (20)

The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
The Art of Intelligence – A Practical Introduction Machine Learning for Orac...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Collab365 Empower-Your-Applications-With-Azure-Machine-Learning
Collab365 Empower-Your-Applications-With-Azure-Machine-LearningCollab365 Empower-Your-Applications-With-Azure-Machine-Learning
Collab365 Empower-Your-Applications-With-Azure-Machine-Learning
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalk
 
Business Analytics Forum #BAF3
Business Analytics Forum #BAF3Business Analytics Forum #BAF3
Business Analytics Forum #BAF3
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
 
Introduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas JellemaIntroduction overviewmachinelearning sig Door Lucas Jellema
Introduction overviewmachinelearning sig Door Lucas Jellema
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Machine Learning Using Cloud Services
Machine Learning Using Cloud ServicesMachine Learning Using Cloud Services
Machine Learning Using Cloud Services
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
How to solve a problem with machine learning
How to solve a problem with machine learningHow to solve a problem with machine learning
How to solve a problem with machine learning
 

Recently uploaded

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Recently uploaded (20)

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

AzureML – zero to hero

  • 1. AzureML – Zero to Hero Govind Kanshi MTC Bangalore 2nd August 2014
  • 2. What we will cover • AzureML- • What it enables • Examples • Upload data/understand explore it • Develop model/evaluate it/deploy it
  • 3. What this discussion is not about • Data Science/Big Data defn/use etc • ML Advanced topics • Feature Engineering – which features are useful/cleaning/dropping • For PCA kind of work – use R today • Individual algorithm discussion/deep dive. • Model tuning(Parameter sweep) or other techniques – boosting/bagging • Overcoming Data vagaries
  • 4. What you should walk out with • Excitement and confidence that ML with AzureML is doable by all of us as long as we are curious and patient. • AzureML is democratized platform for learning from data ensuring better informed decisions. It helps to bring sophisticated algorithms and mechanisms in easy to use way for masses and high end researchers today.
  • 5. What are we trying to do • Learn from existing Data to do prediction on data • Classification – Put labels • Regression - price, • Recommendation – Rank choices • Examples – classify different behavior, price,recommend, find anamoly • Explore data form natural groupings based on some distance formula • Clustering
  • 6. Demo • Deployed model for public dataset to classify if person has diabetes • Deployed model to predict Decibels of noise • How old is this stuff term “regression ” firstly appears in the Galton´s (1822- 1911) biological works. • Y = a_1 * X_1 + ... + a_n * X_n... • Solve for ...
  • 7. What did we see • Exposed Web service in Raw format to do prediction as request- response
  • 8. Demo • Walkthrough of the model creation for Classification • Possibly choose another algorithm to compare/evaluate
  • 9. What did we see AzureML studio – Experiments/Datasets/Web services Web Services – RR or Batch mode Algorithms – Classification, Regression, Recommendation, Ranking Data – Ingestion, cleansing, massaging, R Integration Dataset/Experiments are immutable – new versions can be deployed
  • 10. What did we do(typical AzureML path) • Define the goal – regression or classification or recommendation • Create a model and train it using dataset • Get data – • Cleanup the data or replace missing data if required • Use the appropriate algorithm/train it • Score the model with test data • Looked at the algorithm parameters • Evaluate Model using metrics • Add more algorithms to compare • Deploy Model as webservice for request-response mechanism • What about batch – yes you can. • Data exploration – visualization of data/results
  • 11. Evaluate Models – summary(classification) • Confusion Matrix • Precision - (TP / (TP+FP) ) • Recall - (TP / (TP + FN)) • F1-score • ROC curve + AUC - Area under ROC curve Actual Predicted class yes no yes True positive (TP) False negative (FN) no False positive (FP) True negative (TN)
  • 12. Issues to think about • Cleaning/choosing right data points • Missing data/transforming data/dropping data/relationship between features • Evaluating the algorithm, comparing, tuning the parameters, relearning • Which algorithm to choose(Boolean classification vs 10 class vs ranking), Data has many attributes 1000s to 5 digits, vs very less data or very sparse/noisy data • What loss function, hyper parameter to aim for • Explain the output – black box vs decision trees • Online/Active Learning
  • 13. Machine Learning Resources • Coursera Machine Learning class https://www.coursera.org/course/ml • Access to AzureML – it is in preview • http://www.youtube.com/watch?v=wjTJVhmu1JM • Draft of Alex Smola and Vishy book on ML: http://alex.smola.org/drafts/thebook.pdf • Elements of Statistical Learning – Hastie, Tibshirani et al: http://www-stat.stanford.edu/~tibs/ElemStatLearn/ • Information Theory, Inference, and Learning Algos – David Mackay: http://www.inference.phy.cam.ac.uk/mackay/itila/ • Datasets - http://archive.ics.uci.edu/ml/datasets.html • Official AzureML – tutorials/Video walkthroughs - https://azure.microsoft.com/en-us/documentation/services/machine-learning/
  • 14. Advanced topics • Other topics • How to use various input data cleanup procedures(dropping/adding/correlated features) • How to publish Web service to Azure Market Place($) - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-web- service-to-azure-marketplace/ • How do you version assets/”dag” • Techniques to overcome vagaries of data • Stratification- sampling for training and testing within classes to overcome issues in data samples representation • k-fold CV - data is split randomly into k subsets + each subset is used for testing and the remainder for training. This is repeated and results averaged. CV uses sampling without replacement. • Bootstraping - uses sampling with replacement to form the training set. • Increasing performance of Model • Bagging - Combining predictions by voting or averaging (for numeric prediction). • Boosting - Uses voting/averaging but models are weighted according to their performance. • Parameter sweeping • Regularization parameter handling – Penalty for overfitting • Understanding the algorithm performance/visualization of the algorithm path when possible. • Associated statistics(confidence/distributions)

Editor's Notes

  1. AzureML - where experiments are done and deployed as web services AzureML studio has “toolbar” which has modules for data ingestion/transformation, statistics, machine learning. Some of them have properties which can be set. AzureML has Datasets which can be bought in at runtime or persisted inside. It has public datasets too. AzureML
  2. Classification algorithms can be measured by these metrics Regression have just RMSE which many people are questioning in present circumstances (Sum through all instances (actual class value - predicted one)) Clustering has different mechanism and requires tests/re-runs to ensure grouped/clustered points have cohesion of somekind Types of classification errors often incur different costs. Total error = (FP+FN)/(TP+FP+TN+FN) Lift charts Sort instances by their predicted probability of being a true positive (TP). X axis is sample size and Y axis is number of true positives (TP). ROC curves (ROC means receiver operating characteristic, a term from signal processing) X axis shows %of false positives (FP) Y axis shows %of true positives (TP). Recall - precision (IR world- search world has these terms too ): Precision (retrieved relevant / total retrieved) = TP / (TP+FP) Recall (retrieved relevant / total relevant) = TP / (TP + FN)
  3. Desirables Model interpretation More visualization HMM Native Time series? Text analysis – IR integration