This session was delivered at SQLServer UG group meetup. This is pretty much 101 on AzureML offering which allows easy creation of trained models and their deployment for prediction purpose. It does not get into details of all algorithms , process of cleaning up data or tuning - sweeping, bagging/boosting/bootstrapping .....
Student Profile Sample report on improving academic performance by uniting gr...
AzureML – zero to hero
1. AzureML – Zero to Hero
Govind Kanshi
MTC Bangalore
2nd August 2014
2. What we will cover
• AzureML-
• What it enables
• Examples
• Upload data/understand explore it
• Develop model/evaluate it/deploy it
3. What this discussion is not about
• Data Science/Big Data defn/use etc
• ML Advanced topics
• Feature Engineering – which features are useful/cleaning/dropping
• For PCA kind of work – use R today
• Individual algorithm discussion/deep dive.
• Model tuning(Parameter sweep) or other techniques – boosting/bagging
• Overcoming Data vagaries
4. What you should walk out with
• Excitement and confidence that ML with AzureML is doable by all of
us as long as we are curious and patient.
• AzureML is democratized platform for learning from data ensuring
better informed decisions. It helps to bring sophisticated algorithms
and mechanisms in easy to use way for masses and high end
researchers today.
5. What are we trying to do
• Learn from existing Data to do prediction on data
• Classification – Put labels
• Regression - price,
• Recommendation – Rank choices
• Examples – classify different behavior, price,recommend, find anamoly
• Explore data form natural groupings based on some distance formula
• Clustering
6. Demo
• Deployed model for public dataset to classify if person has diabetes
• Deployed model to predict Decibels of noise
• How old is this stuff term “regression ” firstly appears in the Galton´s (1822-
1911) biological works.
• Y = a_1 * X_1 + ... + a_n * X_n...
• Solve for ...
7. What did we see
• Exposed Web service in Raw format to do prediction as request-
response
8. Demo
• Walkthrough of the model creation for Classification
• Possibly choose another algorithm to compare/evaluate
9. What did we see
AzureML studio – Experiments/Datasets/Web services
Web Services – RR or Batch mode
Algorithms – Classification, Regression, Recommendation, Ranking
Data – Ingestion, cleansing, massaging,
R Integration
Dataset/Experiments are immutable – new versions can be deployed
10. What did we do(typical AzureML path)
• Define the goal – regression or classification or recommendation
• Create a model and train it using dataset
• Get data –
• Cleanup the data or replace missing data if required
• Use the appropriate algorithm/train it
• Score the model with test data
• Looked at the algorithm parameters
• Evaluate Model using metrics
• Add more algorithms to compare
• Deploy Model as webservice for request-response mechanism
• What about batch – yes you can.
• Data exploration – visualization of data/results
11. Evaluate Models – summary(classification)
• Confusion Matrix
• Precision - (TP / (TP+FP) )
• Recall - (TP / (TP + FN))
• F1-score
• ROC curve + AUC - Area under ROC curve
Actual Predicted class yes no
yes True positive (TP) False negative (FN)
no False positive (FP) True negative (TN)
12. Issues to think about
• Cleaning/choosing right data points
• Missing data/transforming data/dropping data/relationship between features
• Evaluating the algorithm, comparing, tuning the parameters,
relearning
• Which algorithm to choose(Boolean classification vs 10 class vs
ranking), Data has many attributes 1000s to 5 digits, vs very less data
or very sparse/noisy data
• What loss function, hyper parameter to aim for
• Explain the output – black box vs decision trees
• Online/Active Learning
13. Machine Learning Resources
• Coursera Machine Learning class
https://www.coursera.org/course/ml
• Access to AzureML – it is in preview
• http://www.youtube.com/watch?v=wjTJVhmu1JM
• Draft of Alex Smola and Vishy book on ML: http://alex.smola.org/drafts/thebook.pdf
• Elements of Statistical Learning – Hastie, Tibshirani et al: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
• Information Theory, Inference, and Learning Algos – David Mackay: http://www.inference.phy.cam.ac.uk/mackay/itila/
• Datasets - http://archive.ics.uci.edu/ml/datasets.html
• Official AzureML – tutorials/Video walkthroughs - https://azure.microsoft.com/en-us/documentation/services/machine-learning/
14. Advanced topics
• Other topics
• How to use various input data cleanup procedures(dropping/adding/correlated features)
• How to publish Web service to Azure Market Place($) - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-web-
service-to-azure-marketplace/
• How do you version assets/”dag”
• Techniques to overcome vagaries of data
• Stratification- sampling for training and testing within classes to overcome issues in data samples
representation
• k-fold CV - data is split randomly into k subsets + each subset is used for testing and the remainder for
training. This is repeated and results averaged. CV uses sampling without replacement.
• Bootstraping - uses sampling with replacement to form the training set.
• Increasing performance of Model
• Bagging - Combining predictions by voting or averaging (for numeric prediction).
• Boosting - Uses voting/averaging but models are weighted according to their performance.
• Parameter sweeping
• Regularization parameter handling – Penalty for overfitting
• Understanding the algorithm performance/visualization of the algorithm path when possible.
• Associated statistics(confidence/distributions)
Editor's Notes
AzureML - where experiments are done and deployed as web services
AzureML studio has “toolbar” which has modules for data ingestion/transformation, statistics, machine learning. Some of them have properties which can be set.
AzureML has Datasets which can be bought in at runtime or persisted inside. It has public datasets too.
AzureML
Classification algorithms can be measured by these metrics
Regression have just RMSE which many people are questioning in present circumstances (Sum through all instances (actual class value - predicted one))
Clustering has different mechanism and requires tests/re-runs to ensure grouped/clustered points have cohesion of somekind
Types of classification errors often incur different costs.
Total error = (FP+FN)/(TP+FP+TN+FN)
Lift charts
Sort instances by their predicted probability of being a true positive (TP).
X axis is sample size and Y axis is number of true positives (TP).
ROC curves (ROC means receiver operating characteristic, a term from signal processing)
X axis shows %of false positives (FP)
Y axis shows %of true positives (TP).
Recall - precision (IR world- search world has these terms too ):
Precision (retrieved relevant / total retrieved) = TP / (TP+FP)
Recall (retrieved relevant / total relevant) = TP / (TP + FN)
Desirables
Model interpretation
More visualization
HMM
Native Time series?
Text analysis – IR integration