%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
pass marathon predictive analytics use cases - PowerBI and Azure ML
1. Presenting Sponsor
Azure Machine Learning Studio and
PowerBI
Yana Berkovich, Microsoft MVP, BI & Applications Lead , Onni Group of Companies
Moderated By: Aasish Sharma
2. Technical Assistance
If you require assistance
during the session, type
your inquiry into the
question pane on the right
side.
Maximize your screen
with the zoom button on
the top of the presentation
window.
Please fill in the short
evaluation following the
session. It will appear in
your web browser.
3. Thank You to Our Sponsor
KingswaySoft is a leading provider of high-performance data
integration solutions for connectivity and productivity using
SSIS as the ETL platform. Organizations from more than 70
countries rely on our solutions to drive their business data
efficiency.
4. Attend PASS Summit to Grow Your Career
• The Community
PASS Summit is the largest conference for technical professionals who
leverage the Microsoft Data Platform.
November 6-9 | Seattle, WA
PASSsummit.com
Connect with a global network of 250,000+ data professionals
5. Yana Berkovich
Microsoft Data Platform MVP
VANO365 User group president
Vancouver PowerBI user group member and speaker
SQL and O365 Saturdays and community events speaker
@Yana_Berkovich
https://www.linkedin.com/in/
yanaberkovich/
YanaBerkovich.com
Yana@yanaberkovich.
com
6. Presenting Sponsor
Azure Machine Learning Studio and
PowerBI
Yana Berkovich, Microsoft MVP, BI & Applications Lead , Onni Group of Companies
7. Azure Machine Learning Lab &
PowerBI
Why are we using those tools?
AzureML Lab vs PowerBI
Getting the Data
PreProcessing the Data
The prediction model
Sample, population and your data set
Example - Exponential Smoothing Method
Quick Summary – what to use when…?
10. Decision Support
This is how it looks like This is how the Data Analysis field
is trying to make it look like:
James Taylor, leader of Decision Management Solutions
http://decisionmanagementsolutions.com/
Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics (IBM
Press)
11. What is Azure ML Lab/Studio?
Built on top of the machine learning capabilities of several
Microsoft products and services.
Shares many of the real-time predictive analytics of the new
personal assistant - Cortana. Azure ML also uses proven
solutions from Xbox and Bing.
Components
Lab
Gallery
Only cloud based tool in Azure
https://studio.azureml.net/Home
Audience: Data Analysts, Statisticians,
Actuary, Data Scientists …
Users: Data Analysts, Data Scientists
12. A suite of business analytics tools that deliver insights.
Connects to hundreds of data sources, simplifies data prep, and drives
ad hoc analysis. Produce beautiful reports, then publish them for your
organization to consume on the web and across mobile devices.
Scalable across the enterprise, with governance and security built-in.
Components
Desktop
O365
Mobile
Embedded
Report Server
Insights apps
Cloud solution, on-premise solution, mobile solution
https://powerbi.microsoft.com
What is PowerBI?
Audience: Business Users & Managers
Users: IT, Finance, Marketing, Manufacturing,
Data Analysts…
13. Azure ML
A service that was created for developers and
data scientist
Business users, end users and customers,
Analysts friendly
Predict the future
Train and create custom models based on
statistics that will help answer questions
Visualize the existing data for business use
Answer business questions
Predict the future??!! Is there a better why that
can potentially generate more value for the
business?
PowerBI
Get insights to give information for the Decision
Support
Who? What? Why?
16. Getting the Data - Decision Support
System and the tip of the iceberg
Predict
Model
Insights
Information
Data
Data Science starts with data gathering,
Getting the data from the metrics is hard!
The data collection process can be a result
of meetings, Telemetry, IOT…
17. How are we using ML Studio? Machine Learning Process Cycle
Adapted from:
Azure Machine Learning Studio Four Tips
from the Pros by
Brad Llewellyn’s presentation
PASS Link
https://www.youtube.com/watch?v=lRxZkBubbd
4&feature=youtu.be
Data
Collection
Data
Cleansing
Data
Manipulations
Model
Creation
Model
Evaluation
18. IT all starts with the right question and Business Goal!
19. Case Study
Airplanes are never late….
We are going to analyze the data set of
flights during the month of October
This data set was taken from the
sample data sets in ML studio
20. Getting the Data
Azure ML Lab PowerBI
Data set CSV file, txt, Excel, Hive table, SQL table,
Odata, SVMlight, Zip, R object
Source – CSV file in this case,
More than a 100 different sources
Source Type
Data Delimiter
Data connection and refresh
21. Visualizing the Data
Azure ML Lab PowerBI
Data Preview
Histograms, box plots
Raw data
This is the main goal of this tool – Data visualization
Recently, similar automatic visualizations
Data view for all the visualizations click the
Aggregated data
24. Azure ML
Data Type, Change metadata module Data Type – automatic detection, Change the
type in a SQL query, directly on the column
Clean missing data – minimum maximum missing
value ration (even 100% of the data cleaned)
Clean duplications, first last top rows
Use DAX queries and R
PowerBI
Create measures calculated based on data ranges
Data Cleansing
Convert the data into categories from range
Group categorical values
Edit metadata
SMOTE - increasing rows/facts number
Edit metadata
25. Azure ML
Selecting columns, Selecting columns,
Merging, Join with other data source – SQL
manipulations, R Manipulations, Python
manipulations
Building Dimensions – Time dimension, Airport
Dimension…
Creating custom measures, quick measures and
code based measures using DAX
PowerBI
ERD- create connections between the dimensions
and the fact tables
Data Manipulations
Creating Join through SQL query, Merging,
Appending lines
Creating EDR through join of another dimension
table for the selected columns
Using R or Python for creating custom measures
(avg, mean…)
26. Azure ML
Only if you build a model for that
Out of the box visualization for the data set with
2 graphic options as previously mentioned
Q&A functionality recently available on desktop
Looks very similar to the visualizations that exist
in ML lab
Enables the user to add the FAQ visualization to
the dashboard or report
“native” language questions answered-
What is the most late flight from Chicago airport?
PowerBI
Data Manipulations
28. Main Steps in creating an Experiment / Report
AzureML Experiment PowerBI Report
Get data
Clean the data
Prepare the data (adding columns, calculations,
missing data types, joins, SQL manipulations…)
Divide the data – sample for the model to train,
data for evaluation
Choose the model
Train the Algorithm
Score using the data for evaluation
Evaluate
Save as a trained model for later use or
Create Web Service and predict for new data sets
Get or connect to the data
Clean the query
Create measures and dimensions
Create connections using ERD
Create data visualizations
Q&A Analyze the data and get the answers to
your question
Add visualizations to Dashboard
Create Application and publish
29. Which Questions do we ask our Model?
Azure ML Lab PowerBI
How do we predict if a certain flight is going
to be late?
How does the weather affect the flight being
late?
If we are going to fly from a certain airport,
will our flight be late – Ask the Web service!
What is the chance for the flight to be less
than 15min late if it’s AA? What is the
precision of this prediction?
Future Events
We generally don’t! It is mostly a data
Visualization tool not a tool we use to predict
What is the average? Max? Min?
Which Airport has the most late arrivals?
What is the correlation and the trend between
the weather and the delay time?
Clustering the data, which airports are in the
most late cluster? – histograms and brick
charts
Events that have already happened, limited
prediction
30. What is a prediction model?
Which Algorithm is the best fit to predict the results, depending on the data
Has the data seasonal? hads repetitions? Categorical?
Linear Regression or Poisson Regression?
How can we know what works best? Based on the past results!
Main model types:
Anomaly Detection
Classification
Clustering
Regression
31. Statistics…and prediction models How do we predict the average
late departure?
Average
Single Exponential Smoothing
Exponential smoothing is a rule of thumb
technique for smoothing time series data using
the exponentialwindow function. Whereas in the
simple moving average the past observations are
weighted equally,exponential functions are used
to assignexponentially decreasing weights over
time.
( Wikipedia to the rescue… )
Moving Average
The last month might be a better prediction for
flights than the last 20 months
Weighted Moving Average
Some observations are more significant than others,
flights of a domastic flight company have different
performance and cannot be compared to others or
big vs small planes
Can be chosen, for the single smoothing, between 0.1 and
0.9, is chosen through a local optimal minimum value
We choose the best value for α so the value which results
in the smallest MSE. (Mean of Square Errors)
32. Adding information to our data visualization
PowerBI
Min value line
Max value line
Trend line – we can see that the AVG delay time
increases?
Expediential Smooth
Seasonality – 7 points (week in a month)
Ignore last 10 points – to check our prediction
Forecast length- to see what the other 7 days will
look like
33. Adding information to our data visualization
PowerBI – How can we explain the predicted results?
Trend line – we can see that the AVG delay time
increases?
How can we validate and score the predicted results? Azure ML Lab
• End of October - Thanksgiving?
• Weather changes at the airports for the worse
• The trend line doesn’t continue for the predicted data
• How can we control the Alpha? Well in Power View for O365, not in PowerBI yet
34. More options in PowerBI? – R
R model for more, simple prediction options in PowerBI
Add the R code in the PowerBI model for the relevant data column
The R visualization can do predictive models of your choice
It is limited but very useful for business case scenarios
Recommended Blog post -
Revenue and forecasting by Christian Berg – Plot using R
https://community.powerbi.com/t5/Community-Blog/Revenue-and-forecasting/ba-p/86299
New Series of Time Series by PHD MVP Leiila Etatti –
http://radacad.com/new-series-of-time-series-part-1
35. Meanwhile in Azure ML Lab
Unfortunately, the ETS – Exponential smoothing module was deprecated, so lets
choose a better one!
Edit Metadata – Adding the column for the Average values
Split the data into sample and population (not just ignore last 10 but
randomize the split)
The question what is the average late time expected is simply wrong for
this tool, we would like to use it for actually predicting for each flight if it
is going to be late, or how the weather affects the flights being late.
36. Azure ML Lab some of the Mathematical models
Decision Forest Regression
Linear Regression (Excell as well…Solver)
2 Class Boosted Decision Tree
Decision Tree
2 Class Logistic Regression
Will be used in the prediction demo
to compare which is predicting the best way
K- Mean Clustering (PBI as well)
37. • Bullet one
• Bullet two
• Bullet three
The Prediction by Airport –
Hartsfield in Atlanta
Georgia and Chicago are
the 2 leading airports
that the weather has a
very large impact on the
delay times, the delay
times there are the
largest, just like we hear
in the news about those
airports being in delay
(How many Hallmark
movies are using the
weather in Chicago
airport during a
snowstorm in
Christmas…)
38. • Bullet one
• Bullet two
• Bullet three
The Flight Delay prediction compare the scored models
So the blue prediction model
is slightly better than the red
one, to predict if the flight is
going to be late.
Two class boosted decision
tree is slightly better than two
class logistics regression
40. Azure ML
Data scientists, developers Business users, end users and customers,
Analysts friendly
Be the development platform for prediction
analytics solutions Development platform and publishing platform
for data visualization
Upload the data, manipulate the data, divide into
data set and training set, train the model,
evaluate the model create service, predict for
other data sets
PowerBI
Connect to data, create report, analyze exciting
data and get data insights
Who? What? Why?
Ask questions – Business users and managers
questions, evaluate, compare, classify, displayPredict given a mathematical trained model
based on past results
The next generation is already here… Azure IoT hub,
Azure AI and Machine learning focused on devs
Welcome to PASS Marathon: Predictive Analytics Use Cases!
We’re excited you could join us today for Yana Berkovich’s session, Azure Machine Learning Studio and PowerBI.
This PASS Marathon consists of 5 consecutive live webinars, delivered by expert speakers from the PASS community.
The sessions will be recorded and posted online after the event. You will receive an email letting you know when the recordings are available.
My name is Aasish Sharma [you can say a bit about yourself here if you’d like]
I have a few introductory slides before I hand over the reins to Yana.
[move to next slide]
If you require technical assistance please type your question into the question pane located on the right side of your screen and someone will assist you. This question pane is also where you may ask any questions throughout the presentation. Feel free to enter your questions at any time and once we get to the Q&A portion of the session, I’ll read your questions aloud to the speaker.
You are able to zoom in on the presentation content by using the zoom button located on the top of the presentation window.
Please note that there will be a short evaluation at the end of the session. Your feedback is important to us so please take a moment to complete it. This will pop-up after the webinar ends in your web browser.
[Note to moderators: You need to determine which questions are the most relevant and ask them out loud to the presenter].
I’d like to take a moment to thank our Presenting Sponsor, KinswaySoft.
The staging of PASS Marathon would not be possible without their generous support, and they are the reason this event is available free of charge. If you would like to learn more about our sponsors and sign up for information on how they can help you, please visit sponsors page of the PASS Marathon website.
[move to next slide]
PASS Summit is the PASS community’s flagship event and it is happening November 6-9 in Seattle, Washington.
PASS Summit is the largest conference for technical professionals who leverage the Microsoft Data Platform. Immerse yourself in deep-dive technical sessions, learn best practices, and discover new tips and tricks.
For more information and to register, visit PASSsummit.com.
[Moderator Part]
This PASS Marathon session is presented by Yana Berkovich. Yana is a people-oriented professional, aspiring Product Manager, with 11 years of process analysis, building solutions, customization, developing, managing and training experience. Yana has been working with Power Bi, Oracle SQL and SQL server for the past few years. She is a data enthusiast with international experience in many companies as a BA, Team Lead and everything SharePoint & Office 365.
**[Speaker takes over]**
And without further ado, here is Yana with Azure Machine Learning Studio and PowerBI.
{SPEAKER begins}
Emphasize the blogs
The data science and ML course to take
Kaggle for data sets
Show the Fish Boston Report, visualization page 3 as an extra example
stands for Synthetic Minority Oversampling Technique. This is a statistical technique for increasing the number of cases in your dataset in a balanced way.
Show the moview web service
Show the applications in ML
In PowerBI there are other methods such as K clustering usage with a plot build with R script in order to predict events