Presentation giving as part of the Global Azure Bootcamp 2017, April 22, 2017. Subject: one-day hands-on workshop about the Cortana Intelligence Suite.
2. Welcome @ Azure Global Bootcamp 2017
2Apr-17 – DataScenarios – CC BY 4.0
#GlobalAzure
Rubicon - Gast
Welkom@Rubicon
3. Timetable
• 09:00 – Welcome
• 09:30 – Introduction
• 09:45 – Azure Stream Analytics
• 11:15 – Azure Data Lake Store
• 12:30 – Lunch
• 13:15 – Azure Data Lake Analytics
• 14:30 – Azure Machine Learning
• 15:45 – Wrap-up / AzureML UDF in ASA
• 16:00 – Drinks
3Apr-17 – DataScenarios – CC BY 4.0
4. Who am I?
• Jan Pieter Posthuma – Microsoft Data Consultant
• DataScenarios – Data Consultancy Company
• Architect roles at multiple projects
• Creator of Power BI Custom Visuals
• HierarchySlicer
• Box and Whisker chart
• Contact
• mail@datascenarios.nl
• https://twitter.com/jppp
• https://linkedin.com/in/jpposthuma
• https://github.com/liprec
• https://docs.com/liprec
4Apr-17 – DataScenarios – CC BY 4.0
5. Prerequisites
• Azure Subscription
(Free signup: https://azure.microsoft.com/pricing/free-trial/)
• Azure ML subscription
(Free signup: https://studio.azureml.net/?selectAccess=true&o=2)
• Visual Studio 2017 (2015) with Azure Data Lake Tools
(2017: via Tools and Extensions;
2015: https://www.microsoft.com/en-
us/download/details.aspx?id=49504)
6. From data to decisions to action
Apr-17 – DataScenarios – CC BY 4.0 6
8. Hands on labs scenario
8Apr-17 – DataScenarios – CC BY 4.0
Lab 1 – Ingest Lab 2 – Store and Prepare Lab 3 –Learning
Azure
Stream
Analytics
Azure
Data Lake
Store
Azure
Data Lake
Analytics
Azure
Machine
Learning
Power BI
Close the loop
9. Lab information
• Find Azure Event Hub information here:
https://bit.ly/GAB-NL
https://bit.ly/GAB-NL-HOL
9Apr-17 – DataScenarios – CC BY 4.0
10. Apr-17 – DataScenarios – CC BY 4.0 10
Azure Stream Analytics
Apr-17 – DataScenarios – CC BY 4.0 10
11. Azure Stream Analytics
• Fully managed, cost effective real-time event processing
engine
• T-SQL like language, SAQL
• Scalable cloud solution
• Input: Azure Event Hubs, Azure Storage Blobs
• Output: Azure Event Hubs, Power BI, Azure Storage Blobs, Azure
Data Lake Store, DocumentDB, Azure SQL DB
• AzureML integration via UDF
12. Apr-17 – DataScenarios – CC BY 4.0 12
Azure Data Lake Store
Apr-17 – DataScenarios – CC BY 4.0 12
13. Azure Data Lake Store
• Enterprise-wide hyper-scale repository for big data analytic
workloads
• Apache Hadoop file system compatible (HDFS)
• Unlimited storage
• Highly-available: Cloud scale, redundant copies
• Secure: Authentication (AAD), Access control (ACL) and Encryption
(Azure Key Vault)
• All data
14. Apr-17 – DataScenarios – CC BY 4.0 14
Azure Data Lake Analytics
Apr-17 – DataScenarios – CC BY 4.0 14
15. Azure Data Lake Analytics
• Big data analytics designed to run on top of
Azure Data Lake Store
• Dynamic scaling: cloud scale, dynamically provisions resources
• U-SQL: simple and familiar, powerful, and extensible (via .NET)
• Job (query, priority and max parallelism) duration defines price
• Works also with: Azure Blob Storage, Azure SQL DB
16. Apr-17 – DataScenarios – CC BY 4.0 16
Azure Machine Learning
Apr-17 – DataScenarios – CC BY 4.0 16
17. Azure Machine Learning
• Machine Learning:
‘the study of systems that can learn from data’
• MLStudio: ‘IDE’ for designing, training and validating models
• Data consumption via upload or Azure Storage (no ADLS yet)
• Operationalizing via WebService integration
• Retraining model via API
18. Supervised learning
• Infer a target function from
labeled dataset
• Example algoritms:
• Classification – identifying which
categories observation belongs
• Regression – estimating the
relationships among variables
• Dataset split needed
• Training
• Validation
• Test
18Apr-17 – DataScenarios – CC BY 4.0
Data Target
Model
Test data Target
19. Unsupervised learning
• Identify naturally occurring
patterns in data
• Example algorithms:
• Clustering – grouping a set of
objects based on similarity into
clusters
• Outlier detection – identification
of items which do not conform to an
expected pattern
• No data split needed
19Apr-17 – DataScenarios – CC BY 4.0
Data (no labels)
Model
Naturally
occurring
hidden
structure
Cortana Analytics is a fully managed big data and advanced analytics suite that transforms your data into intelligent action.
It is a comprehensive suite that brings together technologies throughout Microsoft and provides fast and flexible deployment with a simple monthly subscription to reduce the time and cost.
With Cortana Analytics, we are taking years of research and innovation – spanning technology & infrastructure for advanced analytics, including capabilities such as machine learning, big data storage and processing in the cloud, perceptual intelligence e.g. vision, face and speech recognition and integration with Cortana, Microsoft’s personal digital assistant with the goal of helping enterprise customers make better, faster decisions to accelerate their speed of business.