SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Azure Data Platform Services
Mostafa Elzoghbi
http://mostafa.rocks
Twitter: @MostafaElzoghbi
Session takeaways and objectives
• Azure Data Platform Services
• HDInsight Clusters in Azure
• Data Storage: Apache Hive, Apache Hbase, Azure Data Catalog
• Data Transformations: Apache Storm, Apache Spark, Azure Data Factory
• Healthcare / Life Sciences Use Cases
Azure Data Platform
Services
HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache
Hadoop technology stack that is the go-to solution for big data analysis.
It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie,
Ambari, and so on.
HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL
Server Analysis Services, and SQL Server Reporting Services.
HDInsight is available on Windows and Linux
HDInsight on Linux: A Hadoop cluster on Ubuntu
HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2
What is HDInsight
HDInsight provides cluster Types & custom configurations for:
• Hadoop (HDFS)
• HBase
• Storm
• Spark
• R Server
• Hive Interactive, Kafka (Preview)
HDInsight has powerful programming extensions for languages including C#, Java,
and .NET.
Business Value Proposition: Skip maintaining and purchasing hardware
HDInsight clusters on Azure
HDInsight clusters on Azure
Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after
Google BigTable.
HBase provides random access and strong consistency for large amounts of unstructured and
semistructured data in a schemaless database organized by column families
Data is stored in the rows of a table, and data within a row is grouped by column family.
The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can
rely on data redundancy, batch processing, and other features that are provided by distributed
applications in the Hadoop ecosystem.
What is HBase
Order No Customer
Name
Customer
Phone
Company Name Company
Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
Customer Company
Order No Customer
Name
Customer
Phone
Company Name Company
Address
12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
HBase Commands:
create  Equivalent to Create table in T-SQL
get  Equivalent to Select statements in T-SQL
put  Equivalent to Update, Insert statement in T-SQL
scan  Equivalent to Select (no where condition) in T-SQL
HBase shell is your query tool to execute in CRUD commands to a HBase cluster.
Data can also be managed using the HBase C# API, which provides a client library on top
of the HBase REST API.
An HBase database can also be queried by using Hive using SQLHive.
What is HBase
Apache Hive is a data warehouse system for Hadoop, which enables data summarization,
querying, and analysis of data by using HiveQL (a query language similar to SQL).
Hive understands how to work with structured and semi-structured data, such as text files
where the fields are delimited by specific characters.
Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly
structured data.
Hive can also be extended through user-defined functions (UDF).
A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL.
Support for multiple execution engine: WebHCat, HiveServer2 (faster) or Tez.
What is Hive
Apache Storm is a distributed, fault-tolerant, open-source computation system that allows
you to process data in real-time with Hadoop.
Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in
the Azure environment by using Apache Hadoop.
Ability to write Storm components in C#, JAVA and Python.
Azure Scale up or Scale down without an impact for running Storm topologies.
Ease of provision and use in Azure portal & development templates in Visual Studio.
What is Apache Storm
Apache Storm apps are submitted as Topologies.
A topology is a graph of computation that processes streams
Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and
they are consumed by bolts.
Tuple: A named list of dynamically typed values.
Spout: Consumes data from a data source and emits one or more streams.
Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are
also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a
blob, or other data store.
Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
Apache Storm Components
Apache Spark™ is a fast and general engine for large-scale data processing.
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Write applications quickly in Java, Scala, Python, R.
Combine SQL, streaming, and complex analytics.
Spark's in-memory computation capabilities
make it a good choice for iterative algorithms in
ML and graph computations.
Support for R Server & Azure Data Lake.
What is Apache Spark
Part 3: Single Slide
DEMO –
Apache Spark
Azure Data Lake includes all the capabilities required to make it easy for developers, data
scientists, and analysts to store data of any size, shape, and speed, and do all types of
processing and analytics across platforms and languages.
It removes the complexities of ingesting and storing all of your data while making it faster
to get up and running with batch, streaming, and interactive analytics.
Azure Data Lake works with existing IT investments for identity, management, and security
for simplified data management and governance.
Data Lake Analytics—a no-limits analytics job service to power intelligent action
Azure Data Lake (ADL)
Azure Data Lake (ADL)
Azure Data Lake Analytics is a new service, built to make big data analytics easy.
Key capabilities:
Dynamic scaling: do analytics on terabytes or even exabytes of data
Develop faster, debug, and optimize smarter using familiar tools: U-SQL Jobs.
U-SQL: simple and familiar, powerful, and extensible: SQL language with the power of
expressive C#, process & analyze data with skills you already have.
Affordable and cost effective.
Works with all your Azure Data: Data Lake Analytics is optimized to work with Azure
Data Lake - providing the highest level of performance, throughput, and parallelization
for your big data workloads. Data Lake Analytics can also work with Azure Blob
storage and Azure SQL database.
Azure Data Lake Analytics
Part 3: Single Slide
DEMO –
Azure Data
Lake
Healthcare &
Pharmaceuticals Use Cases
Provider of enzymes for Food, Pharma Industry
Pharmaceuticals
Global provider of natural
ingredients such as cultures and
enzymes for the food,
pharmaceutical, nutritional, and
agricultural industries.
Part 1: What They Did | Gene Analysis
Challenge
Collect clinical trial data from electronic laboratory notebooks (ELNs)
Collect structured and unstructured data from sources like automated equipment (robots,
temperature sensors, and other devices)
Need to do analysis and look for patterns on this data
Solution
Chose Azure HDInsight, SQL Server on-premises
Examine patterns in chemical composition and physical properties of cultures and
enzymes e.g. examine the gene composition of our bacteria and how it actually
behaves in yogurt
Gene Analysis
BK1
Provider of enzymes for Food, Pharma Industry
Part 2: How They Did It | Gene Analysis
How They Did It
Collect data in Azure Blobs
• Extract data from all the files (GBs in size)
• Transpose into JSON using .NET
HDInsight processes data for insights
• Hive is used to run queries
• Mainly Select statements (joins/unions)
• Maximum 20 lines of code
Use SQL Server for reporting
Use Hive ODBC connector
Gene Analysis
SQL Server
On-premises
Lab Information
Management
System
Automated equipment
(robots, temperature sensors, etc)
Healthcare Application Provider
Healthcare
Leading provider of healthcare
software applications for clinical,
financial, pharmacy, etc.
Part 1: What They Did | Data Lake to deliver better analytics
Challenge
Multiple healthcare applications operating in product silos
Want to implement a data lake and provide distributed processing for better analytics and
predictions for their end customers:
• Population, risk, and Care management
• Clinical decision support using predictions
• Real time quality measures to assist providers reach their regulatory requirements
• Capacity management predictions
• Enrich clinical data with NLP on unstructured physicians notes to reduce over/under treatment,
readmissions, and faster claims processing
Solution
Chose Azure HDInsight, HBase, and custom .NET application
Begin by storing all medical records in Azure
Acquire, clean data and insert into data lake
Data Lake
Analytics
BK1
Healthcare Application Provider
Part 2: How They Did It | Data Lake to deliver better analytics
How They Did It
Collect data in Azure Blobs
• Have medical records in JSON format
HDInsight processes data for insights
• MapReduce preprocesses data
• used to insert data into data lake
• Developed .NET application to interface with HBase
Data Lake Analytics
Azure Blobs
Azure HDInsight
Medical Records
in JSON
.NET Application
Insert into Blobs
References
• HDInsight Documentation (R, Storm, Spark, Kafka, ADL,..etc)
https://azure.microsoft.com/en-us/services/hdinsight/
• Spark Programming Guide
http://spark.apache.org/docs/latest/programming-guide.html
• edx.org: Free Apache Spark courses
• Get started with Data Science VMs in Azure
https://blogs.technet.microsoft.com/machinelearning/tag/dsvm/
Big data talking stories in Healthcare

Weitere ähnliche Inhalte

Was ist angesagt?

Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyNilesh Shah
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Cathrine Wilhelmsen
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data LakeCalum Miller
 
Afternoons with Azure - Power BI and Azure Analysis Services
Afternoons with Azure - Power BI and Azure Analysis ServicesAfternoons with Azure - Power BI and Azure Analysis Services
Afternoons with Azure - Power BI and Azure Analysis ServicesCCG
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Turner Kunkel
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Sprinkle Data Inc
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azureEyal Ben Ivri
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningMostafa
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsSergio Zenatti Filho
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 

Was ist angesagt? (20)

Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Afternoons with Azure - Power BI and Azure Analysis Services
Afternoons with Azure - Power BI and Azure Analysis ServicesAfternoons with Azure - Power BI and Azure Analysis Services
Afternoons with Azure - Power BI and Azure Analysis Services
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)
 
Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)Definitive Guide to Select Right Data Warehouse (2020)
Definitive Guide to Select Right Data Warehouse (2020)
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 

Ähnlich wie Big data talking stories in Healthcare

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in AzureMostafa
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azureMostafa
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...Megha Shah
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Nathan Bijnens
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsightEng Teong Cheah
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 

Ähnlich wie Big data talking stories in Healthcare (20)

Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
 
4AA6-4492ENW
4AA6-4492ENW4AA6-4492ENW
4AA6-4492ENW
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 

Mehr von Mostafa

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicMostafa
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloudMostafa
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI EmbeddedMostafa
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsMostafa
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using AzureMostafa
 
Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insMostafa
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure mlMostafa
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BIMostafa
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump StartMostafa
 
Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azureMostafa
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startupsMostafa
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - publicMostafa
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceMostafa
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azureMostafa
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge readyMostafa
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaMostafa
 

Mehr von Mostafa (20)

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud public
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI Embedded
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook Connectors
 
Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using Azure
 
Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-ins
 
Data science essentials in azure ml
Data science essentials in azure mlData science essentials in azure ml
Data science essentials in azure ml
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BI
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump Start
 
Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azure
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startups
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - public
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud service
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
 
eRecall
eRecalleRecall
eRecall
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge ready
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache Cordova
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Big data talking stories in Healthcare

  • 1. Azure Data Platform Services Mostafa Elzoghbi http://mostafa.rocks Twitter: @MostafaElzoghbi
  • 2. Session takeaways and objectives • Azure Data Platform Services • HDInsight Clusters in Azure • Data Storage: Apache Hive, Apache Hbase, Azure Data Catalog • Data Transformations: Apache Storm, Apache Spark, Azure Data Factory • Healthcare / Life Sciences Use Cases
  • 4.
  • 5. HDInsight is a cloud implementation on Microsoft Azure of the rapidly expanding Apache Hadoop technology stack that is the go-to solution for big data analysis. It includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, and so on. HDInsight also integrates with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services. HDInsight is available on Windows and Linux HDInsight on Linux: A Hadoop cluster on Ubuntu HDInsight on Windows: A Hadoop cluster on Win Server 2012 R2 What is HDInsight
  • 6. HDInsight provides cluster Types & custom configurations for: • Hadoop (HDFS) • HBase • Storm • Spark • R Server • Hive Interactive, Kafka (Preview) HDInsight has powerful programming extensions for languages including C#, Java, and .NET. Business Value Proposition: Skip maintaining and purchasing hardware HDInsight clusters on Azure
  • 8. Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families Data is stored in the rows of a table, and data within a row is grouped by column family. The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem. What is HBase
  • 9. Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA Customer Company Order No Customer Name Customer Phone Company Name Company Address 12012015 Mostafa 101-232-2345 Microsoft Redmond, WA
  • 10. HBase Commands: create  Equivalent to Create table in T-SQL get  Equivalent to Select statements in T-SQL put  Equivalent to Update, Insert statement in T-SQL scan  Equivalent to Select (no where condition) in T-SQL HBase shell is your query tool to execute in CRUD commands to a HBase cluster. Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. An HBase database can also be queried by using Hive using SQLHive. What is HBase
  • 11. Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data. Hive can also be extended through user-defined functions (UDF). A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL. Support for multiple execution engine: WebHCat, HiveServer2 (faster) or Tez. What is Hive
  • 12. Apache Storm is a distributed, fault-tolerant, open-source computation system that allows you to process data in real-time with Hadoop. Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in the Azure environment by using Apache Hadoop. Ability to write Storm components in C#, JAVA and Python. Azure Scale up or Scale down without an impact for running Storm topologies. Ease of provision and use in Azure portal & development templates in Visual Studio. What is Apache Storm
  • 13. Apache Storm apps are submitted as Topologies. A topology is a graph of computation that processes streams Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and they are consumed by bolts. Tuple: A named list of dynamically typed values. Spout: Consumes data from a data source and emits one or more streams. Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a blob, or other data store. Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures. Apache Storm Components
  • 14. Apache Spark™ is a fast and general engine for large-scale data processing. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Write applications quickly in Java, Scala, Python, R. Combine SQL, streaming, and complex analytics. Spark's in-memory computation capabilities make it a good choice for iterative algorithms in ML and graph computations. Support for R Server & Azure Data Lake. What is Apache Spark
  • 15. Part 3: Single Slide DEMO – Apache Spark
  • 16. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. Data Lake Analytics—a no-limits analytics job service to power intelligent action Azure Data Lake (ADL)
  • 18. Azure Data Lake Analytics is a new service, built to make big data analytics easy. Key capabilities: Dynamic scaling: do analytics on terabytes or even exabytes of data Develop faster, debug, and optimize smarter using familiar tools: U-SQL Jobs. U-SQL: simple and familiar, powerful, and extensible: SQL language with the power of expressive C#, process & analyze data with skills you already have. Affordable and cost effective. Works with all your Azure Data: Data Lake Analytics is optimized to work with Azure Data Lake - providing the highest level of performance, throughput, and parallelization for your big data workloads. Data Lake Analytics can also work with Azure Blob storage and Azure SQL database. Azure Data Lake Analytics
  • 19. Part 3: Single Slide DEMO – Azure Data Lake
  • 21. Provider of enzymes for Food, Pharma Industry Pharmaceuticals Global provider of natural ingredients such as cultures and enzymes for the food, pharmaceutical, nutritional, and agricultural industries. Part 1: What They Did | Gene Analysis Challenge Collect clinical trial data from electronic laboratory notebooks (ELNs) Collect structured and unstructured data from sources like automated equipment (robots, temperature sensors, and other devices) Need to do analysis and look for patterns on this data Solution Chose Azure HDInsight, SQL Server on-premises Examine patterns in chemical composition and physical properties of cultures and enzymes e.g. examine the gene composition of our bacteria and how it actually behaves in yogurt Gene Analysis
  • 22. BK1 Provider of enzymes for Food, Pharma Industry Part 2: How They Did It | Gene Analysis How They Did It Collect data in Azure Blobs • Extract data from all the files (GBs in size) • Transpose into JSON using .NET HDInsight processes data for insights • Hive is used to run queries • Mainly Select statements (joins/unions) • Maximum 20 lines of code Use SQL Server for reporting Use Hive ODBC connector Gene Analysis SQL Server On-premises Lab Information Management System Automated equipment (robots, temperature sensors, etc)
  • 23.
  • 24. Healthcare Application Provider Healthcare Leading provider of healthcare software applications for clinical, financial, pharmacy, etc. Part 1: What They Did | Data Lake to deliver better analytics Challenge Multiple healthcare applications operating in product silos Want to implement a data lake and provide distributed processing for better analytics and predictions for their end customers: • Population, risk, and Care management • Clinical decision support using predictions • Real time quality measures to assist providers reach their regulatory requirements • Capacity management predictions • Enrich clinical data with NLP on unstructured physicians notes to reduce over/under treatment, readmissions, and faster claims processing Solution Chose Azure HDInsight, HBase, and custom .NET application Begin by storing all medical records in Azure Acquire, clean data and insert into data lake Data Lake Analytics
  • 25. BK1 Healthcare Application Provider Part 2: How They Did It | Data Lake to deliver better analytics How They Did It Collect data in Azure Blobs • Have medical records in JSON format HDInsight processes data for insights • MapReduce preprocesses data • used to insert data into data lake • Developed .NET application to interface with HBase Data Lake Analytics Azure Blobs Azure HDInsight Medical Records in JSON .NET Application Insert into Blobs
  • 26.
  • 27.
  • 28.
  • 29. References • HDInsight Documentation (R, Storm, Spark, Kafka, ADL,..etc) https://azure.microsoft.com/en-us/services/hdinsight/ • Spark Programming Guide http://spark.apache.org/docs/latest/programming-guide.html • edx.org: Free Apache Spark courses • Get started with Data Science VMs in Azure https://blogs.technet.microsoft.com/machinelearning/tag/dsvm/