SlideShare ist ein Scribd-Unternehmen logo
1 von 111
60 min
Data Complexity: Variety and Velocity
Terabytes (1012)
Gigabytes (109)
Megabytes (106)
Petabytes (1015)
Exabyte (1018)
Volume Velocity
Variety Variability
Reduces
NoSQL:
• No cleansing!
• No ETL!
• No load!
• Analyze the data where it lands! Store now, question later
RDBMS
Data
Arrives
Derive a
schema
Cleanse
the data
Transform
the data
Load
the data
SQL
Queries
1
2
3 4 5
6
Data
Arrives
Application
Program
1 2
HOW?? IF I
DON’T
KNOW THE
STRUCTURE?
Distributed Storage (HDFS)
Query
(Hive)
Distributed Processing
(MapReduce)
DataIntegration
(ODBC/SQOOP/REST)
EventPipeline
(EventHub/
Flume)
Legend
Red =
Core Hadoop
Blue =
Data processing
Gray= Microsoft
integration points
and value adds
Orange =
Data Movement
Green = Packages
YARN
Name Node
de
Data Node
HDFS API
DFS (1 Data Node per
Worker Role) and Compute
Cluster / VM
Azure Storage (WASB)
Benefits:
Data reuse and sharing
Data storage cost
Elastic scale-out
Geo-replication
…
Data Node
Most important Benefit:
Data are INDEPENDENT from cluster
And WASB is FAST…
SOSP Paper - Windows Azure Storage: A Highly
Available Cloud Storage Service with Strong
Consistency
http://nasuni.com
Report link is here
M
Extent Nodes (EN)
Paxos
Front End
Layer
FE
Incoming Write Request
M
M
Partition
Server
Partition
Server
Partition
Server
Partition
Server
Partition
Master
FE FE FE FE
Lock
Service
Ack
Partition Layer
Stream
Layer
Account
Name
Container
Name
Blob
Name
aaaa aaaa aaaaa
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
…….. …….. ……..
zzzz zzzz zzzzz
Storage Stamp
Partition
Server
Partition
Server
Account
Name
Container
Name
Blob
Name
richard videos tennis
……… ……… ………
……… ……… ………
zzzz zzzz zzzzz
Account
Name
Container
Name
Blob
Name
harry pictures sunset
……… ……… ………
……… ……… ………
richard videos soccer
Partition
Server
Partition
Master
Front-End
Server
PS 2 PS 3
PS 1
A-H: PS1
H’-R: PS2
R’-Z: PS3
A-H: PS1
H’-R: PS2
R’-Z: PS3
Partition
Map
Blob Index
Partition
Map
Account
Name
Container
Name
Blob
Name
aaaa aaaa aaaaa
……… ……… ………
……… ……… ………
harry pictures sunrise
A-H
R’-ZH’-R
• Programming framework
(library and runtime) for
analyzing datasets stored in
HDFS
• Composed of user-supplied
Map and Reduce functions:
• Map() - subdivide and
conquer
• Reduce() - combine and
reduce cardinality
………
Do work() Do work() Do work()
context.write(word, one);
context.write(key, new IntWritable(sum));
wasb:///example/data/gutenberg/davinci.txt wasb:///example/data/WordCountOutput
Start-AzureHDInsightJob
Get-AzureStorageBlob
Run in PS
https://pltkhdc01.azurehdinsight.net:443/ambari/ap
i/v1/clusters/pltkhdc01.azurehdinsight.net/service
s/yarn
• It’s important to check that the results generated
by queries are realistic, valid, and useful for better
RoI
• Automate tasks in a repeatable solution, and run
the solution from a remote computer rather than
directly from the cluster server desktop.
• There’s a huge range of tools that you can use
with Hadoop, and choosing the most appropriate
can be difficult.
• If you decide to use a resource-intensive
application such as HBase or Storm, you should
consider running it on a separate cluster.
Data-flow platform to transform and
analyze HDFS data
Scripting – No Java Needed!
Focus on semantics, not on implementation
Extensible through user defined functions and
methods
Pigs Eat Anything
Pig can operate on data whether it has metadata or not.
Pigs Live Anywhere
Pig is not tied to one particular parallel framework.
Pigs Are Domestic Animals
Pig is designed to be easily controlled. Complex tasks involving
interrelated data transformations can be simplified and
encoded as data flow sequences. Pig programs accomplish
huge tasks, but they are easy to write and maintain.
Pigs Fly
Pig processes data quickly. The system automatically optimizes
execution of Pig jobs, so the user can focus on semantics.
LOGS = LOAD 'wasb:///example/data/sample.log';
LEVELS = foreach LOGS generate REGEX_EXTRACT($0, '(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)', 1)
as LOGLEVEL;
FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null;
GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL;
FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL)
as COUNT;
RESULT = order FREQUENCIES by COUNT desc;
DUMP RESULT; STORE RESULT INTO 'tkR1'
Check result in PS
Hadoop 2.0
What is Machine Learning (ML)
Solve extremely hard problems
Extract more value from Big Data
Drive a shift in business analytics
Business
Knowledge
Data
Preparation
Modelling
Evaluation
Data
Understanding
Idea
Data
Publish
Machine Learning Process Model
Based on the CRISP-DM Model
Volume,batchprocessing
Events, Real Time processing
Relay
Queue
Topic
Notification Hub
Event Hub
NAT and Firewall Traversal Service
Request/Response Services
Unbuffered with TCP Throttling.
Hybrid Connection
Transactional Cloud AMQP/HTTP Broker
High-Scale, High-Reliability Messaging
Sessions, Scheduled Delivery, etc.
Transactional Message Distribution
Up to 2000 subscriptions per Topic
Up to 2K/100K filter rules per subscription
High-scale notification distribution
Most mobile push notification services
Millions of notification targets
EVENTS, MASSIVE
SCALE
Event
Producers
> 1M Producers
> 1GB/sec
Aggregate
Throughput
Partitions
Direct
PartitionKey
Hash
Throughput Units:
• 1 ≤ TUs ≤ Partition Count
• TU: 1 MB/s writes, 2 MB/s reads
• We pay for TU
AMQP 1.0
Credit-based flow control
Client-side cursors
Offset by Id or Timestamp
Ingestor
(broker)
Collection Presentation
and action
Event
producers
Transformation Long-term
storage
Event hubs
Storage
adapters
Stream
processingCloud gateways
(web APIs)
Field
gateways
Applications
Legacy IOT
(custom protocols)
Devices
IP-capable devices
(Windows/Linux)
Low-power
devices (RTOS)
Search and query
Data analytics (Excel)
Web/thick client
dashboards
Service bus
Azure DBs
Azure storage
HDInsight
Stream
Analytics
Devices to take action
Storm
IEventProcessor
Daughter
jumping
in garage
Me with
compressed
(cold) air
Me with
small dryer
* Tick tuples scheme is Storm’s built-in mechanism for generating tuples and sending them to each bolt in the topology at specified intervals.
Worth to check: https://storm.apache.org/apidocs/backtype/storm/topology/TopologyBuilder.BoltGetter.html
EventHubSpout
spoutConfig.getPartitionCount
PartialCountBolt
EventHubSpout
DBGlobalCountBolt
collector.emit
collector.ack
db.insertValue(System.currentTimeMillis(), partialCount);
Compute
Visualisation
Orchestration Storage
Service bus
Event Hub
Data Factory
Power BI
Stream Analytics
HD Insight
Machine Learning
Virtual Machines
Table Storage
Blob Storage
SQL Azure
Document DB
Feeds
IoT
Data Sources
Near real time analysisData Journeys
Azure
Compute
Visualisation
Orchestration Storage
Service bus
Event Hub
Data Factory
Power BI
Stream Analytics
HD Insight
Machine Learning
Virtual Machines
Table Storage
Blob Storage
SQL Azure
Document DB
Feeds
IoT
Data Sources
Near real time analysisPredictive Analytics
Azure
Compute
Visualisation
Orchestration Storage
Service bus
Event Hub
Data Factory
Power BI
Stream Analytics
HD Insight
Machine Learning
Virtual Machines
Table Storage
Blob Storage
SQL Azure
Document DB
Feeds
IoT
Data Sources
Near real time analysisNear real time analysis
Azure
Compute
Visualisation
Orchestration Storage
Service bus
Event Hub
Data Factory
Power BI
Stream Analytics
HD Insight
Machine Learning
Virtual Machines
Table Storage
Blob Storage
SQL Azure
Document DB
Feeds
IoT
Data Sources
Near real time analysisBig Data
Azure
Compute
Visualisation
Orchestration Storage
Service bus
Event Hub
Data Factory
Power BI
Stream Analytics
HD Insight
Machine Learning
Virtual Machines
Table Storage
Blob Storage
SQL Azure
Document DB
Feeds
IoT
Data Sources
Near real time analysis“Traditional” BI
Azure
tkopacz@microsoft.com
Azure
Windows
Server
Linux
Hosted Clouds
Windows
Server
Linux
Service Fabric
Private Clouds
Windows
Server
Linux
High Availability
Hyper-Scale
Hybrid Operations
High Density
Microservices
Rolling Upgrades
Stateful services
Low Latency
Fast startup &
shutdown
Container Orchestration
& lifecycle management
Replication &
Failover
Simple
programming
models
Load balancing
Self-healingData Partitioning
Automated Rollback
Health
Monitoring
Placement
Constraints
Big data on Azure for Architects

Weitere ähnliche Inhalte

Was ist angesagt?

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionCortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionMSAdvAnalytics
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics SuiteJames Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAlberto Diaz Martin
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...Lace Lofranco
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azureMohamed Tawfik
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...Databricks
 

Was ist angesagt? (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionCortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics Solution
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Cortana Analytics Suite
Cortana Analytics SuiteCortana Analytics Suite
Cortana Analytics Suite
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
How Azure Databricks helped make IoT Analytics a Reality with Janath Manohara...
 

Andere mochten auch

Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business IntelligenceHGanesh
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data StoryLynn Langit
 
Azure architecture
Azure architectureAzure architecture
Azure architectureAmal Dev
 
Windows Azure and the Hybrid Cloud
Windows Azure and the Hybrid CloudWindows Azure and the Hybrid Cloud
Windows Azure and the Hybrid CloudWindows Azure
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Improving Application Security With Azure
Improving Application Security With AzureImproving Application Security With Azure
Improving Application Security With AzureSoftchoice Corporation
 
Architecting azure IaaS Solutions
Architecting azure IaaS SolutionsArchitecting azure IaaS Solutions
Architecting azure IaaS Solutionsswapnilrkambli
 
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For TechiesMicrosoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For TechiesAidan Finn
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache AmbariHortonworks
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAdnan Hashmi
 
Optimize your azure architecture
Optimize your azure architectureOptimize your azure architecture
Optimize your azure architectureAsaf Nakash
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
MS Cloud Summit Paris 2017 - Azure Stack
MS Cloud Summit Paris 2017 - Azure StackMS Cloud Summit Paris 2017 - Azure Stack
MS Cloud Summit Paris 2017 - Azure StackBenoît SAUTIERE
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 

Andere mochten auch (20)

Desayuno de arquitectos: Big data en azure
Desayuno de arquitectos: Big data en azureDesayuno de arquitectos: Big data en azure
Desayuno de arquitectos: Big data en azure
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Azure architecture
Azure architectureAzure architecture
Azure architecture
 
Windows Azure and the Hybrid Cloud
Windows Azure and the Hybrid CloudWindows Azure and the Hybrid Cloud
Windows Azure and the Hybrid Cloud
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Improving Application Security With Azure
Improving Application Security With AzureImproving Application Security With Azure
Improving Application Security With Azure
 
Architecting azure IaaS Solutions
Architecting azure IaaS SolutionsArchitecting azure IaaS Solutions
Architecting azure IaaS Solutions
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Microsoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For TechiesMicrosoft Azure Hybrid Cloud - Getting Started For Techies
Microsoft Azure Hybrid Cloud - Getting Started For Techies
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
 
Optimize your azure architecture
Optimize your azure architectureOptimize your azure architecture
Optimize your azure architecture
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
MS Cloud Summit Paris 2017 - Azure Stack
MS Cloud Summit Paris 2017 - Azure StackMS Cloud Summit Paris 2017 - Azure Stack
MS Cloud Summit Paris 2017 - Azure Stack
 
Big Data en Azure: Azure Data Lake
Big Data en Azure: Azure Data LakeBig Data en Azure: Azure Data Lake
Big Data en Azure: Azure Data Lake
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 

Ähnlich wie Big data on Azure for Architects

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsAlluxio, Inc.
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 

Ähnlich wie Big data on Azure for Architects (20)

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

Mehr von Tomasz Kopacz

Azure Digital Twins.pdf
Azure Digital Twins.pdfAzure Digital Twins.pdf
Azure Digital Twins.pdfTomasz Kopacz
 
24032022 Zero Trust for Developers Pub.pdf
24032022 Zero Trust for Developers Pub.pdf24032022 Zero Trust for Developers Pub.pdf
24032022 Zero Trust for Developers Pub.pdfTomasz Kopacz
 
Deep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsDeep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsTomasz Kopacz
 
Net core (dawniej 5.0) – co to dla mnie. też dużo o open source
Net core (dawniej   5.0) – co to dla mnie. też dużo o open sourceNet core (dawniej   5.0) – co to dla mnie. też dużo o open source
Net core (dawniej 5.0) – co to dla mnie. też dużo o open sourceTomasz Kopacz
 
Visual Studio – jak zorganizować pracę używając Scrum i GIT?
Visual Studio – jak zorganizować pracę używając Scrum i GIT?Visual Studio – jak zorganizować pracę używając Scrum i GIT?
Visual Studio – jak zorganizować pracę używając Scrum i GIT?Tomasz Kopacz
 
Visual Studio - zastosowania
Visual Studio - zastosowaniaVisual Studio - zastosowania
Visual Studio - zastosowaniaTomasz Kopacz
 
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjach
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjachCoś o service fabric, architekturze, i bardzo skalowalnych aplikacjach
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjachTomasz Kopacz
 
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...Tomasz Kopacz
 
Windows 10, internet of things, komunikacja duplex od kabli do odrobiny azu...
Windows 10, internet of things, komunikacja duplex   od kabli do odrobiny azu...Windows 10, internet of things, komunikacja duplex   od kabli do odrobiny azu...
Windows 10, internet of things, komunikacja duplex od kabli do odrobiny azu...Tomasz Kopacz
 
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyć
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyćIt w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyć
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyćTomasz Kopacz
 
(Azure) Machine Learning 2015
(Azure) Machine Learning 2015(Azure) Machine Learning 2015
(Azure) Machine Learning 2015Tomasz Kopacz
 
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...Tomasz Kopacz
 
Mts 2013 tomasz kopacz - windows 8, office 365, workflow manager, windows a...
Mts 2013   tomasz kopacz - windows 8, office 365, workflow manager, windows a...Mts 2013   tomasz kopacz - windows 8, office 365, workflow manager, windows a...
Mts 2013 tomasz kopacz - windows 8, office 365, workflow manager, windows a...Tomasz Kopacz
 
Mts 2013 tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...
Mts 2013   tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...Mts 2013   tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...
Mts 2013 tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...Tomasz Kopacz
 
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...Tomasz Kopacz
 
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...Tomasz Kopacz
 

Mehr von Tomasz Kopacz (17)

Azure Digital Twins.pdf
Azure Digital Twins.pdfAzure Digital Twins.pdf
Azure Digital Twins.pdf
 
24032022 Zero Trust for Developers Pub.pdf
24032022 Zero Trust for Developers Pub.pdf24032022 Zero Trust for Developers Pub.pdf
24032022 Zero Trust for Developers Pub.pdf
 
Deep dive into service fabric after 2 years
Deep dive into service fabric after 2 yearsDeep dive into service fabric after 2 years
Deep dive into service fabric after 2 years
 
O danych w 2016
O danych w 2016O danych w 2016
O danych w 2016
 
Net core (dawniej 5.0) – co to dla mnie. też dużo o open source
Net core (dawniej   5.0) – co to dla mnie. też dużo o open sourceNet core (dawniej   5.0) – co to dla mnie. też dużo o open source
Net core (dawniej 5.0) – co to dla mnie. też dużo o open source
 
Visual Studio – jak zorganizować pracę używając Scrum i GIT?
Visual Studio – jak zorganizować pracę używając Scrum i GIT?Visual Studio – jak zorganizować pracę używając Scrum i GIT?
Visual Studio – jak zorganizować pracę używając Scrum i GIT?
 
Visual Studio - zastosowania
Visual Studio - zastosowaniaVisual Studio - zastosowania
Visual Studio - zastosowania
 
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjach
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjachCoś o service fabric, architekturze, i bardzo skalowalnych aplikacjach
Coś o service fabric, architekturze, i bardzo skalowalnych aplikacjach
 
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...
Kiedy napadnie na nas pralka – jak budować bezpieczne systemy internet of thi...
 
Windows 10, internet of things, komunikacja duplex od kabli do odrobiny azu...
Windows 10, internet of things, komunikacja duplex   od kabli do odrobiny azu...Windows 10, internet of things, komunikacja duplex   od kabli do odrobiny azu...
Windows 10, internet of things, komunikacja duplex od kabli do odrobiny azu...
 
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyć
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyćIt w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyć
It w roku 201x – dom, szkoła, potem praca. no i – jak tu (i czego!) uczyć
 
(Azure) Machine Learning 2015
(Azure) Machine Learning 2015(Azure) Machine Learning 2015
(Azure) Machine Learning 2015
 
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...
Azure paa s v2 – microservices, microsoft (azure) service fabric, .apps and o...
 
Mts 2013 tomasz kopacz - windows 8, office 365, workflow manager, windows a...
Mts 2013   tomasz kopacz - windows 8, office 365, workflow manager, windows a...Mts 2013   tomasz kopacz - windows 8, office 365, workflow manager, windows a...
Mts 2013 tomasz kopacz - windows 8, office 365, workflow manager, windows a...
 
Mts 2013 tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...
Mts 2013   tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...Mts 2013   tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...
Mts 2013 tomasz kopacz - wydajność aplikacji dla windows 8 - jak ją mierzyć...
 
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...
Tomasz Kopacz MTS 2012 Wind RT w Windows 8 i tzw aplikacje lob (line of busin...
 
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...
Tomasz Kopacz MTS 2012 Azure - Co i kiedy użyć (IaaS vs paas vshybrid cloud v...
 

Kürzlich hochgeladen

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Big data on Azure for Architects

  • 2.
  • 3. Data Complexity: Variety and Velocity Terabytes (1012) Gigabytes (109) Megabytes (106) Petabytes (1015) Exabyte (1018)
  • 4.
  • 6. Reduces NoSQL: • No cleansing! • No ETL! • No load! • Analyze the data where it lands! Store now, question later RDBMS Data Arrives Derive a schema Cleanse the data Transform the data Load the data SQL Queries 1 2 3 4 5 6 Data Arrives Application Program 1 2 HOW?? IF I DON’T KNOW THE STRUCTURE?
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) DataIntegration (ODBC/SQOOP/REST) EventPipeline (EventHub/ Flume) Legend Red = Core Hadoop Blue = Data processing Gray= Microsoft integration points and value adds Orange = Data Movement Green = Packages YARN
  • 17. Name Node de Data Node HDFS API DFS (1 Data Node per Worker Role) and Compute Cluster / VM Azure Storage (WASB) Benefits: Data reuse and sharing Data storage cost Elastic scale-out Geo-replication … Data Node Most important Benefit: Data are INDEPENDENT from cluster And WASB is FAST…
  • 18.
  • 19.
  • 20.
  • 21. SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency http://nasuni.com Report link is here
  • 22. M Extent Nodes (EN) Paxos Front End Layer FE Incoming Write Request M M Partition Server Partition Server Partition Server Partition Server Partition Master FE FE FE FE Lock Service Ack Partition Layer Stream Layer
  • 23. Account Name Container Name Blob Name aaaa aaaa aaaaa …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. …….. zzzz zzzz zzzzz Storage Stamp Partition Server Partition Server Account Name Container Name Blob Name richard videos tennis ……… ……… ……… ……… ……… ……… zzzz zzzz zzzzz Account Name Container Name Blob Name harry pictures sunset ……… ……… ……… ……… ……… ……… richard videos soccer Partition Server Partition Master Front-End Server PS 2 PS 3 PS 1 A-H: PS1 H’-R: PS2 R’-Z: PS3 A-H: PS1 H’-R: PS2 R’-Z: PS3 Partition Map Blob Index Partition Map Account Name Container Name Blob Name aaaa aaaa aaaaa ……… ……… ……… ……… ……… ……… harry pictures sunrise A-H R’-ZH’-R
  • 24.
  • 25.
  • 26.
  • 27. • Programming framework (library and runtime) for analyzing datasets stored in HDFS • Composed of user-supplied Map and Reduce functions: • Map() - subdivide and conquer • Reduce() - combine and reduce cardinality ……… Do work() Do work() Do work()
  • 28.
  • 29.
  • 30. context.write(word, one); context.write(key, new IntWritable(sum)); wasb:///example/data/gutenberg/davinci.txt wasb:///example/data/WordCountOutput Start-AzureHDInsightJob Get-AzureStorageBlob Run in PS
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. • It’s important to check that the results generated by queries are realistic, valid, and useful for better RoI • Automate tasks in a repeatable solution, and run the solution from a remote computer rather than directly from the cluster server desktop. • There’s a huge range of tools that you can use with Hadoop, and choosing the most appropriate can be difficult. • If you decide to use a resource-intensive application such as HBase or Storm, you should consider running it on a separate cluster.
  • 37. Data-flow platform to transform and analyze HDFS data Scripting – No Java Needed! Focus on semantics, not on implementation Extensible through user defined functions and methods Pigs Eat Anything Pig can operate on data whether it has metadata or not. Pigs Live Anywhere Pig is not tied to one particular parallel framework. Pigs Are Domestic Animals Pig is designed to be easily controlled. Complex tasks involving interrelated data transformations can be simplified and encoded as data flow sequences. Pig programs accomplish huge tasks, but they are easy to write and maintain. Pigs Fly Pig processes data quickly. The system automatically optimizes execution of Pig jobs, so the user can focus on semantics.
  • 38.
  • 39. LOGS = LOAD 'wasb:///example/data/sample.log'; LEVELS = foreach LOGS generate REGEX_EXTRACT($0, '(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)', 1) as LOGLEVEL; FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null; GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL; FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL, COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT; RESULT = order FREQUENCIES by COUNT desc; DUMP RESULT; STORE RESULT INTO 'tkR1'
  • 40.
  • 41.
  • 42.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. What is Machine Learning (ML) Solve extremely hard problems Extract more value from Big Data Drive a shift in business analytics
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 69.
  • 70. Relay Queue Topic Notification Hub Event Hub NAT and Firewall Traversal Service Request/Response Services Unbuffered with TCP Throttling. Hybrid Connection Transactional Cloud AMQP/HTTP Broker High-Scale, High-Reliability Messaging Sessions, Scheduled Delivery, etc. Transactional Message Distribution Up to 2000 subscriptions per Topic Up to 2K/100K filter rules per subscription High-scale notification distribution Most mobile push notification services Millions of notification targets EVENTS, MASSIVE SCALE
  • 71. Event Producers > 1M Producers > 1GB/sec Aggregate Throughput Partitions Direct PartitionKey Hash Throughput Units: • 1 ≤ TUs ≤ Partition Count • TU: 1 MB/s writes, 2 MB/s reads • We pay for TU AMQP 1.0 Credit-based flow control Client-side cursors Offset by Id or Timestamp
  • 72. Ingestor (broker) Collection Presentation and action Event producers Transformation Long-term storage Event hubs Storage adapters Stream processingCloud gateways (web APIs) Field gateways Applications Legacy IOT (custom protocols) Devices IP-capable devices (Windows/Linux) Low-power devices (RTOS) Search and query Data analytics (Excel) Web/thick client dashboards Service bus Azure DBs Azure storage HDInsight Stream Analytics Devices to take action Storm IEventProcessor
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86. * Tick tuples scheme is Storm’s built-in mechanism for generating tuples and sending them to each bolt in the topology at specified intervals. Worth to check: https://storm.apache.org/apidocs/backtype/storm/topology/TopologyBuilder.BoltGetter.html
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102. Compute Visualisation Orchestration Storage Service bus Event Hub Data Factory Power BI Stream Analytics HD Insight Machine Learning Virtual Machines Table Storage Blob Storage SQL Azure Document DB Feeds IoT Data Sources Near real time analysisData Journeys Azure
  • 103. Compute Visualisation Orchestration Storage Service bus Event Hub Data Factory Power BI Stream Analytics HD Insight Machine Learning Virtual Machines Table Storage Blob Storage SQL Azure Document DB Feeds IoT Data Sources Near real time analysisPredictive Analytics Azure
  • 104. Compute Visualisation Orchestration Storage Service bus Event Hub Data Factory Power BI Stream Analytics HD Insight Machine Learning Virtual Machines Table Storage Blob Storage SQL Azure Document DB Feeds IoT Data Sources Near real time analysisNear real time analysis Azure
  • 105. Compute Visualisation Orchestration Storage Service bus Event Hub Data Factory Power BI Stream Analytics HD Insight Machine Learning Virtual Machines Table Storage Blob Storage SQL Azure Document DB Feeds IoT Data Sources Near real time analysisBig Data Azure
  • 106. Compute Visualisation Orchestration Storage Service bus Event Hub Data Factory Power BI Stream Analytics HD Insight Machine Learning Virtual Machines Table Storage Blob Storage SQL Azure Document DB Feeds IoT Data Sources Near real time analysis“Traditional” BI Azure
  • 107.
  • 109.
  • 110. Azure Windows Server Linux Hosted Clouds Windows Server Linux Service Fabric Private Clouds Windows Server Linux High Availability Hyper-Scale Hybrid Operations High Density Microservices Rolling Upgrades Stateful services Low Latency Fast startup & shutdown Container Orchestration & lifecycle management Replication & Failover Simple programming models Load balancing Self-healingData Partitioning Automated Rollback Health Monitoring Placement Constraints