SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Big Data - in der Cloud oder doch
lieber On-Premises?
Guido Schmutz
Kassel, 21.9.2017
@gschmutz guidoschmutz@wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
2
Agenda
1. Cloud Primer
2. Big Data and IoT Architecture
3. Big Data in the Cloud
4. Various Models for Big Data in Cloud
5. Big Data On-Premises
6. Hybrid Big Data Solutions
4
Cloud Primer
Cloud Primer
5
Instance
• the thing running in the cloud provider’s infrastructure
• can be a VM but does not have to be
Instance Type
• the size of the instance (Combination of CPU, Memory, Disk Storage => Cost)
• Azure: Instance sizes
Instance Control
• lifecycle of an instance
• Instances can be stopped or terminated (deleted)
Cloud Primer
6
Images
• the template used for provisioning an
instance
Serverless
• Run code “without” servers => only
specify functions (Java, C#, Python,
Node.js)
• Pay only for the compute time you
consume
• easy scale-out
• management and capacity planning
decision done by provider
Regions and Availability Zones
• represents geographic distribution of
cloud provider
• Regions are the geographic areas
where a service is offered
• Availability Zones (AZ) add high
availability within a Region
• communication within AZ in same
region cost less than across regions
Cloud Primer – Specific Instances
7
On-Demand Instance
• flexible, on-demand usage
• billing increment dependent on provider
Temporary Instance
• can disappear at any time (bid price)
• are charged significantly less
• well suited for Hadoop workloads (if storage
and compute are separated)
• AWS: spot instances
Reserved Instance
• reserved capacity in advance
• reduced pricing (up to 75% to on-demand)
Dedicated Instance
• pay for instances
• run on hardware dedicated to you
• Amazon decides placement
Dedicated Host
• pay for entire physical server
• full flexibility of placement of instances (VM)
• solves existing server-bound licenses issues
Bare Metal
• bare hardware resources, no virtualization by
cloud provider
• full flexibility / full control
• almost no automation provided
Cloud Primer - Storage
8
Block Storage
• most common type offered by a cloud
provider
• disk-like storage
• comes with each instance when provisioned
• accessed as filesystem mounts => volumes,
disks
• persistent volumes survive beyond lifetime
of instance that spawned it
• ephemeral volumes are limited to life of
instance to which they are attached
• AWS: EBS
• Azure: VHDS & Azure File Storage
• Oracle: Block Storage
Object Storage
• each chunk of data is treated as its own
entity independent of any instance
• content of each object is opaque to the
provider
• API or URL is used to access data (no
mount)
• well suited for Big Data
• hot and cold storage options
• AWS: S3 & Glacier
• Azure: Azure Blob Storage
• Oracle: Object Storage & Archive Storage
Cloud Primer – Usage Patterns
9
Short Lived (Transient)
👍 Minimal maintenance, high efficiency
👎 spin up time, higher resource demand
👎 data transfer to permanent storage
Self-Service
👍 efficiency of on-demand creation
👎 need to maintain tooling
Cloud-Only
👍 data transfer stay within cloud, minimal on-
premises costs, integration with provider
👎 higher cloud expenditure
Long lived (Long Running)
👍 less time waiting for clusters to start/stop
👍 lower resource demand
👎 wasted idle time (if there is)
👎 maintenance burden, growing size over time
Managed
👍 easy alignment with budget constraints
👎 waiting time for usage, admin effort
Hybrid
👍 lower cloud expenditure, local resources
available
👎 complex workflows, data transfer costs
10
Big Data & IoT Architecture
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Cloud / On-PremisesEdge
Internet /
Cloud /
On-Premises
1) Bulk Source – Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
2) Bulk Source - Edge & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
3) Event Source – Stream & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
4) Event Source – Edge & Stream & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
5) Stream Ingestion – Edge & Stream Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
20
Big Data in the Cloud
Big Data in the Cloud – two usage patterns
21
Short Lived Cluster (Transient)
data is repurposed, and used for a
specific use case in a specific workload
Cluster spun up only when needed
Flexibility
• spin up arbitrary number of nodes quickly
• Expand quickly from very small to very large
Simplicity
• use as is, solve problem and move on
Long Lived Cluster (Long Running)
data is acquired and augmented
continuously
cluster is in permanent use for mixed
workloads
Performance
• Raw compute performance across wide range
of workloads
• time of availability
BDaaS – Possible Cost Optimizations
22
Autoscaling
• scale up when a query comes in
• scale down when jobs finish
• match utilization with job demand
• benchmark: auto-scaling saves 33% in
compute costs compared to static-
sized cluster
Excess capacity
• Hadoop is fault tolerant, can take
advantage of unreliable instances
such as temporary instances
• benchmark: if 50% is done on spot
nodes, save 80% compared to normal
nodes
Common workload distribution with Big Data applications
Data Locality vs. Compute/Storage Separation
23
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute
and storage – the
fundamental difference
• store data in Object
Storage instead of DFS
• bring up Compute nodes
only for data processing
• multiple workloads on
separate clusters can
access same data
A new way to Manage Big Data
24
Big Data Traditional
Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big Data
A New Approach
Containers and VMs
Compute and storage
separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
5 ½ ways to get Big Data in the Cloud
26
1. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on Bare Metal
2. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on VM
3. Hadoop PaaS from Cloud Provider’s Marketplace
4. Dedicated (Long-Running) BigData-as-a-Service
5. Elastic (Transient) Big-Data-as-a-Service (storage and compute
separated)
6. “Cloud on Premises” (Cloud Stack from Vendors on Premises)
28
Various Models for Big Data in
Cloud
Various Models for Big Data in Cloud
29
1. Bare Metal Cloud (Bring Your Own Hadoop - BYOH)
2. IaaS with any Hadoop Distribution (Bring Your Own Hadoop)
3. PaaS with Hadoop (from Marketplace)
4. Dedicated (Long-Running) BDaaS
5. Elastic (Transient) BDaaS
6. BDaas + Analytics SaaS
1) Bare Metal Cloud (BYOH)
30
Compute	(Bare	Metal)
Big	Data	(Custom)
Oracle	Compute
Analytics	(Custom)
Storage	(Bare	Metal)
Oracle	Block	Volume	&	
Object	Storage,	Data	
Transfer	Service
Intelligence	(Custom)
Amazon
Azure
Oracle
Custom
n.a.	(Dedicated	Host	
close,	but	runs	VMs)
n.a.
n.a.	(Dedicated	Host,	
close,	but	runs	VMs)
n.a.
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
2) IaaS (Bring Your Own Hadoop)
31
Amazon	EC2	&	EC2	 Azure	VM
Bring	Your	Own	Hadoop	
(BYOH)
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
General	Purpose	
Compute	&	Dedicated	
Compute
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
S3,	EBS,	Glacier,	
Snowball,	Snowball	
Edge,	Snowmobile
Storage	(Blob),	Data	
Lake	Store,	
Import/Export
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
3) PaaS (Hadoop from Marketplace)
32
S3,	EBS,	Glacier,	
Snowball,	Snowball	
Edge,	Snowmobile
Hadoop	(Hortonworks,	
MapR)
Hadoop	(Cloudera,	
Hortonworks,	MapR)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
n.a.
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
4) Dedicated BDaaS
33
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Image-,	Speech-
Recognition,	Bots,	…
Image-,	Speech-
Recognition,	Bots,	…
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	(Cloudera)
Custom	(SQL,	Machine	
Learning,	..)
Image-,	Speech-
Recognition,	Bots,	…
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
5) Elastic BDaaS
34
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Image-,	Speech-
Recognition,	Bots,	…
Image-,	Speech-
Recognition,	Bots,	…
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	Compute	
Edition	(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Image-,	Speech-
Recognition,	Bots,	…
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
6) BDaaS + Analytics SaaS
35
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Machine	Learning,	
Polly,	…
Machine	Learning,	Data	
Lake	Analytics,	…
Amazon	EC2	&	EC2	
Dedicated	Hosts
Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Alexa,	Lex,	Polly
Cortana,	Speech	API,	
Computer	Vision	API,	
Video	API,	...
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	Compute	
Edition	/	Big	Data	CS
Big	Data	Discovery	CS,	
Analytics	Cloud,	Data	
Spatial	&	Graph
n.a.
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Oracle Cloud
36
IoT CS
Event	Hub	CS
Stream	
Analytics
Big	Data	CS
NoSQL	CS
Big	Data	
Discovery	CS
Big	Data	CS	–
Compute
Object
Storage
Archive	
Storage
Data	Transfer	
Service
Block	
Storage
NoSQL	CS
Data	Special	
&	Graph
Data	Transfer	
Service
BigData SQL
Data	Transfer	
Service
NoSQL	CS
Event	Hub	CS
Data	Transfer	
Service
Integration	CS
Messaging	CS
BI	CS
Process	CS
Mobile	CS
Container	CS
Application	
Container	CS
GoldenGate
Visual	Builder
Big	Data	
Preparation	CS
Data	
Visualization	CS
Oracle	Data	
Integrator	CS Analytics	CS
Amazon AWS
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Elastic	MapReduce	(EMR)
Polly
ML
Lex
Rekognition
Kinesis	Analytics
Kinesis	Streams
Kinesis	Firehose
Snowmobile
Snowball
AWS	IoT Platform Lambda
Direct	Connect
S3
Glacier
Dynamo	DB
EC2 Auto	Scaling	
EBS
EFS
Alexa
Athena
Dynamo	DB
Snowball
Direct	Connect
Snowball	Edge
Kinesis	Firehose
Athena
Snowball
Greengrass
Rules	Engine
Lambda
Redshift
EC2	Container	Service
EC2	Container	Registry
Mobile	Hub
Mobile	SDK
Lambda
SQSSNSEmail
PinpointAPI	Gateway
Elasticsearch
ElasticCache
Dynamo	DB
Elasticsearch
TensorFlow
Glue
Data	pipeline
QuickSight
Microsoft Azure
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
HD	Insight
Storage	Blob
Machine	
Learning
Data	Lake	
Store
Storage	Block
Data	Lake
Analytics
Event	Hub
Stream
Analytics
IoT Suite
Cosmos	DB
Import/Export
Import/Export
Speech	
API
Vision	API
Cortana
Bot	Service
Service	Bus
Notification	Hub
API	Management
Power	BI
BizTalk	Services
Event	Hub
IoT Hub
IoT Edge
SQL	Data	
Warehouse
Table	Storage
Redis	Cache
Functions
Container	Service
Container	Registry
Cosmos	DB
Table	Storage
Container	Instances
Time	Series	Insight
Time	Series	Insight
Event	Grid
43
Big Data On-Premises
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
On-Premises – Oracle Cloud Machine
44
IoT CS
Event	Hub	CS
Stream	
Analytics
Big	Data	CS
NoSQL	CS
Big	Data	
Discovery	CS
Big	Data	CS	–
Compute
Object
Storage
Archive	
Storage
Data	Transfer	
Service
Block	
Storage
NoSQL	CS
Data	Special	
&	Graph
Data	Transfer	
Service
BigData SQL
Data	Transfer	
Service
NoSQL	CS
Event	Hub	CS
Data	Transfer	
Service
Integration	CS
Messaging	CS
BI	CS
Process	CS
Mobile	CS
Container	CS
Application	
Container	CS
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
On Premises – Open Source
45
46
Hybrid Big Data Solutions
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
47
Cloud On-PremOn-Prem/Edge/
Internet
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
48
Cloud On-PremOn-Prem/Edge/
Internet
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
49
CloudOn-Prem/Edge/
Internet
On-Prem
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
50
CloudOn-Prem/Edge
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
@gschmutz guidoschmutz.wordpress.com

Weitere ähnliche Inhalte

Was ist angesagt?

Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCAmazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Apigee | Google Cloud
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationCambridge Semantics
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Azure advanced analytics for SAP customers
Azure advanced analytics for SAP customersAzure advanced analytics for SAP customers
Azure advanced analytics for SAP customersVisual_BI
 
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...Amazon Web Services
 
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to CloudHybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to CloudAleksandar Francuz
 
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례rockplace
 
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...Amazon Web Services Korea
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?Amazon Web Services
 
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationCapgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationFloyd DCosta
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Cathrine Wilhelmsen
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAmazon Web Services
 

Was ist angesagt? (20)

Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Big Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data DemocratizationBig Data Fabric 2.0 Drives Data Democratization
Big Data Fabric 2.0 Drives Data Democratization
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Azure advanced analytics for SAP customers
Azure advanced analytics for SAP customersAzure advanced analytics for SAP customers
Azure advanced analytics for SAP customers
 
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...
How to Migrate SAP Applications to AWS While Maintaining Compliance with AWS ...
 
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to CloudHybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
Hybrid- and Multi-Cloud by design - IBM Cloud and your journey to Cloud
 
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례
 
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...
금융권 고객을 위한 클라우드 보안 및 규정 준수 가이드 - 이대근 시큐리티 어슈어런스 매니저, AWS :: AWS Summit Seoul ...
 
What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?What is Cloud Computing with Amazon Web Services?
What is Cloud Computing with Amazon Web Services?
 
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud MigrationCapgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
 
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdf
 

Andere mochten auch

Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Canada
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Guido Schmutz
 
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCGIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCJim Tochterman
 
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...vasuballa
 
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012HughPW
 
Cloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSCloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSGeoCirrus
 
Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011chelm
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
 
David Overton: GIS in the cloud
David Overton: GIS in the cloudDavid Overton: GIS in the cloud
David Overton: GIS in the cloudAGI Geocommunity
 

Andere mochten auch (11)

Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?
 
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCGIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
 
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
 
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
 
GIS and the Cloud
GIS and the CloudGIS and the Cloud
GIS and the Cloud
 
Cloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSCloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUS
 
Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
David Overton: GIS in the cloud
David Overton: GIS in the cloudDavid Overton: GIS in the cloud
David Overton: GIS in the cloud
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Ähnlich wie Big Data - in the cloud or rather on-premises?

Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatestrajramab
 
Windows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongWindows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongSpiffy
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...AFAS Software
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010DavidGristwood
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBill Wilder
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongSpiffy
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudDavid Chou
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the CloudNigel Watson
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 

Ähnlich wie Big Data - in the cloud or rather on-premises? (20)

Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatest
 
Windows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongWindows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan Wong
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows Azure
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan Wong
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The Cloud
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the Cloud
 
India Webinar
India WebinarIndia Webinar
India Webinar
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 

Mehr von Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Mehr von Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Big Data - in the cloud or rather on-premises?

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Big Data - in der Cloud oder doch lieber On-Premises? Guido Schmutz Kassel, 21.9.2017 @gschmutz guidoschmutz@wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 2
  • 3. Agenda 1. Cloud Primer 2. Big Data and IoT Architecture 3. Big Data in the Cloud 4. Various Models for Big Data in Cloud 5. Big Data On-Premises 6. Hybrid Big Data Solutions
  • 5. Cloud Primer 5 Instance • the thing running in the cloud provider’s infrastructure • can be a VM but does not have to be Instance Type • the size of the instance (Combination of CPU, Memory, Disk Storage => Cost) • Azure: Instance sizes Instance Control • lifecycle of an instance • Instances can be stopped or terminated (deleted)
  • 6. Cloud Primer 6 Images • the template used for provisioning an instance Serverless • Run code “without” servers => only specify functions (Java, C#, Python, Node.js) • Pay only for the compute time you consume • easy scale-out • management and capacity planning decision done by provider Regions and Availability Zones • represents geographic distribution of cloud provider • Regions are the geographic areas where a service is offered • Availability Zones (AZ) add high availability within a Region • communication within AZ in same region cost less than across regions
  • 7. Cloud Primer – Specific Instances 7 On-Demand Instance • flexible, on-demand usage • billing increment dependent on provider Temporary Instance • can disappear at any time (bid price) • are charged significantly less • well suited for Hadoop workloads (if storage and compute are separated) • AWS: spot instances Reserved Instance • reserved capacity in advance • reduced pricing (up to 75% to on-demand) Dedicated Instance • pay for instances • run on hardware dedicated to you • Amazon decides placement Dedicated Host • pay for entire physical server • full flexibility of placement of instances (VM) • solves existing server-bound licenses issues Bare Metal • bare hardware resources, no virtualization by cloud provider • full flexibility / full control • almost no automation provided
  • 8. Cloud Primer - Storage 8 Block Storage • most common type offered by a cloud provider • disk-like storage • comes with each instance when provisioned • accessed as filesystem mounts => volumes, disks • persistent volumes survive beyond lifetime of instance that spawned it • ephemeral volumes are limited to life of instance to which they are attached • AWS: EBS • Azure: VHDS & Azure File Storage • Oracle: Block Storage Object Storage • each chunk of data is treated as its own entity independent of any instance • content of each object is opaque to the provider • API or URL is used to access data (no mount) • well suited for Big Data • hot and cold storage options • AWS: S3 & Glacier • Azure: Azure Blob Storage • Oracle: Object Storage & Archive Storage
  • 9. Cloud Primer – Usage Patterns 9 Short Lived (Transient) 👍 Minimal maintenance, high efficiency 👎 spin up time, higher resource demand 👎 data transfer to permanent storage Self-Service 👍 efficiency of on-demand creation 👎 need to maintain tooling Cloud-Only 👍 data transfer stay within cloud, minimal on- premises costs, integration with provider 👎 higher cloud expenditure Long lived (Long Running) 👍 less time waiting for clusters to start/stop 👍 lower resource demand 👎 wasted idle time (if there is) 👎 maintenance burden, growing size over time Managed 👍 easy alignment with budget constraints 👎 waiting time for usage, admin effort Hybrid 👍 lower cloud expenditure, local resources available 👎 complex workflows, data transfer costs
  • 10. 10 Big Data & IoT Architecture
  • 11. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 12. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Cloud / On-PremisesEdge Internet / Cloud / On-Premises
  • 13. 1) Bulk Source – Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 14. 2) Bulk Source - Edge & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 15. 3) Event Source – Stream & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 16. 4) Event Source – Edge & Stream & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 17. 5) Stream Ingestion – Edge & Stream Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 18. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 19. 20 Big Data in the Cloud
  • 20. Big Data in the Cloud – two usage patterns 21 Short Lived Cluster (Transient) data is repurposed, and used for a specific use case in a specific workload Cluster spun up only when needed Flexibility • spin up arbitrary number of nodes quickly • Expand quickly from very small to very large Simplicity • use as is, solve problem and move on Long Lived Cluster (Long Running) data is acquired and augmented continuously cluster is in permanent use for mixed workloads Performance • Raw compute performance across wide range of workloads • time of availability
  • 21. BDaaS – Possible Cost Optimizations 22 Autoscaling • scale up when a query comes in • scale down when jobs finish • match utilization with job demand • benchmark: auto-scaling saves 33% in compute costs compared to static- sized cluster Excess capacity • Hadoop is fault tolerant, can take advantage of unreliable instances such as temporary instances • benchmark: if 50% is done on spot nodes, save 80% compared to normal nodes Common workload distribution with Big Data applications
  • 22. Data Locality vs. Compute/Storage Separation 23 Data Local Compute Separate Compute and Storage Worker #1 Disk Processing Master Node Worker #2 Disk Processing Worker #3 Disk Processing Network Storage Disk Disk Disk Compute #1 Processing Compute #2 Processing Compute #3 Processing Network Master Node Network Separation of compute and storage – the fundamental difference • store data in Object Storage instead of DFS • bring up Compute nodes only for data processing • multiple workloads on separate clusters can access same data
  • 23. A new way to Manage Big Data 24 Big Data Traditional Assumptions Bare-metal Data Locality HDFS on local disks Big Data A New Approach Containers and VMs Compute and storage separation Shared storage Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights
  • 24. 5 ½ ways to get Big Data in the Cloud 26 1. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on Bare Metal 2. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on VM 3. Hadoop PaaS from Cloud Provider’s Marketplace 4. Dedicated (Long-Running) BigData-as-a-Service 5. Elastic (Transient) Big-Data-as-a-Service (storage and compute separated) 6. “Cloud on Premises” (Cloud Stack from Vendors on Premises)
  • 25. 28 Various Models for Big Data in Cloud
  • 26. Various Models for Big Data in Cloud 29 1. Bare Metal Cloud (Bring Your Own Hadoop - BYOH) 2. IaaS with any Hadoop Distribution (Bring Your Own Hadoop) 3. PaaS with Hadoop (from Marketplace) 4. Dedicated (Long-Running) BDaaS 5. Elastic (Transient) BDaaS 6. BDaas + Analytics SaaS
  • 27. 1) Bare Metal Cloud (BYOH) 30 Compute (Bare Metal) Big Data (Custom) Oracle Compute Analytics (Custom) Storage (Bare Metal) Oracle Block Volume & Object Storage, Data Transfer Service Intelligence (Custom) Amazon Azure Oracle Custom n.a. (Dedicated Host close, but runs VMs) n.a. n.a. (Dedicated Host, close, but runs VMs) n.a. Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) Custom (Image-, Speech-Recognition, Bots, …)
  • 28. 2) IaaS (Bring Your Own Hadoop) 31 Amazon EC2 & EC2 Azure VM Bring Your Own Hadoop (BYOH) Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) General Purpose Compute & Dedicated Compute Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) S3, EBS, Glacier, Snowball, Snowball Edge, Snowmobile Storage (Blob), Data Lake Store, Import/Export Custom (Image-, Speech-Recognition, Bots, …) Custom (Image-, Speech-Recognition, Bots, …) Oracle Object & Archive Storage, Data Transfer Service Custom (Image-, Speech-Recognition, Bots, …) Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 29. 3) PaaS (Hadoop from Marketplace) 32 S3, EBS, Glacier, Snowball, Snowball Edge, Snowmobile Hadoop (Hortonworks, MapR) Hadoop (Cloudera, Hortonworks, MapR) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Custom (Image-, Speech-Recognition, Bots, …) Custom (Image-, Speech-Recognition, Bots, …) Oracle Object & Archive Storage, Data Transfer Service n.a. Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 30. 4) Dedicated BDaaS 33 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Image-, Speech- Recognition, Bots, … Image-, Speech- Recognition, Bots, … Oracle Object & Archive Storage, Data Transfer Service Big Data CS (Cloudera) Custom (SQL, Machine Learning, ..) Image-, Speech- Recognition, Bots, … Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 31. 5) Elastic BDaaS 34 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Image-, Speech- Recognition, Bots, … Image-, Speech- Recognition, Bots, … Oracle Object & Archive Storage, Data Transfer Service Big Data CS Compute Edition (Hortonworks) Custom (SQL, Machine Learning, ..) Image-, Speech- Recognition, Bots, … Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 32. 6) BDaaS + Analytics SaaS 35 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Machine Learning, Polly, … Machine Learning, Data Lake Analytics, … Amazon EC2 & EC2 Dedicated Hosts Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Alexa, Lex, Polly Cortana, Speech API, Computer Vision API, Video API, ... Oracle Object & Archive Storage, Data Transfer Service Big Data CS Compute Edition / Big Data CS Big Data Discovery CS, Analytics Cloud, Data Spatial & Graph n.a. Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 33. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Oracle Cloud 36 IoT CS Event Hub CS Stream Analytics Big Data CS NoSQL CS Big Data Discovery CS Big Data CS – Compute Object Storage Archive Storage Data Transfer Service Block Storage NoSQL CS Data Special & Graph Data Transfer Service BigData SQL Data Transfer Service NoSQL CS Event Hub CS Data Transfer Service Integration CS Messaging CS BI CS Process CS Mobile CS Container CS Application Container CS GoldenGate Visual Builder Big Data Preparation CS Data Visualization CS Oracle Data Integrator CS Analytics CS
  • 34. Amazon AWS Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Elastic MapReduce (EMR) Polly ML Lex Rekognition Kinesis Analytics Kinesis Streams Kinesis Firehose Snowmobile Snowball AWS IoT Platform Lambda Direct Connect S3 Glacier Dynamo DB EC2 Auto Scaling EBS EFS Alexa Athena Dynamo DB Snowball Direct Connect Snowball Edge Kinesis Firehose Athena Snowball Greengrass Rules Engine Lambda Redshift EC2 Container Service EC2 Container Registry Mobile Hub Mobile SDK Lambda SQSSNSEmail PinpointAPI Gateway Elasticsearch ElasticCache Dynamo DB Elasticsearch TensorFlow Glue Data pipeline QuickSight
  • 35. Microsoft Azure Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps HD Insight Storage Blob Machine Learning Data Lake Store Storage Block Data Lake Analytics Event Hub Stream Analytics IoT Suite Cosmos DB Import/Export Import/Export Speech API Vision API Cortana Bot Service Service Bus Notification Hub API Management Power BI BizTalk Services Event Hub IoT Hub IoT Edge SQL Data Warehouse Table Storage Redis Cache Functions Container Service Container Registry Cosmos DB Table Storage Container Instances Time Series Insight Time Series Insight Event Grid
  • 37. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps On-Premises – Oracle Cloud Machine 44 IoT CS Event Hub CS Stream Analytics Big Data CS NoSQL CS Big Data Discovery CS Big Data CS – Compute Object Storage Archive Storage Data Transfer Service Block Storage NoSQL CS Data Special & Graph Data Transfer Service BigData SQL Data Transfer Service NoSQL CS Event Hub CS Data Transfer Service Integration CS Messaging CS BI CS Process CS Mobile CS Container CS Application Container CS
  • 38. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps On Premises – Open Source 45
  • 39. 46 Hybrid Big Data Solutions
  • 40. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 47 Cloud On-PremOn-Prem/Edge/ Internet
  • 41. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 48 Cloud On-PremOn-Prem/Edge/ Internet
  • 42. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 49 CloudOn-Prem/Edge/ Internet On-Prem
  • 43. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 50 CloudOn-Prem/Edge