SlideShare ist ein Scribd-Unternehmen logo
1 von 115
Downloaden Sie, um offline zu lesen
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Introduction to Streaming Analytics
Guido Schmutz
Guido Schmutz
Working for Trivadis for more than 19 years
Oracle ACE Director for Fusion Middleware and SOA
Co-Author of different books
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 25 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Twitter: gschmutz
Our company.
© Trivadis – The Company3 03.06.16
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and and Open
Source technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
© Trivadis – The Company4 03.06.16
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1. Introduction & Foundation
2. Designing Streaming Analytics Solutions
3. Implementing Event Hub
4. Implementing Data Ingestion
5. Implementing Streaming Analytics
6. Scalability & Reliability
7. Streaming Analytics in Architecture
8. Summary
Introduction & Foundation
Big Data Definition (4 Vs)
+	Time	to	action	?	– Big	Data	+	Real-Time	=	Stream	Processing
Characteristics	of	Big	Data:	Its	Volume,	Velocity	
and	Variety	in	combination
The world is changing …
The model of Generating/Consuming Data has changed ….
Old Model: few companies are generating data, all others are consuming data
New Model: all of use are generating data, and all of us are consuming data
Who is generating Big Data?
The progress and innovation is no longer hindered by the ability to collect data
But by the ability to manage, analyze, summarize, visualize and discover knowledge
from the collected data in a timely manner and in a scalable fashion
Social	media	and	networks
(all	of	us	are	generating	data)
Scientific	instruments
(collecting	all	sorts	of	data)	
Mobile	devices	
(tracking	all	objects	all	the	time)
Sensor	technology	and	networks
(measuring	all	kinds	of	data)
Traditional Data Processing - Challenges
• Introduces too much “decision latency”
• Responses are delivered “after the fact”
• Maximum value of the identified situation is lost
• Decision are made on old and stale data
• “Data a Rest”
The New Era: Streaming Data Analytics / Fast Data
• Events are analyzed and processed in
real-time as the arrive
• Decisions are timely, contextual and
based on fresh data
• Decision latency is eliminated
• “Data in motion”
Real Time Analytics Use Cases
• Algorithmic Trading
• Online Fraud Detection
• Geo Fencing
• Proximity/Location Tracking
• Intrusion detection systems
• Traffic Management
• Recommendations
• Churn detection
• Internet of Things (IoT) / Intelligence
Sensors
• Social Media/Data Analytics
• Gaming Data Feed
• …
What happen in an internet minute
Internet Of Things – Sensors
are/will be everywhere
There are more devices tapping into the internet
than people on earth
How do we prepare our systems/architecture for
the future?
Source:	Cisco	Source:	The	Economist
Different Types of Stream/Event Processing
Simple Event Processing (SEP)
Event Stream Processing (ESP)
Different Types of Stream/Event Processing
Complex Event Processing (CEP)
Native Streaming vs.
Micro-Batching
Native Streaming
• Events processed as they
arrive
• + low-latency
• - throughput
• - fault tolerance is expensive
Micro-Batching
• Splits incoming stream in
small batches
• + high(er) throughput
• + easier fault tolerance
• - lower latency
Source:	 Distributed	 Real-Time	Stream	Processing:	
Why	and	How	by	Petr	Zapletal
How to design a Streaming Analytics Solution?
Event
Stream
event
Data	Ingestion
event
Persist
(Queue)
Event
Stream
event
Data	Ingestion
event
Analytics
event
Analytics
result
result
Event
Stream
event
Data	Ingestion/	
Analytics
result
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Designing Streaming Analytics
Solutions
How to design a Streaming Analytics System?
It usually starts very simple … just one data pipeline
Event
Stream
Analyticsevent
Data	
Ingestion
New Event Stream sources are added …
Event
Stream
Analytics
2nd Event
Stream
3rd Event
Stream
nth Event
Stream
event
event
event
event
Data	
Ingestion
2nd Data	
Ingestion
3rd Data	
Ingestion
Nth Data	
Ingestion
New Processors are interested in the events …
Event
Stream
Analytics
2nd Event
Stream
3rd Event
Stream
nth Event
Stream
2nd Analyticsevent
event
event
event
Data	
Ingestion
2nd Data	
Ingestion
3rd Data	
Ingestion
Nth Data	
Ingestion
… and the solution becomes the problem
Event
Stream
Analytics
2nd Event
Stream
3rd Event
Stream
nth Event
Stream
2nd Analytics
3rd Analytics
Nth
Analytics
event
event
event
event
Data	
Ingestion
2nd Data	
Ingestion
3rd Data	
Ingestion
Nth Data	
Ingestion
… and the solution becomes the problem
Event
Stream
Analytics
2nd Event
Stream
3rd Event
Stream
nth Event
Stream
2nd Analytics
3rd Analytics
Nth
Analytics
event
event
event
event
Data	
Ingestion
2nd Data	
Ingestion
3rd Data	
Ingestion
Nth Data	
Ingestion
… and the solution becomes the problem
New	
Customers
Operational
Logs
Click
Stream
Meter	
Readings
event
event
event
event
CDC	
Ingestion
Log	Ingestion
Click	Stream	
Ingestion
Senor	
Ingestion
Hadoop/Data	
Warehouse
Recommendation	
System
Log	Search
Fraud	Detection
Decouple event streams from consumers
„Unified	Log“
Remember	 Enterprise	
Service	 Bus	(ESB)	?
Enterprise	 Event	Bus Event	Stream	AnalyticsEvent	Stream	Ingestion
CDC	
Ingestion
Log	Ingestion
Click	Stream	
Ingestion
Senor	
Ingestion
Hadoop/Data	
Warehouse
Recommendation	
System
Log	Search
Fraud	Detection
What	is	the	
idea	of	a
Unified	Log?
New	
Customers
Operational
Logs
Click
Stream
Meter	
Readings
Unified Log – What is it?
By Unified Log, we do not mean this ….
137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114
137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 -
137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 -
137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809
… but this
• a structured log (records are numbered beginning with 0 based on order they are written)
• aka. commit log or
journal
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1st
record Next	record
written
Central Unified Log for (real-time) subscription
Take all the organization’s data (events)
and put it into a central log for subscription
Properties of the Unified Log:
• Unified: “Enterprise”, single
deployment
• Append-Only: events are appended,
no update in place => immutable
• Ordered: each event has an offset,
which is unique within a shard
• Fast: should be able to handle
thousands of messages / sec
• Distributed: lives on a cluster of
machines
0 1 2 3 4 5 6 7 8 9
1
0
1
1
reads
writes
Collector
Consumer	
System	A
(time	=	6)
Consumer
System	B
(time	=	10)
reads
Implementing Event Bus
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity
stream data (logs, metrics collections, social
media streams, …)
Initially developed at LinkedIn, now part of
Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• “A unified platform for handling all the real-time data feeds a large company might
have.”
Must haves
• High throughput to support high volume event feeds.
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems.
• Support low-latency delivery to handle more traditional messaging use cases.
• Guarantee fault-tolerance in the presence of machine failures.
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor1 2 3 4 5 6
Truck
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor
Partition	0
1 2 3 4 5 6
Partition	0
1 2 3 4 5 6
Partition	1 Movement
Processor
Truck
Apache
Kafka
Kafka Broker
Movement
Processor
Truck
Movement	Topic
Engine-Metrics	Topic
Engine
Processor
P	0
Movement
Processor
1 2 3 4 5
P	1 1 2 3 4 5
Kafka Broker
Movement	Topic
Engine-Metrics	Topic
P	0 1 2 3 4 5
P	1 1 2 3 4 5
P	0 1 2 3 4 5
P	0 1 2 3 4 5
Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and
sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer	 group	C1
Apache Kafka - Performance
Kafka at LinkedIn => over 1100 brokers / 60 clusters
Kafka Performance at own setup => 6 brokers (VM) / 1 cluster
• 445’622 messages/second
• 31 MB / second
• 3.0405 ms average latency between producer / consumer
800	billion
messages/day
175	TB	produced/day
650	TB	consumed/day
13	million	messages/second
2.75	GB	/	second
at	busiest	time	of	day
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Demo: Consuming Kafka Topic
Demo: Monitoring Kafka Cluster with Kafka Manager
Implementing Data Ingestion
StreamSets Data Collector
• Founded by ex-Cloudera, Informatica
employees
• Continuous open source, intent-driven,
big data ingest
• Visible, record-oriented approach fixes
combinatorial explosion
• Batch or stream processing
• Standalone, Spark cluster, MapReduce
cluster
• IDE for pipeline development by ‘civilians’
• Relatively new - first public release
September 2015
• So far, vast majority of commits are from
StreamSets staff
Apache NiFi
• Originated at NSA as Niagarafiles
• Open sourced December 2014, Apache
TLP July 2015
• Opaque, file-oriented payload
• Distributed system of processors with
centralized control
• Based on flow-based programming
concepts
• Data Provenance
• Web-based user interface
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Demo: Using Apache NiFi for Collection
Implementing Streaming Analytics
Streaming Analytics
Product
Framework	/	Infrastructure
Open	Source Closed	Source
Implementing Streaming Analytics:
Oracle Stream Analytics
History of Oracle Stream Analytics
Oracle	Complex	Event	
Processing	 (OCEP)
Oracle	Event	Processing	 (OEP)
Oracle	Stream	Explorer	(SX)
Oracle	Event	Processing	
for	Java	Embedded
Oracle	Stream	Analytics	(OSA)
Oracle	Edge	Analytics	(OAE)
BEA	Weblogic Event	Server
Oracle	CQL
Oracle	IoT Cloud	Service
2016
2015
2007
2008
2012
2013
OEA
• Filtering
• Correlation
• Aggregation
• Pattern
matching
Devices /
Gateways
Services
Computing Edge Enterprise
“Sea of data”
Macro-event
High-value
Actionable
In-context
EDGE
Analytics
Stream	
Analytics
FOG
• High Volume
• Continuous Streaming
• Extreme Low Latency
• Disparate Sources
• Temporal Processing
• Pattern Matching
• Machine Learning
Oracle Stream Analytics: From Noise to Value
• High	Volume
• Continuous	 Streaming
• Sub-Millisecond	 Latency
• Disparate	 Sources
• Time-Window	 Processing
• Pattern	 Matching
• High	Availability	 /	Scalability
• Coherence	 Integration	
• Geospatial,	 Geofencing
• Big	Data	Integration
• Business	 Event	Visualization
• Action!
Oracle Stream Analytics Platform
What it does
• Compelling, friendly and visually stunning real time
streaming analytics user experience for Business users to
dynamically create and implement Instant Insight solutions
Key Features
• Analyze simulated or live data feeds to determine event
patterns, correlation, aggregation & filtering
• Pattern library for industry specific solutions
• Streams, References, Maps & Explorations
Benefits
• Accelerated delivery time
• Hides all challenges & complexities of underlying real-time
event-driven infrastructure
Oracle Stream Analytics - Connecting Everything &
Anything of Interest to the Business
Understanding of CQL Filtering, Correlation, Pattern: NOT NEEDED
Understanding of IT Deployment and Management: NOT NEEDED
Understanding of Development, Java, Best Practices: NOT NEEDED
Understanding of the Event Driven Platform: NOT NEEDED
Business accessibility to Geo-Streaming Analytics
Real Time Streaming Solutions face an increasing need to track "assets of interest" and
initiate actions based on encroachment of boundary proximity to fixed and moving
objects and other geographic, temporal, or event conditions.
Geo-Fence,	Fence,	Polygon
Geo-Streaming
“	Add	value	to	your	real	time	streaming	 data	discovery	and	analytics	 by	applying	 and	including	
mathematical,	 statistical	 analysis	 to	the	live	output	stream”	
“These	 streaming	 “Excel	 spreadsheets”	 really	 do	come	to	life”
Expression Builder enabling calculation for the Business
User
Concept of Connections & Connection Reuse in
Streams
Decision Table for Nested IF-THEN-ELSE Rules
Topology View and Navigation
Stream Analytics – Terminology for Business Users
Explorer: The Application User Interface Catalog: The repository for browsing resources
Stream Analytics – Terminology for Business Users
Stream: incoming flow of events that you
want to analyze (CSV, Kafka, JMS, Rest,
MQTT, …)
Exploration: application that correlates events
from streams and data sources, using filters,
groupings, summaries, ranges, and more
Stream Analytics – Terminology for Business Users
Shape: A blueprint of an event in a stream or
data in a data source. How the business data
is represented in the selected stream
Map: collection of geo-fences
Reference: A connection to static data that is
joined to a stream to enrich it and/or to be used in
business logic and output
Stream Analytics – Terminology for Business Users
Pattern: A pre-built Exploration that
addresses a particular business scenario in a
focused and simplified User Interface
Connection: collection of metadata required to
connect to an external system
Targets: defines an interface with a downstream
system
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Demo: Oracle Stream Analytics
Demo: Oracle Stream Analytics
Demo: Oracle Stream Analytics
Demo: Oracle Stream Analytics
Implementing Streaming Analytics:
Spark Streaming
Apache Spark
Apache Spark is a fast and general engine for large-scale data processing
• The hot trend in Big Data!
• Originally developed 2009 in UC Berkley’s AMPLab
• Based on 2007 Microsoft Dryad paper
• Written in Scala, supports Java, Python, SQL and R
• Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x
faster on disk
• One of the largest OSS communities in big data with over 200 contributors in 50+
organizations
• Open Sourced in 2010 – since 2014 part of Apache Software foundation
Apache Spark
Spark	SQL
(Batch	Processing)
Blink	DB
(Approximate
Querying)
Spark	Streaming
(Real-Time)
MLlib,	Spark	R
(Machine	Learning)
GraphX
(Graph	Processing)
Spark	Core	API	and	Execution	Model
Spark
Standalone
MESOS YARN HDFS
Elastic
Search
NoSQL S3
Libraries
Core	Runtime
Cluster	Resource	Managers Data	Stores
Resilient Distributed Dataset (RDD)
Are
• Immutable
• Re-computable
• Fault tolerant
• Reusable
Have Transformations
• Produce new RDD
• Rich set of transformation available
• filter(), flatMap(), map(),
distinct(), groupBy(), union(),
join(), sortByKey(),
reduceByKey(), subtract(), ...
Have Actions
• Start cluster computing operations
• Rich set of action available
• collect(), count(), fold(),
reduce(), count(), …
RDD RDD
Input Source
• File
• Database
• Stream
• Collection
.count() ->	100
Data
Partitions RDD
Data
Partition	0
Partition	1
Partition	2
Partition	3
Partition	4
Partition	5
Partition	6
Partition	7
Partition	8
Partition	9
Server	1
Server	2
Server	3
Server	4
Server	5
Partitions RDD
Data
Partition	0
Partition	1
Partition	2
Partition	3
Partition	4
Partition	5
Partition	6
Partition	7
Partition	8
Partition	9
Server	1
Server	2
Server	3
Server	4
Server	5
Partitions RDD
Data
Partition	0
Partition	1
Partition	2
Partition	3
Partition	4
Partition	5
Partition	6
Partition	7
Partition	8
Partition	9
Server	2
Server	3
Server	4
Server	5
Stage 1 – reduceByKey()
Stage 1 – flatMap() + map()
Spark Workflow Input	HDFS	File
HadoopRDD
MappedRDD
ShuffledRDD
Text	File	Output
sc.hapoopFile()
map()
reduceByKey()
sc.saveAsTextFile()
Transformations
(Lazy)
Action	
(Execute	
Transformations)
Master
MappedRDD
P0
P1
P3
ShuffledRDD
P0
MappedRDD
flatMap()
DAG	
Scheduler
Spark Execution Model
Data	
Storage
Worker
Master
Executer
Executer
Server
Executer
Stage 1 – flatMap() + map()
Spark Execution Model
Data	
Storage
Worker
Master
Executer
Data	
Storage
Worker
Executer
Data	
Storage
Worker
Executer
RDD
P0
P1
P3
Narrow	TransformationMaster
filter()
map()
sample()
flatMap()
Data	
Storage
Worker
Executer
Stage 2 – reduceByKey()
Spark Execution Model
Data	
Storage
Worker
Executer
Data	
Storage
Worker
Executer
RDD
P0
Wide	Transformation
Master
join()
reduceByKey()
union()
groupByKey()
Shuffle	!
Data	
Storage
Worker
Executer
Data	
Storage
Worker
Executer
Batch vs. Real-Time Processing
Petabytes	of	Data
Gigabytes
Per	Second
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
Discretized Stream (DStream)
Kafka
Truck
Truck
Truck
Discrete	by	time
Individual	Event
DStream =	RDD
Discretized Stream (DStream)
DStream DStream
X	Seconds
Transform
.countByValue()
.reduceByKey()
.join
.map
Discretized Stream (DStream)
time	1 time	2 time	3
message
time	n….
f(message 1)
RDD	@time	1
f(message 2)
f(message n)
….
message 1
RDD	@time	1
message 2
message n
….
result 1
result 2
result n
….
message message message
f(message 1)
RDD	@time	2
f(message 2)
f(message n)
….
message 1
RDD	@time	2
message 2
message n
….
result 1
result 2
result n
….
f(message 1)
RDD	@time	3
f(message 2)
f(message n)
….
message 1
RDD	@time	3
message 2
message n
….
result 1
result 2
result n
….
f(message 1)
RDD	@time	n
f(message 2)
f(message n)
….
message 1
RDD	@time	n
message 2
message n
….
result 1
result 2
result n
….
Input	Stream
Event	DStream
MappedDStream
map()
saveAsHadoopFiles()
Time	Increasing
DStreamTransformation	Lineage
Actions	Trigger	
Spark	Jobs
Adapted	from	Chris	Fregly: http://slidesha.re/11PP7FV
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Implementing Streaming Analytics:
Apache Storm
Apache Storm
A platform for doing analysis on streams of data as they come in, so you can react to
data as it happens.
• highly distributed real-time computation system
• Provides general primitives to do
real-time computation
• To simplify working with queues & workers
• scalable and fault-tolerant
Originated at Backtype, acquired by Twitter in 2011
Open Sourced late 2011
Part of Apache since September 2013
Apache Storm – Core concepts
Tuple
• Immutable Set of Key/value pairs
Stream
• an unbounded sequence of tuples that can be processed in parallel by Storm
Topology
• Wires data and functions via a DAG (directed acyclic graph)
• Executes on many machines similar to a MR job in Hadoop
Spout
• Source of data streams (tuples)
• can be run in “reliable” and “unreliable” mode
Bolt
• Consumes 1+ streams and produces new streams
• Complex operations often require multiple
steps and thus multiple bolts
Spout
Spout
Bolt
Bolt
Bolt
Bolt
Source	of	
Stream	B
Subscribes:	A
Emits:		C
Subscribes:	A
Emits:		D
Subscribes:	A	&	B
Emits:		-
Subscribes:	C	&	D
Emits:		-
T T T T T T T T
Demo Use Case – Truck Sensors
Truck
Data	
Ingestion
Geo-Fencing
2016-06-02	 14:39:56.605|98|27|Mark	
Lochbihler|803014426|Wichita	 to	
Little Rock	 Route 2|Normal|38.65|-
90.21|5187297736652502631
{"timestamp":	 "2016-06-02	
14:39:56.991",	"truckId":	 99,	
"driverId":	 31,	"driverName":	
"Rommel	 Garcia",	 "routeId":	
1565885487,	 "routeName":	
"Springfield	 to	KC	Via	Hanibal",	
"eventType":	"Normal",	 "latitude":	
37.16,	"longitude":	 "-94.46",	
"correlationId":	
5187297736652502631}
Reckless	Driving	
Detector
NEAR
ENTER
Truck
Driver
DashboardMovement Movement
JSON
Reckless
Driver
Apache Storm – How does it work ?
Geo
Hashing
Trucks	
Movement
Geo
Hashing
{		"timestamp"	 :	"2016-06-02
Shuffle
Grouping
Geo
Hashing
{		"timestamp"	 :	"2016-06-02	
12:56:02.362",		"truckId"	 :	35,		"driverId"	 :	
26,		"driverName"	 :	"Michael	 Aube",	 	
"routeId"	 :	1090292248,	 "eventType"	 :	
"Normal",	 	"latitude"	 :	40.86,		"longitude"	 :	
"-89.91"}
Truck
Movement
{		"timestamp"	 :	"2016-06-02
“geohash”	 :	“dp206n3d“,
Apache Storm – How does it work ?
Geo
Hashing
Trucks	
Movement
GeoFencer
Geo
Hashing
GeoFencer
Geo
Hashing
Shuffle
Grouping
Fields
Grouping
Truck
Movement
{		"timestamp"	 :	"2016-06-02
{		"timestamp"	 :	"2016-06-02	
12:56:02.362",		"truckId"	 :	35,		"driverId"	 :	
26,		"driverName"	 :	"Michael	 Aube",	 	
"routeId"	 :	1090292248,	 "eventType"	 :	
"Normal",	 	"latitude"	 :	40.86,		"longitude"	 :	
"-89.91"}
{		“geohash”	 :	“dp206n3d“,	 "timestamp"	 :	
"2016-06-02	 12:56:02.362",		"truckId"	 :	35,		
"driverId"	 :	26,		"driverName"	 :	"Michael	
Aube",	 	"routeId"	 :	1090292248,	
"eventType"	 :	"Normal",	 	"latitude"	 :	40.86,		
"longitude"	 :	"-89.91"}
{		“geohash”	 :	“f00hfh99“,	 ..
{		"timestamp"	 :	"2016-06-02
Apache Storm – How does it work ?
Geo
Hashing
Trucks
Movement
GeoFencer
Geo
Hashing
GeoFencer
Alerter
Geo
Hashing
Shuffle
Grouping
Fields
Grouping
Global
Grouping
Truck
Movement
{		"timestamp"	 :	"2016-06-02
{		"timestamp"	 :	"2016-06-02	
12:56:02.362",		"truckId"	 :	35,		"driverId"	 :	
26,		"driverName"	 :	"Michael	 Aube",	 	
"routeId"	 :	1090292248,	 "eventType"	 :	
"Normal",	 	"latitude"	 :	40.86,		"longitude"	 :	
"-89.91"}
{		“geohash”	 :	“dp206n3d“,	 "timestamp"	 :	
"2016-06-02	 12:56:02.362",		"truckId"	 :	35,		
"driverId"	 :	26,		"driverName"	 :	"Michael	
Aube",	 	"routeId"	 :	1090292248,	
"eventType"	 :	"Normal",	 	"latitude"	 :	40.86,		
"longitude"	 :	"-89.91"}
{		"timestamp"	 :	"2016-06-02
{	"timestamp"	 :	"2016-06-02	 12:56:02.362",		
"truckId"	 :	35,		"driverId"	 :	26,	 	"latitude"	 :	
40.86,		"longitude"	 :	"-89.91"}
{		“geohash”	 :	“f00hfh99“,	 ..
Apache Storm – Core concepts
Each Spout or Bolt are running N instances in parallel
GeoHashing
nth
Trucks	
Movement
GeoFencing
nth
GeoHashing
GeoFencing
1st
Shuffle Fields
Shuffle	grouping is	random	grouping
Fields	grouping is	grouped	by	value,	such	that	equal	value	results	in	equal	task
All	grouping replicates	to	all	tasks
Global	grouping makes	all	tuples go	to	one	task
None	grouping makes	bolt	run	in	the	same	thread	as	bolt/spout	it	subscribes	to
Direct	grouping producer	(task	that	emits)	controls	which	consumer	will	receive
Local or	Shuffle	
grouping
similar	to	the	shuffle	grouping	but	will	shuffle	tuples	among	bolt	tasks	running	in	the	same	
worker	process,	if	any.	Falls	back	to	shuffle	grouping behavior.
ReportGlobal
Scalability & Reliability
How to scale a Streaming Analytics System?
Queue	
(Persist)
Event
Stream
event
Collecting
Thread	1 event event
Processing
Thread	1 result
Collecting
Thread	2
Processing
Thread	2
event event event result
Collecting
Thread	n
Processing
Thread	n
Collecting
Process	1
Collecting
Process	1
Collecting
Process	1
Collecting
Process	1
Collecting
Process	1
How to scale a Streaming Analytics System?
Queue	1	
(Persist)
Event
Stream
event
Collecting
Thread	1
event event
Processing
Process	1 result
Collecting
Thread	1
Processing
Process	1
Queue	2	
(Persist)event
event event result
Processing
Process	1
Queue	n
(Persist)
Collecting
Process	1
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Processing	 A
Process	 1
Processing	 B
Process	 1
How to scale a Streaming Analytics System?
Event
Stream
Collecting
Process	1
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
Processing	 A
Thread	 1
Q1
e
Processing	 B
Thread	 1
Q1
e
Processing	 A
Process	 2
Processing	 A
Thread	 n
Qn
e
How to make Streaminig Analytics System reliable?
Faults and stragglers inevitable in large clusters running big data applications
Streaming applications must recover from them quickly
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Event
Stream
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Event
Stream
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
How to deal with “Stragglers”
Consumer goes slow
Backpressure Queue upDrop data
Other jobs grind
to a halt L
Run out of
memory L
Spill to diskNo thanks L
How to make Streaming Analytics System reliable?
Solution 1: using active/passive system (hot replication)
• Both systems process the full load
• In case of a failure, automatically switch and use the “passive” system
• Stragglers slow down both active and passive system
Stat
e
=	State	in-memory	and/or	on-disk
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Event
Stream
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
Active
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
Passive
Stat
e
Stat
e
How to make Streaming Analytics System reliable?
Solution 2: Upstream backup
• Nodes buffer sent messages and reply them to new node in case of failure
• Stragglers are treated as failures
State =	State	in-memory	and/or	on-disk
buffer =	Buffer	for	replay	in-memory	and/or	on-disk
Collecting
Process	2
Processing	 A
Process	 2
Processing	 B
Process	 2
Event
Stream
Collecting
Process	2
Processing	 A
Thread	 2
Q2
e
Processing	 B
Thread	 2
Q2
e
State
Message Delivery Semantics
At most once [0,1]
• Messages my be lost
• Messages never redelivered
At least once [1 .. n]
• Messages will never be lost
• but messages may be redelivered
(might be ok if consumer can handle
it)
Exactly once [1]
• Messages are never lost
• Messages are never redelivered
• Perfect message delivery
• Incurs higher latency for transactional
semantics
Streaming Analytics in Architecture
“Traditional Architecture” for Big Data
Data
Collection
(Analytical)	Data	Processing
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
Batch
compute
Stage
Result	Store
Query
Engine
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Streaming Analytics Architecture for Big Data
aka. (Complex) Event Processing)
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
=	Data	in	Motion =	Data	at	Rest
Keep raw event data
Data
Collection
Batch
compute
Data
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
=	Data	in	Motion =	Data	at	Rest
(Analytical)	Batch	Data	Processing
Raw	Data	
(Reservoir)
“Lambda Architecture” for Big Data
Data
Collection
(Analytical)	Batch	Data	Processing
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
“Kappa Architecture” for Big Data
Data
Collection
“Raw	Data	Reservoir”
Batch
compute
Data
Sources
Messaging
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
Logfiles
Sensor
RDBMS
ERP
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Result	Store
Messaging
Result	Store
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Computed	
Information
“Unified Architecture” for Big Data
Data
Collection
(Analytical)	Batch	Data	Processing	(Calculate	
Models	of	incoming	data)
Batch
compute
Result	StoreData
Sources
Channel
Data
Consumer
Reports
Service
Analytic
Tools
Alerting
Tools
Social
RDBMS
Sensor
ERP
Logfiles
Mobile
Machine
(Analytical)	Real-Time	Data	Processing
Stream/Event	Processing
Batch
compute
Messaging
Result	Store
Query
Engine
Result	Store
Computed	
Information
Raw	Data	
(Reservoir)
=	Data	in	Motion =	Data	at	Rest
Prediction	
Models
Summary
Summary
More and more use cases (such as IoT) make Streaming Analytics necessary
Treat events as events! Infrastructures for handling lots of events are available!
Platforms such as Oracle Stream Analytics enable the business to work directly on
streaming data (empower the business analyst) => User Experience of an Excel Sheet
on streaming data
Platform such as Apache Strom and Apache Spark Streaming provide a highly-scalable
and fault-tolerant infrastructure for streaming analytics => Oracle Stream Analytics can
use Spark Streaming as the runtime infrastructure
Platforms such as Kafka provide a high volume event broker infrastructure, a.k.a. Event
Hub
Comparison
Oracle	Stream Analytics Spark	Streaming Spark	Storm
Community n.a. >	280	contributors > 100	contributors
Language Options Java,	CQL Java,	Scala, Python Java,	Clojure, Scala,	…
Processing	Models Event-Streaming Micro-Batching Event-Streaming
Processing DSL Yes Yes No
Stateful Ops Yes Yes No
Pattern	detection Yes No No
Scalability	&	Reliability limited yes yes
Distributed RPC No No Yes
Delivery	Guarantees At Least	Once Exactly Once At	most	once /	At	least	once
Latency sub-second seconds sub-second
”self-service”	for	Biz Yes No No
Platform OEP server,	Spark	
Streaming	(YARN,	Mesos)
YARN,	Mesos Standalone,	
DataStax EE
Storm Cluster,	YARN
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com

Weitere ähnliche Inhalte

Was ist angesagt?

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeGuido Schmutz
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 

Was ist angesagt? (20)

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 

Andere mochten auch

Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...DataStax Academy
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data ArchitecturesGuido Schmutz
 

Andere mochten auch (6)

Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Big Data Technology
Big Data TechnologyBig Data Technology
Big Data Technology
 
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...
 
Big Data Architectures
Big Data ArchitecturesBig Data Architectures
Big Data Architectures
 

Ähnlich wie Introduction to Streaming Analytics

Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingGuido Schmutz
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Guido Schmutz
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...Grid Dynamics
 
IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?Guido Schmutz
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and OpportunitiesKenny Huang Ph.D.
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Data Distribution for the Event-Driven Business
Data Distribution for the Event-Driven BusinessData Distribution for the Event-Driven Business
Data Distribution for the Event-Driven BusinessSolace
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksGuido Schmutz
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big datawebwinkelvakdag
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysYork University
 

Ähnlich wie Introduction to Streaming Analytics (20)

Oracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream ProcessingOracle Stream Analytics - Simplifying Stream Processing
Oracle Stream Analytics - Simplifying Stream Processing
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?IoT Architecture - are traditional architectures good enough?
IoT Architecture - are traditional architectures good enough?
 
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinReal Time Analytics with Apache Cassandra - Cassandra Day Berlin
Real Time Analytics with Apache Cassandra - Cassandra Day Berlin
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Data Distribution for the Event-Driven Business
Data Distribution for the Event-Driven BusinessData Distribution for the Event-Driven Business
Data Distribution for the Event-Driven Business
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
 

Mehr von Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

Mehr von Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Kürzlich hochgeladen

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Kürzlich hochgeladen (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Introduction to Streaming Analytics

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Introduction to Streaming Analytics Guido Schmutz
  • 2. Guido Schmutz Working for Trivadis for more than 19 years Oracle ACE Director for Fusion Middleware and SOA Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 25 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz
  • 3. Our company. © Trivadis – The Company3 03.06.16 Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and and Open Source technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. © Trivadis – The Company4 03.06.16 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 5. Agenda 1. Introduction & Foundation 2. Designing Streaming Analytics Solutions 3. Implementing Event Hub 4. Implementing Data Ingestion 5. Implementing Streaming Analytics 6. Scalability & Reliability 7. Streaming Analytics in Architecture 8. Summary
  • 7. Big Data Definition (4 Vs) + Time to action ? – Big Data + Real-Time = Stream Processing Characteristics of Big Data: Its Volume, Velocity and Variety in combination
  • 8. The world is changing … The model of Generating/Consuming Data has changed …. Old Model: few companies are generating data, all others are consuming data New Model: all of use are generating data, and all of us are consuming data
  • 9. Who is generating Big Data? The progress and innovation is no longer hindered by the ability to collect data But by the ability to manage, analyze, summarize, visualize and discover knowledge from the collected data in a timely manner and in a scalable fashion Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  • 10. Traditional Data Processing - Challenges • Introduces too much “decision latency” • Responses are delivered “after the fact” • Maximum value of the identified situation is lost • Decision are made on old and stale data • “Data a Rest”
  • 11. The New Era: Streaming Data Analytics / Fast Data • Events are analyzed and processed in real-time as the arrive • Decisions are timely, contextual and based on fresh data • Decision latency is eliminated • “Data in motion”
  • 12. Real Time Analytics Use Cases • Algorithmic Trading • Online Fraud Detection • Geo Fencing • Proximity/Location Tracking • Intrusion detection systems • Traffic Management • Recommendations • Churn detection • Internet of Things (IoT) / Intelligence Sensors • Social Media/Data Analytics • Gaming Data Feed • …
  • 13. What happen in an internet minute
  • 14. Internet Of Things – Sensors are/will be everywhere There are more devices tapping into the internet than people on earth How do we prepare our systems/architecture for the future? Source: Cisco Source: The Economist
  • 15. Different Types of Stream/Event Processing Simple Event Processing (SEP) Event Stream Processing (ESP)
  • 16. Different Types of Stream/Event Processing Complex Event Processing (CEP)
  • 17. Native Streaming vs. Micro-Batching Native Streaming • Events processed as they arrive • + low-latency • - throughput • - fault tolerance is expensive Micro-Batching • Splits incoming stream in small batches • + high(er) throughput • + easier fault tolerance • - lower latency Source: Distributed Real-Time Stream Processing: Why and How by Petr Zapletal
  • 18. How to design a Streaming Analytics Solution? Event Stream event Data Ingestion event Persist (Queue) Event Stream event Data Ingestion event Analytics event Analytics result result Event Stream event Data Ingestion/ Analytics result
  • 19. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 21. How to design a Streaming Analytics System? It usually starts very simple … just one data pipeline Event Stream Analyticsevent Data Ingestion
  • 22. New Event Stream sources are added … Event Stream Analytics 2nd Event Stream 3rd Event Stream nth Event Stream event event event event Data Ingestion 2nd Data Ingestion 3rd Data Ingestion Nth Data Ingestion
  • 23. New Processors are interested in the events … Event Stream Analytics 2nd Event Stream 3rd Event Stream nth Event Stream 2nd Analyticsevent event event event Data Ingestion 2nd Data Ingestion 3rd Data Ingestion Nth Data Ingestion
  • 24. … and the solution becomes the problem Event Stream Analytics 2nd Event Stream 3rd Event Stream nth Event Stream 2nd Analytics 3rd Analytics Nth Analytics event event event event Data Ingestion 2nd Data Ingestion 3rd Data Ingestion Nth Data Ingestion
  • 25. … and the solution becomes the problem Event Stream Analytics 2nd Event Stream 3rd Event Stream nth Event Stream 2nd Analytics 3rd Analytics Nth Analytics event event event event Data Ingestion 2nd Data Ingestion 3rd Data Ingestion Nth Data Ingestion
  • 26. … and the solution becomes the problem New Customers Operational Logs Click Stream Meter Readings event event event event CDC Ingestion Log Ingestion Click Stream Ingestion Senor Ingestion Hadoop/Data Warehouse Recommendation System Log Search Fraud Detection
  • 27. Decouple event streams from consumers „Unified Log“ Remember Enterprise Service Bus (ESB) ? Enterprise Event Bus Event Stream AnalyticsEvent Stream Ingestion CDC Ingestion Log Ingestion Click Stream Ingestion Senor Ingestion Hadoop/Data Warehouse Recommendation System Log Search Fraud Detection What is the idea of a Unified Log? New Customers Operational Logs Click Stream Meter Readings
  • 28. Unified Log – What is it? By Unified Log, we do not mean this …. 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114 137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 - 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809 … but this • a structured log (records are numbered beginning with 0 based on order they are written) • aka. commit log or journal 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1st record Next record written
  • 29. Central Unified Log for (real-time) subscription Take all the organization’s data (events) and put it into a central log for subscription Properties of the Unified Log: • Unified: “Enterprise”, single deployment • Append-Only: events are appended, no update in place => immutable • Ordered: each event has an offset, which is unique within a shard • Fast: should be able to handle thousands of messages / sec • Distributed: lives on a cluster of machines 0 1 2 3 4 5 6 7 8 9 1 0 1 1 reads writes Collector Consumer System A (time = 6) Consumer System B (time = 10) reads
  • 31. Apache Kafka - Overview Distributed publish-subscribe messaging system Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) Initially developed at LinkedIn, now part of Apache Does not use JMS API and standards Kafka maintains feeds of messages in topics Kafka Cluster Consumer Consumer Consumer Producer Producer Producer
  • 32. Apache Kafka - Motivation LinkedIn’s motivation for Kafka was: • “A unified platform for handling all the real-time data feeds a large company might have.” Must haves • High throughput to support high volume event feeds. • Support real-time processing of these feeds to create new, derived feeds. • Support large data backlogs to handle periodic ingestion from offline systems. • Support low-latency delivery to handle more traditional messaging use cases. • Guarantee fault-tolerance in the presence of machine failures.
  • 33. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor1 2 3 4 5 6 Truck
  • 34. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor Partition 0 1 2 3 4 5 6 Partition 0 1 2 3 4 5 6 Partition 1 Movement Processor Truck
  • 35. Apache Kafka Kafka Broker Movement Processor Truck Movement Topic Engine-Metrics Topic Engine Processor P 0 Movement Processor 1 2 3 4 5 P 1 1 2 3 4 5 Kafka Broker Movement Topic Engine-Metrics Topic P 0 1 2 3 4 5 P 1 1 2 3 4 5 P 0 1 2 3 4 5 P 0 1 2 3 4 5
  • 36. Apache Kafka - Partition offsets Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset • Consumers track their pointers via (offset, partition, topic) tuples Consumer group C1
  • 37. Apache Kafka - Performance Kafka at LinkedIn => over 1100 brokers / 60 clusters Kafka Performance at own setup => 6 brokers (VM) / 1 cluster • 445’622 messages/second • 31 MB / second • 3.0405 ms average latency between producer / consumer 800 billion messages/day 175 TB produced/day 650 TB consumed/day 13 million messages/second 2.75 GB / second at busiest time of day http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines https://engineering.linkedin.com/kafka/running-kafka-scale
  • 38. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 40. Demo: Monitoring Kafka Cluster with Kafka Manager
  • 42. StreamSets Data Collector • Founded by ex-Cloudera, Informatica employees • Continuous open source, intent-driven, big data ingest • Visible, record-oriented approach fixes combinatorial explosion • Batch or stream processing • Standalone, Spark cluster, MapReduce cluster • IDE for pipeline development by ‘civilians’ • Relatively new - first public release September 2015 • So far, vast majority of commits are from StreamSets staff
  • 43. Apache NiFi • Originated at NSA as Niagarafiles • Open sourced December 2014, Apache TLP July 2015 • Opaque, file-oriented payload • Distributed system of processors with centralized control • Based on flow-based programming concepts • Data Provenance • Web-based user interface
  • 44. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 45. Demo: Using Apache NiFi for Collection
  • 49. History of Oracle Stream Analytics Oracle Complex Event Processing (OCEP) Oracle Event Processing (OEP) Oracle Stream Explorer (SX) Oracle Event Processing for Java Embedded Oracle Stream Analytics (OSA) Oracle Edge Analytics (OAE) BEA Weblogic Event Server Oracle CQL Oracle IoT Cloud Service 2016 2015 2007 2008 2012 2013
  • 50. OEA • Filtering • Correlation • Aggregation • Pattern matching Devices / Gateways Services Computing Edge Enterprise “Sea of data” Macro-event High-value Actionable In-context EDGE Analytics Stream Analytics FOG • High Volume • Continuous Streaming • Extreme Low Latency • Disparate Sources • Temporal Processing • Pattern Matching • Machine Learning Oracle Stream Analytics: From Noise to Value • High Volume • Continuous Streaming • Sub-Millisecond Latency • Disparate Sources • Time-Window Processing • Pattern Matching • High Availability / Scalability • Coherence Integration • Geospatial, Geofencing • Big Data Integration • Business Event Visualization • Action!
  • 51. Oracle Stream Analytics Platform What it does • Compelling, friendly and visually stunning real time streaming analytics user experience for Business users to dynamically create and implement Instant Insight solutions Key Features • Analyze simulated or live data feeds to determine event patterns, correlation, aggregation & filtering • Pattern library for industry specific solutions • Streams, References, Maps & Explorations Benefits • Accelerated delivery time • Hides all challenges & complexities of underlying real-time event-driven infrastructure
  • 52. Oracle Stream Analytics - Connecting Everything & Anything of Interest to the Business Understanding of CQL Filtering, Correlation, Pattern: NOT NEEDED Understanding of IT Deployment and Management: NOT NEEDED Understanding of Development, Java, Best Practices: NOT NEEDED Understanding of the Event Driven Platform: NOT NEEDED
  • 53. Business accessibility to Geo-Streaming Analytics Real Time Streaming Solutions face an increasing need to track "assets of interest" and initiate actions based on encroachment of boundary proximity to fixed and moving objects and other geographic, temporal, or event conditions. Geo-Fence, Fence, Polygon Geo-Streaming
  • 54. “ Add value to your real time streaming data discovery and analytics by applying and including mathematical, statistical analysis to the live output stream” “These streaming “Excel spreadsheets” really do come to life” Expression Builder enabling calculation for the Business User
  • 55. Concept of Connections & Connection Reuse in Streams
  • 56. Decision Table for Nested IF-THEN-ELSE Rules
  • 57. Topology View and Navigation
  • 58. Stream Analytics – Terminology for Business Users Explorer: The Application User Interface Catalog: The repository for browsing resources
  • 59. Stream Analytics – Terminology for Business Users Stream: incoming flow of events that you want to analyze (CSV, Kafka, JMS, Rest, MQTT, …) Exploration: application that correlates events from streams and data sources, using filters, groupings, summaries, ranges, and more
  • 60. Stream Analytics – Terminology for Business Users Shape: A blueprint of an event in a stream or data in a data source. How the business data is represented in the selected stream Map: collection of geo-fences Reference: A connection to static data that is joined to a stream to enrich it and/or to be used in business logic and output
  • 61. Stream Analytics – Terminology for Business Users Pattern: A pre-built Exploration that addresses a particular business scenario in a focused and simplified User Interface Connection: collection of metadata required to connect to an external system Targets: defines an interface with a downstream system
  • 62. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 63. Demo: Oracle Stream Analytics
  • 64. Demo: Oracle Stream Analytics
  • 65. Demo: Oracle Stream Analytics
  • 66. Demo: Oracle Stream Analytics
  • 68. Apache Spark Apache Spark is a fast and general engine for large-scale data processing • The hot trend in Big Data! • Originally developed 2009 in UC Berkley’s AMPLab • Based on 2007 Microsoft Dryad paper • Written in Scala, supports Java, Python, SQL and R • Can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk • One of the largest OSS communities in big data with over 200 contributors in 50+ organizations • Open Sourced in 2010 – since 2014 part of Apache Software foundation
  • 70. Resilient Distributed Dataset (RDD) Are • Immutable • Re-computable • Fault tolerant • Reusable Have Transformations • Produce new RDD • Rich set of transformation available • filter(), flatMap(), map(), distinct(), groupBy(), union(), join(), sortByKey(), reduceByKey(), subtract(), ... Have Actions • Start cluster computing operations • Rich set of action available • collect(), count(), fold(), reduce(), count(), …
  • 71. RDD RDD Input Source • File • Database • Stream • Collection .count() -> 100 Data
  • 75. Stage 1 – reduceByKey() Stage 1 – flatMap() + map() Spark Workflow Input HDFS File HadoopRDD MappedRDD ShuffledRDD Text File Output sc.hapoopFile() map() reduceByKey() sc.saveAsTextFile() Transformations (Lazy) Action (Execute Transformations) Master MappedRDD P0 P1 P3 ShuffledRDD P0 MappedRDD flatMap() DAG Scheduler
  • 77. Stage 1 – flatMap() + map() Spark Execution Model Data Storage Worker Master Executer Data Storage Worker Executer Data Storage Worker Executer RDD P0 P1 P3 Narrow TransformationMaster filter() map() sample() flatMap() Data Storage Worker Executer
  • 78. Stage 2 – reduceByKey() Spark Execution Model Data Storage Worker Executer Data Storage Worker Executer RDD P0 Wide Transformation Master join() reduceByKey() union() groupByKey() Shuffle ! Data Storage Worker Executer Data Storage Worker Executer
  • 79. Batch vs. Real-Time Processing Petabytes of Data Gigabytes Per Second
  • 84. Discretized Stream (DStream) DStream DStream X Seconds Transform .countByValue() .reduceByKey() .join .map
  • 85. Discretized Stream (DStream) time 1 time 2 time 3 message time n…. f(message 1) RDD @time 1 f(message 2) f(message n) …. message 1 RDD @time 1 message 2 message n …. result 1 result 2 result n …. message message message f(message 1) RDD @time 2 f(message 2) f(message n) …. message 1 RDD @time 2 message 2 message n …. result 1 result 2 result n …. f(message 1) RDD @time 3 f(message 2) f(message n) …. message 1 RDD @time 3 message 2 message n …. result 1 result 2 result n …. f(message 1) RDD @time n f(message 2) f(message n) …. message 1 RDD @time n message 2 message n …. result 1 result 2 result n …. Input Stream Event DStream MappedDStream map() saveAsHadoopFiles() Time Increasing DStreamTransformation Lineage Actions Trigger Spark Jobs Adapted from Chris Fregly: http://slidesha.re/11PP7FV
  • 86. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 88. Apache Storm A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. • highly distributed real-time computation system • Provides general primitives to do real-time computation • To simplify working with queues & workers • scalable and fault-tolerant Originated at Backtype, acquired by Twitter in 2011 Open Sourced late 2011 Part of Apache since September 2013
  • 89. Apache Storm – Core concepts Tuple • Immutable Set of Key/value pairs Stream • an unbounded sequence of tuples that can be processed in parallel by Storm Topology • Wires data and functions via a DAG (directed acyclic graph) • Executes on many machines similar to a MR job in Hadoop Spout • Source of data streams (tuples) • can be run in “reliable” and “unreliable” mode Bolt • Consumes 1+ streams and produces new streams • Complex operations often require multiple steps and thus multiple bolts Spout Spout Bolt Bolt Bolt Bolt Source of Stream B Subscribes: A Emits: C Subscribes: A Emits: D Subscribes: A & B Emits: - Subscribes: C & D Emits: - T T T T T T T T
  • 90. Demo Use Case – Truck Sensors Truck Data Ingestion Geo-Fencing 2016-06-02 14:39:56.605|98|27|Mark Lochbihler|803014426|Wichita to Little Rock Route 2|Normal|38.65|- 90.21|5187297736652502631 {"timestamp": "2016-06-02 14:39:56.991", "truckId": 99, "driverId": 31, "driverName": "Rommel Garcia", "routeId": 1565885487, "routeName": "Springfield to KC Via Hanibal", "eventType": "Normal", "latitude": 37.16, "longitude": "-94.46", "correlationId": 5187297736652502631} Reckless Driving Detector NEAR ENTER Truck Driver DashboardMovement Movement JSON Reckless Driver
  • 91. Apache Storm – How does it work ? Geo Hashing Trucks Movement Geo Hashing { "timestamp" : "2016-06-02 Shuffle Grouping Geo Hashing { "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "driverName" : "Michael Aube", "routeId" : 1090292248, "eventType" : "Normal", "latitude" : 40.86, "longitude" : "-89.91"} Truck Movement { "timestamp" : "2016-06-02 “geohash” : “dp206n3d“,
  • 92. Apache Storm – How does it work ? Geo Hashing Trucks Movement GeoFencer Geo Hashing GeoFencer Geo Hashing Shuffle Grouping Fields Grouping Truck Movement { "timestamp" : "2016-06-02 { "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "driverName" : "Michael Aube", "routeId" : 1090292248, "eventType" : "Normal", "latitude" : 40.86, "longitude" : "-89.91"} { “geohash” : “dp206n3d“, "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "driverName" : "Michael Aube", "routeId" : 1090292248, "eventType" : "Normal", "latitude" : 40.86, "longitude" : "-89.91"} { “geohash” : “f00hfh99“, .. { "timestamp" : "2016-06-02
  • 93. Apache Storm – How does it work ? Geo Hashing Trucks Movement GeoFencer Geo Hashing GeoFencer Alerter Geo Hashing Shuffle Grouping Fields Grouping Global Grouping Truck Movement { "timestamp" : "2016-06-02 { "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "driverName" : "Michael Aube", "routeId" : 1090292248, "eventType" : "Normal", "latitude" : 40.86, "longitude" : "-89.91"} { “geohash” : “dp206n3d“, "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "driverName" : "Michael Aube", "routeId" : 1090292248, "eventType" : "Normal", "latitude" : 40.86, "longitude" : "-89.91"} { "timestamp" : "2016-06-02 { "timestamp" : "2016-06-02 12:56:02.362", "truckId" : 35, "driverId" : 26, "latitude" : 40.86, "longitude" : "-89.91"} { “geohash” : “f00hfh99“, ..
  • 94. Apache Storm – Core concepts Each Spout or Bolt are running N instances in parallel GeoHashing nth Trucks Movement GeoFencing nth GeoHashing GeoFencing 1st Shuffle Fields Shuffle grouping is random grouping Fields grouping is grouped by value, such that equal value results in equal task All grouping replicates to all tasks Global grouping makes all tuples go to one task None grouping makes bolt run in the same thread as bolt/spout it subscribes to Direct grouping producer (task that emits) controls which consumer will receive Local or Shuffle grouping similar to the shuffle grouping but will shuffle tuples among bolt tasks running in the same worker process, if any. Falls back to shuffle grouping behavior. ReportGlobal
  • 96. How to scale a Streaming Analytics System? Queue (Persist) Event Stream event Collecting Thread 1 event event Processing Thread 1 result Collecting Thread 2 Processing Thread 2 event event event result Collecting Thread n Processing Thread n
  • 97. Collecting Process 1 Collecting Process 1 Collecting Process 1 Collecting Process 1 Collecting Process 1 How to scale a Streaming Analytics System? Queue 1 (Persist) Event Stream event Collecting Thread 1 event event Processing Process 1 result Collecting Thread 1 Processing Process 1 Queue 2 (Persist)event event event result Processing Process 1 Queue n (Persist)
  • 98. Collecting Process 1 Collecting Process 2 Processing A Process 2 Processing B Process 2 Processing A Process 1 Processing B Process 1 How to scale a Streaming Analytics System? Event Stream Collecting Process 1 Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e Processing A Thread 1 Q1 e Processing B Thread 1 Q1 e Processing A Process 2 Processing A Thread n Qn e
  • 99. How to make Streaminig Analytics System reliable? Faults and stragglers inevitable in large clusters running big data applications Streaming applications must recover from them quickly Collecting Process 2 Processing A Process 2 Processing B Process 2 Event Stream Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e Collecting Process 2 Processing A Process 2 Processing B Process 2 Event Stream Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e
  • 100. How to deal with “Stragglers” Consumer goes slow Backpressure Queue upDrop data Other jobs grind to a halt L Run out of memory L Spill to diskNo thanks L
  • 101. How to make Streaming Analytics System reliable? Solution 1: using active/passive system (hot replication) • Both systems process the full load • In case of a failure, automatically switch and use the “passive” system • Stragglers slow down both active and passive system Stat e = State in-memory and/or on-disk Collecting Process 2 Processing A Process 2 Processing B Process 2 Event Stream Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e Active Collecting Process 2 Processing A Process 2 Processing B Process 2 Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e Passive Stat e Stat e
  • 102. How to make Streaming Analytics System reliable? Solution 2: Upstream backup • Nodes buffer sent messages and reply them to new node in case of failure • Stragglers are treated as failures State = State in-memory and/or on-disk buffer = Buffer for replay in-memory and/or on-disk Collecting Process 2 Processing A Process 2 Processing B Process 2 Event Stream Collecting Process 2 Processing A Thread 2 Q2 e Processing B Thread 2 Q2 e State
  • 103. Message Delivery Semantics At most once [0,1] • Messages my be lost • Messages never redelivered At least once [1 .. n] • Messages will never be lost • but messages may be redelivered (might be ok if consumer can handle it) Exactly once [1] • Messages are never lost • Messages are never redelivered • Perfect message delivery • Incurs higher latency for transactional semantics
  • 104. Streaming Analytics in Architecture
  • 105. “Traditional Architecture” for Big Data Data Collection (Analytical) Data Processing Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine Batch compute Stage Result Store Query Engine Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 106. Streaming Analytics Architecture for Big Data aka. (Complex) Event Processing) Data Collection Batch compute Data Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store = Data in Motion = Data at Rest
  • 107. Keep raw event data Data Collection Batch compute Data Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store = Data in Motion = Data at Rest (Analytical) Batch Data Processing Raw Data (Reservoir)
  • 108. “Lambda Architecture” for Big Data Data Collection (Analytical) Batch Data Processing Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest
  • 109. “Kappa Architecture” for Big Data Data Collection “Raw Data Reservoir” Batch compute Data Sources Messaging Data Consumer Reports Service Analytic Tools Alerting Tools Social Logfiles Sensor RDBMS ERP Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Result Store Messaging Result Store Raw Data (Reservoir) = Data in Motion = Data at Rest Computed Information
  • 110. “Unified Architecture” for Big Data Data Collection (Analytical) Batch Data Processing (Calculate Models of incoming data) Batch compute Result StoreData Sources Channel Data Consumer Reports Service Analytic Tools Alerting Tools Social RDBMS Sensor ERP Logfiles Mobile Machine (Analytical) Real-Time Data Processing Stream/Event Processing Batch compute Messaging Result Store Query Engine Result Store Computed Information Raw Data (Reservoir) = Data in Motion = Data at Rest Prediction Models
  • 112. Summary More and more use cases (such as IoT) make Streaming Analytics necessary Treat events as events! Infrastructures for handling lots of events are available! Platforms such as Oracle Stream Analytics enable the business to work directly on streaming data (empower the business analyst) => User Experience of an Excel Sheet on streaming data Platform such as Apache Strom and Apache Spark Streaming provide a highly-scalable and fault-tolerant infrastructure for streaming analytics => Oracle Stream Analytics can use Spark Streaming as the runtime infrastructure Platforms such as Kafka provide a high volume event broker infrastructure, a.k.a. Event Hub
  • 113. Comparison Oracle Stream Analytics Spark Streaming Spark Storm Community n.a. > 280 contributors > 100 contributors Language Options Java, CQL Java, Scala, Python Java, Clojure, Scala, … Processing Models Event-Streaming Micro-Batching Event-Streaming Processing DSL Yes Yes No Stateful Ops Yes Yes No Pattern detection Yes No No Scalability & Reliability limited yes yes Distributed RPC No No Yes Delivery Guarantees At Least Once Exactly Once At most once / At least once Latency sub-second seconds sub-second ”self-service” for Biz Yes No No Platform OEP server, Spark Streaming (YARN, Mesos) YARN, Mesos Standalone, DataStax EE Storm Cluster, YARN
  • 114.