Stream Processing is a concept used to create a high-performance system for rapidly building applications that analyze and act on real-time streaming data. Benefits, amongst others, are faster processing and reaction to real-time complex event streams and the flexibility to quickly adapt to changing business and analytic needs. Big data, cloud, mobile and internet of things are the major drivers for stream processing and streaming analytics.
This session discusses the technical concepts of stream processing and how it is related to big data, mobile, cloud and internet of things. Different use cases such as predictive fault management or fraud detection are used to show and compare alternative frameworks and products for stream processing and streaming analytics.
The audience will understand when to use open source frameworks such as Apache Storm, Apache Spark or Esper, and powerful engines from software vendors such as IBM InfoSphere Streams or TIBCO StreamBase. Live demos will give the audience a good feeling about how to use these frameworks and tools.
The session will also discuss how stream processing is related to Hadoop and statistical analysis with software such as SAS, Apache Spark’s MLlib or R language.
WordPress Websites for Engineers: Elevate Your Brand
Streaming Analytics - Comparison of Open Source Frameworks and Products
1. Fast Data and Streaming Analytics
in the Era of Hadoop, R and Apache Spark
Kai Wähner
kwaehner@tibco.com
@KaiWaehner
www.kai-waehner.de
LinkedIn / Xing Please connect!
2. Key Messages
– Streaming Analytics processes Data while it is in Motion!
– Automation and Proactive Human Interaction are BOTH needed!
– Time to Market is the Key Requirement for most Use Cases!
3. Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
4. Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
9. Data Monitoring
• Motor temperature
• Motor vibration
• Current
• Intake pressure
• Intake temperature
Flow
Electrical power cable
Pump
Intake
Protector
ESP motor
Pump monitoring unit
Electric Submersible Pumps (ESP)
Predictive Analytics (Fault Management)
28. Challenges of the 21st Century Retailer
• Retailing and Retail Challenges are changing
• Consumers expect better and integrated customer experience across all channels
– Rapid adoption of mobile is a major driver
– Customers want an integrated service across physical and digital channels… Simultaneously
– Customer experience is becoming one of the main differentiators
• Real-Time, one-on-one marketing can:
– Improve a retailer’s relevance with the customer
– Increase customer wallet-share
• Key to being able to achieve this is:
– Identifying and knowing your customer, in depth in real-time
– Understanding the opportunity their past behavior reveals
– Understanding your inventory (availability, velocity, pipeline)
31. New Real-Time Fraud Detection
Based on Deep Historical Insight
Real-time fraud action can be taken based on
historical insight – system not “whiplashed”
by real-time events
Streaming Analytics for Gift Card Fraud Protection
39. 39
Act while data is in motion!
Time
Business
Value
Business Event
Data Ready for Analysis
Analysis Completed
Decision Made
$$$$
$$$
$$
$ Action Taken
Stream Processing
speeds action and
increases business
value by seizing
opportunities while
they matter
40. Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS
MS Excel
SAS
Data
Scientists
Cleansed
Data
History
Data Discovery
R
Enterprise Service Bus
ERP MDM DB WMS
SOA
BIG DATA
Data Warehouse, Hadoop
InternalData
IntegrationBus
API
Event Server
Streaming Analytics Reference Architecture
Spark
41. Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
42. Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS
MS Excel
SAS
Data
Scientists
Cleansed
Data
History
Data Discovery
R
Enterprise Service Bus
ERP MDM DB WMS
SOA
BIG DATA
Data Warehouse, Hadoop
InternalData
IntegrationBus
API
Event Server
Streaming Analytics Reference Architecture
Spark
58. Visual IDE (Dev, Test, Debug)
Simulation (Feed Testing, Test Generation)
Live UI (monitoring, proactive interaction)
Maturity (24h support, consulting)
Integration (ootb integration: ESB, MDM, etc.)
Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connectivity (technologies, markets, products)
Operators (Filter, Sort, Aggregate)
What Streaming Alternative do you need?
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
61. TIBCO StreamBase
• Performance: Latency, Throughput, Scalability
– Multi-threaded and clustered server from version 1
– High throughput: Millions of messages, 100,000s of quotes, 10,000s of orders
– Low-latency: microsecond latency for algo trading, pre-trade risk, market data
• Take Advantage of High Performance Hardware
– Multicore (12, 24, 32 core) large memory (10s of gigabytes)
– 64-bit Linux, Windows, Solaris deployment
– Hardware acceleration (GPU, Solace, Tervela)
• Enterprise Deployment
– High availability and fault tolerance
– Distributed state management for large data sets
– Management and monitoring tools
– Security and entitlements Integration
– Continuous deployment and QA Process Support
StreamSQL compiler
and static optimizer
In process, in thread
adapter architecture
Visual parallelism and
scaling
ActiveSpaces
integration for
distributed shared state
Data parallelism and
dispatch
StreamBase Server
Innovations
“The StreamBase engine is for real. We couldn’t break it, and believe
me, I tried” SVP Development, Top 5 Broker Dealer
64. Visual Debugger
Feed Simulation
Unit Testing
“StreamBase’s modeling tools are easy to
use and will enable the exchange to
quickly react to the ever changing needs of
our customers.” Steve Goldman,
Director of Enterprise Architecture
StreamBase Development Studio
65. Live Datamart
Continuous Query Processor Alerts
BusinessEvents
FTL
EMS
ActiveSpaces
Live Datamart
BusinessWorks
Social Media Data
Market Data
Sensor Data
Historical
Data
ActiveSpaces Datagrid
Enterprise
dataMarket Data
IoT
Mobile
Social
LiveView Desktop
Command & Control
ACTION
Continuous Query
67. Live Datamart Clients and APIs
• Rich Desktop Client
– Drag&Drop, no coding
• Rich Web Client
– Drag&Drop, no coding
• HTML5 and Javascript API
– D3, jQuery, ExtJS, Google Charts,
Bing, AngularJS
• .NET API
– For custom .NET development
• Java API
– For custom Java GUI development
• Combination
– Rich Client + HTML5 Extensions
72. Example: Apache Storm + TIBCO Live Datamart
External
Data
Snapshot Results
Continuous Query Processor
Query
TIBCO Live Datamart
Continuous
Alerting
Active Tables Active Tables
Continuous
Updates
Clients
Message Bus
Public Data
Customer Data
StreamBase Bolt StreamBase Spout
Operational
Data
StreamBase Bolt and Spout connect
Apache Storm to StreamBase to provide
real-time analytics on operational data
73. Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
74. Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming AnalyticsAction
Aggregate
Rules
Stream Processing
Analytics
Correlate
Live Datamart
Continuous query
processing
Alerts
Manual action,
escalation
HISTORICAL ANALYSIS
MS Excel
SAS
Data
Scientists
Cleansed
Data
History
Data Discovery
R
Enterprise Service Bus
ERP MDM DB WMS
SOA
BIG DATA
Data Warehouse, Hadoop
InternalData
IntegrationBus
API
Event Server
Streaming Analytics Reference Architecture
Spark
76. Real Time Close Loop: Understand – Anticipate – Act
Big Data
store everything
in Hadoop, DWH, NoSQL, etc.
even without structure
even if you do not need it today
http://blogs.teradata.com/international/tag/hadoop/
77. Real Time Close Loop: Understand – Anticipate – Act
Data Discovery + Statistics + Machine Learning
to find insights and patterns in historical data
78. Real Time Close Loop: Understand – Anticipate – Act
Streaming Analytics
to operationalize insights
and patterns in real time
Stream
Processing
Hadoop
Open
Source
R
TERR
SAS
MATLAB
In-
database
analytics
Spark
80. R with TIBCO Runtime for R (TERR)
TIBCO TERR delivers production-grade R analytics to
enterprises
Flexibility & analytic power of R language
Time-to-market agility
Enterprise-grade platform
• A TIBCO licensed & supported product
• Not GPL, not a repackaging of the Open source R engine
• Deployment in TIBCO products and 3rd party applications
(e.g. Hadoop)
http://spotfire.tibco.com/discover-spotfire/what-does-spotfire-do/predictive-analytics/tibco-enterprise-runtime-for-r-terr
85. Case Study: Streaming Analytics for Betting
• Situation: Today, 80% of Betting is Done After the
Game Starts
• It’s not your father’s bookie anymore!
• Problem: How to Analyze Big Betting Data?
• Thousands of concurrent games, constantly adjusting odds, dozens of
betting networks – firms must correlate millions of events a day to find
the best betting opportunities in real-time
• Solution: TIBCO for Fast Data Architecture
• TXOdds uses TIBCO to correlate, aggregate, and analyze large
volumes of streaming betting data in real-time and publish innovative
predictive betting analytics to their customers
• Result: TXOdds First to Market with Innovative Zero
Latency Betting Analytics
• Innovative real-time analytics help players who can process electronic
data in real-time the edge
“With StreamBase, in two
months we had our first betting
analytics feed live, and we
continually deploy new ideas
and evolve our old ones.”
- Alex Kozlenkov, VP of technology, TXOdds
86. 87
“WHEN 5 KEY BOOKIES RAISE
THE SAME ODDS IN A 5-SECOND
WINDOW, BET LESS”
? ? ? ? ?? ???
87. 88
“WHEN THE REAL-TIME ODDS ARE
5% GREATER THAN THE HISTORICAL
SPREAD, INCREASE MY BET”
? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ?
88. Reference Architecture: Streaming Betting Analytics
Event Processing
MONITOR
REAL-TIME ANALYTICS
AGGREGATE
HISTORICAL COMPARISON
Predictive odds
analytics
Zero Latency
Betting Analytics
GLOBAL, DISTRIBUTED INFRASTRUCTURE
Historical odds
deviations
B
U
S
BETTING LINES
SCORES
NEWS
HADOOP
Context:
Historical
Betting Data,
Odds, Outcomes
B
U
S
CACHE CACHE CACHE
Real-Time Analytics
CORRELATE
StreamBase LiveView
SOCIAL
89. Twitter
(#TomBradyBrokenLeg)
Twitter (#Boston)
Brady’s Stats
Actionable
Insights
Real-Time Social Media Analytics
Twitter (#NFL)
Something relevant happening?
Every minute counts!
Change Odds (automated or manually triggered):
• Stop live-betting for the currently running game?
• How many interceptions will the Quarterback throw?
• Will the Patriots win the Super Bowl?
• …
92. – Streaming Analytics processes Data while it is in Motion!
– Automation and Proactive Human Interaction are BOTH needed!
– Time to Market is the Key Requirement for most Use Cases!
Key Messages