The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business, by connecting various enterprise data sources to enhance your experience in understanding the data and to increase internal productivity.
This session explores how to enable digital transformation by building a data analytics platform. It will discuss the follwoing topics:
WSO2 Data Analytics Server architecture
Understanding streaming constructs
Architectural styles for data integration
Debugging and troubleshooting your integration
Deployment
Performance tuning
Production hardening
9. Market Recognition
Named as a Strong Performer in The Forrester Wave™: Big Data
Streaming Analytics, Q1 2016.
• Highest score possible in 'Acquisition and Pricing' criteria
• Second-highest scores in 'Ability to execute' criteria
The Forrester Report notes…..
“WSO2 is an open source middleware provider that includes a full spectrum of
architected-as-one components such as application servers, message brokers, enterprise service
bus, and many others.
Its streaming analytics solution follows the complex event processor architectural
approach, so it provides very low-latency analytics. Enterprises that already use WSO2
middleware can add CEP seamlessly. Enterprises looking for a full middleware stack that
includes streaming analytics will find a place for WSO2 on their shortlist as well.”
11. Event Streams
Event Stream Schema
Name : TemperatureStream
Version : 1.0
Attribute Type
sensorID String
temperature double
preasure double
Event
StreamID TemperatureStream:1.0
Timestamp 1487270220419
sensorID AP234
temperature 23.5
preasure 94.2
SourceIP 168.50.24.2
+ Support for
arbitrary key-value
pairs
12. Realtime Processing
• Process events in streaming fashion (one event at a time)
• Processing topology (Execution Plan)
– Written in Siddhi Query Language
– Runs in isolation
– Include
• Queries
• Input event streams
• Output event streams
15. Basic Patterns
define stream SoftDrinkSales
(region string, brand string, quantity int, price double);
from SoftDrinkSales[price >= 100]#window.time(1 hour)
select region, brand, avg(quantity) as avgQuantity
group by region, brand
having avgQuantity > 1000
insert into HighHourlySales ;
Temporal Aggregation,
Transformation,
Threshold & Filtering
Other supported window types:
timeBatch(), length(),
lengthBatch(), etc.
16. Event Correlation Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 10] ) ->
a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Only Supported in CEP Systems!!!
17. Data Persistence
• Provides backend datastore
agnostic way to store and
retrieve data
• Provides standard REST API
• Pluggable data connectors
– RDBMS
– Cassandra
– HBase
– custom ...
Data Abstraction Layer
Custom
18. Siddhi Event Table and Join
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ ,
table.name = ‘UserTable’, caching.algorithm’=‘LRU’)
define table CardUserTable (name string, cardNum long);
from Purchase as p join CardUserTable as c
on p.cardNo == c.cardNum
select p.cardNo as cardNo, c.name as name, p.price as price
insert into PurchaseUserStream ;
Supported for RDBMS, In-Memory,
Distributed In-Memory Grid
(Hazelcast),
WSO2 Analytics Table
Cache used to
improve performance
19. Incremental Processing Patterns
• Periodic Analysis
• Incremental Analysis
– on newly arrived data
• Lambda Architecture
• Realtime Incremental Analytics
– on newly arrived data with low latency
20. Periodic Analytics Pattern
• Runs through the full data set
• Summarize data periodically
• E.g: Identifying median
• Supported with WSO2 DAS
– Spark Script Scheduling
– Siddhi Batch windows.
https://www.hsph.harvard.edu/population-development/2014/09/08/impact-of-schedule-control-on-quality-of-c
are-in-nursing-homes/
21. Incremental Analytics Pattern
• For incremental Big Data processing
• Periodically process the newly arrived data
• Via Extended Spark
create temporary table orders using CarbonAnalytics
options (tableName "ORDERS",
schema "customerID STRING, phoneType STIRNG,
OrderID STRING, cost DOUBLE, _timestamp LONG -i",
incrementalParams "orders, 60");
23. Realtime Incremental Analytics Pattern
• Low latency and low resource utilization
• Works for both short and long term streaming data
• Enhanced version of Lambda Architecture
Realtime incremental processing
(Seconds & Minutes)
Batch incremental processing
(Hour and above)
26. Dashboards
• Dashboard generation
• Gadget generation
• Gather data via
– Websockets
– Polling
• Custom & Personalized
Gadget and Dashboard
support
27. Interactive Queries
• Full text search
• Drilldown search
• Near real time data indexing
and retrieval
• Powered by Apache Lucene
28. Intelligent Processing Patterns
• Build and Run ML Models
• Streaming ML
• Anomaly Detection
• Detect Rare Activity Sequences
• Scoring
• Realtime Risk Detection
29. Predictive Analytics
• Guided UI to build Machine
Learning models with
– Apache Spark MLlib
– H2O.ai (for deep learning
algorithms)
• Build with R and export them as
PMML
• Run built models against realtime
data in DAS
30. Real time Prediction
Using built machine learning models
from DataStream#ml:predict(“/home/user/ml.model”, “double”)
select *
Insert into PredictionStream ;
Or use R scripts, Regression, Markov Chains or Anomaly
Detection on realtime
31. Analytics Extensions Store
• geo: Geographical processing
• nlp: Natural Language Processing
(with Stanford NLP)
• ml: Running Machine Learning and
PMML models
• timeseries: Regression and time
series
• math: Mathematical operations
• str: String operations
• regex: Regular expression
• more ...
https://store.wso2.com/
35. Avoid False Positives with Scoring Pattern
• You just bought a diamond ring
• You bought 20 diamond rings, within 15 minutes
at 3am and shipped it to 4 global locations?
• Use combination of rules
• Give weights to each rule
• Single number that reflects multiple fraud indicators
• Use a threshold to reject transactions
Score = 2*X + 4*Y + 13*Z
37. • Model randomly changing systems
• Detect using Siddhi Markov Models
Detect Sequence of Rare Activities
https://en.wikipedia.org/wi
ki/Markov_chain
39. Banking and Finance
Risk Management
• End of Day Risk processing is no
longer adequate
• Support Realtime Intra-day Value at
Risk computations
• Calculated using realtime
– Market prices
– Portfolio changes
40. Realtime Value at Risk
WSO2 DAS models Value at Risk using 3 standard methods
• Historical Simulation
• Variance-Covariance
• Monte Carlo Simulation
Query :
from InputSteam#var:historical(251, 0.95, Symbol, Price)
select *
insert into VaRStream ;
41. Banking and Finance
Stock Market Surveillance
Manipulation Methods
• Front Running
• Pump (and Dump)
• Insider Dealing
• Wash Trading
• Churning
• more ...
42. Banking and Finance
Stock Market Surveillance ...
Manipulated via :
• Artificially inflating or deflating stock prices
• Exploiting prior knowledge of company proceedings
• Abusing advanced knowledge of pending orders
Solved via :
Joining market data feeds with external data streams such as
company announcements, news feeds, twitter streams, etc
44. eCommerce and Digital Marketing
Recommendations
Based on :
• Customer buying history
• Item buying history
• Current trends
• Machine Learning
Customers are likely to
choose
recommendations as
they are personalized.
47. eCommerce and Digital Marketing
AD Optimisations
Achieve :
• More clicks
• More conversions
• Effective use of
allocated budget
• Higher click through
ratio (CTR)
• Greater ROI
48. Fleet Management
You can know
• Where your fleet is ?
• Driving behaviour
• Are vehicles used optimally?
– Fuel expenses
– Travel time
– Round trip time
• Current situation on the road
• more ...
https://rnpc-rekos.ru/gps-fleet-management-systems/
50. Smart Energy Analytics
• Optimize Smart Grids
– Analyse energy demand
– Predict required energy
supply
• Understand steady state
operations
• Act on events in energy network
• Monitor process and equipments
on energy network
• more ...
51. • Monitor and manage
Equipments
• Home Automation
• Surveillance
• Maintenance
• Edge Analytics with
Siddhi
• more ...
Smart Building / Home Analytics
52. QoS Enablement, Network & System Monitoring
• Real-Time Botnet Traffic Detection
• Auto scaling based on
– CPU utilisation
– Memory consumption
– Load average
– Request count
– etc ...
54. Healthcare
• HL7 Messaging support
• Monitoring Medical
Records
– Delay in patient visits
– Alerts based on
glucose levels
• Used with of WSO2
Integration (ESB)
55. Key Differentiations
• Realtime analytics at it’s best
– Rich set of realtime functions
– Sequence and pattern detection
• No code compilations - SQL Like language
• Incremental processing for everyday analytics
• Intelligent decision making with ML and more
• Rich sets of input & output connectors
• High performance and low infrastructure cost