Sun Tzu said “if you know your enemies and know yourself, you can win a hundred battles without a single loss.” Those words have never been truer than in our time. We are faced with an avalanche of data. Many believe the ability to process and gain insights from a vast array of available data will be the primary competitive advantage for organizations in the years to come.
To make sense of data, you will have to face many challenges: how to collect, how to store, how to process, and how to react fast. Although you can build these systems from bottom up, it is a significant problem. There are many technologies, both open source and proprietary, that you can put together to build your analytics solution, which will likely save you effort and provide a better solution.
In this session, Srinath will discuss WSO2’s middleware offering in BigData and explain how you can put them together to build a solution that will make sense of your data. The session will cover technologies like thrift for collecting data, Cassandra for storing data, Hadoop for analyzing data in batch mode, and Complex event processing for analyzing data real time.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queries, and Complex Event Processing
1. View, Act, and React: Shaping
Business Activity with Analytics,
BigData Queries, and Complex
Event Processing
Srinath Perera
Director, Research
WSO2
2. Start
Image cedit, CC licence,
http://ansem315.deviantart.com/art/AsimovFoundation-395188263
• 1942, Asimov wrote a book
called Foundation, in which
the character Hari Seldon use
mathematical models to
predict the future of
civilization and then to save
it.
• Paul Krugman,( the Nobel
Laureate in Economics), said
his interest in economic
begin with foundation.
• We are entering that Era of
our history where Mr. Asimov
might have a point.
3. Consider a Day in your Life
• What is the best road to take?
• Would there be any bad
weather?
• What is the best way to invest
the money?
• Should I take that loan?
• Is there a way to do this faster?
• What others did in similar
cases?
• Which product should I buy?
6. Why it is hard?
• System build of many
computers (1000 nodes to
store 1PB with 1TB each)
• That handles lots of data (10Gb
network => 83 days to copy
1PB)
• Running complex logic (models
can be complex as the system)
• This pushes us to the frontier
of Distributed Systems and
Databases
http://www.flickr.com/photos/mariachily/5250487136,
Licensed CC
8. Each stream
has a name
Event Streams
{
•
'name':'PlayStream',
'version':'1.0.0',
'payloadData':[
'name':'sid',
'ts':'BIGINT',
'x':'DOUBLE',
•
...
]
}
Each event has
attributes, that has
types
We view the world as event
streams
Event stream is series of events
over time
We use SQL like languages (Hive/
CEP) to process event streams and
create new event streams
Select from PlayStream[x>2500 and .. ]
İnsert into NearGoalStream
9. Demo Usecase (DEBS 2013)
• Football game, players and ball
has sensors (DESB Challenge
2013)
• sid, ts, x,y,z, v,a
• Use cases: Running analysis,
Ball Possession and Shots on
Goal, Heatmap of Activity
• Siddhi did 100K+ on each
usecase
• For this talk, we will look at user
activity by region of the field.
13. BAM Hive Query
Find how much time spent in each cell.
CREATE EXTERNAL TABLE IF NOT EXISTS
PlayStream …
select sid,
ceiling((y+33000)*7/10000 + x/10000)
as cell, count(sid)
from PlayStream
GROUP BY sid, ceiling((y+33000)*7/10000 +
x/10000);
15. Calculate the
mean location of
each player every
second
define partition sidPrt by PlayStream.sid,
CEP Query
LocBySecStream.sid
from PlayStream#window.timeBatch(1sec)
select sid, avg(x) as xMean, avg(y) as yMean, avg(z)
as zMean
insert into LocBySecStream partition by sidPrt
from every e1 = LocBySecStream ->
e2 = LocBySecStream [e1.yMean + 10000 > yMean
or yMean + 10000 > e1.yMean]
within 2sec select e1.sid
insert into LongAdvStream partition by sidPrt ;
Detect more
than 10m run