SlideShare ist ein Scribd-Unternehmen logo
1 von 42
N..e.ar ..re.
..ana.
tec....hnolo...gy choi...ce
✦

pavlo.baron@codecentric.de	


✦

@pavlobaron
Wile E. Coyote
✦

pretty slow	


✦

running on own demand	


✦

very wide field of vision	


✦

very long memory	


✦

purely proactive	


✦

✦

thoroughly analysing and
preparing	

always loses
Road Runner
✦

hell fast	


✦

ever running	


✦

very narrow field of vision	


✦

very short memory	


✦

purely reactive	


✦

✦

forced to immediately
decide	

always wins
Coyote: slow
✦

✦

✦

too much mumbo-jumbo,
too many tools, totally
dependent on ACME 	

needs a complex, partially
distributed setup	

complex decisions,
depending on Runner,
weather, environment etc.
Runner: fast
✦

✦

✦

zero hoo-ha, zero
tools, just own
body	

road bound	

simple decisions
like run | halt | step
aside | beep beep
Coyote: offline
✦

✦

mostly stands around,
observing and planning	

only sprints on demand,
when Runner passes by
Runner: non-stop
✦

✦

never stops fully, just
occasionally halts for food
and to fool Coyote	

continuously runs the road
in search for food
Coyote: wide vision
✦

✦

sees the whole environment	

tries to use the whole
environment to catch
Runner, predicting his
paths
Runner: narrow vision
✦

✦

only sees what’s in front of
his nose on the road	

due to speed and short-time
predictions, feels well with
the narrow, momentary
vision
Coyote: long memory
✦

✦

as far as possible, learns
from previous failures	

continuously improves
tricks to catch Runner
Runner: short memory
✦

✦

ultimate carpe diem	

predicts Coyote’s actions in
last minute, avoiding being
harmed right before the fact
Coyote: proactive

✦

plans and tries out, looks
for new ways to catch
Runner
Runner: reactive

✦

doesn’t plan, just reacts on
Coyote’s actions
Coyote: thorough
✦

✦

thoroughly analyses the
situation	

throughly plans ahead,
prepares for one single shot
Runner: spontaneous
✦

✦

decides immediately and
spontaneously, depending
on what Coyote does	

makes the best immediate
decision to achieve the
highest level of Coyote
fooling
Coyote: loses
✦

✦

no matter how hard he
tries, he’s never fast or
savvy enough to catch
Runner	

never gives up though
Runner: wins
✦

✦

doesn’t even try to win, but
always does thanks to speed
and immediate situation
analysis, followed by
reaction. Also, due to
Coyote’s continuous failure	

every time has fun fooling
Coyote
Coyote is batch.
Runner is near realtime.
Batch (analytics)
✦

✦

✦

✦

is when you have plenty of
time for analysis	

is when you explore
patterns and models in
historic data	

is when you try to fit any
sort of data into a
hypothetic model	

is when you plan and
forecast the future instead
of (re)acting immediately
Batch (architecture)
✦

✦

✦

✦

is when you
(synchronously) query
previously stored data	

is when you use main
memory primarily for
temporary caches	

is when you do ETL and
alike, even on Hadoop’s
rails	

is when you split large
amounts of historic data in
smaller portions for
distributed / parallel
analysis
Batch (technology)
✦

✦

✦

is when you build on
(R)DBMS or (softschema) NoSQL data
stores in a classic way	

is when you store in HDFS
and process with Hadoop &
Co.	

is when you generally rely
on disks / storage
Near realtime (analytics)
✦

✦

✦

✦

is when you don’t have time	

is when you analyse data as
it comes	

is when you already have a
fixed model, and data flying
in fits it 100%	

is when you (re)act
immediately, based on
patterns you learned online
and in the batch analysis
Near realtime (architecture)
✦

✦

✦

✦

is when you don’t query
data, but expect / assume it	

is when you use main
memory as primary data
storage	

is when you process event
streams	

is when you distribute and
parallelise only independent
computations (it’s hairy
enough even on one
machine - explicit loop
tiling, skewing etc.)
Near realtime (technology)
✦

✦

✦

✦

✦

is when you build on
DSMS, event processing
systems and alike	

is when you store (almost)
only for archiving reasons	

is when you don’t hit disks
or speak of “storage”	

is when you do your best to
avoid horizontal network
gossip	

is when you must go for
accelerators such as GPUs
in case of complex math
Near realtime - non-stop,
immediate analytics cannot
be done as / in batch.
Near realtime is tricky
✦

✦

✦

✦

✦

you need to build event-driven, non-blocking,
lock-free, reactive programs (buzzword
award!)	

you need to work time-bound, penalising or
compensating late events	

you need to keep everything (sliced, autoexpiring) in main memory	

you need to completely utilise resources of one
single machine (speaking of mechanical
sympathy), without waste	

you need to fix your model and work with
fixed-size (binary) events
Scaling near realtime
✦

✦

✦

✦

scaling near realtime analytics is pretty hard.
Similar challenges parallelising on one
machine or scaling out in a distributed way	

you scale through logical or physical stream
splitting, online scatter-gather and alike	

you keep distributed / parallel computation
independent, until you have to merge in the
next processing stage. And so on.	

you scale through receive-and-forward, fireand-forget, cascading, pipelining, multicast,
redundant (who’s first, role-based etc.)
processing
Surviving near realtime
✦

✦

✦

building a restlessly eventoriented, in-memory analytics
system brings some challenges	

disaster recovery: yet again,
splitting streams (for storage),
redundant (role-based)
computation	

short-term failure recovery: upfront temporary, auto-expiring
storage, auto-replay or penalising
events
Near realtime is limited
✦

✦

✦

you need to run most of
analytics on event windows
of some size	

you switch from exact to
probabilistic / approximate
results	

you can only predict near
future, cluster based on
relatively short time periods
and recognise short-term
patterns and anomalies only
Near realtime mining
✦

✦

✦

✦

you mine live streams instead of
passive data sources	

typical algorithms such as
Apriori, 1-class-SVM, k-means,
regressions etc. are easily
possible, but on stream portions
only	

NLP can be done by giving
words identifiers and dealing
with binary messages instead of
text	

as long as it fits into main
memory, it’s comparable to
classic mining, but is much faster
Near realtime + batch?
✦

✦

✦

the combination of both is
what can make a winning
solution. Example reference
architecture: Lambda, but
it’s even more	

exploratory, offline
analytics, baseline analysis,
pattern mining, algorithm
training and alike you do in
the batch	

you apply batch analytics’
results to near realtime and
prove or reject hypothesis’,
detect anomalies, run
forecasts, derive trends etc.
Near realtime, no batch?
✦

✦

✦

✦

it’s possible to do some of this
completely without batch, just
on streams - even more than
basic counters and stats	

you need to keep every single
historic event in a data store	

you need to replay historic
events instead of querying /
mining your data store	

don’t query your database - let
the database stream what it has
to you
Near realtime example tools
✦

✦

✦

✦

✦

query/store-oriented/passivelyadapting: Spark/Shark, Impala,
Drill, ParStream, Splunk	

full-blown CEP engines /
continuous querying DSMSs:
Esper, TIBCO/StreamBase	

more pragmatic stream
processors: Storm, S4, Samza	

event-oriented, continuous
analysers: keen.io, also
speaker’s current WIP	

etc. etc. etc...
Near realtime - DIY
✦

✦

✦

✦

in the end, you’ll have to build it (or core
parts of it) yourself	

you’ll have to work with circular / ring
buffers and / or zero-overhead queuing
software: Disruptor, 0MQ	

ideally, you keep everything in one single
OS process - multi-threading is still hairy
enough then	

managing and using machine’s overall
memory is the tricky part	


✦

for GPUs: OpenCL, Rootbeer	


✦

embed analytics / statistics into the process
Near realtime - DIY
✦

✦

✦

✦

✦

✦

picking the basis platform has less to do with the
personal flavour than with what it offers	

C is a good and a valid choice, but very “manual”	

Erlang/OTP is great for glue, but hard for analytics
and integration. In the end, it’s C, but pretty tricky
here	

Node.js is C in the end at this point, but it’s not for
single-process / multi-threading and still maturing	

JVM is a good compromise. Managed / GCcontrolled memory with object wrappers will be
sacrificed for off-heap memory with primitives though	

Most of the rest doesn’t apply for this sort of tasks
Near realtime - DIY

✦

✦

✦

✦

✦

programming paradigms and thus
languages are the essential, secret sauce	

functional programming is ideal for
analytics and event-processing	

(functional) reactive programming,
Reactor (as pattern or framework), RX
are good for building this sort of
systems	

JavaScript is partially there, Erlang,
Clojure, Scala & Co. are further, but can
be uncontrollable in runtime behaviour	

pure Java can be (later) a healthy tradeoff though - now with RX or Reactor,
Netty etc.
Time in near realtime
✦

✦

✦

✦

✦

realtime still means real time, even if “near”	

the platform of your choice might not be ideal
for hard or soft realtime, since the difference is
primarily in what happens with late events and
under high load	

Erlang will do its best to trigger a timer. Same
with Node.js. But they don’t interrupt hard, are
scheduling on their own and thus leaving you
with an approximation	

JVM comes close, but still no easy way to
interrupt explicitly. Alternative: Hashing
Wheel, own scheduler on dedicated core	

C is the winner, OS-support essential (RTOS
alike)
Near realtime + data store?
✦

✦

✦

✦

near realtime analytics systems need to
store data in different stages: shortterm replay, disaster protection, history	

the trick is to turn around the way you
work with the data store	

your data store knows model and
queries beforehand, and only waits for
events to start streaming historic data
satisfying the static query / view	

most NoSQL stores, but also classic
RDBMS have implantable workers /
jobs / coprocessors as built-in feature:
Oracle, Riak, HBase etc.
Near realtime business cases
✦

✦

anomaly / novelty / outlier detection in
any sort of system	

fraud, attack detection based on
patterns	


✦

situational pricing, product placement	


✦

stock, inventory control and forecast	


✦

online bidding, trading	


✦

automated traffic optimization	


✦

semi-automated operations	


✦

immediate visualization and tracing
Why speed?
✦

✦

✦

✦

why be slow if it’s possible, with
comparable effort, to be fast in
making decisions and automating
them? If not you, then your
competitor	

since everybody can mine data,
speed and quality are the only
technical success factors left	

it’s about how fast you can decide
based on data. The best way is to
start very early, at the source of data	

“new economy” is all about speed,
not (only) lobbies
✦

cartoon images found on the
internet and are directly or
indirectly property/copyright of
or related to Time Warner

Weitere ähnliche Inhalte

Andere mochten auch

Assistech: An AAC Device for Autistic Children
Assistech: An AAC Device for Autistic ChildrenAssistech: An AAC Device for Autistic Children
Assistech: An AAC Device for Autistic ChildrenSusie Herbstritt
 
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Pavlo Baron
 
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)Pavlo Baron
 
a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)Pavlo Baron
 
Q1_networks
Q1_networksQ1_networks
Q1_networksginiskid
 
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)Pavlo Baron
 
Future Things/Robotic Products
Future Things/Robotic ProductsFuture Things/Robotic Products
Future Things/Robotic ProductsSusie Herbstritt
 
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)Pavlo Baron
 
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)Pavlo Baron
 

Andere mochten auch (11)

f6k & l10n
f6k & l10nf6k & l10n
f6k & l10n
 
Assistech: An AAC Device for Autistic Children
Assistech: An AAC Device for Autistic ChildrenAssistech: An AAC Device for Autistic Children
Assistech: An AAC Device for Autistic Children
 
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
Theoretical aspects of distributed systems - playfully illustrated (@pavlobaron)
 
Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)Set this Big Data technology zoo in order (@pavlobaron)
Set this Big Data technology zoo in order (@pavlobaron)
 
a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)a Tech guy’s take on Big Data business cases (@pavlobaron)
a Tech guy’s take on Big Data business cases (@pavlobaron)
 
Q1_networks
Q1_networksQ1_networks
Q1_networks
 
20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)20 reasons why we don't need architects (@pavlobaron)
20 reasons why we don't need architects (@pavlobaron)
 
Future Things/Robotic Products
Future Things/Robotic ProductsFuture Things/Robotic Products
Future Things/Robotic Products
 
BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)BigData & CDN - OOP2011 (Pavlo Baron)
BigData & CDN - OOP2011 (Pavlo Baron)
 
Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)Let It Crash (@pavlobaron)
Let It Crash (@pavlobaron)
 
Kokkola
KokkolaKokkola
Kokkola
 

Ähnlich wie Near realtime analytics - technology choice (@pavlobaron)

Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowMateuszSzczyrzyca
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015NoSQLmatters
 
Need for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsNeed for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsKonrad Malawski
 
Zen and the Art of ILS Migration--KUDOSCon 2011
Zen and the Art of ILS Migration--KUDOSCon 2011Zen and the Art of ILS Migration--KUDOSCon 2011
Zen and the Art of ILS Migration--KUDOSCon 2011D Ruth Bavousett
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)Pavlo Baron
 
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016Quantopian
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionAzul Systems, Inc.
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Azul Systems Inc.
 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItAzul Systems Inc.
 
SQL Server High Availability and DR - Too Many Choices!
SQL Server High Availability and DR - Too Many Choices!SQL Server High Availability and DR - Too Many Choices!
SQL Server High Availability and DR - Too Many Choices!Mike Walsh
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfAlexanderKyalo3
 
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)Clancy Childs
 
Garbage Collection in Hotspot JVM
Garbage Collection in Hotspot JVMGarbage Collection in Hotspot JVM
Garbage Collection in Hotspot JVMjaganmohanreddyk
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 

Ähnlich wie Near realtime analytics - technology choice (@pavlobaron) (20)

Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
 
Need for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applicationsNeed for Async: Hot pursuit for scalable applications
Need for Async: Hot pursuit for scalable applications
 
Zen and the Art of ILS Migration--KUDOSCon 2011
Zen and the Art of ILS Migration--KUDOSCon 2011Zen and the Art of ILS Migration--KUDOSCon 2011
Zen and the Art of ILS Migration--KUDOSCon 2011
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Spaghetti gate
Spaghetti gateSpaghetti gate
Spaghetti gate
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
 
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
Understanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About ItUnderstanding Java Garbage Collection - And What You Can Do About It
Understanding Java Garbage Collection - And What You Can Do About It
 
SQL Server High Availability and DR - Too Many Choices!
SQL Server High Availability and DR - Too Many Choices!SQL Server High Availability and DR - Too Many Choices!
SQL Server High Availability and DR - Too Many Choices!
 
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdfCSE545 sp23 (2) Streaming Algorithms 2-4.pdf
CSE545 sp23 (2) Streaming Algorithms 2-4.pdf
 
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
BYO/DIY Analytics Platform (MeasureCamp Presentation by Clancy Childs)
 
Garbage Collection in Hotspot JVM
Garbage Collection in Hotspot JVMGarbage Collection in Hotspot JVM
Garbage Collection in Hotspot JVM
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 

Mehr von Pavlo Baron

@pavlobaron Why monitoring sucks and how to improve it
@pavlobaron Why monitoring sucks and how to improve it@pavlobaron Why monitoring sucks and how to improve it
@pavlobaron Why monitoring sucks and how to improve itPavlo Baron
 
Why we do tech the way we do tech now (@pavlobaron)
Why we do tech the way we do tech now (@pavlobaron)Why we do tech the way we do tech now (@pavlobaron)
Why we do tech the way we do tech now (@pavlobaron)Pavlo Baron
 
Qcon2015 living database
Qcon2015 living databaseQcon2015 living database
Qcon2015 living databasePavlo Baron
 
Becoming reactive without overreacting (@pavlobaron)
Becoming reactive without overreacting (@pavlobaron)Becoming reactive without overreacting (@pavlobaron)
Becoming reactive without overreacting (@pavlobaron)Pavlo Baron
 
The hidden costs of the parallel world (@pavlobaron)
The hidden costs of the parallel world (@pavlobaron)The hidden costs of the parallel world (@pavlobaron)
The hidden costs of the parallel world (@pavlobaron)Pavlo Baron
 
data, ..., profit (@pavlobaron)
data, ..., profit (@pavlobaron)data, ..., profit (@pavlobaron)
data, ..., profit (@pavlobaron)Pavlo Baron
 
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)Pavlo Baron
 
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)Pavlo Baron
 
Dynamo concepts in depth (@pavlobaron)
Dynamo concepts in depth (@pavlobaron)Dynamo concepts in depth (@pavlobaron)
Dynamo concepts in depth (@pavlobaron)Pavlo Baron
 
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)Pavlo Baron
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)Pavlo Baron
 
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)Pavlo Baron
 
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)Pavlo Baron
 
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)Pavlo Baron
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Pavlo Baron
 

Mehr von Pavlo Baron (15)

@pavlobaron Why monitoring sucks and how to improve it
@pavlobaron Why monitoring sucks and how to improve it@pavlobaron Why monitoring sucks and how to improve it
@pavlobaron Why monitoring sucks and how to improve it
 
Why we do tech the way we do tech now (@pavlobaron)
Why we do tech the way we do tech now (@pavlobaron)Why we do tech the way we do tech now (@pavlobaron)
Why we do tech the way we do tech now (@pavlobaron)
 
Qcon2015 living database
Qcon2015 living databaseQcon2015 living database
Qcon2015 living database
 
Becoming reactive without overreacting (@pavlobaron)
Becoming reactive without overreacting (@pavlobaron)Becoming reactive without overreacting (@pavlobaron)
Becoming reactive without overreacting (@pavlobaron)
 
The hidden costs of the parallel world (@pavlobaron)
The hidden costs of the parallel world (@pavlobaron)The hidden costs of the parallel world (@pavlobaron)
The hidden costs of the parallel world (@pavlobaron)
 
data, ..., profit (@pavlobaron)
data, ..., profit (@pavlobaron)data, ..., profit (@pavlobaron)
data, ..., profit (@pavlobaron)
 
(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)(Functional) reactive programming (@pavlobaron)
(Functional) reactive programming (@pavlobaron)
 
Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)Diving into Erlang is a one-way ticket (@pavlobaron)
Diving into Erlang is a one-way ticket (@pavlobaron)
 
Dynamo concepts in depth (@pavlobaron)
Dynamo concepts in depth (@pavlobaron)Dynamo concepts in depth (@pavlobaron)
Dynamo concepts in depth (@pavlobaron)
 
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
Chef's Coffee - provisioning Java applications with Chef (@pavlobaron)
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
 
NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)NoSQL - how it works (@pavlobaron)
NoSQL - how it works (@pavlobaron)
 
The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)The Agile Alibi (Pavlo Baron)
The Agile Alibi (Pavlo Baron)
 
Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)Harry Potter and Enormous Data (Pavlo Baron)
Harry Potter and Enormous Data (Pavlo Baron)
 
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)Big Data & NoSQL - EFS'11 (Pavlo Baron)
Big Data & NoSQL - EFS'11 (Pavlo Baron)
 

Kürzlich hochgeladen

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 

Kürzlich hochgeladen (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 

Near realtime analytics - technology choice (@pavlobaron)

  • 3. Wile E. Coyote ✦ pretty slow ✦ running on own demand ✦ very wide field of vision ✦ very long memory ✦ purely proactive ✦ ✦ thoroughly analysing and preparing always loses
  • 4. Road Runner ✦ hell fast ✦ ever running ✦ very narrow field of vision ✦ very short memory ✦ purely reactive ✦ ✦ forced to immediately decide always wins
  • 5. Coyote: slow ✦ ✦ ✦ too much mumbo-jumbo, too many tools, totally dependent on ACME needs a complex, partially distributed setup complex decisions, depending on Runner, weather, environment etc.
  • 6. Runner: fast ✦ ✦ ✦ zero hoo-ha, zero tools, just own body road bound simple decisions like run | halt | step aside | beep beep
  • 7. Coyote: offline ✦ ✦ mostly stands around, observing and planning only sprints on demand, when Runner passes by
  • 8. Runner: non-stop ✦ ✦ never stops fully, just occasionally halts for food and to fool Coyote continuously runs the road in search for food
  • 9. Coyote: wide vision ✦ ✦ sees the whole environment tries to use the whole environment to catch Runner, predicting his paths
  • 10. Runner: narrow vision ✦ ✦ only sees what’s in front of his nose on the road due to speed and short-time predictions, feels well with the narrow, momentary vision
  • 11. Coyote: long memory ✦ ✦ as far as possible, learns from previous failures continuously improves tricks to catch Runner
  • 12. Runner: short memory ✦ ✦ ultimate carpe diem predicts Coyote’s actions in last minute, avoiding being harmed right before the fact
  • 13. Coyote: proactive ✦ plans and tries out, looks for new ways to catch Runner
  • 14. Runner: reactive ✦ doesn’t plan, just reacts on Coyote’s actions
  • 15. Coyote: thorough ✦ ✦ thoroughly analyses the situation throughly plans ahead, prepares for one single shot
  • 16. Runner: spontaneous ✦ ✦ decides immediately and spontaneously, depending on what Coyote does makes the best immediate decision to achieve the highest level of Coyote fooling
  • 17. Coyote: loses ✦ ✦ no matter how hard he tries, he’s never fast or savvy enough to catch Runner never gives up though
  • 18. Runner: wins ✦ ✦ doesn’t even try to win, but always does thanks to speed and immediate situation analysis, followed by reaction. Also, due to Coyote’s continuous failure every time has fun fooling Coyote
  • 19. Coyote is batch. Runner is near realtime.
  • 20. Batch (analytics) ✦ ✦ ✦ ✦ is when you have plenty of time for analysis is when you explore patterns and models in historic data is when you try to fit any sort of data into a hypothetic model is when you plan and forecast the future instead of (re)acting immediately
  • 21. Batch (architecture) ✦ ✦ ✦ ✦ is when you (synchronously) query previously stored data is when you use main memory primarily for temporary caches is when you do ETL and alike, even on Hadoop’s rails is when you split large amounts of historic data in smaller portions for distributed / parallel analysis
  • 22. Batch (technology) ✦ ✦ ✦ is when you build on (R)DBMS or (softschema) NoSQL data stores in a classic way is when you store in HDFS and process with Hadoop & Co. is when you generally rely on disks / storage
  • 23. Near realtime (analytics) ✦ ✦ ✦ ✦ is when you don’t have time is when you analyse data as it comes is when you already have a fixed model, and data flying in fits it 100% is when you (re)act immediately, based on patterns you learned online and in the batch analysis
  • 24. Near realtime (architecture) ✦ ✦ ✦ ✦ is when you don’t query data, but expect / assume it is when you use main memory as primary data storage is when you process event streams is when you distribute and parallelise only independent computations (it’s hairy enough even on one machine - explicit loop tiling, skewing etc.)
  • 25. Near realtime (technology) ✦ ✦ ✦ ✦ ✦ is when you build on DSMS, event processing systems and alike is when you store (almost) only for archiving reasons is when you don’t hit disks or speak of “storage” is when you do your best to avoid horizontal network gossip is when you must go for accelerators such as GPUs in case of complex math
  • 26. Near realtime - non-stop, immediate analytics cannot be done as / in batch.
  • 27. Near realtime is tricky ✦ ✦ ✦ ✦ ✦ you need to build event-driven, non-blocking, lock-free, reactive programs (buzzword award!) you need to work time-bound, penalising or compensating late events you need to keep everything (sliced, autoexpiring) in main memory you need to completely utilise resources of one single machine (speaking of mechanical sympathy), without waste you need to fix your model and work with fixed-size (binary) events
  • 28. Scaling near realtime ✦ ✦ ✦ ✦ scaling near realtime analytics is pretty hard. Similar challenges parallelising on one machine or scaling out in a distributed way you scale through logical or physical stream splitting, online scatter-gather and alike you keep distributed / parallel computation independent, until you have to merge in the next processing stage. And so on. you scale through receive-and-forward, fireand-forget, cascading, pipelining, multicast, redundant (who’s first, role-based etc.) processing
  • 29. Surviving near realtime ✦ ✦ ✦ building a restlessly eventoriented, in-memory analytics system brings some challenges disaster recovery: yet again, splitting streams (for storage), redundant (role-based) computation short-term failure recovery: upfront temporary, auto-expiring storage, auto-replay or penalising events
  • 30. Near realtime is limited ✦ ✦ ✦ you need to run most of analytics on event windows of some size you switch from exact to probabilistic / approximate results you can only predict near future, cluster based on relatively short time periods and recognise short-term patterns and anomalies only
  • 31. Near realtime mining ✦ ✦ ✦ ✦ you mine live streams instead of passive data sources typical algorithms such as Apriori, 1-class-SVM, k-means, regressions etc. are easily possible, but on stream portions only NLP can be done by giving words identifiers and dealing with binary messages instead of text as long as it fits into main memory, it’s comparable to classic mining, but is much faster
  • 32. Near realtime + batch? ✦ ✦ ✦ the combination of both is what can make a winning solution. Example reference architecture: Lambda, but it’s even more exploratory, offline analytics, baseline analysis, pattern mining, algorithm training and alike you do in the batch you apply batch analytics’ results to near realtime and prove or reject hypothesis’, detect anomalies, run forecasts, derive trends etc.
  • 33. Near realtime, no batch? ✦ ✦ ✦ ✦ it’s possible to do some of this completely without batch, just on streams - even more than basic counters and stats you need to keep every single historic event in a data store you need to replay historic events instead of querying / mining your data store don’t query your database - let the database stream what it has to you
  • 34. Near realtime example tools ✦ ✦ ✦ ✦ ✦ query/store-oriented/passivelyadapting: Spark/Shark, Impala, Drill, ParStream, Splunk full-blown CEP engines / continuous querying DSMSs: Esper, TIBCO/StreamBase more pragmatic stream processors: Storm, S4, Samza event-oriented, continuous analysers: keen.io, also speaker’s current WIP etc. etc. etc...
  • 35. Near realtime - DIY ✦ ✦ ✦ ✦ in the end, you’ll have to build it (or core parts of it) yourself you’ll have to work with circular / ring buffers and / or zero-overhead queuing software: Disruptor, 0MQ ideally, you keep everything in one single OS process - multi-threading is still hairy enough then managing and using machine’s overall memory is the tricky part ✦ for GPUs: OpenCL, Rootbeer ✦ embed analytics / statistics into the process
  • 36. Near realtime - DIY ✦ ✦ ✦ ✦ ✦ ✦ picking the basis platform has less to do with the personal flavour than with what it offers C is a good and a valid choice, but very “manual” Erlang/OTP is great for glue, but hard for analytics and integration. In the end, it’s C, but pretty tricky here Node.js is C in the end at this point, but it’s not for single-process / multi-threading and still maturing JVM is a good compromise. Managed / GCcontrolled memory with object wrappers will be sacrificed for off-heap memory with primitives though Most of the rest doesn’t apply for this sort of tasks
  • 37. Near realtime - DIY ✦ ✦ ✦ ✦ ✦ programming paradigms and thus languages are the essential, secret sauce functional programming is ideal for analytics and event-processing (functional) reactive programming, Reactor (as pattern or framework), RX are good for building this sort of systems JavaScript is partially there, Erlang, Clojure, Scala & Co. are further, but can be uncontrollable in runtime behaviour pure Java can be (later) a healthy tradeoff though - now with RX or Reactor, Netty etc.
  • 38. Time in near realtime ✦ ✦ ✦ ✦ ✦ realtime still means real time, even if “near” the platform of your choice might not be ideal for hard or soft realtime, since the difference is primarily in what happens with late events and under high load Erlang will do its best to trigger a timer. Same with Node.js. But they don’t interrupt hard, are scheduling on their own and thus leaving you with an approximation JVM comes close, but still no easy way to interrupt explicitly. Alternative: Hashing Wheel, own scheduler on dedicated core C is the winner, OS-support essential (RTOS alike)
  • 39. Near realtime + data store? ✦ ✦ ✦ ✦ near realtime analytics systems need to store data in different stages: shortterm replay, disaster protection, history the trick is to turn around the way you work with the data store your data store knows model and queries beforehand, and only waits for events to start streaming historic data satisfying the static query / view most NoSQL stores, but also classic RDBMS have implantable workers / jobs / coprocessors as built-in feature: Oracle, Riak, HBase etc.
  • 40. Near realtime business cases ✦ ✦ anomaly / novelty / outlier detection in any sort of system fraud, attack detection based on patterns ✦ situational pricing, product placement ✦ stock, inventory control and forecast ✦ online bidding, trading ✦ automated traffic optimization ✦ semi-automated operations ✦ immediate visualization and tracing
  • 41. Why speed? ✦ ✦ ✦ ✦ why be slow if it’s possible, with comparable effort, to be fast in making decisions and automating them? If not you, then your competitor since everybody can mine data, speed and quality are the only technical success factors left it’s about how fast you can decide based on data. The best way is to start very early, at the source of data “new economy” is all about speed, not (only) lobbies
  • 42. ✦ cartoon images found on the internet and are directly or indirectly property/copyright of or related to Time Warner