SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
deconstructing
LAMBDA


Philly ETE 2014 - Darach Ennis - @darachennis
A journey from speed at
any cost - to unit cost at
considerable scale


Philly ETE 2014 - Darach Ennis - @darachennis
small FAST
DATA guy
Interested in Data Patterns and War Stories (aka: Data Architectures)

Philly ETE 2014 - Darach Ennis - @darachennis
Big Data!
!
!
“The techniques and technologies for such data-
intensive science are so different that it is
worth distinguishing data-intensive science from
computational science as a new, fourth paradigm”
!
- Jim Gray!
!
The Fourth Paradigm: Data-Intensive Scientific Discovery. - Microsoft 2009
Scale vs Speed!
!
!
“Premature optimisation is the root of all evil.”
!
- Donald Knuth !
!
“Premature evil is the root of all optimisation.”
!
- Nitsan Wakart!
DATA intensive!
science @SCALE
Philly ETE 2014 - Darach Ennis - @darachennis
Mechanical Sympathy
Mechanical Sympathy
Mechanical Sympathy
A Wall Street Second
A Swiss Second
Small Data? <= 128bytes
HTTP GET/POST - A typical RESTful performance
0.1
1
10
100
1000
1
10
100
1000
Concurrent Connections
1 2 4 8 16 32 64 128 256 512 1024
Req/Sec Bw/Sec (MB) Avg Latency (ms) Max Latency (ms) Stdev (ms)
14,99815,17315,33015,44515,78715,49914,64212,616
8,705
4,2793,9073,907 4,279
8,705
12,616 14,642 15,499 15,787 15,445 15,330 15,173 14,998
Small Data? <= 1K
HTTP GET/POST - A typical RESTful performance
0.1
1
10
100
1000
1
100
10000
Concurrent Connections
1 2 4 8 16 32 64 128 256 512 1024
Req/Sec Bw/Sec (MB) Avg Latency (ms) Max Latency (ms) Stdev (ms)
2,8422,7882,8302,9162,8582,7902,8492,722
1,951
1,288
690690
1,288
1,951
2,722 2,849 2,790 2,858 2,916 2,830 2,788 2,842
Big Events - 1Billion Sources
Ballpark number of boxes if each box can handle 2500 events/second
Scale
1
1000
1000000
Event Universe
1 million 10 million 100 million 1 billion
1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc
400,000
40,000
4,000
35
16,667
1,667
167
17
112
12
21
5
111
1/dy 1/hr 1/mn 1/sc
Data!
Sympathy?
Philly ETE 2014 - Darach Ennis - @darachennis
5 V's
5 V’s via [V-PEC-T]
• Business Factors
• ‘Veracity’ - The What
• ‘Value’ - The Why
• Technical Domain (Policies, Events, Content)
• Volume, Velocity, Variety
Incremental!
!
The needs of the individual event or query outweigh the
needs of the aggregate events or queries in flight in the
system
Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
Incremental!
!
The needs of the individual event or query outweigh the
needs of the aggregate events or queries in flight in the
system
Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
Incremental!
!
The needs of the individual event or query outweigh the
needs of the aggregate events or queries in flight in the
system
Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
Batch!
!
The needs of the system outweigh the needs of
individual events and queries running in flight or active
within the system
Incremental!
!
The needs of the individual event or query outweigh the
needs of the aggregate events or queries in flight in the
system
- Nathan März
“Computing arbitrary functions on an arbitrary
dataset in real-time is a daunting problem.”
Lambda architecture is a
twitter scale architecture.
5k msgs/sec inbound
(tweets) on average (150k
peak?) - <1k ‘small' data -
Firehose outbound
(broadcast problem, fairly
easy to scale)
Lambda: http://bit.ly/Hs53Ur
Web
Batch
Serving
Speed
Views
Views
Views
Views
Views
Views
Time
Series
Docs K/V Rel
MQ
"New Data"
Data
Apps
Apps
Lambda: A
All new data is sent to both the batch
layer and the speed layer. In the
batch layer, new data is appended to
the master dataset. In the speed
layer, the new data is consumed to
do incremental updates of the
realtime views.
Lambda: B
The master dataset is an immutable,
append-only set of data. The master
dataset only contains the rawest
information that is not derived from
any other information you have.
Lambda: http://bit.ly/Hs53Ur
Web
Batch
Serving
Speed
Views
Views
Views
Views
Views
Views
Time
Series
Docs K/V Rel
MQ
"New Data"
Data
Apps
Apps
?? ?
Enrich, Transform, Store!
Extract, Transform, Load
• From A: “rawest … not derived"
• In many environments it may be preferable to
normalise data for later ease of retrieval (eg:
Dremel, strongly typed nested records) to
support scalable ad hoc query.

• Derivation allows other forms of efficient retrieval
eg: using SAX - Symbolic Aggregate
Approximation, PAA - Piecewise Aggregate
SAX & PAA
Symbolic Aggregate
Approximation
Piecewise Aggregate
Approximation
1sc -> 1mn -> 1hr -> 1dy -> 1wk -> 1mh -> 1yr
Lambda: C
The batch layer precomputes query
functions from scratch. The results of the
batch layer are called batch views. The
batch layer runs in a while(true) loop and
continuously recomputes the batch views
from scratch. The strength of the batch
layer is its ability to compute arbitrary
functions on arbitrary data. This gives it
the power to support any application.
Lambda: D
The serving layer indexes the batch views
produced by the batch layer and makes it
possible to get particular values out of a
batch view very quickly. The serving layer is
a scalable database that swaps in new
batch views as they’re made available.
Because of the latency of the batch layer, the
results available from the serving layer are
always out of date by a few hours.
Lambda: http://bit.ly/Hs53Ur
Web
Batch
Serving
Speed
Views
Views
Views
Views
Views
Views
Time
Series
Docs K/V Rel
MQ
"New Data"
Data
Apps
Apps
?
Think ‘Statistical
Compression'
https://github.com/gornik/gorgeo - A geohash ES plugin
Lambda: E
The speed layer compensates for the high latency of updates
to the serving layer. It uses fast incremental algorithms and
read/write databases to produce realtime views that are
always up to date. The speed layer only deals with recent
data, because any data older than that has been absorbed
into the batch layer and accounted for in the serving layer.
The speed layer is significantly more complex than the
batch and serving layers, but that complexity is
compensated by the fact that the realtime views can be
continuously discarded as data makes its way through the
batch and serving layers. So, the potential negative impact
of that complexity is greatly limited.
Lambda: http://bit.ly/Hs53Ur
Web
Batch
Serving
Speed
Views
Views
Views
Views
Views
Views
Time
Series
Docs K/V Rel
MQ
"New Data"
Data
Apps
Apps
?
Use a DSP + CEP/ESP or
‘Scalable CEP'
• Storm/S4 + Esper/…
• Embed a CEP/ESP within a
Distributed Stream processing
Engine
• Use Drill for large scale ad hoc
query [leverage nested records]
Lambda: F
Queries are resolved by getting results from both the
batch and realtime views and merging them
together.
Millwheel: http://bit.ly/1gWqNIC
Web
Query
Window
CounterQueries
Model
Stats
Stats
Model
Out of
Trend?
Alerts
Window
Counter
Model
Out of
Trend?
Monitor
Google’s “Zeitgeist
pipeline"
Lambda: Batch View
• Precomputed Queries are central to Complex
Event Processing / Event Stream Processing
architectures.
• Unfortunately, though, most DBMS’s still offer only
synchronous blocking RPC access to underlying
data when asynchronous guaranteed delivery
would be preferable for view construction
leveraging CEP/ESP techniques.
Lambda: Merging …
• Possibly one of the most difficult aspects of near
real-time and historical data integration is
combining flows sensibly.
• For example, is the order of interleaving across
merge sources applied in a known deterministically
recomputable order? If not, how can results be
recomputed subsequently? Will data converge? 



[cf: http://cs.brown.edu/research/aurora/hwang.icde05.ha.pdf]
Lambda: A start …
Web
Batch
Serving
Speed
Views
Views
Views
Views
Views
Views
Time
Series
Docs K/V Rel
MQ
"New Data"
Data
Apps
Apps
Lambda Architecture - An
architectural pattern
producing war stories is
better than no patterns at all
Thanks.
!
Questions?
!
@darachennis

Weitere ähnliche Inhalte

Was ist angesagt?

Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesLego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesDataWorks Summit/Hadoop Summit
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIOJozo Kovac
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Helena Edelson
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)Robert Metzger
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ ScaleJerome Boulon
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 

Was ist angesagt? (20)

Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming PipelinesLego-like building blocks of Storm and Spark Streaming Pipelines
Lego-like building blocks of Storm and Spark Streaming Pipelines
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign ...
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Flink Streaming
Flink StreamingFlink Streaming
Flink Streaming
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ Scale
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 

Andere mochten auch

FunctionalJS - May 2014 - Streams
FunctionalJS - May 2014 - StreamsFunctionalJS - May 2014 - Streams
FunctionalJS - May 2014 - Streamsdarach
 
Streams and Things
Streams and ThingsStreams and Things
Streams and Thingsdarach
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
Data distribution in the cloud with Node.js
Data distribution in the cloud with Node.jsData distribution in the cloud with Node.js
Data distribution in the cloud with Node.jsdarach
 
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011darach
 
Thing. An unexpected journey. Devoxx UK 2014
Thing. An unexpected journey. Devoxx UK 2014Thing. An unexpected journey. Devoxx UK 2014
Thing. An unexpected journey. Devoxx UK 2014darach
 

Andere mochten auch (6)

FunctionalJS - May 2014 - Streams
FunctionalJS - May 2014 - StreamsFunctionalJS - May 2014 - Streams
FunctionalJS - May 2014 - Streams
 
Streams and Things
Streams and ThingsStreams and Things
Streams and Things
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Data distribution in the cloud with Node.js
Data distribution in the cloud with Node.jsData distribution in the cloud with Node.js
Data distribution in the cloud with Node.js
 
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
 
Thing. An unexpected journey. Devoxx UK 2014
Thing. An unexpected journey. Devoxx UK 2014Thing. An unexpected journey. Devoxx UK 2014
Thing. An unexpected journey. Devoxx UK 2014
 

Ähnlich wie Deconstructing Lambda

Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)jaxLondonConference
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
JDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorJDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorPROIDEA
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022ZainAsgar1
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Paris Carbone
 

Ähnlich wie Deconstructing Lambda (20)

Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
JDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregorJDD2014: Real Big Data - Scott MacGregor
JDD2014: Real Big Data - Scott MacGregor
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Data science for infrastructure dev week 2022
Data science for infrastructure   dev week 2022Data science for infrastructure   dev week 2022
Data science for infrastructure dev week 2022
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
IBM Aspera overview
IBM Aspera overview IBM Aspera overview
IBM Aspera overview
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 

Mehr von darach

Meta Programming with Streams and Pipes
Meta Programming with Streams and PipesMeta Programming with Streams and Pipes
Meta Programming with Streams and Pipesdarach
 
Erlang/Sapiens
Erlang/SapiensErlang/Sapiens
Erlang/Sapiensdarach
 
QCON London 2013
QCON London 2013QCON London 2013
QCON London 2013darach
 
EFL Munich - February 2013 - "Conversational Big Data with Erlang"
EFL Munich - February 2013 - "Conversational Big Data with Erlang"EFL Munich - February 2013 - "Conversational Big Data with Erlang"
EFL Munich - February 2013 - "Conversational Big Data with Erlang"darach
 
Streamy, Pipy, Analyticy
Streamy, Pipy, AnalyticyStreamy, Pipy, Analyticy
Streamy, Pipy, Analyticydarach
 
Tech mesh london 2012
Tech mesh london 2012Tech mesh london 2012
Tech mesh london 2012darach
 

Mehr von darach (6)

Meta Programming with Streams and Pipes
Meta Programming with Streams and PipesMeta Programming with Streams and Pipes
Meta Programming with Streams and Pipes
 
Erlang/Sapiens
Erlang/SapiensErlang/Sapiens
Erlang/Sapiens
 
QCON London 2013
QCON London 2013QCON London 2013
QCON London 2013
 
EFL Munich - February 2013 - "Conversational Big Data with Erlang"
EFL Munich - February 2013 - "Conversational Big Data with Erlang"EFL Munich - February 2013 - "Conversational Big Data with Erlang"
EFL Munich - February 2013 - "Conversational Big Data with Erlang"
 
Streamy, Pipy, Analyticy
Streamy, Pipy, AnalyticyStreamy, Pipy, Analyticy
Streamy, Pipy, Analyticy
 
Tech mesh london 2012
Tech mesh london 2012Tech mesh london 2012
Tech mesh london 2012
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Deconstructing Lambda

  • 1. deconstructing LAMBDA 
 Philly ETE 2014 - Darach Ennis - @darachennis
  • 2. A journey from speed at any cost - to unit cost at considerable scale 
 Philly ETE 2014 - Darach Ennis - @darachennis
  • 3.
  • 4. small FAST DATA guy Interested in Data Patterns and War Stories (aka: Data Architectures)
 Philly ETE 2014 - Darach Ennis - @darachennis
  • 5. Big Data! ! ! “The techniques and technologies for such data- intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm” ! - Jim Gray! ! The Fourth Paradigm: Data-Intensive Scientific Discovery. - Microsoft 2009
  • 6. Scale vs Speed! ! ! “Premature optimisation is the root of all evil.” ! - Donald Knuth ! ! “Premature evil is the root of all optimisation.” ! - Nitsan Wakart!
  • 7. DATA intensive! science @SCALE Philly ETE 2014 - Darach Ennis - @darachennis
  • 11. A Wall Street Second
  • 13. Small Data? <= 128bytes HTTP GET/POST - A typical RESTful performance 0.1 1 10 100 1000 1 10 100 1000 Concurrent Connections 1 2 4 8 16 32 64 128 256 512 1024 Req/Sec Bw/Sec (MB) Avg Latency (ms) Max Latency (ms) Stdev (ms) 14,99815,17315,33015,44515,78715,49914,64212,616 8,705 4,2793,9073,907 4,279 8,705 12,616 14,642 15,499 15,787 15,445 15,330 15,173 14,998
  • 14. Small Data? <= 1K HTTP GET/POST - A typical RESTful performance 0.1 1 10 100 1000 1 100 10000 Concurrent Connections 1 2 4 8 16 32 64 128 256 512 1024 Req/Sec Bw/Sec (MB) Avg Latency (ms) Max Latency (ms) Stdev (ms) 2,8422,7882,8302,9162,8582,7902,8492,722 1,951 1,288 690690 1,288 1,951 2,722 2,849 2,790 2,858 2,916 2,830 2,788 2,842
  • 15. Big Events - 1Billion Sources Ballpark number of boxes if each box can handle 2500 events/second Scale 1 1000 1000000 Event Universe 1 million 10 million 100 million 1 billion 1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc 1/dy 1/hr 1/mn 1/sc 400,000 40,000 4,000 35 16,667 1,667 167 17 112 12 21 5 111 1/dy 1/hr 1/mn 1/sc
  • 16. Data! Sympathy? Philly ETE 2014 - Darach Ennis - @darachennis
  • 17. 5 V's
  • 18. 5 V’s via [V-PEC-T] • Business Factors • ‘Veracity’ - The What • ‘Value’ - The Why • Technical Domain (Policies, Events, Content) • Volume, Velocity, Variety
  • 19. Incremental! ! The needs of the individual event or query outweigh the needs of the aggregate events or queries in flight in the system Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
  • 20. Incremental! ! The needs of the individual event or query outweigh the needs of the aggregate events or queries in flight in the system Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
  • 21. Incremental! ! The needs of the individual event or query outweigh the needs of the aggregate events or queries in flight in the system Source: Ashwani Roy, Charles Cai - QCON London 2013 - http://bit.ly/1f2Pdf9
  • 22. Batch! ! The needs of the system outweigh the needs of individual events and queries running in flight or active within the system
  • 23. Incremental! ! The needs of the individual event or query outweigh the needs of the aggregate events or queries in flight in the system
  • 24. - Nathan März “Computing arbitrary functions on an arbitrary dataset in real-time is a daunting problem.”
  • 25. Lambda architecture is a twitter scale architecture. 5k msgs/sec inbound (tweets) on average (150k peak?) - <1k ‘small' data - Firehose outbound (broadcast problem, fairly easy to scale)
  • 27. Lambda: A All new data is sent to both the batch layer and the speed layer. In the batch layer, new data is appended to the master dataset. In the speed layer, the new data is consumed to do incremental updates of the realtime views.
  • 28. Lambda: B The master dataset is an immutable, append-only set of data. The master dataset only contains the rawest information that is not derived from any other information you have.
  • 30. Enrich, Transform, Store! Extract, Transform, Load • From A: “rawest … not derived" • In many environments it may be preferable to normalise data for later ease of retrieval (eg: Dremel, strongly typed nested records) to support scalable ad hoc query.
 • Derivation allows other forms of efficient retrieval eg: using SAX - Symbolic Aggregate Approximation, PAA - Piecewise Aggregate
  • 31. SAX & PAA Symbolic Aggregate Approximation Piecewise Aggregate Approximation 1sc -> 1mn -> 1hr -> 1dy -> 1wk -> 1mh -> 1yr
  • 32. Lambda: C The batch layer precomputes query functions from scratch. The results of the batch layer are called batch views. The batch layer runs in a while(true) loop and continuously recomputes the batch views from scratch. The strength of the batch layer is its ability to compute arbitrary functions on arbitrary data. This gives it the power to support any application.
  • 33. Lambda: D The serving layer indexes the batch views produced by the batch layer and makes it possible to get particular values out of a batch view very quickly. The serving layer is a scalable database that swaps in new batch views as they’re made available. Because of the latency of the batch layer, the results available from the serving layer are always out of date by a few hours.
  • 36. Lambda: E The speed layer compensates for the high latency of updates to the serving layer. It uses fast incremental algorithms and read/write databases to produce realtime views that are always up to date. The speed layer only deals with recent data, because any data older than that has been absorbed into the batch layer and accounted for in the serving layer. The speed layer is significantly more complex than the batch and serving layers, but that complexity is compensated by the fact that the realtime views can be continuously discarded as data makes its way through the batch and serving layers. So, the potential negative impact of that complexity is greatly limited.
  • 38. Use a DSP + CEP/ESP or ‘Scalable CEP' • Storm/S4 + Esper/… • Embed a CEP/ESP within a Distributed Stream processing Engine • Use Drill for large scale ad hoc query [leverage nested records]
  • 39. Lambda: F Queries are resolved by getting results from both the batch and realtime views and merging them together.
  • 41. Lambda: Batch View • Precomputed Queries are central to Complex Event Processing / Event Stream Processing architectures. • Unfortunately, though, most DBMS’s still offer only synchronous blocking RPC access to underlying data when asynchronous guaranteed delivery would be preferable for view construction leveraging CEP/ESP techniques.
  • 42. Lambda: Merging … • Possibly one of the most difficult aspects of near real-time and historical data integration is combining flows sensibly. • For example, is the order of interleaving across merge sources applied in a known deterministically recomputable order? If not, how can results be recomputed subsequently? Will data converge? 
 
 [cf: http://cs.brown.edu/research/aurora/hwang.icde05.ha.pdf]
  • 43. Lambda: A start … Web Batch Serving Speed Views Views Views Views Views Views Time Series Docs K/V Rel MQ "New Data" Data Apps Apps
  • 44. Lambda Architecture - An architectural pattern producing war stories is better than no patterns at all