Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Why 
your 
company 
needs 
a 
Unified 
Log 
Span 
Conference, 
London, 
28th 
October 
2014
Introducing 
myself 
• Alex 
Dean 
• Co-­‐founder 
and 
technical 
lead 
at 
Snowplow, 
the 
open-­‐source 
event 
analyBc...
So 
what’s 
a 
Unified 
Log?
A 
quick 
history 
lesson: 
the 
three 
eras 
of 
business 
data 
processing 
[1] 
1. The 
classic 
era, 
1996+ 
2. The 
h...
The 
classic 
era 
of 
business 
data 
processing, 
1996+ 
OWN 
DATA 
CENTER 
NARROW 
DATA 
SILOES 
LOW 
LATENCY 
LOCAL 
L...
The 
hybrid 
era, 
2005+ 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
Local 
loop 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐...
The 
hybrid 
era: 
a 
surfeit 
of 
soNware 
vendors 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
Local 
loop 
LOW 
...
The 
hybrid 
era: 
company-­‐wide 
reporQng 
and 
analyQcs 
ends 
up 
like 
Rashomon 
The 
bandit’s 
story 
vs. 
The 
wife...
The 
hybrid 
era: 
the 
number 
of 
data 
integraQons 
is 
unsustainable
So 
how 
do 
we 
unravel 
the 
hairball?
The 
unified 
era, 
2013+ 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
LATENCY 
LOCAL 
LOOPS 
E-­‐comm 
...
The 
unified 
log 
is 
Amazon 
Kinesis, 
or 
Apache 
KaVa 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
L...
“Kaba 
is 
designed 
to 
allow 
a 
single 
cluster 
to 
serve 
as 
the 
central 
data 
backbone 
for 
a 
large 
organizaBo...
So 
what 
does 
a 
unified 
log 
give 
us? 
A 
single 
version 
of 
the 
truth 
Our 
truth 
is 
now 
upstream 
from 
the 
...
What 
does 
a 
unified 
log 
let 
us 
do 
that 
we 
couldn’t 
do 
before? 
PopulaQng 
a 
unified 
log 
with 
your 
company...
But 
garbage 
in, 
garbage 
out: 
it’s 
crucial 
to 
properly 
model 
the 
event 
streams 
feeding 
into 
the 
unified 
lo...
We 
also 
need 
to 
store 
and 
version 
the 
schemas 
used 
to 
describe 
our 
events, 
as 
these 
will 
change 
over 
Qm...
How 
are 
we 
embracing 
the 
unified 
log 
at 
Snowplow?
Some 
background: 
early 
on, 
we 
decided 
that 
Snowplow 
should 
be 
composed 
of 
a 
set 
of 
loosely 
coupled 
subsys...
Today 
almost 
all 
users/customers 
are 
running 
a 
batch-­‐based 
Snowplow 
configuraQon 
Hadoop-­‐ 
based 
enrichment ...
Can 
we 
implement 
Snowplow 
on 
top 
of 
Kinesis/KaVa? 
CLOUD 
VENDOR 
/ 
OWN 
DATA 
CENTER 
Search 
Silo 
SOME 
LOW 
LA...
We 
are 
working 
on 
Amazon 
Kinesis 
support 
first; 
Apache 
KaVa 
will 
come 
later 
(using 
Apache 
Samza 
for 
strea...
Live 
demo!
QuesQons? 
Discount 
code: 
spancNw 
(43% 
off 
all 
Manning 
eBooks 
for 
Span 
J) 
hNp://snowplowanalyBcs.com 
hNps://g...
Nächste SlideShare
Wird geladen in …5
×

Span Conference: Why your company needs a unified log

Apache Kafka and Amazon Kinesis are more than just message queues — they can serve as a unified log which you can put at the heart of your business, effectively creating a "digital nervous system" which your company's applications and processes can be re-structured around.

In this talk, Alex will provide an introduction to unified log technology, highlight some killer use cases and also show how Kinesis is being used "in anger" at Snowplow. Alex's talk will draw on his experiences working with event streams over the last two and a half years at Snowplow; it’s also heavily influenced by Jay Kreps’ unified log monograph, and by Alex's recent work penning Unified Log Processing, a Manning book. Alex's talk will show how event streams inside a unified log are an incredibly powerful primitive for building rich event-centric applications, unbundling local transactional silos and creating a single version of truth for a company.

Alex's talk will conclude with a live demo of Amazon Kinesis in action processing Snowplow events.

Span Conference: Why your company needs a unified log

  1. 1. Why your company needs a Unified Log Span Conference, London, 28th October 2014
  2. 2. Introducing myself • Alex Dean • Co-­‐founder and technical lead at Snowplow, the open-­‐source event analyBcs plaCorm based here in London [1] • Weekend writer of Unified Log Processing, available on the Manning Early Access Program [2] [1] hNps://github.com/snowplow/snowplow [2] hNp://manning.com/dean
  3. 3. So what’s a Unified Log?
  4. 4. A quick history lesson: the three eras of business data processing [1] 1. The classic era, 1996+ 2. The hybrid era, 2005+ 3. The unified era, 2013+ [1] hNp://snowplowanalyBcs.com/blog/ 2014/01/20/the-­‐three-­‐eras-­‐of-­‐business-­‐data-­‐processing/
  5. 5. The classic era of business data processing, 1996+ OWN DATA CENTER NARROW DATA SILOES LOW LATENCY LOCAL LOOPS Point-­‐to-­‐point connec+ons HIGH LATENCY Data warehouse WIDE DATA COVERAGE CMS Silo CRM E-­‐comm Local loop Local loop Silo Local loop Management reporBng ERP Silo Local loop Silo Nightly batch ETL process FULL DATA HISTORY
  6. 6. The hybrid era, 2005+ CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-­‐comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email markeBng Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-­‐batch processing Systems monitoring Batch processing Data warehouse Management reporBng Batch processing Hadoop Ad hoc analyBcs SAAS VENDOR #3 Web analyBcs Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  7. 7. The hybrid era: a surfeit of soNware vendors CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-­‐comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email markeBng Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-­‐batch processing Systems monitoring Batch processing Data warehouse Management reporBng Batch processing Hadoop Ad hoc analyBcs SAAS VENDOR #3 Web analyBcs Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  8. 8. The hybrid era: company-­‐wide reporQng and analyQcs ends up like Rashomon The bandit’s story vs. The wife’s story vs. The samurai’s story vs. The woodcuNer’s story
  9. 9. The hybrid era: the number of data integraQons is unsustainable
  10. 10. So how do we unravel the hairball?
  11. 11. The unified era, 2013+ CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks LOW LATENCY WIDE DATA Unified log COVERAGE Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > FEW DAYS’ DATA HISTORY Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs
  12. 12. The unified log is Amazon Kinesis, or Apache KaVa CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs • Amazon Kinesis, a hosted AWS service • Extremely similar semanBcs to Kaba • Apache Kaba, an append-­‐ only, distributed, ordered commit log • Developed at LinkedIn to serve as their organizaBon’s unified log
  13. 13. “Kaba is designed to allow a single cluster to serve as the central data backbone for a large organizaBon” [1] [1] hNp://kaba.apache.org/
  14. 14. So what does a unified log give us? A single version of the truth Our truth is now upstream from the data warehouse The hairball of point-­‐to-­‐point connecQons has been unravelled Local loops have been unbundled 1 2 3 4
  15. 15. What does a unified log let us do that we couldn’t do before? PopulaQng a unified log with your company’s event streams Real-­‐Bme management reporBng To enable… HolisBc systems monitoring Re-­‐running models from Day 0 A/B tesBng end-­‐to-­‐end pipelines Shipping offline models to RT … anything requiring low latency response / holis+c view of our company’s data!
  16. 16. But garbage in, garbage out: it’s crucial to properly model the event streams feeding into the unified log Subject Direct Object Indirect Verb Object Event Context Prep. ~ Object • We are working on a semanBc model for events – an “event grammar” at Snowplow [1] • The event grammar borrows concepts from human language: • A semanBc model prevents business and technology assumpBons leaking in to the event stream – making it less briNle over Bme [1] hNp://snowplowanalyBcs.com/blog/2013/08/12/ towards-­‐universal-­‐event-­‐analyBcs-­‐building-­‐an-­‐event-­‐grammar/
  17. 17. We also need to store and version the schemas used to describe our events, as these will change over Qme Unified log
  18. 18. How are we embracing the unified log at Snowplow?
  19. 19. Some background: early on, we decided that Snowplow should be composed of a set of loosely coupled subsystems 1. Trackers 2. Collectors 3. Enrich 4. Storage 5. AnalyBcs Generate event data from any environment Log raw events from trackers Validate and enrich raw events = Standardised data protocols Store enriched events ready for analysis Analyze enriched events These turned out to be criBcal to allowing us to evolve the above stack
  20. 20. Today almost all users/customers are running a batch-­‐based Snowplow configuraQon Hadoop-­‐ based enrichment Snowplow event tracking SDK Amazon S3 Amazon Redshik HTTP-­‐based event collector • Batch-­‐based • Normally run overnight; The Snowplow batch-­‐based someBmes every 4-­‐6 hours flow uses Amazon S3 as a “poor man’s” unified log
  21. 21. Can we implement Snowplow on top of Kinesis/KaVa? CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-­‐comm Silo CRM SAAS VENDOR #2 Email markeBng ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream Ad hoc HIGH LATENCY LOW LATENCY Product rec’s analyBcs Management reporBng Fraud detecBon Churn prevenBon APIs
  22. 22. We are working on Amazon Kinesis support first; Apache KaVa will come later (using Apache Samza for stream processing) Scala Stream Collector Raw event stream Enrich Kinesis app Bad raw events stream Enriched event stream S3 Redshik S3 sink Kinesis app Redshik sink Kinesis app Snowplow Trackers = not yet released ElasBc-­‐ Search sink Kinesis app DynamoDB ElasBc-­‐ Search Event aggregator Kinesis app AnalyQcs on Read (for agile exploraBon of event stream, ML, audiBng, applying alternate models, reprocessing etc) AnalyQcs on Write (for dashboarding, audience segmentaBon, RTB, etc)
  23. 23. Live demo!
  24. 24. QuesQons? Discount code: spancNw (43% off all Manning eBooks for Span J) hNp://snowplowanalyBcs.com hNps://github.com/snowplow/snowplow @snowplowdata To meet up or chat, @alexcrdean on TwiNer or alex@snowplowanalyBcs.com

×