SlideShare ist ein Scribd-Unternehmen logo
1 von 30
building a system for machine and
event-oriented data
e. sammer | @esammer
kafka summit 2016
© 2015 Rocana, Inc. All Rights Reserved.
context
© 2015 Rocana, Inc. All Rights Reserved.
what we do
3
• we build a system for the operation of modern data centers
• triage and diagnostics, exploration, trends, advanced analytics of complex
systems
• our data: logs, metrics, human activity, anything that occurs in the data center
• “enterprise software” (i.e. we build for others.)
• today: how we built what we built
© 2015 Rocana, Inc. All Rights Reserved.
our typical customer use cases
4
• >100K events / sec (8.6B events / day), sub-second end to end latency, full
fidelity retention, critical use cases
• quality of service - “are credit card transactions happening fast enough?”
• fraud detection - “detect, investigate, prosecute, and learn from fraud.”
• forensic diagnostics - “what really caused the outage last friday?”
• security - “who’s doing what, where, when, why, and how, and is that ok?”
• user behavior - ”capture and correlate user behavior with system performance,
then feed it to downstream systems in realtime.”
© 2015 Rocana, Inc. All Rights Reserved.
depth: 3 meters
© 2015 Rocana, Inc. All Rights Reserved.
high level architecture – sources
6
© 2015 Rocana, Inc. All Rights Reserved.
high level architecture – sinks
7
© 2015 Rocana, Inc. All Rights Reserved.
guarantees
8
• no single point of failure exists
• all components scale horizontally[1]
• data retention and latency is a function of cost, not tech[1]
• every event is delivered provided no more than N - 1 failures occur (where N is
the kafka replication level)
• all operations, including upgrade, are online[2]
• every event is (or appears to be) delivered exactly once[3]
[1] we’re positive there’s a limit, but thus far it has been cost.
[2] from the user’s perspective, at a system level.
[3] when queried via our UI. lots of details here.
© 2015 Rocana, Inc. All Rights Reserved.
events
© 2015 Rocana, Inc. All Rights Reserved.
modeling our world
10
• everything is an event
• each event contains a timestamp, type, location, host, service, body, and type-
specific attributes (k/v pairs)
• build specialized aggregates as necessary - just optimized views of the data
© 2015 Rocana, Inc. All Rights Reserved.
event schema
11
{
id: string,
ts: long,
event_type_id: int,
location: string,
host: string,
service: string,
body: [ null, string ],
attributes: map<string>
}
© 2015 Rocana, Inc. All Rights Reserved.
event types
12
• some event types are standard
– syslog, http, log4j, generic text record, …
• users define custom event types
• producers populate event type
• transformations can turn an event of type A into B
• event type metadata tells downstream systems how to interpret body and
attributes
© 2015 Rocana, Inc. All Rights Reserved.
ex: generic syslog event
13
event_type_id: 100, // rfc3164, rfc5424 (syslog)
body: … // raw syslog message bytes
attributes: { // extracted fields from body
syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”,
syslog_severity: “6”, // info severity
syslog_facility: “3”, // daemon facility
syslog_process: “dhclient”,
syslog_pid: “668”,
…
}
© 2015 Rocana, Inc. All Rights Reserved.
ex: generic http event
14
event_type_id: 102, // generic http event
body: … // raw http log message bytes
attributes: {
http_req_method: “GET”,
http_req_vhost: “w2a-demo-02”,
http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”,
http_req_query: “q=service%3Asshd&p=1&s=200”,
http_resp_code: “200”,
…
}
© 2015 Rocana, Inc. All Rights Reserved.
stream processing
© 2015 Rocana, Inc. All Rights Reserved.
a reminder…
16
© 2015 Rocana, Inc. All Rights Reserved.
data processing
17
• each processing job gets a full stream of the fire hose, decides what it wants to
consider or operate on
• output of “non-terminal” jobs always just events
• result: all processing jobs are composable
• many jobs take user rules or configuration from our ui
© 2015 Rocana, Inc. All Rights Reserved.
the jobs
18
• transformation engine: configuration-based data transformation
• metric aggregation: olap cube construction of time series data (e.g. host 17
user cpu time)
• model build/eval: train/evaluate various kinds of models (e.g. anomaly
detection)
• trigger engine: detect complex patterns in the stream, emit events on match
(e.g. complex event processing, automated workflow, alerting)
• action engine: perform some action upon receiving a specific event type (e.g.
email notification, 3rd party api invocation)
• storage: write all the things to hdfs
© 2015 Rocana, Inc. All Rights Reserved.
transformation use cases
19
© 2015 Rocana, Inc. All Rights Reserved.
event feedback loops
20
© 2015 Rocana, Inc. All Rights Reserved.
metrics and time series
© 2015 Rocana, Inc. All Rights Reserved.
aggregation
22
• mostly for time series metrics
• two halves: on write and on query
• data model: (dimensions) => (aggregates)
• on write
– reduce(a: A, b: A) => A over window
– store “base” aggregates, all associative and commutative
• on query
– perform same aggregate or derivative aggregates
– group by the same dimensions
– we use SQL (Impala)
© 2015 Rocana, Inc. All Rights Reserved.
aside: late arriving data (it’s a thing)
23
• never trust a (wall) clock
• producer determines event time, rest of the system uses this always
• data that shows up late always processed according to event time
• apache beam describes these issues perfectly
• this is real and you must deal with it
© 2015 Rocana, Inc. All Rights Reserved.
ex: service event volume by host and minute
24
• dimensions: ts, window, location, host, service, metric
• on write, aggregates: count, sum, min, max, last
• epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume =>
17, 42, 1, 10, 8
• on query:
– SELECT floor(ts / 60000) as bin, host, service, metric, sum(value_sum) FROM
events WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY
bin, host, service, metric
• if late arriving data existed in events, the same dimensions would repeat with a
another set of aggregates and would be rolled up as a result of the group by
• tl;dr: normal window aggregation operations
© 2015 Rocana, Inc. All Rights Reserved.
extension, pain, and advice
© 2015 Rocana, Inc. All Rights Reserved.
extending the system
26
• custom producers
• custom consumers
• event types
• parser / transformation plugins
• custom metric definition and aggregate functions
• custom processing jobs on landed data
© 2015 Rocana, Inc. All Rights Reserved.
pain (aka: the struggle is real)
27
• lots of tradeoffs when picking a stream processing solution
– samza: right features, but low level programming model, not supported by vendors.
missing security features.
– storm: too rigid, too slow. not supported by all Hadoop vendors.
– flink: relatively new, fledgling community. growing.
– spark streaming: tons of issues initially, but lots of community energy. improving.
• stack complexity, (relative im)maturity
• beam-style retractions required for correct, timely, efficient aggregates of
complex metrics (non-assoc/commutative)
© 2015 Rocana, Inc. All Rights Reserved.
if you’re going to try this…
28
• read all the literature on stream processing[1]
• treat it like the distributed systems problem it is
• understand, make, and make good on guarantees
• find the right abstractions
• never trust the hand waving or “hello worlds”
• fully evaluate the projects/products in this space
• understand it’s not just about search
[1] wait, like all of it? yea, like all of it.
© 2015 Rocana, Inc. All Rights Reserved.
things I didn’t talk about
29
• reprocessing data when bad code / transformations are detected
• dealing with data quality issues (“the struggle is real” part 2)
• the user interface and all the fancy analytics
– data visualization and exploration
– event search
– anomalous trend and event detection
– metric, source, and event correlation
– motif finding
– noise reduction and dithering
• event delivery semantics (e.g. at least once, exactly once, etc.)
• alerting, notification, and other subsystems
© 2015 Rocana, Inc. All Rights Reserved.
questions?
thank you.
@esammer | esammer@rocana.com

Weitere ähnliche Inhalte

Was ist angesagt?

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

Was ist angesagt? (20)

Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALAB
 
Bridging the Gap: Connecting AWS and Kafka
Bridging the Gap: Connecting AWS and KafkaBridging the Gap: Connecting AWS and Kafka
Bridging the Gap: Connecting AWS and Kafka
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
How Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message QueueHow Yelp Leapt to Microservices with More than a Message Queue
How Yelp Leapt to Microservices with More than a Message Queue
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
 
Apache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics PlatformApache Kafka in Adobe Ad Cloud's Analytics Platform
Apache Kafka in Adobe Ad Cloud's Analytics Platform
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 

Andere mochten auch

Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
confluent
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
confluent
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
confluent
 

Andere mochten auch (20)

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
 
The Rise of Real Time
The Rise of Real TimeThe Rise of Real Time
The Rise of Real Time
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
 
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Ch...
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
Simplifying Event Streaming: Tools for Location Transparency and Data Evoluti...
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian LitaKafka, Killer of Point-to-Point Integrations, Lucian Lita
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 

Ähnlich wie Building an Event-oriented Data Platform with Kafka, Eric Sammer

Ähnlich wie Building an Event-oriented Data Platform with Kafka, Eric Sammer (20)

Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015Building a system for machine and event-oriented data - SF HUG Nov 2015
Building a system for machine and event-oriented data - SF HUG Nov 2015
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
IoT interoperability
IoT interoperabilityIoT interoperability
IoT interoperability
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 

Kürzlich hochgeladen (20)

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 

Building an Event-oriented Data Platform with Kafka, Eric Sammer

  • 1. building a system for machine and event-oriented data e. sammer | @esammer kafka summit 2016
  • 2. © 2015 Rocana, Inc. All Rights Reserved. context
  • 3. © 2015 Rocana, Inc. All Rights Reserved. what we do 3 • we build a system for the operation of modern data centers • triage and diagnostics, exploration, trends, advanced analytics of complex systems • our data: logs, metrics, human activity, anything that occurs in the data center • “enterprise software” (i.e. we build for others.) • today: how we built what we built
  • 4. © 2015 Rocana, Inc. All Rights Reserved. our typical customer use cases 4 • >100K events / sec (8.6B events / day), sub-second end to end latency, full fidelity retention, critical use cases • quality of service - “are credit card transactions happening fast enough?” • fraud detection - “detect, investigate, prosecute, and learn from fraud.” • forensic diagnostics - “what really caused the outage last friday?” • security - “who’s doing what, where, when, why, and how, and is that ok?” • user behavior - ”capture and correlate user behavior with system performance, then feed it to downstream systems in realtime.”
  • 5. © 2015 Rocana, Inc. All Rights Reserved. depth: 3 meters
  • 6. © 2015 Rocana, Inc. All Rights Reserved. high level architecture – sources 6
  • 7. © 2015 Rocana, Inc. All Rights Reserved. high level architecture – sinks 7
  • 8. © 2015 Rocana, Inc. All Rights Reserved. guarantees 8 • no single point of failure exists • all components scale horizontally[1] • data retention and latency is a function of cost, not tech[1] • every event is delivered provided no more than N - 1 failures occur (where N is the kafka replication level) • all operations, including upgrade, are online[2] • every event is (or appears to be) delivered exactly once[3] [1] we’re positive there’s a limit, but thus far it has been cost. [2] from the user’s perspective, at a system level. [3] when queried via our UI. lots of details here.
  • 9. © 2015 Rocana, Inc. All Rights Reserved. events
  • 10. © 2015 Rocana, Inc. All Rights Reserved. modeling our world 10 • everything is an event • each event contains a timestamp, type, location, host, service, body, and type- specific attributes (k/v pairs) • build specialized aggregates as necessary - just optimized views of the data
  • 11. © 2015 Rocana, Inc. All Rights Reserved. event schema 11 { id: string, ts: long, event_type_id: int, location: string, host: string, service: string, body: [ null, string ], attributes: map<string> }
  • 12. © 2015 Rocana, Inc. All Rights Reserved. event types 12 • some event types are standard – syslog, http, log4j, generic text record, … • users define custom event types • producers populate event type • transformations can turn an event of type A into B • event type metadata tells downstream systems how to interpret body and attributes
  • 13. © 2015 Rocana, Inc. All Rights Reserved. ex: generic syslog event 13 event_type_id: 100, // rfc3164, rfc5424 (syslog) body: … // raw syslog message bytes attributes: { // extracted fields from body syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”, syslog_severity: “6”, // info severity syslog_facility: “3”, // daemon facility syslog_process: “dhclient”, syslog_pid: “668”, … }
  • 14. © 2015 Rocana, Inc. All Rights Reserved. ex: generic http event 14 event_type_id: 102, // generic http event body: … // raw http log message bytes attributes: { http_req_method: “GET”, http_req_vhost: “w2a-demo-02”, http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”, http_req_query: “q=service%3Asshd&p=1&s=200”, http_resp_code: “200”, … }
  • 15. © 2015 Rocana, Inc. All Rights Reserved. stream processing
  • 16. © 2015 Rocana, Inc. All Rights Reserved. a reminder… 16
  • 17. © 2015 Rocana, Inc. All Rights Reserved. data processing 17 • each processing job gets a full stream of the fire hose, decides what it wants to consider or operate on • output of “non-terminal” jobs always just events • result: all processing jobs are composable • many jobs take user rules or configuration from our ui
  • 18. © 2015 Rocana, Inc. All Rights Reserved. the jobs 18 • transformation engine: configuration-based data transformation • metric aggregation: olap cube construction of time series data (e.g. host 17 user cpu time) • model build/eval: train/evaluate various kinds of models (e.g. anomaly detection) • trigger engine: detect complex patterns in the stream, emit events on match (e.g. complex event processing, automated workflow, alerting) • action engine: perform some action upon receiving a specific event type (e.g. email notification, 3rd party api invocation) • storage: write all the things to hdfs
  • 19. © 2015 Rocana, Inc. All Rights Reserved. transformation use cases 19
  • 20. © 2015 Rocana, Inc. All Rights Reserved. event feedback loops 20
  • 21. © 2015 Rocana, Inc. All Rights Reserved. metrics and time series
  • 22. © 2015 Rocana, Inc. All Rights Reserved. aggregation 22 • mostly for time series metrics • two halves: on write and on query • data model: (dimensions) => (aggregates) • on write – reduce(a: A, b: A) => A over window – store “base” aggregates, all associative and commutative • on query – perform same aggregate or derivative aggregates – group by the same dimensions – we use SQL (Impala)
  • 23. © 2015 Rocana, Inc. All Rights Reserved. aside: late arriving data (it’s a thing) 23 • never trust a (wall) clock • producer determines event time, rest of the system uses this always • data that shows up late always processed according to event time • apache beam describes these issues perfectly • this is real and you must deal with it
  • 24. © 2015 Rocana, Inc. All Rights Reserved. ex: service event volume by host and minute 24 • dimensions: ts, window, location, host, service, metric • on write, aggregates: count, sum, min, max, last • epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume => 17, 42, 1, 10, 8 • on query: – SELECT floor(ts / 60000) as bin, host, service, metric, sum(value_sum) FROM events WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY bin, host, service, metric • if late arriving data existed in events, the same dimensions would repeat with a another set of aggregates and would be rolled up as a result of the group by • tl;dr: normal window aggregation operations
  • 25. © 2015 Rocana, Inc. All Rights Reserved. extension, pain, and advice
  • 26. © 2015 Rocana, Inc. All Rights Reserved. extending the system 26 • custom producers • custom consumers • event types • parser / transformation plugins • custom metric definition and aggregate functions • custom processing jobs on landed data
  • 27. © 2015 Rocana, Inc. All Rights Reserved. pain (aka: the struggle is real) 27 • lots of tradeoffs when picking a stream processing solution – samza: right features, but low level programming model, not supported by vendors. missing security features. – storm: too rigid, too slow. not supported by all Hadoop vendors. – flink: relatively new, fledgling community. growing. – spark streaming: tons of issues initially, but lots of community energy. improving. • stack complexity, (relative im)maturity • beam-style retractions required for correct, timely, efficient aggregates of complex metrics (non-assoc/commutative)
  • 28. © 2015 Rocana, Inc. All Rights Reserved. if you’re going to try this… 28 • read all the literature on stream processing[1] • treat it like the distributed systems problem it is • understand, make, and make good on guarantees • find the right abstractions • never trust the hand waving or “hello worlds” • fully evaluate the projects/products in this space • understand it’s not just about search [1] wait, like all of it? yea, like all of it.
  • 29. © 2015 Rocana, Inc. All Rights Reserved. things I didn’t talk about 29 • reprocessing data when bad code / transformations are detected • dealing with data quality issues (“the struggle is real” part 2) • the user interface and all the fancy analytics – data visualization and exploration – event search – anomalous trend and event detection – metric, source, and event correlation – motif finding – noise reduction and dithering • event delivery semantics (e.g. at least once, exactly once, etc.) • alerting, notification, and other subsystems
  • 30. © 2015 Rocana, Inc. All Rights Reserved. questions? thank you. @esammer | esammer@rocana.com