SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Š Rocana, Inc. All Rights Reserved. | 1
JOEY ECHEVERRIA | @fwiffo | November 4th, 2015
San Francisco Hadoop Users Group
Building a System for Machine and
Event-Oriented Data
Š Rocana, Inc. All Rights Reserved. | 2
Context
Š Rocana, Inc. All Rights Reserved. | 3
Joey
• Where I work: Rocana – Director of Engineering
• Where I used to work: Cloudera (‘11 – ’15), NSA
• Distributed systems, security, data processing, “big data”
Š Rocana, Inc. All Rights Reserved. | 4
Free stuff!
• Tweet @rocanainc with
#SFHUG – best three
tweets get a book
Š Rocana, Inc. All Rights Reserved. | 5
What we do
• Build a system for the operation of modern data centers
• Triage and diagnostics, exploration, trends, advanced analytics of
complex systems
• Our data:
• logs, metrics, human activity, anything that occurs in the data center
• “Enterprise Software” (i.e. we build for others.)
• Today: how we built what we built
Š Rocana, Inc. All Rights Reserved. | 6
Our typical customer use cases
• >100K events / sec (8.6B events / day), sub-second end to end latency,
full fidelity retention, critical use cases
• Quality of service - “are credit card transactions happening fast enough?”
• Fraud detection - “detect, investigate, prosecute, and learn from fraud.”
• Forensic diagnostics - “what really caused the outage last friday?”
• Security - “who’s doing what, where, when, why, and how, and is that ok?”
• User behavior - ”capture and correlate user behavior with system
performance, then feed it to downstream systems in realtime.”
Š Rocana, Inc. All Rights Reserved. | 7
10,000 foot view
Š Rocana, Inc. All Rights Reserved. | 8
High level architecture
Š Rocana, Inc. All Rights Reserved. | 9
Guarantees
• No single point of failure exists
• All components scale horizontally[1]
• Data retention and latency is a function of cost, not tech[1]
• Every event is delivered provided no more than N - 1 failures occur
(where N is the kafka replication level)
• All operations, including upgrade, are online[2]
• Every event is (or appears to be) delivered exactly once[3]
[1] we’re positive there’s a limit, but thus far it has been cost.
[2] from the user’s perspective, at a system level.
[3] when queried via our UI. lots of details here.
Š Rocana, Inc. All Rights Reserved. | 10
Events
Š Rocana, Inc. All Rights Reserved. | 11
Modeling our world
• Everything is an event
• Each event contains a timestamp, type, location, host, service, body, and
type-specific attributes (k/v pairs)
• Build specialized aggregates as necessary - just optimized views of the
data
Š Rocana, Inc. All Rights Reserved. | 12
Event schema
{
id: string,
ts: long,
event_type_id: int,
location: string,
host: string,
service: string,
body: [ null, bytes ],
attributes: map<string>
}
Š Rocana, Inc. All Rights Reserved. | 13
Event types
• Some event types are standard
• syslog, http, log4j, generic text record, …
• Users define custom event types
• Producers populate event type
• Transformations can turn one event type into another
• Event type metadata tells downstream systems how to interpret body and
attributes
Š Rocana, Inc. All Rights Reserved. | 14
Ex: generic syslog event
event_type_id: 100, // rfc3164, rfc5424 (syslog)
body: … // raw syslog message bytes
attributes: { // extracted fields from body
syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”,
syslog_severity: “6”, // info severity
syslog_facility: “3”, // daemon facility
syslog_process: “dhclient”,
syslog_pid: “668”,
…
}
Š Rocana, Inc. All Rights Reserved. | 15
Ex: generic http event
event_type_id: 102, // generic http event
body: … // raw http log message bytes
attributes: {
http_req_method: “GET”,
http_req_vhost: “w2a-demo-02”,
http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”,
http_req_query: “q=service%3Asshd&p=1&s=200”,
http_resp_code: “200”,
…
}
Š Rocana, Inc. All Rights Reserved. | 16
Consumers
Š Rocana, Inc. All Rights Reserved. | 17
Consumers
• …do most of the work
• Parallelism
• Kafka offset management
• Message de-duplication
• Transformation (embedded library)
• Dead letter queue support
• Downstream system knowledge
Š Rocana, Inc. All Rights Reserved. | 18
Inside a consumer
Š Rocana, Inc. All Rights Reserved. | 19
Metrics and time series
Š Rocana, Inc. All Rights Reserved. | 20
Aggregation
• Mostly for time series metrics
• Two halves: on write and on query
• Data model: (dimensions) => (aggregates)
• On write
• reduce(a: A, b: A) => A over window
• Store “base” aggregates, all associative and commutative
• On query
• Perform same aggregate or derivative aggregates
• Group by the same dimensions
• SQL (Impala)
Š Rocana, Inc. All Rights Reserved. | 21
Aside: late arriving data (it’s a thing)
• Never trust a (wall) clock
• Producer determines observation time, rest of the system uses this always
• Data that shows up late always processed according to observation time
• Aggregation consequences
• The same time window can appear multiple times
• Solution: aggregate every N seconds, potentially generating multiple aggregates for
the same time bin
• This is real and you must deal with it
• Do what we did or
• Build a system that mutates/replaces aggregates already output or
• Delay aggregate output for some slop time; drop it if late data shows up
Š Rocana, Inc. All Rights Reserved. | 22
Ex: service event volume by host and minute
• Dimensions: ts, window, location, host, service, metric
• On write, aggregates: count, sum, min, max, last
• epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume =>
17, 42, 1, 10, 8
• On query:
• SELECT floor(ts / 60000) as bin, loc, host, service, metric, sum(value_sum) FROM metrics
WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY bin, loc, host,
service, metric
• If late arriving data existed in events, the same dimensions would repeat with a
another set of aggregates and would be rolled up as a result of the group by
• tl;dr: normal window aggregation operations
Š Rocana, Inc. All Rights Reserved. | 23
Extension, pain, and advice
Š Rocana, Inc. All Rights Reserved. | 24
Extending the system
• Custom producers
• Custom consumers
• Event types
• Parser / transformation plugins
• Custom metric definition and aggregate functions
• Custom processing jobs on landed data
Š Rocana, Inc. All Rights Reserved. | 25
Pain (aka: the struggle is real)
• Lots of tradeoffs when picking a stream processing solution
• Apache Samza: right features, but low level programming model, not supported
by vendors. missing security features.
• Apache Storm: too rigid, too slow. not supported by all Hadoop vendors.
• Apache Spark streaming: tons of issues initially, but lots of community energy.
improving.
• @digitallogic: “My heart says Samza, but my head says Spark Streaming.”
• Our (current) needs are meager; do work inside consumers.
• Stack complexity, (relative im)maturity
• Scaling solr cloud to billions of events per day
Š Rocana, Inc. All Rights Reserved. | 26
If you’re going to try this…
• Read all the literature on stream processing[1]
• Treat it like the distributed systems problem it is
• Understand, make, and make good on guarantees
• Find the right abstractions
• Never trust the hand waving or “hello worlds”
• Fully evaluate the projects/products in this space
• Understand it’s not just about search
[1] wait, like all of it? yeah, like all of it.
Š Rocana, Inc. All Rights Reserved. | 27
Things I didn’t talk about
• Reprocessing data when bad code / transformations are detected
• Dealing with data quality issues (“the struggle is real” part 2)
• The user interface and all the fancy analytics
• data visualization and exploration
• event search
• anomalous trend and event detection
• metric, source, and event correlation
• motif finding
• noise reduction and dithering
• Event delivery semantics (e.g. at least once, exactly once, etc.)
• Alerting
Š Rocana, Inc. All Rights Reserved. | 28
Questions?
@fwiffo | batman@rocana.com

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
Nine Neins - where Java EE will never take you
Nine Neins - where Java EE will never take youNine Neins - where Java EE will never take you
Nine Neins - where Java EE will never take you
Markus Eisele
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 

Was ist angesagt? (20)

Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
Nine Neins - where Java EE will never take you
Nine Neins - where Java EE will never take youNine Neins - where Java EE will never take you
Nine Neins - where Java EE will never take you
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Do's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in productionDo's and don'ts when deploying akka in production
Do's and don'ts when deploying akka in production
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
 
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
Leveraging services in stream processor apps at Ticketmaster (Derek Cline, Ti...
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Cloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful ServerlessCloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful Serverless
 
101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)101 ways to configure kafka - badly (Kafka Summit)
101 ways to configure kafka - badly (Kafka Summit)
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
Scalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBERScalable complex event processing on samza @UBER
Scalable complex event processing on samza @UBER
 
Building stateful systems with akka cluster sharding
Building stateful systems with akka cluster shardingBuilding stateful systems with akka cluster sharding
Building stateful systems with akka cluster sharding
 

Ähnlich wie Building a system for machine and event-oriented data - SF HUG Nov 2015

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
RightScale User Conference / Fall / 2010 - Morning Sessions
RightScale User Conference / Fall / 2010 - Morning SessionsRightScale User Conference / Fall / 2010 - Morning Sessions
RightScale User Conference / Fall / 2010 - Morning Sessions
RightScale
 

Ähnlich wie Building a system for machine and event-oriented data - SF HUG Nov 2015 (20)

Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
Database@Home : Data Driven Apps - Data-driven Microservices Architecture wit...
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
RightScale User Conference / Fall / 2010 - Morning Sessions
RightScale User Conference / Fall / 2010 - Morning SessionsRightScale User Conference / Fall / 2010 - Morning Sessions
RightScale User Conference / Fall / 2010 - Morning Sessions
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache KafkaÂŽ an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache KafkaÂŽ an...Scaling Security on 100s of Millions of Mobile Devices Using Apache KafkaÂŽ an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache KafkaÂŽ an...
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 

Mehr von Felicia Haggarty

Mehr von Felicia Haggarty (8)

8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOps8 Tips for Deploying DevSecOps
8 Tips for Deploying DevSecOps
 
Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015Yarn presentation - DFW CUG - December 2015
Yarn presentation - DFW CUG - December 2015
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
IoT Austin CUG talk
IoT Austin CUG talkIoT Austin CUG talk
IoT Austin CUG talk
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Data revolution by Doug Cutting
Data revolution by Doug CuttingData revolution by Doug Cutting
Data revolution by Doug Cutting
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 

KĂźrzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

KĂźrzlich hochgeladen (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Building a system for machine and event-oriented data - SF HUG Nov 2015

  • 1. Š Rocana, Inc. All Rights Reserved. | 1 JOEY ECHEVERRIA | @fwiffo | November 4th, 2015 San Francisco Hadoop Users Group Building a System for Machine and Event-Oriented Data
  • 2. Š Rocana, Inc. All Rights Reserved. | 2 Context
  • 3. Š Rocana, Inc. All Rights Reserved. | 3 Joey • Where I work: Rocana – Director of Engineering • Where I used to work: Cloudera (‘11 – ’15), NSA • Distributed systems, security, data processing, “big data”
  • 4. Š Rocana, Inc. All Rights Reserved. | 4 Free stuff! • Tweet @rocanainc with #SFHUG – best three tweets get a book
  • 5. Š Rocana, Inc. All Rights Reserved. | 5 What we do • Build a system for the operation of modern data centers • Triage and diagnostics, exploration, trends, advanced analytics of complex systems • Our data: • logs, metrics, human activity, anything that occurs in the data center • “Enterprise Software” (i.e. we build for others.) • Today: how we built what we built
  • 6. Š Rocana, Inc. All Rights Reserved. | 6 Our typical customer use cases • >100K events / sec (8.6B events / day), sub-second end to end latency, full fidelity retention, critical use cases • Quality of service - “are credit card transactions happening fast enough?” • Fraud detection - “detect, investigate, prosecute, and learn from fraud.” • Forensic diagnostics - “what really caused the outage last friday?” • Security - “who’s doing what, where, when, why, and how, and is that ok?” • User behavior - ”capture and correlate user behavior with system performance, then feed it to downstream systems in realtime.”
  • 7. Š Rocana, Inc. All Rights Reserved. | 7 10,000 foot view
  • 8. Š Rocana, Inc. All Rights Reserved. | 8 High level architecture
  • 9. Š Rocana, Inc. All Rights Reserved. | 9 Guarantees • No single point of failure exists • All components scale horizontally[1] • Data retention and latency is a function of cost, not tech[1] • Every event is delivered provided no more than N - 1 failures occur (where N is the kafka replication level) • All operations, including upgrade, are online[2] • Every event is (or appears to be) delivered exactly once[3] [1] we’re positive there’s a limit, but thus far it has been cost. [2] from the user’s perspective, at a system level. [3] when queried via our UI. lots of details here.
  • 10. Š Rocana, Inc. All Rights Reserved. | 10 Events
  • 11. Š Rocana, Inc. All Rights Reserved. | 11 Modeling our world • Everything is an event • Each event contains a timestamp, type, location, host, service, body, and type-specific attributes (k/v pairs) • Build specialized aggregates as necessary - just optimized views of the data
  • 12. Š Rocana, Inc. All Rights Reserved. | 12 Event schema { id: string, ts: long, event_type_id: int, location: string, host: string, service: string, body: [ null, bytes ], attributes: map<string> }
  • 13. Š Rocana, Inc. All Rights Reserved. | 13 Event types • Some event types are standard • syslog, http, log4j, generic text record, … • Users define custom event types • Producers populate event type • Transformations can turn one event type into another • Event type metadata tells downstream systems how to interpret body and attributes
  • 14. Š Rocana, Inc. All Rights Reserved. | 14 Ex: generic syslog event event_type_id: 100, // rfc3164, rfc5424 (syslog) body: … // raw syslog message bytes attributes: { // extracted fields from body syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”, syslog_severity: “6”, // info severity syslog_facility: “3”, // daemon facility syslog_process: “dhclient”, syslog_pid: “668”, … }
  • 15. Š Rocana, Inc. All Rights Reserved. | 15 Ex: generic http event event_type_id: 102, // generic http event body: … // raw http log message bytes attributes: { http_req_method: “GET”, http_req_vhost: “w2a-demo-02”, http_req_path: “/api/v1/search?q=service%3Asshd&p=1&s=200”, http_req_query: “q=service%3Asshd&p=1&s=200”, http_resp_code: “200”, … }
  • 16. Š Rocana, Inc. All Rights Reserved. | 16 Consumers
  • 17. Š Rocana, Inc. All Rights Reserved. | 17 Consumers • …do most of the work • Parallelism • Kafka offset management • Message de-duplication • Transformation (embedded library) • Dead letter queue support • Downstream system knowledge
  • 18. Š Rocana, Inc. All Rights Reserved. | 18 Inside a consumer
  • 19. Š Rocana, Inc. All Rights Reserved. | 19 Metrics and time series
  • 20. Š Rocana, Inc. All Rights Reserved. | 20 Aggregation • Mostly for time series metrics • Two halves: on write and on query • Data model: (dimensions) => (aggregates) • On write • reduce(a: A, b: A) => A over window • Store “base” aggregates, all associative and commutative • On query • Perform same aggregate or derivative aggregates • Group by the same dimensions • SQL (Impala)
  • 21. Š Rocana, Inc. All Rights Reserved. | 21 Aside: late arriving data (it’s a thing) • Never trust a (wall) clock • Producer determines observation time, rest of the system uses this always • Data that shows up late always processed according to observation time • Aggregation consequences • The same time window can appear multiple times • Solution: aggregate every N seconds, potentially generating multiple aggregates for the same time bin • This is real and you must deal with it • Do what we did or • Build a system that mutates/replaces aggregates already output or • Delay aggregate output for some slop time; drop it if late data shows up
  • 22. Š Rocana, Inc. All Rights Reserved. | 22 Ex: service event volume by host and minute • Dimensions: ts, window, location, host, service, metric • On write, aggregates: count, sum, min, max, last • epoch, 60000, us-west-2a, w2a-demo-1, sshd, event_volume => 17, 42, 1, 10, 8 • On query: • SELECT floor(ts / 60000) as bin, loc, host, service, metric, sum(value_sum) FROM metrics WHERE ts BETWEEN x AND y AND metric = ”event_volume” GROUP BY bin, loc, host, service, metric • If late arriving data existed in events, the same dimensions would repeat with a another set of aggregates and would be rolled up as a result of the group by • tl;dr: normal window aggregation operations
  • 23. Š Rocana, Inc. All Rights Reserved. | 23 Extension, pain, and advice
  • 24. Š Rocana, Inc. All Rights Reserved. | 24 Extending the system • Custom producers • Custom consumers • Event types • Parser / transformation plugins • Custom metric definition and aggregate functions • Custom processing jobs on landed data
  • 25. Š Rocana, Inc. All Rights Reserved. | 25 Pain (aka: the struggle is real) • Lots of tradeoffs when picking a stream processing solution • Apache Samza: right features, but low level programming model, not supported by vendors. missing security features. • Apache Storm: too rigid, too slow. not supported by all Hadoop vendors. • Apache Spark streaming: tons of issues initially, but lots of community energy. improving. • @digitallogic: “My heart says Samza, but my head says Spark Streaming.” • Our (current) needs are meager; do work inside consumers. • Stack complexity, (relative im)maturity • Scaling solr cloud to billions of events per day
  • 26. Š Rocana, Inc. All Rights Reserved. | 26 If you’re going to try this… • Read all the literature on stream processing[1] • Treat it like the distributed systems problem it is • Understand, make, and make good on guarantees • Find the right abstractions • Never trust the hand waving or “hello worlds” • Fully evaluate the projects/products in this space • Understand it’s not just about search [1] wait, like all of it? yeah, like all of it.
  • 27. Š Rocana, Inc. All Rights Reserved. | 27 Things I didn’t talk about • Reprocessing data when bad code / transformations are detected • Dealing with data quality issues (“the struggle is real” part 2) • The user interface and all the fancy analytics • data visualization and exploration • event search • anomalous trend and event detection • metric, source, and event correlation • motif finding • noise reduction and dithering • Event delivery semantics (e.g. at least once, exactly once, etc.) • Alerting
  • 28. Š Rocana, Inc. All Rights Reserved. | 28 Questions? @fwiffo | batman@rocana.com