SlideShare a Scribd company logo
1 of 34
Download to read offline
Building “sexy” real-time analytics systems
AdGear is full-stack ad platform for publishers and advertisers, with advanced
analytics, attribution measurement, ad serving, and real-time bidding technology.
Real-time bidding (RTB)
Real-time reporting... why?

•
•
•

help clients to make informed decisions	


•
•

should I increase the bid price?	

should I bid on exchange X?	


inventory control (brand safety)	

debugging (bots detection, creatives audits)
“Sexy” real-time analytics systems
“Sexy”?

•
•

elegant backend	

beautiful user interface
Architecture #1	

(3 years ago)

•
•
•

ssh	

node.js	

socket.io
Problems

•
•
•
•
•

no SMP support	


•
•

each process needs to be monitored	

requires load-balancing (nginx)	


duplicated state (per process)	

duplicated work (de-serialization)	

bad error handling (event loop explodes)	

callbacks...
* promise construct
Architecture #2	

(1.5 years ago)

•
•
•
•

ssh_channel *	

gproc (pub sub)	

ETS counters	

bullet (cowboy)

* https://gist.github.com/lpgauth/6529807
Architecture #2
1. receive buffered events, split
and de-serialize	

2. each event is sent to a
collector process (3) using
gproc (pubsub) for filtering	

3. collector (gen_server)
aggregates message using ETS
counters and flush every
second	

4. bullet handler serializes the
aggregates (tab2list to json)
Problems

•

ssh_channel process and collector process are
bottlenecks	


•

number of messages increases with the number of
clients	


•
•

requires lots of bandwidth for large streams	

limited filtering (match specs)
Improvements...
(6 months ago)

•
•

optimize collector’s msg loop (gen_server to proc_lib)	

use ssh compression	


•
•

added support for openssh zlib compression *	

R16B02

* https://github.com/lpgauth/otp/tree/openssh_zlib
This worked for a while...
“Hey man, it would be very cool if you could show in
real-time the number of bid requests per domain for
Friday’s demo... Can you do it?” - boss
Sure.
What did I just agree too...

•
•

I only have 3 days to build this...	

bid requests stream is too large to aggregate in a
central location (1+ Gbit/s - 80K+/s)
Strategy for demo
1. move aggregation upstream	

2. use ETS match select to find table ids (filtering)	

3. increment counters in process (no message!)	

4. periodically flush aggregates via message to
collector node	

5. collector node increments local counters and
periodically flush aggregates to bullet handler
Success!
Introducing swirl!	


“lightweight distributed stream processor”
Swirl components

•
•

“dynamic” streams (swirl_stream)	


•
•

powerful filtering language (swirl_ql)	


simple behavior that implements a map-reduce like
interface (swirl_flow)	

process registry (swirl_tracker)
Streams
Flows

* application:start(swirl).
swirl_flow behavior
Mapper Node
1. process “emits” event	

2. lookup in ETS if there’s a
flow that matches the
stream name and filter	

3. if there’s a match, call
flow_mod:map/4	

4. if map returns counters,
increment in ETS	

5. swirl_mapper periodically
flush aggregates to
reducer node
Reducer Node

1. swirl_tracker receives
mapper aggregates and
forwards it to reducer	

2. reducer increments
counters in ets	

3. reducer flushes counters
to flow_mod:reduce/4
Swirl-ql

•
•

sql where clause like syntax	

supported operators:	


•
•
•
•

AND / OR	

<, <=, =, >, <>	

IN (x, y) / NOT IN (x, y, z)	

IS NULL / IS NOT NULL (undefined)

* https://github.com/lpgauth/swirl-ql
Swirl-ql

•

examples:	


•
•
•

“event IN (‘impression’, ‘click’)”!
“buyer_id IS NOT NULL AND buyer_id <> 3”!
“event = ‘impressions’ AND (buyer_id IN (3, 5) OR
buyer_id IS NULL)
Swirl-ql

•
•
•
•

leex / yecc for parsing (use lex / yacc doc)	

pattern match ftw!	

use hipe (~200% speed gain in micro benchmarks)	


•

0.286 vs 0.097 microseconds *	


experimenting with dynamic compilation

* http://theory.stanford.edu/~sergei/papers/sigmod10-index.pdf
Swirl limitations

•
•

best-effort (hard problem!)	


•
•

netsplits	

crash	


in-memory only
Todo

•
•
•
•

node discovery	

code distribution	

resource limitation	

better documentation!
Architecture #3	

(now!)

•
•

swirl	

bullet (cowboy)
Demo!

* https://github.com/lpgauth/swirl-demo
Thank You!

pssst: we’re hiring!

twitter: lpgauth	

github: lpgauth

More Related Content

What's hot

Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Gevent what's the point
Gevent what's the pointGevent what's the point
Gevent what's the point
seanmcq
 

What's hot (20)

bluespec talk
bluespec talkbluespec talk
bluespec talk
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management....NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
.NET Fest 2018. Maarten Balliauw. Let’s refresh our memory! Memory management...
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Event loop
Event loopEvent loop
Event loop
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak Pipe
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Taking advantage of Prometheus relabeling
Taking advantage of Prometheus relabelingTaking advantage of Prometheus relabeling
Taking advantage of Prometheus relabeling
 
Understanding greenlet
Understanding greenletUnderstanding greenlet
Understanding greenlet
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Storm
StormStorm
Storm
 
Akka.NET streams and reactive streams
Akka.NET streams and reactive streamsAkka.NET streams and reactive streams
Akka.NET streams and reactive streams
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOS
 
Gevent what's the point
Gevent what's the pointGevent what's the point
Gevent what's the point
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
Androsia: A step ahead in securing in-memory Android application data by Sami...
Androsia: A step ahead in securing in-memory Android application data by Sami...Androsia: A step ahead in securing in-memory Android application data by Sami...
Androsia: A step ahead in securing in-memory Android application data by Sami...
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitives
 
streamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with stormstreamparse and pystorm: simple reliable parallel processing with storm
streamparse and pystorm: simple reliable parallel processing with storm
 

Viewers also liked

D2.1 Evaluation Criteria and Methods
D2.1	 Evaluation Criteria and MethodsD2.1	 Evaluation Criteria and Methods
D2.1 Evaluation Criteria and Methods
Hendrik Drachsler
 
What's in a habitat?
What's in a habitat?What's in a habitat?
What's in a habitat?
Russell Ogden
 
Presentatie Octrooigilde
Presentatie OctrooigildePresentatie Octrooigilde
Presentatie Octrooigilde
Marleen Kuiper
 
No Sql Introduction
No Sql IntroductionNo Sql Introduction
No Sql Introduction
Dingding Ye
 

Viewers also liked (20)

Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014
 
Staying Afloat with Buoy: A High-Performance HTTP Client
Staying Afloat with Buoy: A High-Performance HTTP ClientStaying Afloat with Buoy: A High-Performance HTTP Client
Staying Afloat with Buoy: A High-Performance HTTP Client
 
D2.1 Evaluation Criteria and Methods
D2.1	 Evaluation Criteria and MethodsD2.1	 Evaluation Criteria and Methods
D2.1 Evaluation Criteria and Methods
 
What's in a habitat?
What's in a habitat?What's in a habitat?
What's in a habitat?
 
Scaling
ScalingScaling
Scaling
 
A Tour of Google Earth
A Tour of Google EarthA Tour of Google Earth
A Tour of Google Earth
 
Presentatie Octrooigilde
Presentatie OctrooigildePresentatie Octrooigilde
Presentatie Octrooigilde
 
No Sql Introduction
No Sql IntroductionNo Sql Introduction
No Sql Introduction
 
Oporto
OportoOporto
Oporto
 
The Roots of Innovation
The Roots of InnovationThe Roots of Innovation
The Roots of Innovation
 
Phoenix Az Energy Office Getting Ee Done Right The First Time
Phoenix   Az Energy Office Getting Ee Done Right The First TimePhoenix   Az Energy Office Getting Ee Done Right The First Time
Phoenix Az Energy Office Getting Ee Done Right The First Time
 
Unit 1.3 Introduction to Programming (Part 1)
Unit 1.3 Introduction to Programming (Part 1)Unit 1.3 Introduction to Programming (Part 1)
Unit 1.3 Introduction to Programming (Part 1)
 
Net Defender
Net DefenderNet Defender
Net Defender
 
Trip
TripTrip
Trip
 
24 Reasons...
24 Reasons...24 Reasons...
24 Reasons...
 
Writing presentation
Writing presentationWriting presentation
Writing presentation
 
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
 
The Future of Big Data in Education
The Future of Big Data in EducationThe Future of Big Data in Education
The Future of Big Data in Education
 
The Art of Reconstrcution workshop
The Art of Reconstrcution workshopThe Art of Reconstrcution workshop
The Art of Reconstrcution workshop
 
Web Service on SSD
Web Service on SSDWeb Service on SSD
Web Service on SSD
 

Similar to Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013

Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
confluent
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
PyData
 

Similar to Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013 (20)

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP EngineStrtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP Engine
 
High Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for SupercomputingHigh Availability HPC ~ Microservice Architectures for Supercomputing
High Availability HPC ~ Microservice Architectures for Supercomputing
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
GOCON Autumn (Story of our own Monitoring Agent in golang)
GOCON Autumn (Story of our own Monitoring Agent in golang)GOCON Autumn (Story of our own Monitoring Agent in golang)
GOCON Autumn (Story of our own Monitoring Agent in golang)
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr - Using CNTK's Python Interface for Deep LearningDave DeBarr -
Using CNTK's Python Interface for Deep LearningDave DeBarr -
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
Streaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani KokhreidzeStreaming Infrastructure at Wise with Levani Kokhreidze
Streaming Infrastructure at Wise with Levani Kokhreidze
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Building Sexy Real-Time Analytics Systems - Erlang Factory NYC / Toronto 2013