Open analytics meetup alex poon (1)

•

0 gefällt mir•574 views

Open Analytics

Visual Revenue's p

Storm @ Visual Revenue (an Outbrain
Company)

Alex Poon
VP of Engineering

What we do?
Customer Traﬃc 
•  14B page views per month

•  At peak, 8000-10000 per sec Web Servers 

•  Deployed Storm to production ~ 1
Ka=a 
month ago Data Transform/
Aggrega8on 
•  Storm cluster of ~50 instances on Storm 
AWS
Databases 

Dashboard  Algo 

Automa8on

Before Storm
•  Built our own distributed data processing
•  ZMQ
•  Batch based process
•  Hashing processing by customers
•  Advantages
•  Simple in-house system built from very basic components
•  Well understood
•  Disadvantages
•  Hard to scale, constant battle for keeping up with pings
•  Machine management was clumsy
•  Uneven distribution of traffic
•  Multiple processes doing similar work, wasting resources

Why Kafka/Storm?
•  Kafka
•  open-sourced, distributed publish-subscribe messaging system
•  Storm
•  open-sourced, real-time computation system for continuous
computation
•  They are awesome
•  Distributed, highly scalable, and fault tolerance
•  High throughput
•  Reliable
•  Real-time
•  Great at in-memory analytics, and real-time decision support

DataAggregation
Customer 
15s 

Position  Front Page 
15s  15s 
URL  Aggregate 
15s 
Aggregate  Arrangement 
5m  5m 

Spout  Tweet  @Handle 
Bolt  15s  15s

Learning / Ideas
1. Kafka + zookeeper is extremely scalable and easy to setup.
Check out the Brod library if you are doing Python

2. Use the Storm UI (Ganglia based) to monitor your cluster

3. Shell Bolts were inefficient and hard to debug (at least for us)

4. Upgrade to at least Storm version 0.8.2 which gives you capacity
metrics on top of other goodies

5. Storm’s anchoring/replay capability is awesome but comes with a
visible overhead

6. Use a good framework to manage your cluster, we use Salt Stack

7. Our unit tests are built in Junit. Most built in unit tests for Storm
are only available in Clojure for now

Thank You

Alex Poon
@alexpoon06
@Outbrain

Yes, it is true. We are
Hiring!!  
www.visualrevenue.com/jobs

Empfohlen

Immutable infrastructure isn’t the answerSam Bashton

Performance stackShayne Bartlett

Hbasecon2013 Wrap UpMinwoo Kim

Real time dashboards with Kafka and DruidVenu Ryali

Monitoring the unknown, 1000*100 series a day - Big Data Vilnius 2017Quentin Adam

Building a derived data store using KafkaVenu Ryali

How to Build High Performance : WordPressDylan Burris

Build a reverse proxy for modern immutable infrastructure - Sozu - Devops D D...Quentin Adam

Empfohlen

Immutable infrastructure isn’t the answerSam Bashton

Performance stackShayne Bartlett

Hbasecon2013 Wrap UpMinwoo Kim

Real time dashboards with Kafka and DruidVenu Ryali

Monitoring the unknown, 1000*100 series a day - Big Data Vilnius 2017Quentin Adam

Building a derived data store using KafkaVenu Ryali

How to Build High Performance : WordPressDylan Burris

Build a reverse proxy for modern immutable infrastructure - Sozu - Devops D D...Quentin Adam

Multi-master, multi-region MySQL deployment in Amazon AWSContinuent

Aws 12 Month Free Tier for Web Designers and DevelopersDylan Burris

MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017Quentin Adam

Problems you’ll face in the Microservices World: Configuration, Authenticatio...Quentin Adam

5 simple steps to migrate to AWSAmazon Web Services

Azure Site Recovery Loves Business ContinuityMichael Frank

#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson

AWS Customer Presentation - JovianDATAAmazon Web Services

Building big data pipelines with Kafka and KubernetesVenu Ryali

Meetup #3: Migrate a fast scale system to AWSAWS Vietnam Community

Taming the cost of your first cloud - CCCEU 2014Tim Mackey

SCCM ConfigMgr Intune Architecture Decision MakerAnoop Nair

Azure Nights August2017Michael Frank

Green / Blue Deployment with Immutable ServersSimon Dittlmann

Sina App Engine - a distributed web solution on cloudcong lei

Faas With Kata ContainerMadhuri Kumari

TerraformOtto Jongerius

Reliable, Scalable Kubernetes on AWSApplatix

Blue green deploymentLucas Falk Beier

Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahujaResellerClub

A scalable server environment for your applicationsGigaSpaces

Stream Computing (The Engineer's Perspective)Ilya Ganelin

Weitere ähnliche Inhalte

Was ist angesagt?

Multi-master, multi-region MySQL deployment in Amazon AWSContinuent

Aws 12 Month Free Tier for Web Designers and DevelopersDylan Burris

MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017Quentin Adam

Problems you’ll face in the Microservices World: Configuration, Authenticatio...Quentin Adam

5 simple steps to migrate to AWSAmazon Web Services

Azure Site Recovery Loves Business ContinuityMichael Frank

#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson

AWS Customer Presentation - JovianDATAAmazon Web Services

Building big data pipelines with Kafka and KubernetesVenu Ryali

Meetup #3: Migrate a fast scale system to AWSAWS Vietnam Community

Taming the cost of your first cloud - CCCEU 2014Tim Mackey

SCCM ConfigMgr Intune Architecture Decision MakerAnoop Nair

Azure Nights August2017Michael Frank

Green / Blue Deployment with Immutable ServersSimon Dittlmann

Sina App Engine - a distributed web solution on cloudcong lei

Faas With Kata ContainerMadhuri Kumari

TerraformOtto Jongerius

Reliable, Scalable Kubernetes on AWSApplatix

Blue green deploymentLucas Falk Beier

Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahujaResellerClub

Was ist angesagt? (20)

Multi-master, multi-region MySQL deployment in Amazon AWS

Aws 12 Month Free Tier for Web Designers and Developers

MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017

Problems you’ll face in the Microservices World: Configuration, Authenticatio...

5 simple steps to migrate to AWS

Azure Site Recovery Loves Business Continuity

#lspe Q1 2013 dynamically scaling netflix in the cloud

AWS Customer Presentation - JovianDATA

Building big data pipelines with Kafka and Kubernetes

Meetup #3: Migrate a fast scale system to AWS

Taming the cost of your first cloud - CCCEU 2014

SCCM ConfigMgr Intune Architecture Decision Maker

Azure Nights August2017

Green / Blue Deployment with Immutable Servers

Sina App Engine - a distributed web solution on cloud

Faas With Kata Container

Terraform

Reliable, Scalable Kubernetes on AWS

Blue green deployment

Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja

Ähnlich wie Open analytics meetup alex poon (1)

A scalable server environment for your applicationsGigaSpaces

Stream Computing (The Engineer's Perspective)Ilya Ganelin

Palringo : a startup's journey from a data center to the cloudPhilipBasford

Cloud Computing with .NetWesley Faler

Accelerating Analytics for the Future of GenomicsAmazon Web Services

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack

Five Years of EC2 DistilledGrig Gheorghiu

Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Flink Forward

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit

Architecture Best Practices on Windows AzureNuno Godinho

Apache storm vs. Spark StreamingP. Taylor Goetz

NICTA, Disaster Recovery Using OpenStacklaurabeckcahoon

Leaving the Ivory Tower: Research in the Real WorldArmonDadgar

John adams talk cloudyJohn Adams

Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin

Azug - successfully breeding rabitsYves Goeleven

IEEE Cloud 2012: Clouds Hands-On TutorialSrinath Perera

Quilt - Distributed Load Simulation from AWSAjith Jose

Oracle in the Cloudzain1425

Ähnlich wie Open analytics meetup alex poon (1) (20)

A scalable server environment for your applications

Stream Computing (The Engineer's Perspective)

Palringo : a startup's journey from a data center to the cloud

Cloud Computing with .Net

Accelerating Analytics for the Future of Genomics

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT

Five Years of EC2 Distilled

Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)

Architecture Best Practices on Windows Azure

Apache storm vs. Spark Streaming

NICTA, Disaster Recovery Using OpenStack

Leaving the Ivory Tower: Research in the Real World

John adams talk cloudy

Your Guide to Streaming - The Engineer's Perspective

Azug - successfully breeding rabits

IEEE Cloud 2012: Clouds Hands-On Tutorial

Quilt - Distributed Load Simulation from AWS

Oracle in the Cloud

Mehr von Open Analytics

Cyber after Snowden (OA Cyber Summit)Open Analytics

Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Open Analytics

CDM….Where do you start? (OA Cyber Summit)Open Analytics

An Immigrant’s view of Cyberspace (OA Cyber Summit)Open Analytics

MOLOCH: Search for Full Packet Capture (OA Cyber Summit)Open Analytics

Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Open Analytics

Using Real-Time Data to Drive Optimization & PersonalizationOpen Analytics

M&A Trends in Telco AnalyticsOpen Analytics

Competing in the Digital EconomyOpen Analytics

Piwik: An Analytics Alternative (Chicago Summit)Open Analytics

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics

Crossing the Chasm (Ikanow - Chicago Summit)Open Analytics

On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...Open Analytics

Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Open Analytics

Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Open Analytics

From Insight to Impact (Chicago Summit - Keynote)Open Analytics

Easybib Open Analytics NYCOpen Analytics

MarkLogic - Open Analytics MeetupOpen Analytics

The caprate presentation_july2013_open analytics dc meetupOpen Analytics

Verifeed open analytics_3min deck_071713_finalOpen Analytics

Mehr von Open Analytics (20)

Cyber after Snowden (OA Cyber Summit)

Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)

CDM….Where do you start? (OA Cyber Summit)

An Immigrant’s view of Cyberspace (OA Cyber Summit)

MOLOCH: Search for Full Packet Capture (OA Cyber Summit)

Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...

Using Real-Time Data to Drive Optimization & Personalization

M&A Trends in Telco Analytics

Competing in the Digital Economy

Piwik: An Analytics Alternative (Chicago Summit)

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...

Crossing the Chasm (Ikanow - Chicago Summit)

On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...

Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...

Characterizing Risk in your Supply Chain (nContext - Chicago Summit)

From Insight to Impact (Chicago Summit - Keynote)

Easybib Open Analytics NYC

MarkLogic - Open Analytics Meetup

The caprate presentation_july2013_open analytics dc meetup

Verifeed open analytics_3min deck_071713_final

Open analytics meetup alex poon (1)

1. Storm @ Visual Revenue (an Outbrain Company) Alex Poon VP of Engineering

2. Who are we?

3. What we do? Customer Traﬃc  •  14B page views per month •  At peak, 8000-10000 per sec Web Servers  •  Deployed Storm to production ~ 1 Ka=a  month ago Data Transform/ Aggrega8on  •  Storm cluster of ~50 instances on Storm  AWS Databases  Dashboard  Algo  Automa8on 

4. Before Storm •  Built our own distributed data processing •  ZMQ •  Batch based process •  Hashing processing by customers •  Advantages •  Simple in-house system built from very basic components •  Well understood •  Disadvantages •  Hard to scale, constant battle for keeping up with pings •  Machine management was clumsy •  Uneven distribution of traffic •  Multiple processes doing similar work, wasting resources

5. Why Kafka/Storm? •  Kafka •  open-sourced, distributed publish-subscribe messaging system •  Storm •  open-sourced, real-time computation system for continuous computation •  They are awesome •  Distributed, highly scalable, and fault tolerance •  High throughput •  Reliable •  Real-time •  Great at in-memory analytics, and real-time decision support

6. DataAggregation Customer  15s  Position  Front Page  15s  15s  URL  Aggregate  15s  Aggregate  Arrangement  5m  5m  Spout  Tweet  @Handle  Bolt  15s  15s 

7. Learning / Ideas 1. Kafka + zookeeper is extremely scalable and easy to setup. Check out the Brod library if you are doing Python 2. Use the Storm UI (Ganglia based) to monitor your cluster 3. Shell Bolts were inefficient and hard to debug (at least for us) 4. Upgrade to at least Storm version 0.8.2 which gives you capacity metrics on top of other goodies 5. Storm’s anchoring/replay capability is awesome but comes with a visible overhead 6. Use a good framework to manage your cluster, we use Salt Stack 7. Our unit tests are built in Junit. Most built in unit tests for Storm are only available in Clojure for now

8. Thank You Alex Poon @alexpoon06 @Outbrain Yes, it is true. We are Hiring!!   www.visualrevenue.com/jobs