SlideShare ist ein Scribd-Unternehmen logo
1 von 61
1 Stat, 2 Stat, 3 Stat
A Trillion
Cody A. Ray
Dev-Ops @ BrightTag
Outline
1. Initial Attempt: MongoDB
2. Ideal Stats System: KairosDB?
3. Making KairosDB Work for Us
What Kind of Stats?
Counting!
sum, min, max, etc
Any recurrence relation:
yn = f(x, y0, …, yn-1)
The First Pass: MongoDB
● JSON Documents, Schema-less, Flexible
● Aggregation Pipeline, MapReduce
● Master-Slave Replication
● Atomic Operators!
http://fearlessdeveloper.com/race-condition-java-concurrency/
read counter
counter = 0
read counter
counter = 0
increment
value by 1
increment
value by 1
write value to
counter = 1
write value to
counter = 1
incorrect value
of counter = 1
Simple, Right? What’s the Problem?
Only 3500 writes/second! (m1.large)
up to 7000 wps (with m1.xlarge)
Scale Horizontally?
Redundancy → Mongo Explosion!!!
Feel the Pain
● Scale 3x. 3x != x. Big-O be damned.
● Managing 50+ Mongo replica sets globally
● 10s of $1000s of dollars “wasted” each year
Ideal Stats System?
● Linearly scalable time-series database
● Store arbitrary metrics and metadata
● Support aggregations, other complex
queries
● Bonus points for
o good for storing both application and system metrics
o Graphite web integration
Enter KairosDB
● “fast distributed scalable time series” db
● General metric storage and retrieval
● Based upon Cassandra
o linearly scalable
o tuned for fast writes
o eventually consistent, tunable replication
Adding Data
[
{
"name": "archive_file_tracked",
"datapoints": [[1359788400000, 123]],
"tags": {
"host": "server1",
"data_center": "DC1"
}
}
]
Querying Data
{
"start_absolute": 1357023600000,
"end_absolute": 1357455600000
"metrics": [{
"name": "abc.123",
"tags": {
"host": ["foo", "foo2"],
"type": ["bar"]
},
"aggregators": [{
"name": "sum",
"sampling": {
"value": 10,
"unit": "minutes"
}}]}]}
The Catch(es)
● Lack of atomic operations
o + millisecond time granularity
● Bad support for high cardinality “tags”
● Headache managing Cassandra in AWS
The Catch(es)
● Lack of atomic operations
o + millisecond time granularity
● Bad support for high cardinality “tags”
● Headache managing Cassandra in AWS
Cassandra on AWS
Agathon
The Catch(es)
● Lack of atomic operations
o + millisecond time granularity
● Bad support for high cardinality “tags”
● Headache managing Cassandra in AWS
Cassandra Schema
http://prezi.com/ajkjic0jdws3/kairosdb-cassandra-schema/
Cassandra Schema
http://prezi.com/ajkjic0jdws3/kairosdb-cassandra-schema/
Custom Data
[
{
"name": "archive_file_tracked",
"datapoints": [[1359788400000, "value,metadata,...", "string"]],
"tags": {
"host": "server1",
"data_center": "DC1"
}
}
]
https://github.com/proofpoint/kairosdb/tree/feature/custom_data
Custom Data
[
{
"name": "archive_file_tracked",
"datapoints": [[1359788400000, "value,metadata,...", "string"]],
"tags": {
"host": "server1",
"data_center": "DC1"
}
}
]
https://github.com/proofpoint/kairosdb/tree/feature/custom_data
The Catch(es)
● Lack of atomic operations
o + millisecond time granularity
● Bad support for high cardinality “tags”
● Headache managing Cassandra in AWS
Pieces of the Solution
● Shard the data
o avoids concurrency race conditions
● Pre-aggregation
o solves time-granularity issue
● Stream processing, exactly-once semantics
Queue/Worker Stream Processing
https://www.youtube.com/watch?v=bdps8tE0gY
o
Enter Storm/Trident
StormStormStorms
StormStormKafkas
StormStormZoos
StormStormStormStorm
App Server 1
suro
StormStormMongos
StormStormKairoses
StormStormStormStorm
App Server 2
suro
Stats 1.5 Stats 2.0
Stats 1.5 Stats 2.0
1.5 2.0
1.5 2.0
groupBy(timeRange, metric, tags)
Kafka Broker
Partition
Kafka Broker
Partition
Kafka Spout Kafka Spout
Transforms
30s Writer Bolt30s Writer Bolt30m Writer Bolt
30s Writer Bolt30s Writer Bolt30s Writer Bolt
Kafka Layer
Spout Layer
Transform Layer
Persistence Layer
shuffle()
round-robin(?)
round-robin
(haproxy)
KairosDB Layer KairosDB
Cluster
Pieces of the Solution
● Shard the data
o avoids concurrency race conditions
● Pre-aggregation
o solves time-granularity issue
● Stream processing, exactly-once semantics
Pieces of the Solution
● Shard the data
o avoids concurrency race conditions
● Pre-aggregation
o solves time-granularity issue
● Stream processing, exactly-once semantics
Pieces of the Solution
● Shard the data
o avoids concurrency race conditions
● Pre-aggregation
o solves time-granularity issue
● Stream processing, exactly-once semantics
Pieces of the Solution
● Shard the data
o avoids concurrency race conditions
● Pre-aggregation
o solves time-granularity issue
● Stream processing, exactly-once semantics
Non-Transactional 123
Transactional “[9, 123]”
Opaque Transactional “[9, 123, 120]”
Transactional Tags
[
{
"name": "archive_file_tracked",
"datapoints": [[1359788400000, 123]],
"tags": {
"txid": 9,
"prev": 120
}
}
]
The Catch(es)
● Lack of atomic operations
o + millisecond time granularity
● Bad support for high cardinality “tags”
● Headache managing Cassandra in AWS
Does It Work?
… the counts still match! (whew)
Average latency remains < 10 seconds
Stats 1.0 vs Stats 1.5 Performance
Replacing 9 mongo sets with 2
Cody A. Ray, BrightTag
cray@brighttag.com
Open Source: github.com/brighttag
Slides: bit.ly/gluecon-stats
These following slides weren’t presented at Gluecon.
You may find them interesting anyway. :)
Bolt
Bolt
Spout
Bolt
spout each
group by
persistent
aggregate
eachshuffle
Trident
State
each
each
each each
each each
group by
persistent
aggregate
Trident
State
Trident → Storm Topology
Compilation
2
3
2
4
Tuning Rules
1. Number of workers should be a
multiple of number of machines
1. Number of partitions should be a
multiple of spout parallelism
1. Parallelism should be a multiple of
number of workers
1. Persistence parallelism should be
equal to the number of workers
multi get:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
multi get:
values of
((ts2, metric1))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
multi get:
values of
((ts4, metric2),
(ts3, metric4))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
group by (ts,
metric)
http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-
storm/
Batch from Kafka Persistent Aggregate
value = ...
(ts1, metric1)
value = ...
(ts2, metric2)
value = ...
(ts2, metric3)
value = ...
(ts2, metric1)
value = ...
(ts4, metric2)
value = ...
(ts3, metric4)
multi get:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
multi get:
values of
((ts2, metric1))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
multi get:
values of
((ts4, metric2),
(ts3, metric4))
reducer /
combiner
multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
group by (ts,
metric)
value = ...
(ts1, metric1)
value = ...
(ts2, metric2)
value = ...
(ts2, metric3)
value = ...
(ts2, metric1)
value = ...
(ts4, metric2)
value = ...
(ts3, metric4)
http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-
storm/
Persistent AggregateBatch from Kafka
multi get:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
reducer /
combiner
reducer /
combiner
reducer /
combiner
value = ... value = ... value = ...
(ts1, metric1) (ts1, metric1) (ts1, metric1)
value = ... value = ...
(ts2, metric3)
value = ...
(ts2, metric2)
(ts2, metric3)
value = ...
(ts2, metric3)
value = ...
(ts2, metric2) multi put:
values of
((ts1, metric1),
(ts2, metric2),
(ts2, metric3))
http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-
storm/
From the batch
From the underlying
persistent state

Weitere ähnliche Inhalte

Was ist angesagt?

Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingSpark Summit
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...DataStax
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1Stratio
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...InfluxData
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modelingRomain Hardouin
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraOlga Lavrentieva
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 

Was ist angesagt? (20)

Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 

Ähnlich wie Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live HackingTobias Trelle
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data scienceJohn Cant
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Rebecca Bilbro
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezSpark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezJ On The Beach
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced ReplicationMongoDB
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 

Ähnlich wie Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB (20)

MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
NoSQL Infrastructure
NoSQL InfrastructureNoSQL Infrastructure
NoSQL Infrastructure
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data science
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Spark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and OpsSpark Streaming Tips for Devs and Ops
Spark Streaming Tips for Devs and Ops
 
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernándezSpark Streaming Tips for Devs and Ops by Fran perez y federico fernández
Spark Streaming Tips for Devs and Ops by Fran perez y federico fernández
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 

Mehr von Cody Ray

Cody A. Ray DEXA Report 3/21/2013
Cody A. Ray DEXA Report 3/21/2013Cody A. Ray DEXA Report 3/21/2013
Cody A. Ray DEXA Report 3/21/2013Cody Ray
 
Cognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCody Ray
 
Robotics: Modelling, Planning and Control
Robotics: Modelling, Planning and ControlRobotics: Modelling, Planning and Control
Robotics: Modelling, Planning and ControlCody Ray
 
Psychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio SteganographyPsychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio SteganographyCody Ray
 
Psychoacoustic Approaches to Audio Steganography Report
Psychoacoustic Approaches to Audio Steganography Report Psychoacoustic Approaches to Audio Steganography Report
Psychoacoustic Approaches to Audio Steganography Report Cody Ray
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker VerificationCody Ray
 
Text-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportText-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportCody Ray
 
Image Printing Based on Halftoning
Image Printing Based on HalftoningImage Printing Based on Halftoning
Image Printing Based on HalftoningCody Ray
 
Object Recognition: Fourier Descriptors and Minimum-Distance Classification
Object Recognition: Fourier Descriptors and Minimum-Distance ClassificationObject Recognition: Fourier Descriptors and Minimum-Distance Classification
Object Recognition: Fourier Descriptors and Minimum-Distance ClassificationCody Ray
 

Mehr von Cody Ray (9)

Cody A. Ray DEXA Report 3/21/2013
Cody A. Ray DEXA Report 3/21/2013Cody A. Ray DEXA Report 3/21/2013
Cody A. Ray DEXA Report 3/21/2013
 
Cognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent Tutors
 
Robotics: Modelling, Planning and Control
Robotics: Modelling, Planning and ControlRobotics: Modelling, Planning and Control
Robotics: Modelling, Planning and Control
 
Psychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio SteganographyPsychoacoustic Approaches to Audio Steganography
Psychoacoustic Approaches to Audio Steganography
 
Psychoacoustic Approaches to Audio Steganography Report
Psychoacoustic Approaches to Audio Steganography Report Psychoacoustic Approaches to Audio Steganography Report
Psychoacoustic Approaches to Audio Steganography Report
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
Text-Independent Speaker Verification Report
Text-Independent Speaker Verification ReportText-Independent Speaker Verification Report
Text-Independent Speaker Verification Report
 
Image Printing Based on Halftoning
Image Printing Based on HalftoningImage Printing Based on Halftoning
Image Printing Based on Halftoning
 
Object Recognition: Fourier Descriptors and Minimum-Distance Classification
Object Recognition: Fourier Descriptors and Minimum-Distance ClassificationObject Recognition: Fourier Descriptors and Minimum-Distance Classification
Object Recognition: Fourier Descriptors and Minimum-Distance Classification
 

Kürzlich hochgeladen

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Kürzlich hochgeladen (20)

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

  • 1. 1 Stat, 2 Stat, 3 Stat A Trillion Cody A. Ray Dev-Ops @ BrightTag
  • 2. Outline 1. Initial Attempt: MongoDB 2. Ideal Stats System: KairosDB? 3. Making KairosDB Work for Us
  • 3. What Kind of Stats? Counting! sum, min, max, etc Any recurrence relation: yn = f(x, y0, …, yn-1)
  • 4. The First Pass: MongoDB ● JSON Documents, Schema-less, Flexible ● Aggregation Pipeline, MapReduce ● Master-Slave Replication ● Atomic Operators!
  • 5. http://fearlessdeveloper.com/race-condition-java-concurrency/ read counter counter = 0 read counter counter = 0 increment value by 1 increment value by 1 write value to counter = 1 write value to counter = 1 incorrect value of counter = 1
  • 6.
  • 7.
  • 8.
  • 9. Simple, Right? What’s the Problem? Only 3500 writes/second! (m1.large) up to 7000 wps (with m1.xlarge)
  • 11. Redundancy → Mongo Explosion!!!
  • 12.
  • 13. Feel the Pain ● Scale 3x. 3x != x. Big-O be damned. ● Managing 50+ Mongo replica sets globally ● 10s of $1000s of dollars “wasted” each year
  • 14. Ideal Stats System? ● Linearly scalable time-series database ● Store arbitrary metrics and metadata ● Support aggregations, other complex queries ● Bonus points for o good for storing both application and system metrics o Graphite web integration
  • 15. Enter KairosDB ● “fast distributed scalable time series” db ● General metric storage and retrieval ● Based upon Cassandra o linearly scalable o tuned for fast writes o eventually consistent, tunable replication
  • 16. Adding Data [ { "name": "archive_file_tracked", "datapoints": [[1359788400000, 123]], "tags": { "host": "server1", "data_center": "DC1" } } ]
  • 17. Querying Data { "start_absolute": 1357023600000, "end_absolute": 1357455600000 "metrics": [{ "name": "abc.123", "tags": { "host": ["foo", "foo2"], "type": ["bar"] }, "aggregators": [{ "name": "sum", "sampling": { "value": 10, "unit": "minutes" }}]}]}
  • 18. The Catch(es) ● Lack of atomic operations o + millisecond time granularity ● Bad support for high cardinality “tags” ● Headache managing Cassandra in AWS
  • 19. The Catch(es) ● Lack of atomic operations o + millisecond time granularity ● Bad support for high cardinality “tags” ● Headache managing Cassandra in AWS
  • 22. The Catch(es) ● Lack of atomic operations o + millisecond time granularity ● Bad support for high cardinality “tags” ● Headache managing Cassandra in AWS
  • 25. Custom Data [ { "name": "archive_file_tracked", "datapoints": [[1359788400000, "value,metadata,...", "string"]], "tags": { "host": "server1", "data_center": "DC1" } } ] https://github.com/proofpoint/kairosdb/tree/feature/custom_data
  • 26. Custom Data [ { "name": "archive_file_tracked", "datapoints": [[1359788400000, "value,metadata,...", "string"]], "tags": { "host": "server1", "data_center": "DC1" } } ] https://github.com/proofpoint/kairosdb/tree/feature/custom_data
  • 27. The Catch(es) ● Lack of atomic operations o + millisecond time granularity ● Bad support for high cardinality “tags” ● Headache managing Cassandra in AWS
  • 28. Pieces of the Solution ● Shard the data o avoids concurrency race conditions ● Pre-aggregation o solves time-granularity issue ● Stream processing, exactly-once semantics
  • 32. groupBy(timeRange, metric, tags) Kafka Broker Partition Kafka Broker Partition Kafka Spout Kafka Spout Transforms 30s Writer Bolt30s Writer Bolt30m Writer Bolt 30s Writer Bolt30s Writer Bolt30s Writer Bolt Kafka Layer Spout Layer Transform Layer Persistence Layer shuffle() round-robin(?) round-robin (haproxy) KairosDB Layer KairosDB Cluster
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Pieces of the Solution ● Shard the data o avoids concurrency race conditions ● Pre-aggregation o solves time-granularity issue ● Stream processing, exactly-once semantics
  • 39. Pieces of the Solution ● Shard the data o avoids concurrency race conditions ● Pre-aggregation o solves time-granularity issue ● Stream processing, exactly-once semantics
  • 40.
  • 41. Pieces of the Solution ● Shard the data o avoids concurrency race conditions ● Pre-aggregation o solves time-granularity issue ● Stream processing, exactly-once semantics
  • 42.
  • 43. Pieces of the Solution ● Shard the data o avoids concurrency race conditions ● Pre-aggregation o solves time-granularity issue ● Stream processing, exactly-once semantics
  • 44. Non-Transactional 123 Transactional “[9, 123]” Opaque Transactional “[9, 123, 120]”
  • 45. Transactional Tags [ { "name": "archive_file_tracked", "datapoints": [[1359788400000, 123]], "tags": { "txid": 9, "prev": 120 } } ]
  • 46.
  • 47.
  • 48.
  • 49. The Catch(es) ● Lack of atomic operations o + millisecond time granularity ● Bad support for high cardinality “tags” ● Headache managing Cassandra in AWS
  • 50. Does It Work? … the counts still match! (whew)
  • 51. Average latency remains < 10 seconds
  • 52. Stats 1.0 vs Stats 1.5 Performance Replacing 9 mongo sets with 2
  • 53. Cody A. Ray, BrightTag cray@brighttag.com Open Source: github.com/brighttag Slides: bit.ly/gluecon-stats
  • 54. These following slides weren’t presented at Gluecon. You may find them interesting anyway. :)
  • 55. Bolt Bolt Spout Bolt spout each group by persistent aggregate eachshuffle Trident State each each each each each each group by persistent aggregate Trident State Trident → Storm Topology Compilation
  • 56.
  • 58. Tuning Rules 1. Number of workers should be a multiple of number of machines 1. Number of partitions should be a multiple of spout parallelism 1. Parallelism should be a multiple of number of workers 1. Persistence parallelism should be equal to the number of workers
  • 59. multi get: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) multi get: values of ((ts2, metric1)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) multi get: values of ((ts4, metric2), (ts3, metric4)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) group by (ts, metric) http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with- storm/ Batch from Kafka Persistent Aggregate value = ... (ts1, metric1) value = ... (ts2, metric2) value = ... (ts2, metric3) value = ... (ts2, metric1) value = ... (ts4, metric2) value = ... (ts3, metric4)
  • 60. multi get: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) multi get: values of ((ts2, metric1)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) multi get: values of ((ts4, metric2), (ts3, metric4)) reducer / combiner multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) group by (ts, metric) value = ... (ts1, metric1) value = ... (ts2, metric2) value = ... (ts2, metric3) value = ... (ts2, metric1) value = ... (ts4, metric2) value = ... (ts3, metric4) http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with- storm/ Persistent AggregateBatch from Kafka
  • 61. multi get: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) reducer / combiner reducer / combiner reducer / combiner value = ... value = ... value = ... (ts1, metric1) (ts1, metric1) (ts1, metric1) value = ... value = ... (ts2, metric3) value = ... (ts2, metric2) (ts2, metric3) value = ... (ts2, metric3) value = ... (ts2, metric2) multi put: values of ((ts1, metric1), (ts2, metric2), (ts2, metric3)) http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with- storm/ From the batch From the underlying persistent state

Hinweis der Redaktion

  1. Great for developers! As we’ll see, not great for operations.
  2. Remember the read-update-write concurrency problem. Without atomic operations, multiple processes will clobber each other. In this example, you can see that the counter starts at 0 and two processes try to increment it at the same time… and an update is lost.
  3. Even with bloated Java, atomic increments in mongo are pretty damn simple
  4. We have three types of stats (using dynamic mongo keys) per (timestamp, clientId). Each of these needs incremented...
  5. … so we just use mongo’s $inc atomic increment operator
  6. m1.xlarge has more disks so it can parallelize writes better
  7. Sharding writes across multiple mongos works… we’ve done this for the past few years. The writer writes to a random mongo and the reader aggregates across all mongos.
  8. … but if you want redundancy for your data, you have a mongo explosion. For every 1 node of write capacity, you have to add 3 nodes. Not cost effective and a lot of administration overhead
  9. This is even more pronounced when you take multiple regions into account. We have two layers of reader/aggregation apps and a lot of mongos in each region.
  10. Spend a lot of time and money managing a stats system that we’ve clearly outgrown
  11. After a few years of operational experience at scale… we decided to step back and rethink what an ideal system would look like
  12. KairosDB sounds like it fits the bill…. and its based upon Cassandra which has a big community and we had experience running in production. Because of this, we knew how KairosDB would scale linearly and how to tune it and such.
  13. KairosDB is a RESTful web service wrapper around Cassandra with a Cassandra schema tuned for timeseries data. Data is added to KairosDB by POSTing a JSON document. Supports batch adding multiple metrics (and multiple datapoints per metric) with a single call. The metric has a name, a set of datapoints (timestamp and float or int values), and a set of tags. Tags are arbitrary metadata that you can associate with metrics.
  14. Similarly, just POST a JSON document to query KairosDB. Can batch read multiple metrics within a time range but can’t batch reads for multiple time ranges (yet?) Query for metrics by name and filter by tags. Can also aggregate and down-sample data (e.g., sum all data in this time range into 10 minute buckets)
  15. Of course, all was not rosy. If we want to parallelize writes, we have to handle the read-update-write concurrency problem discussed earlier. Plus, if you have higher volume than one metric per millisecond, you HAVE to pre-aggregate since all KairosDB operations are idempotent. KairosDB also doesn’t handle high-cardinality tags very well… but this is well-known in the community and I expect it to be addressed soon-ish. Lastly, as anyone who has managed Cassandra in AWS knows… it can be tedious.
  16. Let’s start with the low hanging fruit. I think a lot of companies probably write their own tools for this. Netflix has Priam which we used for a while, contributed some stuff back, but eventually took a different direction.
  17. An example of the problem managing Cassandra on AWS is managing the security groups. You have 30 to 100 Cassandra instances in 3-6 regions and they all talk to each other. So you have to manage a lot of security group rules manually or use a tool. Priam was backed by SimpleDB. So it just changed the problem from managing IPs in security groups to managing them in SimpleDB.
  18. We already had all these IPs and such (from a libcloud-based internal tool), we didn’t want to have to manually manage the IPs in SimpleDB. We also wanted something that would run outside AWS since we have different environments in different clouds, and we didn’t want to run a coprocess on all Cassandra instances. So we built Agathon which is easier to extend to different backends, can run in any cloud or raw hardware, and runs as a normal centralized web service. It also handles providing seeds for bootstrapping Cassandra nodes and such. Anyway, its on Github and its low hanging fruit. Check it out.
  19. The next problem was high cardinality “tags”... so you can’t really store user ids or IP addresses or transaction ids in them.
  20. This issue stems from KairosDB underlying Cassandra schema. There are 3 column families. The main one is the data_points family which has a row key consisting of the metric name, base timestamp, and serialized key-value pairs from the tags. Each row stores three weeks of data, with each column name being a millisecond offset from the base timestamp in the row key and the value is the actual value itself.
  21. In NoSQL you generally have to precompute your queries and build indexes manually, so there’s a row_key_index column family. The row key is the metric name and each column name points to a row key in the data_points column family. As you can see, distinct tag combinations each result in a new column. With high-cardinality tags, this can quickly reach Cassandra’s 2 billion column limit.
  22. So instead of storing high-cardinality data in a tag, we might be able to store it in the value itself. There’s a feature branch (not in master yet) that adds support for custom data. Getting this merged is Brian Hawkins, the main guy behind KairosDB, top priority for the next release. Even so, we’ve been running with this in production for a few months now and its looking pretty solid. No problems with it so far.
  23. As you can see, you specify a string data for example and tell Kairos that this datapoint is of the “string” type, which is built-in to KairosDB custom_data by default. You can also provide your own custom data types as KairosDB plugins as we’ll see in a bit.
  24. Okay, now we can get to the grandaddy of problems here. Parallelizing writes without running into race conditions.
  25. So this solution has several pieces. The standard way to avoid the concurrency race condition is to only have one process responsible for reading/updating/writing a particular piece of data. I already mentioned that pre-aggregation is required to solve the time-granularity issue. And lastly is ensuring that each message/stat is processed exactly once. This is a hard but important problem since many stats systems are considered systems of record, used for billing purposes or identifying discrepancies.
  26. When you put these pieces together, you typically end up with something like this. Multiple queues and a lot of worker processes. This leads to complex dependency chains and brittle configurations. Not fun to manage.
  27. The folks at Twitter built (or, rather, bought) Storm/Trident to make this easier. Storm provides a higher-level abstraction than raw message passing, queues, and workers. Storm provides two main primitives: spouts and bolts. Spouts are the sources of data. The emit streams of tuples to downstream bolts. Bolts can perform arbitrary calculations: filtering, transformations, reading data from other sources and merging, writing to persistent data stores. Anything. This chain of Spouts and Bolts forms a “topology” You can even join or merge streams and perform other higher level operations easily using Trident, an API on top of Storm. Queuing between workers happens seamlessly and only when required by a repartition operation, like shuffling or sharding. Storm provides two built-in queues to choose from: 0MQ or Netty.
  28. So how does this fit into the big picture? For us, it looks something like this. We launched a new product which needed support for more general metrics so we built this out and called it the Stats 2.0 pipeline. But for our existing high-volume apps which write directly to Mongo, we needed a nice transition from our existing infrastructure to this new stats pipeline (aka Stats 2.0). So we built a transitional Stats 1.5 pipeline which uses Storm for pre-aggregation and then writes to Mongo. These two pipelines use a lot of the same infrastructure but different Storm topologies and persistent layers.
  29. Logically, this pipeline can be broken down to these layers. At the top, we have our Kafka brokers which use partitions as the unit of parallelism. The first layer within Storm is the Kafka spouts which read from the Kafka partitions. The tuples are emitted from the spout to the transform layer. In our stats topology, this is parsing the message JSON, splitting messages to multiple metrics, picking the pre-aggregation bucket, and deciding the final metric names. The last layer in Storm provides the actual read-update-write operations for persistence. And finally we have the KairosDB layer itself. In red, you see the repartitioning operations which divide each layer. Tuples are randomly distributed between the 2 spout worker threads and all the transform worker threads. This could actually move data between nodes too. The groupBy between the transform and persistence layers is essentially sharding based on the given fields. So only a single worker thread does the aggregation for a set of data. If your set of fields form a decent partitioner, you can spread the load across your nodes pretty evenly. Finally, the persistence layer talk to a local haproxy load balancer which round-robins between different KairosDB nodes. And of course Cassandra has its own partitioning built-in.
  30. This is one of our Trident topologies. What’s nice is that those logical layers are pretty evident in this topology. It also makes it much easier to persist to state, do partitions and aggregations, and even batches tuples which trades better throughput for increased latency.
  31. Its easy to see that this is the spout layer...
  32. … and the transform layer...
  33. … and finally the persistence layer
  34. But where is KairosDB in this topology? Both the Kafka spout and KairosDB state are declared in advance and you can easily substitute other spouts or states in the topology (in theory at least :).
  35. So let’s review our checklist.
  36. We first wanted to shard the data to avoid any concurrency race conditions.
  37. We did that by grouping by specific fields here.
  38. Next we wanted to pre-aggregate the data.
  39. This is handled in the awesome presistentAggregate function that Trident provides. This does the actual read-update-write process. It takes the group of keys from the incoming batch of tuples, queries the persistent state for the current value, and reduces/combines them all using whatever aggregator you choose, and finally writes them back out to the state. Here we’re just doing a Sum across all the values. (There are some extra slides at the end which illustrate this process)
  40. Finally lets see how we can ensure exactly-once semantics in this system.
  41. Thankfully, Trident makes this easy too. It supports three levels of transactionality by default. Non-Transactional provide at-least once semantics. If a tuple fails processing, it may be replayed and thus double counted (or more). Using transactional spouts/states can ensure that doesn’t happen. These store the transaction ID alongside the data in the database. And finally Opaque Transactional provides a stronger guarantee. It can provide exactly-once semantics even in the face of failure of your spout, such as if Kafka goes down. In addition to the transaction ID, it also stores the previous value so it can reason out when a batch has completed processing. Trident provides a few standard serializers for storing all the needed information for the desired level of transactionality. Its either a raw value or a stringified JSON array with a transaction ID and possibly the previous value.
  42. Before I knew about the high-cardinality issue, my first approach was to store the transaction ID and previous value as tags in KairosDB. We’ve already talked about how that turned out. (Badly)
  43. So instead we have to store these in the KairosDB value itself using the Custom Data support. We just map the serializers we saw earlier to KairosDB custom data types.
  44. And then when we create the metric, we also tell KairosDB what type of data the metric holds.
  45. These Trident custom data types for KairosDB and the full KairosDB state implementation for Trident are both open source and available on GitHub. Check ‘em out.
  46. So we’ve workarounds for all these catches and arrived at a flexible and very scalable distributed infrastructure for stats. It wasn’t that bad, was it? :)
  47. As a final note, if you’ve ever tried to track down stats discrepancies, you’ll know how bad it hurts. So we actually deployed the Stats 1.5 pipeline in parallel to the existing system to make sure that it was producing the same values. Great Success! (The Stats 2.0 pipeline was green-field so there wasn’t a previous system for comparison)
  48. So its correct, but is it operating quickly enough? If you look up how to monitor a pipeline like this, you’re probably going to see references to this tool: stormkafkamon. Its pretty great, but its a bit dated. It doesn’t work with the latest Kafka, etc. There’s a fork on the BrightTag GitHub page that’s updated for the latest versions of everything. The main question the business always asks is “how backed up is the stats system”, so I also modified the output so it tells it how much lag there is. This parses the message from Kafka so it has some assumptions. I’m working on generalizing this a bit more for others to use. Here you can see that we’re still near-real-time, enough for our needs anyway. You can decrease this latency by scaling out your persistence layer farther. This is where the engineering tradeoff of latency vs. resources/$$$ comes in.
  49. Again, looking at writing to mongo directly vs. pre-aggregating with Storm. The bottom line is that sharding and pre-aggregation can drastically reduce the number of writes to mongo. We’ve reduced the number of mongo instances by 5x here! That’s a huge win. We’ve gone from being bound by disk I/O to being bound by Mongo locking. KairosDB/Cassandra doesn’t have the same locking penalty, so this should be an even bigger win with it.
  50. Trident provides a nice fluent interface and a lot of powerful operations for building a topology, but how does that map to the underlying Storm primitives? A Trident topology compiles to an efficient storm topology (spouts and bolts) by dividing between repartitioning operations like shuffle and groupBy. The Topology we showed (with the Kafka + Transform + Persistence layers where the Transform does a Stream split into two aggregation buckets) would compile like this.
  51. If you call .name() on the various bits of the stream, you can actually see how Trident compiled into the Storm spouts and bolts in the Storm UI. Here you can see the 30s and 30m persistence bolts, the Transform bolt, and the spout. Note that the spout actually appears as a bolt because the Kafka-Spout provides a controller Spout which coordinates reads from the spout bolt threads to the kafka broker/partitions.
  52. You can also see how the parallelismHints in the topology translate to executors (aka threads) and tasks in the Storm UI. We codified the tuning rules into a StormParallelism helper which is parametized for your machine configuration and topology layout. Note that we have 2 spout0 threads since we have 2 kafka partitions, each 30s and 30m aggregator has 3 threads since there are 3 storm hosts (one of each aggregator per host) and the transform layer has 24 executors since its the only CPU-bound part of the topology; that’s 3 hosts * 2 cores each * 4 threads per core = 24 threads.
  53. There’s not a lot of Storm tuning info out there, but there’s enough if you’ll read a few Gist-based guides. I found they really boil down to these big-4 rules of storm tuning. https://gist.github.com/codyaray/ac2eceb3ff92fa0eaf6b
  54. This is how persistentAggregate works internally. It takes a State like KairosState which implements IBackingMap with the two methods: multiGet and multiPut. All the tuples in the current batch are grouped by the fields (in this case, the timestamp and metric name). Then these keys are parallelized to the persistence parallelism into several multiGet calls. Then all of these values are reduced/combined together and written back to the state in a bulk mutiPut request. (Usually you want persistence parallelism = number of workers, or one thread per worker node, to improve cache efficiency and reduce the bulk request overhead).
  55. But this still isn’t the full picture of the reducer/combiner; we can dive into how this read-update-write pipeline works even farther.
  56. Each multiGet call performs the underlying bulk request and then passes the key-value pairs from the batch + the key-value pairs from the multiGet into the given reducer/combiner. For example, the Sum() aggregator we’ve used here just adds up all the previous stored values and the values from the batch. The final value for each key is then written back to the underlying state in a bulk multiPut request.