SlideShare a Scribd company logo
1 of 19
©2016 Pepperdata
Sean Suchter
CEO & Co-founder, Pepperdata
A Billion Points of Data
PressureBest Practices to Process, Store, and Visualize
Cluster Activity Data at Scale
©2016 Pepperdata
AGENDA
• The deluge of data metrics
• Managing data pressure
• Architecture
• Scaling and optimizing data writes
• Optimizing queries
• Short-lived time series — performance and math challenges
• Performance stats vs. standard OpenTSDB deployments
©2016 Pepperdata
EVEN A FEW NODES GENERATE MANY METRICS
Example cluster:
• 4,000 nodes
• Each running 40 tasks
• Each task has 200 different metrics (memory consumed, HDFS reads, CPU time, etc.)
Hadoop itself is a great example of a huge-scale microservice architecture!
If we sample every metric every 5 seconds, we generate about
400 million data points every minute (500 billion per day)
node 1
task 1
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 2
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 3
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 4
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 5
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
… task P
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
node 2
task 1
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 2
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 3
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 4
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 5
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
… task Q
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
node M
task 1
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 2
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 3
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 4
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
task 5
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
… task R
metric 1
metric 2
metric 3
metric 4
metric 5
...
metric N
…
©2016 Pepperdata
DATA PRESSURE IS LIKE WATER PRESSURE
• System over max capacity? You will get
leaks!
• Reinforcing one component adds
pressure elsewhere — EVERY
component must be stable.
• Increase capacity or change behavior on
one component and you WILL cause
ripples throughout the system.
Imagine the plumbing in your house:
©2016 Pepperdata
TYPICAL OPENTSDB DEPLOYMENT
http://opentsdb.net/overview.html
©2016 Pepperdata
PEPPERDATA DASHBOARD ARCHITECTURE
Node1 Node 2 Node N…
Hadoop cluster
DashboardServices
Insertion
TSD
Insertion
TSD
…
Node-level
aggregation
Node-level
aggregation
…
HBase
Query TSD Query TSD…
Servlet
Browser
(Javascript)
Browser
(Javascript)
Global
aggregation
©2016 Pepperdata
• It’s important to separate read TSDs and write TSDs because of
different thread access patterns.
SEPARATE TSDS
©2016 Pepperdata
HOW MANY TSDS?
• To handle more data, you can add TSDs, but that increases
pressure on the HBase and HDFS.
• Solution: scale HBase and HDFS too.
• But you will have problems until your data gets well split or if there
are spikes on particular nodes.
• Solution: If TSD/HBase/HDFS gets overloaded, buffer up data on
the sender side until it can be sent.
©2016 Pepperdata
REDUCE SERIALIZATION OVERHEAD
• Serialization of the input data into TSD is a CPU bottleneck.
• Solution, part 1: Bulk put — insert many points as one operation.
• Solution, part 2: OpenTSDB plugin — move metrics processing
from collector into the TSD process.
©2016 Pepperdata
SEGMENT THE QUERY
• Typically one query gets handled by a read TSD. But then that
TSD runs out of memory servicing the query.
• Solution: Segment the query, then put it back together at the
servlet layer.
• But now the servlet layer has to hold the put-together query data.
• Solution: Two-phase queries —pick the top-N series to show in one
pass, then redo the query with the actual content.
©2016 Pepperdata
PRE-COMPUTE COMMON AGGREGATES
• The raw data is too much data to read off the disk in HBase.
• Solution: Pre-compute common aggregates before inserting one
node’s data to TSD.
• The pre-aggregates can even be too much on a large cluster.
• Solution: Globally aggregate once enough node data is inserted.
• Note: Tricky because senders can have buffered data; need to handle this
case right.
©2016 Pepperdata
CACHING
• Queries are frequently over the same data.
• Solution: Cache the computed response in the TSD on disk.
• Need to make sure you invalidate this at the right moments — can be
tricky since different nodes and metrics may be ingested at different
times.
©2016 Pepperdata
SHORT-LIVED TIME SERIES: STORAGE
CHALLENGE
• Container time series start and stop a lot. (Millions of times per
hour.) This can be inefficient for storage.
• Solution: Pick tag schema very carefully.
©2016 Pepperdata
SHORT-LIVED TIME SERIES: MATH CHALLENGES
• Short-lived time series break typical OpenTSDB math:
• Standard OpenTSDB case of node-level time series can have
discontinuities around start and stop of nodes and that’s fine,
since you only see a few of those starts and stops in any given
plot.
• With containers, you’re likely to get 100’s of thousands or
millions of discontinuities on every plot!
• Aggregation, downsampling, and rate metrics need to be
computed very carefully.
• Solution: Completely rewrite math layers to take this into account.
©2016 Pepperdata
SOME PERFORMANCE STATS
• One example production deployment:
• We can render plots of a full day of this data in ~1 second.
4,000
nodes
40
tasks/no
de
200
metrics/
task
5-
second
sampling
400
million
points/m
inute
©2016 Pepperdata
PERFORMANCE VS. STANDARD OPENTSDB
http://www.slideshare.net/HBaseCon/ecosystem-session-6
http://www.slideshare.net/HBaseCon/operations-session-3-49043534
- 50,000 100,000 150,000 200,000 250,000
Arista
OVH
Limelight
Pinterest
Box
Ticketmaster
Yahoo
Pepperdata
datapoints/second per machine
©2016 Pepperdata
PERFORMANCE VS. STANDARD OPENTSDB
http://opentsdb.net/misc/opentsdb-oscon.pdf
We process and store
~600 Billion data
points per day from a
single Hadoop cluster
©2016 Pepperdata
PEPPERDATA’S OPENTSDB
CONTRIBUTIONS
• See
https://github.com/pepperdata/opentsdb
©2016 Pepperdata
Questions?

More Related Content

What's hot

What's hot (20)

Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Intro to cassandra + hadoop
Intro to cassandra + hadoopIntro to cassandra + hadoop
Intro to cassandra + hadoop
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
Dave Shuttleworth - Platform performance comparisons, bare metal and cloud ho...
Dave Shuttleworth - Platform performance comparisons, bare metal and cloud ho...Dave Shuttleworth - Platform performance comparisons, bare metal and cloud ho...
Dave Shuttleworth - Platform performance comparisons, bare metal and cloud ho...
 

Viewers also liked

Que hago y_como_vivo
Que hago y_como_vivoQue hago y_como_vivo
Que hago y_como_vivo
almeri1595
 
Uranium mining and milling in namibia swiegers
Uranium mining and milling in namibia  swiegersUranium mining and milling in namibia  swiegers
Uranium mining and milling in namibia swiegers
Leishman Associates
 
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
Alan Huang
 
Social media pres
Social media presSocial media pres
Social media pres
dannimooka
 
Validation of User Intentions in Process Models
Validation of User Intentions in Process ModelsValidation of User Intentions in Process Models
Validation of User Intentions in Process Models
Gerd Groener
 
Major research presentation E.O.I.
Major research presentation E.O.I.Major research presentation E.O.I.
Major research presentation E.O.I.
The City of Toronto
 

Viewers also liked (20)

Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUponHBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
OSv at Cassandra Summit
OSv at Cassandra SummitOSv at Cassandra Summit
OSv at Cassandra Summit
 
Que hago y_como_vivo
Que hago y_como_vivoQue hago y_como_vivo
Que hago y_como_vivo
 
Adjetivos (avanzado)
Adjetivos (avanzado)Adjetivos (avanzado)
Adjetivos (avanzado)
 
Uranium mining and milling in namibia swiegers
Uranium mining and milling in namibia  swiegersUranium mining and milling in namibia  swiegers
Uranium mining and milling in namibia swiegers
 
Emmanuel
EmmanuelEmmanuel
Emmanuel
 
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
黃問_如何問出好問題 _台北市健言社(作大夢的歐吉桑)
 
Surf oral
Surf oralSurf oral
Surf oral
 
Social media pres
Social media presSocial media pres
Social media pres
 
Perímetro craneal
Perímetro cranealPerímetro craneal
Perímetro craneal
 
Marketing print ad 10-step plan
Marketing print ad 10-step planMarketing print ad 10-step plan
Marketing print ad 10-step plan
 
Become a Social Business: How Intuit is leveraging Social Media
Become a Social Business: How Intuit is leveraging Social MediaBecome a Social Business: How Intuit is leveraging Social Media
Become a Social Business: How Intuit is leveraging Social Media
 
Meet harry
Meet harryMeet harry
Meet harry
 
Validation of User Intentions in Process Models
Validation of User Intentions in Process ModelsValidation of User Intentions in Process Models
Validation of User Intentions in Process Models
 
New Zealand Franchising Confidence Index | October 2011
New Zealand Franchising Confidence Index | October 2011New Zealand Franchising Confidence Index | October 2011
New Zealand Franchising Confidence Index | October 2011
 
Major research presentation E.O.I.
Major research presentation E.O.I.Major research presentation E.O.I.
Major research presentation E.O.I.
 

Similar to A Billion Points of Data Pressure

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
saili mane
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
DataStax
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Kognitio
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHP
Ricard Clau
 

Similar to A Billion Points of Data Pressure (20)

Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.Hadoop Hardware @Twitter: Size does matter.
Hadoop Hardware @Twitter: Size does matter.
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni SchieferSpark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Berni Schiefer
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
try
trytry
try
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Hardware Provisioning
Hardware Provisioning Hardware Provisioning
Hardware Provisioning
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
 
Redis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHPRedis Everywhere - Sunshine PHP
Redis Everywhere - Sunshine PHP
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

A Billion Points of Data Pressure

  • 1. ©2016 Pepperdata Sean Suchter CEO & Co-founder, Pepperdata A Billion Points of Data PressureBest Practices to Process, Store, and Visualize Cluster Activity Data at Scale
  • 2. ©2016 Pepperdata AGENDA • The deluge of data metrics • Managing data pressure • Architecture • Scaling and optimizing data writes • Optimizing queries • Short-lived time series — performance and math challenges • Performance stats vs. standard OpenTSDB deployments
  • 3. ©2016 Pepperdata EVEN A FEW NODES GENERATE MANY METRICS Example cluster: • 4,000 nodes • Each running 40 tasks • Each task has 200 different metrics (memory consumed, HDFS reads, CPU time, etc.) Hadoop itself is a great example of a huge-scale microservice architecture! If we sample every metric every 5 seconds, we generate about 400 million data points every minute (500 billion per day) node 1 task 1 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 2 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 3 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 4 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 5 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N … task P metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N node 2 task 1 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 2 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 3 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 4 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 5 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N … task Q metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N node M task 1 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 2 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 3 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 4 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N task 5 metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N … task R metric 1 metric 2 metric 3 metric 4 metric 5 ... metric N …
  • 4. ©2016 Pepperdata DATA PRESSURE IS LIKE WATER PRESSURE • System over max capacity? You will get leaks! • Reinforcing one component adds pressure elsewhere — EVERY component must be stable. • Increase capacity or change behavior on one component and you WILL cause ripples throughout the system. Imagine the plumbing in your house:
  • 5. ©2016 Pepperdata TYPICAL OPENTSDB DEPLOYMENT http://opentsdb.net/overview.html
  • 6. ©2016 Pepperdata PEPPERDATA DASHBOARD ARCHITECTURE Node1 Node 2 Node N… Hadoop cluster DashboardServices Insertion TSD Insertion TSD … Node-level aggregation Node-level aggregation … HBase Query TSD Query TSD… Servlet Browser (Javascript) Browser (Javascript) Global aggregation
  • 7. ©2016 Pepperdata • It’s important to separate read TSDs and write TSDs because of different thread access patterns. SEPARATE TSDS
  • 8. ©2016 Pepperdata HOW MANY TSDS? • To handle more data, you can add TSDs, but that increases pressure on the HBase and HDFS. • Solution: scale HBase and HDFS too. • But you will have problems until your data gets well split or if there are spikes on particular nodes. • Solution: If TSD/HBase/HDFS gets overloaded, buffer up data on the sender side until it can be sent.
  • 9. ©2016 Pepperdata REDUCE SERIALIZATION OVERHEAD • Serialization of the input data into TSD is a CPU bottleneck. • Solution, part 1: Bulk put — insert many points as one operation. • Solution, part 2: OpenTSDB plugin — move metrics processing from collector into the TSD process.
  • 10. ©2016 Pepperdata SEGMENT THE QUERY • Typically one query gets handled by a read TSD. But then that TSD runs out of memory servicing the query. • Solution: Segment the query, then put it back together at the servlet layer. • But now the servlet layer has to hold the put-together query data. • Solution: Two-phase queries —pick the top-N series to show in one pass, then redo the query with the actual content.
  • 11. ©2016 Pepperdata PRE-COMPUTE COMMON AGGREGATES • The raw data is too much data to read off the disk in HBase. • Solution: Pre-compute common aggregates before inserting one node’s data to TSD. • The pre-aggregates can even be too much on a large cluster. • Solution: Globally aggregate once enough node data is inserted. • Note: Tricky because senders can have buffered data; need to handle this case right.
  • 12. ©2016 Pepperdata CACHING • Queries are frequently over the same data. • Solution: Cache the computed response in the TSD on disk. • Need to make sure you invalidate this at the right moments — can be tricky since different nodes and metrics may be ingested at different times.
  • 13. ©2016 Pepperdata SHORT-LIVED TIME SERIES: STORAGE CHALLENGE • Container time series start and stop a lot. (Millions of times per hour.) This can be inefficient for storage. • Solution: Pick tag schema very carefully.
  • 14. ©2016 Pepperdata SHORT-LIVED TIME SERIES: MATH CHALLENGES • Short-lived time series break typical OpenTSDB math: • Standard OpenTSDB case of node-level time series can have discontinuities around start and stop of nodes and that’s fine, since you only see a few of those starts and stops in any given plot. • With containers, you’re likely to get 100’s of thousands or millions of discontinuities on every plot! • Aggregation, downsampling, and rate metrics need to be computed very carefully. • Solution: Completely rewrite math layers to take this into account.
  • 15. ©2016 Pepperdata SOME PERFORMANCE STATS • One example production deployment: • We can render plots of a full day of this data in ~1 second. 4,000 nodes 40 tasks/no de 200 metrics/ task 5- second sampling 400 million points/m inute
  • 16. ©2016 Pepperdata PERFORMANCE VS. STANDARD OPENTSDB http://www.slideshare.net/HBaseCon/ecosystem-session-6 http://www.slideshare.net/HBaseCon/operations-session-3-49043534 - 50,000 100,000 150,000 200,000 250,000 Arista OVH Limelight Pinterest Box Ticketmaster Yahoo Pepperdata datapoints/second per machine
  • 17. ©2016 Pepperdata PERFORMANCE VS. STANDARD OPENTSDB http://opentsdb.net/misc/opentsdb-oscon.pdf We process and store ~600 Billion data points per day from a single Hadoop cluster
  • 18. ©2016 Pepperdata PEPPERDATA’S OPENTSDB CONTRIBUTIONS • See https://github.com/pepperdata/opentsdb

Editor's Notes

  1. Data points are not just time stamped, they are time series -