SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Rich Histograms at Scale:
A New Hope
Evan Chan
@evanfchan
http://github.com/filodb/FiloDB
This is not a contribution
This is not a contribution@evanfchan
What do we do with
Histograms?
This is not a contribution@evanfchan
The Evolution of Histograms
• Pre-aggregated percentiles
Prometheus
InfluxDB
???
Statsd
Graphite
OpenTSDB• Histogram with buckets
• Prometheus histograms
• HDRHistogram
• T-Digests
This is not a contribution@evanfchan
Overlaid Latency Quantiles
This is not a contribution@evanfchan
Now an incident happens…
This is not a contribution@evanfchan
Heatmaps: Rich Visuals
This is not a contribution@evanfchan
Grafana Heatmaps
• Buckets are scalable to much more input data but
needs TSDB support for histogram buckets
• Time series: flexible, but Grafana needs to read
ALL the raw data
This is not a contribution@evanfchan
Useful Histograms
• Should be aggregatable
• Supports quantiles, distributions, other f(x)
• Heatmaps - histograms over time
• Should be accurate
• Should scale and be efficient
This is not a contribution@evanfchan
Buckets and Accuracy
• Max quantile error = bucket
width / lowerBound
• Exponential buckets = consistent
max quantile errors (Good!)
• Linear almost never makes sense
• Your custom Prom histogram
buckets likely have >100% error
Histogram Type Max Error % # Buckets
Linear 100% 60,000,000
Exponential 99.1% 26
Linear 10% 600,000,000
Exponential 10.0% 188
Example: (1000, 6E10) value range
This is not a contribution@evanfchan
Configuring your Histograms
• Start with the range of values you need: (min, max)
• Pick the desired max quantile error %
• Think about trading off publish freq for accuracy
• # buckets = log(max/min) / log(1 + max_error)
• Example: Max error=50%, (1000 to 6E10):
numBuckets = Math.log(6E10/1000) / Math.log(1 + 0.50)

exponentialBuckets(1000, 1 + 0.50, numBuckets)
This is not a contribution
Histograms at Scale
This is not a contribution@evanfchan
Histograms as First-Class
Citizen
• Modeling, transporting, and storing histograms holistically
offers many benefits
• Scalability — much better storage, network, query speed
• Proper aggregations
• Better accuracy and features
• Adaptable to better histogram designs in the future
• Almost nobody is doing this yet
This is not a contribution@evanfchan
Prometheus Histogram
Schema
__name__ metric_sum
5 buckets, sum, count per histogram
__name__ metric_count
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
le 0.5
le 2.0
le 5.0
le 10.
le 25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1
Series2
Series3
Series4
Series5
Series6
Series7
This is not a contribution@evanfchan
The Scale Problem with
Histograms
• My app: 100 metrics, 20 histograms
• Assume range of (1000, 6E10).
• Notice how histograms dominate the time series!
Max error % Num buckets
Histogram
Series
Other Series Total Series
50% 44 882 80 962
10% 188 3762 80 3842
2% 905 18102 80 18182
This is not a contribution@evanfchan
Mama we got a problem
• Actual system: hundreds of
millions of metrics, each one
has histogram with 64
buckets
• Using Prometheus would
lead to tens of billions of
series
This is not a contribution@evanfchan
Prometheus: Raw Data
__name__ metric_sum
__name__ metric_count
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
__name__ metric_bucket
le 0.5
le 2.0
le 5.0
le 10.
le 25.
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
Zone Us-west
44
5
0
2
3
5
5
This is not a contribution@evanfchan
Atomicity Issues
• Prom export, scrape does not guarantee grouping
of histogram buckets.
• Easy to only get part of a histogram
• FiloDB is a distributed database. 7 records might
end up in 7 different nodes!
• Calculating histogram_quantile: talk to 7 nodes
for every query!
This is not a contribution@evanfchan
Single Histogram Schema
5 buckets, sum, count per histogram
__name__ metric
Sum
Count
Hist
0.5
2.0
5.0
10.
25.
44
5
0
2
3
5
5
35
6
1
4
6
6
6
50
10
1
5
8
9
10
60
11
2
6
10
11
11
Series1
This is not a contribution@evanfchan
Single Histogram Raw Data
__name__ MetricZone Us-west
44 5 0 2 3 5 5
Sum Count Hist (0.5, 2, 5, 10, 25)
• One record, not (n + 2). No distribution problem!
• Labels only appear once
• Savings proportional to # of histogram buckets
• 50x savings for 64 histogram buckets
This is not a contribution@evanfchan
Much smaller network and
disk usage
• One time series vs 66 -> 50x network I/O reduction
• Single histogram schema in FiloDB uses < 0.2 bytes
per histogram bucket
Network I/O
Bytesper
histogram
0
3500
7000
10500
14000
Series/bucket Series/histo
Storage cost
Bytesperbucket
0
0.4
0.8
1.2
1.6
Series/bucket Series/histo
This is not a contribution@evanfchan
Optimizing Histograms:
Compression
• Delta encoding of increasing bucket values
0 2 3 5 5 0 2 1 2 0
1 4 6 6 6 1 3 2 0 0
• Compressed size about 4x-10x better than 1
time series per bucket (64 buckets; FiloDB)
• 0.18 bytes/histogram bucket (range: 0.16 - 0.61)
FiloDB
SingleHistogram
0.18 bytes/bucket
Prometheus 1.5 bytes/bucket
Raw data 8 bytes/bucket
This is not a contribution@evanfchan
Optimizing Histograms:
Querying (64 Buckets)
• histogram_quantile()
is more than 100x faster
than series-per-bucket
• No need for group-by
• Localized computation vs
needing to jump across 64
bucket time series
histogram_quantile()
QPS
0
7500
15000
22500
30000
Series/Bucket Series/Histo
This is not a contribution
Rich Histograms
Usability and Correctness
This is not a contribution@evanfchan
Changing buckets…. sum()
• sum(rate(http_req_latency{…..}[5m])) by (le)
• Different buckets lead to incorrect sums
2.5 5 10 50 +Infle= 25 100
This is not a contribution@evanfchan
Holistic Histograms: 

Correct Sums
• Adding histograms holistically allows us to track
bucket changes and correctly sum them
2.5 5 10 50 +Infle= 25 100
This is not a contribution@evanfchan
histogram_quantile clipping
• At 20:00, quantile is clipped at 2nd-last bucket of
10.0
This is not a contribution@evanfchan
histogram_max_quantile
• Client sends a max value at each time interval
This is not a contribution@evanfchan
histogram_max_quantile
• Having a known max allows us to interpolate in last bucket
• Cannot interpolate to +Inf
• https://github.com/filodb/FiloDB/pull/361
2.5 5 10 25 +Infle= 40
0.9
This is not a contribution@evanfchan
Ad-Hoc Histograms
• Just the quantile, min, max from gauges is not that useful
• Get heat map for CPU use across k8s containers
• histogram(2, 8,
container_cpu_usage_seconds_total{….})
• Aggregate histogram across gauges using new
histogram() function
• Yes Grafana can do heat maps from raw series - but you
can only read so many raw time series. :)
This is not a contribution@evanfchan
Summary: Rich Histograms
at Scale
• Treating histograms as a first class citizen
• Massive savings in storage and network I/O
• Solve aggregation and other correctness issues
• Move towards T-Digests and future formats
Thank you very much!
Please reach out to help make useful histograms

at scale a reality!
@evanfchan
http://github.com/filodb/FiloDB
Monitorama slack: #talk-evan-chan
This is not a contribution@evanfchan
Example 2: Write size
This is not a contribution@evanfchan
Heatmap 2: Write Size
This is not a contribution@evanfchan
Histogram aggregation:
Prometheus
• Group by is needed for summing histogram buckets
due to data model - leak of abstraction
• What if dev changes the histogram scheme? (# of
buckets, etc.)
• Not possible to resolve scheme differences in Prom,
since aggregation knows nothing about histograms
sum(rate(histogram_bucket{app="foo")[5m])) by (le)
This is not a contribution@evanfchan
Histogram aggregation:
FiloDB
• No need for _bucket, but need to select histogram
column
• No need for group by. Histograms are natively
understood and correct aggregations happen
sum(rate(histogram{app=“foo”,__col__=“h”)[5m]))

Weitere ähnliche Inhalte

Was ist angesagt?

Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)James Gray
 
Getting Started with Amazon Inspector
Getting Started with Amazon InspectorGetting Started with Amazon Inspector
Getting Started with Amazon InspectorAmazon Web Services
 
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWS
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWSAWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWS
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWSAmazon Web Services Korea
 
Policy Enforcement on Kubernetes with Open Policy Agent
Policy Enforcement on Kubernetes with Open Policy AgentPolicy Enforcement on Kubernetes with Open Policy Agent
Policy Enforcement on Kubernetes with Open Policy AgentVMware Tanzu
 
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스Amazon Web Services Korea
 
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...Amazon Web Services
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyNeo4j
 
Goの時刻に関するテスト
Goの時刻に関するテストGoの時刻に関するテスト
Goの時刻に関するテストKentaro Kawano
 
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트Amazon Web Services Korea
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Amazon Web Services
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
エラー・バジェットによるリスク管理 Managing risk with error budgets
エラー・バジェットによるリスク管理 Managing risk with error budgetsエラー・バジェットによるリスク管理 Managing risk with error budgets
エラー・バジェットによるリスク管理 Managing risk with error budgetsGoogle Cloud Platform - Japan
 
データ活用を加速するAWS分析サービスのご紹介
データ活用を加速するAWS分析サービスのご紹介データ活用を加速するAWS分析サービスのご紹介
データ活用を加速するAWS分析サービスのご紹介Amazon Web Services Japan
 
Building an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPABuilding an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPANeo4j
 
엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나종민 김
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザNoritaka Sekiyama
 

Was ist angesagt? (20)

Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)Amazon's Simple Storage Service (S3)
Amazon's Simple Storage Service (S3)
 
Getting Started with Amazon Inspector
Getting Started with Amazon InspectorGetting Started with Amazon Inspector
Getting Started with Amazon Inspector
 
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWS
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWSAWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWS
AWS IoT로 예지정비 실현하기 - 이종화 솔루션즈 아키텍트, AWS
 
AWS IAM Introduction
AWS IAM IntroductionAWS IAM Introduction
AWS IAM Introduction
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
Policy Enforcement on Kubernetes with Open Policy Agent
Policy Enforcement on Kubernetes with Open Policy AgentPolicy Enforcement on Kubernetes with Open Policy Agent
Policy Enforcement on Kubernetes with Open Policy Agent
 
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
AWS Summit Seoul 2023 | 산업용 ‘이음(e-Um) 5G’ 특화망을 위한 KT의 AWS 기반 사설 5G 서비스
 
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
AWS, I Choose You: Pokemon's Battle against the Bots (SEC402-R1) - AWS re:Inv...
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
Goの時刻に関するテスト
Goの時刻に関するテストGoの時刻に関するテスト
Goの時刻に関するテスト
 
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
AWS Summit Seoul 2023 | "이봐, 해봤어?" 해본! 사람의 Modern Data Architecture 비밀 노트
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
エラー・バジェットによるリスク管理 Managing risk with error budgets
エラー・バジェットによるリスク管理 Managing risk with error budgetsエラー・バジェットによるリスク管理 Managing risk with error budgets
エラー・バジェットによるリスク管理 Managing risk with error budgets
 
データ活用を加速するAWS分析サービスのご紹介
データ活用を加速するAWS分析サービスのご紹介データ活用を加速するAWS分析サービスのご紹介
データ活用を加速するAWS分析サービスのご紹介
 
Building an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPABuilding an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPA
 
엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나엘라스틱서치, 로그스태시, 키바나
엘라스틱서치, 로그스태시, 키바나
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Data Lake ハンズオン
Data Lake ハンズオンData Lake ハンズオン
Data Lake ハンズオン
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザ
 

Ähnlich wie Histograms at scale - Monitorama 2019

FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleEvan Chan
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsSimon Belak
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithmsSimon Belak
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooMithun Radhakrishnan
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 

Ähnlich wie Histograms at scale - Monitorama 2019 (20)

FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
Online statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithmsOnline statistical analysis using transducers and sketch algorithms
Online statistical analysis using transducers and sketch algorithms
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithms
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at Yahoo
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 

Mehr von Evan Chan

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web ServiceEvan Chan
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureEvan Chan
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkEvan Chan
 

Mehr von Evan Chan (15)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Kürzlich hochgeladen

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 

Kürzlich hochgeladen (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 

Histograms at scale - Monitorama 2019

  • 1. Rich Histograms at Scale: A New Hope Evan Chan @evanfchan http://github.com/filodb/FiloDB
  • 2. This is not a contribution
  • 3. This is not a contribution@evanfchan What do we do with Histograms?
  • 4. This is not a contribution@evanfchan The Evolution of Histograms • Pre-aggregated percentiles Prometheus InfluxDB ??? Statsd Graphite OpenTSDB• Histogram with buckets • Prometheus histograms • HDRHistogram • T-Digests
  • 5. This is not a contribution@evanfchan Overlaid Latency Quantiles
  • 6. This is not a contribution@evanfchan Now an incident happens…
  • 7. This is not a contribution@evanfchan Heatmaps: Rich Visuals
  • 8. This is not a contribution@evanfchan Grafana Heatmaps • Buckets are scalable to much more input data but needs TSDB support for histogram buckets • Time series: flexible, but Grafana needs to read ALL the raw data
  • 9. This is not a contribution@evanfchan Useful Histograms • Should be aggregatable • Supports quantiles, distributions, other f(x) • Heatmaps - histograms over time • Should be accurate • Should scale and be efficient
  • 10. This is not a contribution@evanfchan Buckets and Accuracy • Max quantile error = bucket width / lowerBound • Exponential buckets = consistent max quantile errors (Good!) • Linear almost never makes sense • Your custom Prom histogram buckets likely have >100% error Histogram Type Max Error % # Buckets Linear 100% 60,000,000 Exponential 99.1% 26 Linear 10% 600,000,000 Exponential 10.0% 188 Example: (1000, 6E10) value range
  • 11. This is not a contribution@evanfchan Configuring your Histograms • Start with the range of values you need: (min, max) • Pick the desired max quantile error % • Think about trading off publish freq for accuracy • # buckets = log(max/min) / log(1 + max_error) • Example: Max error=50%, (1000 to 6E10): numBuckets = Math.log(6E10/1000) / Math.log(1 + 0.50)
 exponentialBuckets(1000, 1 + 0.50, numBuckets)
  • 12. This is not a contribution Histograms at Scale
  • 13. This is not a contribution@evanfchan Histograms as First-Class Citizen • Modeling, transporting, and storing histograms holistically offers many benefits • Scalability — much better storage, network, query speed • Proper aggregations • Better accuracy and features • Adaptable to better histogram designs in the future • Almost nobody is doing this yet
  • 14. This is not a contribution@evanfchan Prometheus Histogram Schema __name__ metric_sum 5 buckets, sum, count per histogram __name__ metric_count __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket le 0.5 le 2.0 le 5.0 le 10. le 25. 44 5 0 2 3 5 5 35 6 1 4 6 6 6 50 10 1 5 8 9 10 60 11 2 6 10 11 11 Series1 Series2 Series3 Series4 Series5 Series6 Series7
  • 15. This is not a contribution@evanfchan The Scale Problem with Histograms • My app: 100 metrics, 20 histograms • Assume range of (1000, 6E10). • Notice how histograms dominate the time series! Max error % Num buckets Histogram Series Other Series Total Series 50% 44 882 80 962 10% 188 3762 80 3842 2% 905 18102 80 18182
  • 16. This is not a contribution@evanfchan Mama we got a problem • Actual system: hundreds of millions of metrics, each one has histogram with 64 buckets • Using Prometheus would lead to tens of billions of series
  • 17. This is not a contribution@evanfchan Prometheus: Raw Data __name__ metric_sum __name__ metric_count __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket __name__ metric_bucket le 0.5 le 2.0 le 5.0 le 10. le 25. Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west Zone Us-west 44 5 0 2 3 5 5
  • 18. This is not a contribution@evanfchan Atomicity Issues • Prom export, scrape does not guarantee grouping of histogram buckets. • Easy to only get part of a histogram • FiloDB is a distributed database. 7 records might end up in 7 different nodes! • Calculating histogram_quantile: talk to 7 nodes for every query!
  • 19. This is not a contribution@evanfchan Single Histogram Schema 5 buckets, sum, count per histogram __name__ metric Sum Count Hist 0.5 2.0 5.0 10. 25. 44 5 0 2 3 5 5 35 6 1 4 6 6 6 50 10 1 5 8 9 10 60 11 2 6 10 11 11 Series1
  • 20. This is not a contribution@evanfchan Single Histogram Raw Data __name__ MetricZone Us-west 44 5 0 2 3 5 5 Sum Count Hist (0.5, 2, 5, 10, 25) • One record, not (n + 2). No distribution problem! • Labels only appear once • Savings proportional to # of histogram buckets • 50x savings for 64 histogram buckets
  • 21. This is not a contribution@evanfchan Much smaller network and disk usage • One time series vs 66 -> 50x network I/O reduction • Single histogram schema in FiloDB uses < 0.2 bytes per histogram bucket Network I/O Bytesper histogram 0 3500 7000 10500 14000 Series/bucket Series/histo Storage cost Bytesperbucket 0 0.4 0.8 1.2 1.6 Series/bucket Series/histo
  • 22. This is not a contribution@evanfchan Optimizing Histograms: Compression • Delta encoding of increasing bucket values 0 2 3 5 5 0 2 1 2 0 1 4 6 6 6 1 3 2 0 0 • Compressed size about 4x-10x better than 1 time series per bucket (64 buckets; FiloDB) • 0.18 bytes/histogram bucket (range: 0.16 - 0.61) FiloDB SingleHistogram 0.18 bytes/bucket Prometheus 1.5 bytes/bucket Raw data 8 bytes/bucket
  • 23. This is not a contribution@evanfchan Optimizing Histograms: Querying (64 Buckets) • histogram_quantile() is more than 100x faster than series-per-bucket • No need for group-by • Localized computation vs needing to jump across 64 bucket time series histogram_quantile() QPS 0 7500 15000 22500 30000 Series/Bucket Series/Histo
  • 24. This is not a contribution Rich Histograms Usability and Correctness
  • 25. This is not a contribution@evanfchan Changing buckets…. sum() • sum(rate(http_req_latency{…..}[5m])) by (le) • Different buckets lead to incorrect sums 2.5 5 10 50 +Infle= 25 100
  • 26. This is not a contribution@evanfchan Holistic Histograms: 
 Correct Sums • Adding histograms holistically allows us to track bucket changes and correctly sum them 2.5 5 10 50 +Infle= 25 100
  • 27. This is not a contribution@evanfchan histogram_quantile clipping • At 20:00, quantile is clipped at 2nd-last bucket of 10.0
  • 28. This is not a contribution@evanfchan histogram_max_quantile • Client sends a max value at each time interval
  • 29. This is not a contribution@evanfchan histogram_max_quantile • Having a known max allows us to interpolate in last bucket • Cannot interpolate to +Inf • https://github.com/filodb/FiloDB/pull/361 2.5 5 10 25 +Infle= 40 0.9
  • 30. This is not a contribution@evanfchan Ad-Hoc Histograms • Just the quantile, min, max from gauges is not that useful • Get heat map for CPU use across k8s containers • histogram(2, 8, container_cpu_usage_seconds_total{….}) • Aggregate histogram across gauges using new histogram() function • Yes Grafana can do heat maps from raw series - but you can only read so many raw time series. :)
  • 31. This is not a contribution@evanfchan Summary: Rich Histograms at Scale • Treating histograms as a first class citizen • Massive savings in storage and network I/O • Solve aggregation and other correctness issues • Move towards T-Digests and future formats
  • 32. Thank you very much! Please reach out to help make useful histograms
 at scale a reality! @evanfchan http://github.com/filodb/FiloDB Monitorama slack: #talk-evan-chan
  • 33. This is not a contribution@evanfchan Example 2: Write size
  • 34. This is not a contribution@evanfchan Heatmap 2: Write Size
  • 35. This is not a contribution@evanfchan Histogram aggregation: Prometheus • Group by is needed for summing histogram buckets due to data model - leak of abstraction • What if dev changes the histogram scheme? (# of buckets, etc.) • Not possible to resolve scheme differences in Prom, since aggregation knows nothing about histograms sum(rate(histogram_bucket{app="foo")[5m])) by (le)
  • 36. This is not a contribution@evanfchan Histogram aggregation: FiloDB • No need for _bucket, but need to select histogram column • No need for group by. Histograms are natively understood and correct aggregations happen sum(rate(histogram{app=“foo”,__col__=“h”)[5m]))