SlideShare ist ein Scribd-Unternehmen logo
1 von 52
© 2017 MapR Technologies 1
Update on t-digest
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board member, Apache Software Foundation
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @ted_dunning
© 2017 MapR Technologies 3
Who We Are
• MapR echnologies
– We make a kick-ass platform for big data computing
– Support many workloads including Hadoop / Spark / HPC / Other
– Extended to allow streams and tables in basic platform
– Free for academic research / training
• Apache Software Foundation
– Culture hub for building open source communities
– Shared values around openness for contribution as well as use
– Many major projects are part of Apache
– Even more minor ones!
© 2017 MapR Technologies 4
Basic Outline
• Why we should measure distributions
• Basic Ideas
• How t-digest works
• Recent results
• Applications
© 2017 MapR Technologies 5
Why Is This Practically Important
• The novice came to the master and says “something is broken”
© 2017 MapR Technologies 6
Why Is This Practically Important
• The novice came to the master and says “something is broken”
• The master replied “What has changed?”
© 2017 MapR Technologies 7
Why Is This Practically Important
• The novice came to the master and says “something is broken”
• The master replied “What has changed?”
• And the student was enlightened
© 2017 MapR Technologies 8
Finding change is key
but what kind?
© 2017 MapR Technologies 9
Last Night’s Latencies
• These are ping latencies from my hotel
• Looks pretty good, right?
• But what about longer term?
208.302
198.571
185.099
191.258
201.392
214.738
197.389
187.749
201.693
186.762
185.296
186.390
183.960
188.060
190.763
> mean(y$t[i])
[1] 198.6047
> sd(y$t[i])
[1] 71.43965
© 2017 MapR Technologies 10
Not So Fast …
© 2017 MapR Technologies 11
This is long-tailed land
© 2017 MapR Technologies 12
This is long-tailed land
You have to know the distribution
of values
© 2017 MapR Technologies 13
© 2017 MapR Technologies 14
A single number
is simply not enough
© 2017 MapR Technologies 15
What We Really Need Here
• I want to be able to compute the distribution from any time
period
• From any subset of measurements
• With lots of keys and filters
• And not a lot of space
• Basically, any OLAP kind of query
select distribution(x) from … where … group by y,z
© 2017 MapR Technologies 16
Idea 0 – Pre-defined bins
• So let’s assume we have bins
– Upper, lower bound, constant width
• Get a measurement, pick a bin, increment count
• Works great if you know the data
– And you have limited dynamic range (too many bins)
– And the distribution is fixed
• Useful, but not general enough
© 2017 MapR Technologies 17
Idea 1 – Exponential Bins
• Suppose we want relative accuracy in measurement space
• Latencies are positive and only matter within a few percent
– 1.1 ms versus 1.0 ms
– 1100 ms versus 1000 ms
• We can cheat by using floating point representations
– Compute bin using magic
– Count
© 2017 MapR Technologies 18
FloatHistogram
• Assume all measurements are in the range
• Divide this range into power of 2 sub-ranges
• Sub-divide each sub-range evenly with steps
– is typical
• Relative error is bounded in measurement space
© 2017 MapR Technologies 19
FloatHistogram
• Assume all measurements are in the range
• Divide this range into power of 2 sub-ranges
• Sub-divide each sub-range evenly with steps
– is typical
• Relative error is bounded in measurement space
• Bin index can be computed using FP representation!
© 2017 MapR Technologies 20
Fixed Size Bins
© 2017 MapR Technologies 21
Approximate Exponential Bins
© 2017 MapR Technologies 22
Non-linear bins are better
(sometimes)
Still not general enough
© 2017 MapR Technologies 23
Idea 2 – Fully Adaptive Bins
• First intuition – in general, we want accuracy in terms of
percentile
• Second intuition – we want better accuracy at extreme
quantiles
– 50%-ile versus 50.1%-ile?
– What does 0.1% error even mean for 99.99th percentile
• We need bins with small counts near the edges
© 2017 MapR Technologies 24
First 1% of data shown.
Left graph has 100 x 100 sample bins.
Right graph has ~130bins, variable size
© 2017 MapR Technologies 25
The Basic t-digest
• Take a bunch of data
• Sort it
• Group into bins
– But make the bins be smaller at the beginning and end
• Remember the centroid and count of each bin
• That’s a t-digest
© 2017 MapR Technologies 26
But Wait, You Need a Bit More
• Take a bunch of new data, old t-digest
• Sort the data and the old bins together
• Group into bins
– Note that existing bins have bigger weights
– So they might survive … or might clump
• Remember the centroid and count of each new bin
• That’s an updated t-digest
© 2017 MapR Technologies 27
Oh … and Merging
• Take a bunch of old t-digests
• Sort the bins
• Group into mega-bins
– Respect the size constraint
• Remember the centroid and count of each new bin
• That’s a merged t-digest
© 2017 MapR Technologies 28
Adaptive non-linear bins are good
and general
And can be grouped
and regrouped
© 2017 MapR Technologies 29
Results
© 2017 MapR Technologies 30
© 2017 MapR Technologies 31
Status
• Current release
– Small accuracy bugs in corner cases
– Best overall is still AVLTreeDigest
© 2017 MapR Technologies 32
Status
• Current release (3.x)
– Small accuracy bugs in corner cases
– Best overall is still AVLTreeDigest
• Upcoming release (4.0)
– Better accuracy in pathological cases
– Strictly bounded size
– No dynamic allocation (with MergingDigest)
– Good speed (100ns for MergingDigest, 5ns for FloatHistogram)
– Real Soon Now
© 2017 MapR Technologies 33
Example Application
• The data:
– ~ 1 million machines
– Even more services
– Each producing thousands of measurements per second
• Store t-digest for each 5 minute period for each measurement
• Want to query any combination of keys, produce t-digest result
“what was the distribution of launch times yesterday?”
“what about last month?”
“in Europe versus in North America versus in Asia?”
© 2017 MapR Technologies 34
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 35
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 36
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 37
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 38
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 39
What about visualization?
© 2017 MapR Technologies 40
Can’t see small count bars
© 2017 MapR Technologies 41
Good Results
© 2017 MapR Technologies 42
Bad Results – 1% of measurements are 3x bigger
© 2017 MapR Technologies 43
Bad Results – 1% of measurements are 3x bigger
© 2017 MapR Technologies 44
With Better Vertical Scaling
© 2017 MapR Technologies 45
Uniform Bins
© 2017 MapR Technologies 46
FloatHistogram Bins
© 2017 MapR Technologies 47
With FloatHistogram
© 2017 MapR Technologies 48
Original Ping Latency Data
© 2017 MapR Technologies 49
Summary
• Single measurements insufficient, need distributions
• Uniform binned histograms not good
• FloatHistogram for some cases
• T-digest for general cases
• Upcoming release has super-
fast and accurate versions
• Good visualization also key
0.0 0.2 0.4 0.6 0.8 1.0
q
0246810
k
© 2017 MapR Technologies 50
Q & A
© 2017 MapR Technologies 51
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Board member, Apache Software Foundation
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @ted_dunning
© 2017 MapR Technologies 52
T-digest
• Or we can talk about small errors in q
• Accumulate samples, sort, merge
• Merge if k-size < 1
• Interpolate using centroids in x
• Very good near extremes, no dynamic allocation
0.0 0.2 0.4 0.6 0.8 1.0
q
0246810
k

Weitere ähnliche Inhalte

Was ist angesagt?

Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 

Was ist angesagt? (20)

Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 

Ähnlich wie T digest-update

ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouBig Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouMatt Stubbs
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 

Ähnlich wie T digest-update (20)

ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell YouBig Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
Big Data LDN 2017: Machine Learning: What Works And What They Won’t Tell You
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 

Mehr von Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Mehr von Ted Dunning (7)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Kürzlich hochgeladen

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Kürzlich hochgeladen (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

T digest-update

  • 1. © 2017 MapR Technologies 1 Update on t-digest
  • 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Board member, Apache Software Foundation O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning
  • 3. © 2017 MapR Technologies 3 Who We Are • MapR echnologies – We make a kick-ass platform for big data computing – Support many workloads including Hadoop / Spark / HPC / Other – Extended to allow streams and tables in basic platform – Free for academic research / training • Apache Software Foundation – Culture hub for building open source communities – Shared values around openness for contribution as well as use – Many major projects are part of Apache – Even more minor ones!
  • 4. © 2017 MapR Technologies 4 Basic Outline • Why we should measure distributions • Basic Ideas • How t-digest works • Recent results • Applications
  • 5. © 2017 MapR Technologies 5 Why Is This Practically Important • The novice came to the master and says “something is broken”
  • 6. © 2017 MapR Technologies 6 Why Is This Practically Important • The novice came to the master and says “something is broken” • The master replied “What has changed?”
  • 7. © 2017 MapR Technologies 7 Why Is This Practically Important • The novice came to the master and says “something is broken” • The master replied “What has changed?” • And the student was enlightened
  • 8. © 2017 MapR Technologies 8 Finding change is key but what kind?
  • 9. © 2017 MapR Technologies 9 Last Night’s Latencies • These are ping latencies from my hotel • Looks pretty good, right? • But what about longer term? 208.302 198.571 185.099 191.258 201.392 214.738 197.389 187.749 201.693 186.762 185.296 186.390 183.960 188.060 190.763 > mean(y$t[i]) [1] 198.6047 > sd(y$t[i]) [1] 71.43965
  • 10. © 2017 MapR Technologies 10 Not So Fast …
  • 11. © 2017 MapR Technologies 11 This is long-tailed land
  • 12. © 2017 MapR Technologies 12 This is long-tailed land You have to know the distribution of values
  • 13. © 2017 MapR Technologies 13
  • 14. © 2017 MapR Technologies 14 A single number is simply not enough
  • 15. © 2017 MapR Technologies 15 What We Really Need Here • I want to be able to compute the distribution from any time period • From any subset of measurements • With lots of keys and filters • And not a lot of space • Basically, any OLAP kind of query select distribution(x) from … where … group by y,z
  • 16. © 2017 MapR Technologies 16 Idea 0 – Pre-defined bins • So let’s assume we have bins – Upper, lower bound, constant width • Get a measurement, pick a bin, increment count • Works great if you know the data – And you have limited dynamic range (too many bins) – And the distribution is fixed • Useful, but not general enough
  • 17. © 2017 MapR Technologies 17 Idea 1 – Exponential Bins • Suppose we want relative accuracy in measurement space • Latencies are positive and only matter within a few percent – 1.1 ms versus 1.0 ms – 1100 ms versus 1000 ms • We can cheat by using floating point representations – Compute bin using magic – Count
  • 18. © 2017 MapR Technologies 18 FloatHistogram • Assume all measurements are in the range • Divide this range into power of 2 sub-ranges • Sub-divide each sub-range evenly with steps – is typical • Relative error is bounded in measurement space
  • 19. © 2017 MapR Technologies 19 FloatHistogram • Assume all measurements are in the range • Divide this range into power of 2 sub-ranges • Sub-divide each sub-range evenly with steps – is typical • Relative error is bounded in measurement space • Bin index can be computed using FP representation!
  • 20. © 2017 MapR Technologies 20 Fixed Size Bins
  • 21. © 2017 MapR Technologies 21 Approximate Exponential Bins
  • 22. © 2017 MapR Technologies 22 Non-linear bins are better (sometimes) Still not general enough
  • 23. © 2017 MapR Technologies 23 Idea 2 – Fully Adaptive Bins • First intuition – in general, we want accuracy in terms of percentile • Second intuition – we want better accuracy at extreme quantiles – 50%-ile versus 50.1%-ile? – What does 0.1% error even mean for 99.99th percentile • We need bins with small counts near the edges
  • 24. © 2017 MapR Technologies 24 First 1% of data shown. Left graph has 100 x 100 sample bins. Right graph has ~130bins, variable size
  • 25. © 2017 MapR Technologies 25 The Basic t-digest • Take a bunch of data • Sort it • Group into bins – But make the bins be smaller at the beginning and end • Remember the centroid and count of each bin • That’s a t-digest
  • 26. © 2017 MapR Technologies 26 But Wait, You Need a Bit More • Take a bunch of new data, old t-digest • Sort the data and the old bins together • Group into bins – Note that existing bins have bigger weights – So they might survive … or might clump • Remember the centroid and count of each new bin • That’s an updated t-digest
  • 27. © 2017 MapR Technologies 27 Oh … and Merging • Take a bunch of old t-digests • Sort the bins • Group into mega-bins – Respect the size constraint • Remember the centroid and count of each new bin • That’s a merged t-digest
  • 28. © 2017 MapR Technologies 28 Adaptive non-linear bins are good and general And can be grouped and regrouped
  • 29. © 2017 MapR Technologies 29 Results
  • 30. © 2017 MapR Technologies 30
  • 31. © 2017 MapR Technologies 31 Status • Current release – Small accuracy bugs in corner cases – Best overall is still AVLTreeDigest
  • 32. © 2017 MapR Technologies 32 Status • Current release (3.x) – Small accuracy bugs in corner cases – Best overall is still AVLTreeDigest • Upcoming release (4.0) – Better accuracy in pathological cases – Strictly bounded size – No dynamic allocation (with MergingDigest) – Good speed (100ns for MergingDigest, 5ns for FloatHistogram) – Real Soon Now
  • 33. © 2017 MapR Technologies 33 Example Application • The data: – ~ 1 million machines – Even more services – Each producing thousands of measurements per second • Store t-digest for each 5 minute period for each measurement • Want to query any combination of keys, produce t-digest result “what was the distribution of launch times yesterday?” “what about last month?” “in Europe versus in North America versus in Asia?”
  • 34. © 2017 MapR Technologies 34 Collect Data log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center
  • 35. © 2017 MapR Technologies 35 And Transport to Global Analytics log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 36. © 2017 MapR Technologies 36 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 37. © 2017 MapR Technologies 37 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 38. © 2017 MapR Technologies 38 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 39. © 2017 MapR Technologies 39 What about visualization?
  • 40. © 2017 MapR Technologies 40 Can’t see small count bars
  • 41. © 2017 MapR Technologies 41 Good Results
  • 42. © 2017 MapR Technologies 42 Bad Results – 1% of measurements are 3x bigger
  • 43. © 2017 MapR Technologies 43 Bad Results – 1% of measurements are 3x bigger
  • 44. © 2017 MapR Technologies 44 With Better Vertical Scaling
  • 45. © 2017 MapR Technologies 45 Uniform Bins
  • 46. © 2017 MapR Technologies 46 FloatHistogram Bins
  • 47. © 2017 MapR Technologies 47 With FloatHistogram
  • 48. © 2017 MapR Technologies 48 Original Ping Latency Data
  • 49. © 2017 MapR Technologies 49 Summary • Single measurements insufficient, need distributions • Uniform binned histograms not good • FloatHistogram for some cases • T-digest for general cases • Upcoming release has super- fast and accurate versions • Good visualization also key 0.0 0.2 0.4 0.6 0.8 1.0 q 0246810 k
  • 50. © 2017 MapR Technologies 50 Q & A
  • 51. © 2017 MapR Technologies 51 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Board member, Apache Software Foundation O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @ted_dunning
  • 52. © 2017 MapR Technologies 52 T-digest • Or we can talk about small errors in q • Accumulate samples, sort, merge • Merge if k-size < 1 • Interpolate using centroids in x • Very good near extremes, no dynamic allocation 0.0 0.2 0.4 0.6 0.8 1.0 q 0246810 k