SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Low-latency Multi-threaded
Ensemble Learning for
Dynamic Big Data Streams
Diego Marr´on (dmarron@ac.upc.edu)
Eduard Ayguad´e (eduard.ayguade@bsc.es)
Jos´e R. Herrero (josepr@ac.upc.edu)
Jesse Read (jesse.read@polytechnique.edu)
Albert Bifet (albert.bifet@telecom-paristech.fr)
2017 IEEE International Conference on Big Data
December 11-14, 2017, Boston, MA, USA
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Real–time mining of dynamic data streams
• Unprecedented amount of dynamic big data streams (Volume)
• Generating data at High Ratio (Velocity)
• Newly created data rapidly supersedes old data (Volatility)
• This increase in volume, velocity and volatility requires data
to be processed on–the–fly in real–time
2/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Real–time dynamic data streams classification
• Real-time classification imposes some challenges:
• Deal with potentially infinite streams
• Single pass on each instance
• React to changes on the stream (concept drifting)
• Bounded response-time:
• Low latency: Milliseconds (ms) per instance
• High latency: Few seconds (s) per instance
• Limited CPU-time to process each instance
• Preferred methods:
• Hoeffding Tree (HT)
• Ensemble of HT
3/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Hoeffding Tree
• Decision tree suitable for large data streams
• Easy–to–deploy
• They are usually able to keep up with the arrival rate
4/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Hoeffding Tree: Basics
• Build tree structure incrementally (on-the-fly)
• Tree structure uses attributes to route an instance to a leaf
node
• Leaf node:
• Contains the classifier (Naive Bayes)
• Statistics to decide next attribute to split (attribute counters)
• Split decision: needs per-attribute information gain
• Naive Bayes Classifier: calculates each attribute probability
5/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Ensembles: Random Forest of Hoeffding Trees
• Random Forest uses RandomHT (a variation of HT):
• Split decision uses a random subset of attributes
• For each RandomHT in the ensemble:
• Input: sampling with repetition
• Tree is reset if change (drift) is detected
• Responses are combined to form final prediction
• Ensembles require more work for each instance
6/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Exploiting CPU Parallelism
• Parallelism: more work on the same amount of time
• Improves throughput per time unit
• Or improves the accuracy by using more CPU-intensive
methods
• Modern CPU parallel features: multithreaded, SIMD
instructions
7/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Contributions
• Very low-latency response time
• few micro-seconds (µs) per instance
• Compared to an state-of-art implementation, MOA:
• Same accuracy
• Single HT, on average:
• Response time: 2 microseconds (µs) per instance
• 6.73x faster
• Multithreaded ensemble, on average:
• Response time: 10 microseconds (µs) per instance
• 85x faster
• Up to 70% parallel efficiency on a 24 cores CPU
• Highly scalable/adaptive design tested on:
• Intel platform: i7,Xeon
• ARM SoCs: from server range to low end (Raspberry Pi3)
8/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Hoeffding Tree
• Our implementation uses a binary tree
• Split into smaller sub-trees that fit on the CPU L1 cache
• L1 cache 64KB (8x64bits pointer)
• Max sub-tree height: 3
• SIMD instructions to accelerate calculations:
• Information Gain
• Naive Bayes Classifier
cache line
cache line
9/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT: Architecture
• N threads (up to the number of CPU hardware threads)
• Thread 1: data load/parser
• N-1 workers for L learners
• Common instance buffer (lockless ring buffer)
10/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT: Achieving low latency
• Lockless datas structures: at least one makes progress
• Key for scaling with low latency
• Lockless ring buffer:
• Single writer principle:
• Only the owner can write to it
• Everyone can read it
11/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT: Achieving low latency
• Lockless ring buffer:
• Buffer Head:
• Signals new instance on the buffer
• Owned by the parser
• Buffer Tail:
• Each worker owns its LastProcessed sequence number
• Buffer Tail: Lowest LastProcessed among all workers
12/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Evaluation
• Reference implementations
• MOA (Java)
• StreamDM (C++)
• Datasets:
• Two real-world: Covertype and Electricity
• 11 Synthetic generators:
• RandomRBF Drift: r1-r6
• Hyperplane: h1, h2
• LED Drift: l1
• SEA: s1, s2
• Hardware Platforms:
• Intel platform:
• Desktop class: i7-5930K
• Server class: Xeon Platinum 8160
• ARM-based SoC:
• High-end: Nvidia Jetson Tx1, Applied Micro X-Gene2
• Low-end: Raspberry Pi3
13/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Single Hoeffding Tree Performance vs MOA
• Single thread
• Same accuracy as MOA
• Average throughput 525.65 instances per millisecond (ms)
• 6.73x faster than MOA
• 7x faster then StreamDM
• Including instance loading/parsing time
• Except on the RPI3: all instances already parsed in memory
• Data parsing is currently a bottleneck
• Using data from memory: 3x faster on the Intel i7
14/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Single Hoeffding Tree: Speedup Over MOA
cov
elec
h1
h2
l1
r1
r2
r3
r4
r5
r6
s1
s2
Datasets
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Speedup
10
StreamDM (Intel i7)
LMHT (Intel i7)
LMHT (Jetson TX1)
LMHT (X-Gene2)
LMHT (Rpi3)
15/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT Throughput vs MOA
• 85x faster than MOA on average (Intel i7-5930K):
• MOA average: 1.30 Instances per millisecond (ms)
• LHMT average: 105 instances per millisecond (ms)
covtype elec h1 h2 l1 r1 r2 r3 r4 r5 r6 s1 s2
Datasets
0
20
40
60
80
100
120
Instances/ms
130 123
MOA LHMT
16/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT Throughput vs MOA
• Average throughput on different hardware platforms
MOA (i7)
LMHT(i7)
LMHT(Xeon)
LMHT(Jetson-TX1)
LMHT(X-Gene2)
2
13
29
105
109
Throughput(instances/ms)
17/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT Speedup vs MOA
LMHT Intel i7
LMHT Xeon
LMHT Jetson TX1
LMHT X-Gene2
11
24
85
88
Speedup
18/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT Scalability Test: i7-5930K
• i7-5930K: 6 cores / 12 threads
• Threads start competing for resources when using 6+ workers
(7+ threads)
1
2
3
4
5
6
7
8
9
10
11
# Worker threads
1
2
3
4
5
6
7
RelativeSpeedup
cov
elec
h1
h2
l1
r1
r2
r3
r4
r5
r6
s1
s2
19/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
LMHT Scalability Test: Xeon Platinum 8160
• Up to 70% parallel efficiency on Xeon 8160 (24 Cores)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Worker threads
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
RelativeSpeedup
cov
elec
h1
h2
l1
r1
r2
r3
r4
r5
r6
s1
s2
20/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Conclusions
• We presented a high performance scalable design for real-time
data streams classification
• Very low latency: few microseconds (µs) per instance
• Same accuracy than MOA
• Highly adaptive to a variety of hardware platforms
• From server to edge computing (ARM and Intel)
• Up to 70% parallel efficiency on a 24 cores CPU
• On Intel Platforms, on average:
• Single HT: 6.73x faster than MOA
• Multithreaded Ensemble: 85x faster than MOA
• On Arm-based SoCs, on average:
• Single HT: 2x faster than MOA (i7)
• Similar performance on a Raspberry Pi3 (ARM) than MOA(i7)
• Multithreaded ensemble: 24x faster than MOA (i7)
21/24
Introduction Hoeffding Tree Ensembles Evaluations Conclusions
Future Work
• Parser thread can easily limit throughput
• Find the appropriate ratio of learners per parser
• Implement counters for all kinds of attributes
• Scaling to multi-socket nodes (NUMA architectures)
• Distribute ensemble across several nodes
22/24
Thank you
Low-latency Multi-threaded
Ensemble Learning for
Dynamic Big Data Streams
Diego Marr´on (dmarron@ac.upc.edu)
Eduard Ayguad´e (eduard.ayguade@bsc.es)
Jos´e R. Herrero (josepr@ac.upc.edu)
Jesse Read (jesse.read@polytechnique.edu)
Albert Bifet (albert.bifet@telecom-paristech.fr)
2017 IEEE International Conference on Big Data
December 11-14, 2017, Boston, MA, USA

Weitere ähnliche Inhalte

Was ist angesagt?

Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Virtual Time Machine for Large-Scale Reproducible Distributed Emulation
Virtual Time Machine for Large-Scale Reproducible Distributed EmulationVirtual Time Machine for Large-Scale Reproducible Distributed Emulation
Virtual Time Machine for Large-Scale Reproducible Distributed EmulationJason Liu
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Belmiro Moreira
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
EDBT2015: Transactional Replication in Hybrid Data Store Architectures
EDBT2015: Transactional Replication in Hybrid Data Store ArchitecturesEDBT2015: Transactional Replication in Hybrid Data Store Architectures
EDBT2015: Transactional Replication in Hybrid Data Store Architecturestatemura
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Community
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sBelmiro Moreira
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchDanny Abukalam
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStackBelmiro Moreira
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterAttila Szegedi
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 

Was ist angesagt? (20)

Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Virtual Time Machine for Large-Scale Reproducible Distributed Emulation
Virtual Time Machine for Large-Scale Reproducible Distributed EmulationVirtual Time Machine for Large-Scale Reproducible Distributed Emulation
Virtual Time Machine for Large-Scale Reproducible Distributed Emulation
 
10Gbps transfers
10Gbps transfers10Gbps transfers
10Gbps transfers
 
Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016Cern Cloud Architecture - February, 2016
Cern Cloud Architecture - February, 2016
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
EDBT2015: Transactional Replication in Hybrid Data Store Architectures
EDBT2015: Transactional Replication in Hybrid Data Store ArchitecturesEDBT2015: Transactional Replication in Hybrid Data Store Architectures
EDBT2015: Transactional Replication in Hybrid Data Store Architectures
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
CERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8sCERN OpenStack Cloud Control Plane - From VMs to K8s
CERN OpenStack Cloud Control Plane - From VMs to K8s
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical ResearchBruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
 
Future Science on Future OpenStack
Future Science on Future OpenStackFuture Science on Future OpenStack
Future Science on Future OpenStack
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 

Ähnlich wie Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams

High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibrarySebastian Andrasoni
 
4th Systems Paper Survey Seminar
4th Systems Paper Survey Seminar4th Systems Paper Survey Seminar
4th Systems Paper Survey SeminarRyo Matsumiya
 
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...Frank Dürr
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjManhHoangVan
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture Haris456
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
 
Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Clint Edmonson
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...zionsaint
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedis Labs
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
 

Ähnlich wie Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams (20)

High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
LMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging LibraryLMAX Disruptor - High Performance Inter-Thread Messaging Library
LMAX Disruptor - High Performance Inter-Thread Messaging Library
 
4th Systems Paper Survey Seminar
4th Systems Paper Survey Seminar4th Systems Paper Survey Seminar
4th Systems Paper Survey Seminar
 
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...
Improving the Efficiency of Cloud Infrastructures with Elastic Tandem Machine...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 
Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02
 
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
DARPA ERI Summit 2018: The End of Moore’s Law & Faster General Purpose Comput...
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performance
 

Kürzlich hochgeladen

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 

Kürzlich hochgeladen (20)

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 

Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams

  • 1. Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams Diego Marr´on (dmarron@ac.upc.edu) Eduard Ayguad´e (eduard.ayguade@bsc.es) Jos´e R. Herrero (josepr@ac.upc.edu) Jesse Read (jesse.read@polytechnique.edu) Albert Bifet (albert.bifet@telecom-paristech.fr) 2017 IEEE International Conference on Big Data December 11-14, 2017, Boston, MA, USA
  • 2. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Real–time mining of dynamic data streams • Unprecedented amount of dynamic big data streams (Volume) • Generating data at High Ratio (Velocity) • Newly created data rapidly supersedes old data (Volatility) • This increase in volume, velocity and volatility requires data to be processed on–the–fly in real–time 2/24
  • 3. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Real–time dynamic data streams classification • Real-time classification imposes some challenges: • Deal with potentially infinite streams • Single pass on each instance • React to changes on the stream (concept drifting) • Bounded response-time: • Low latency: Milliseconds (ms) per instance • High latency: Few seconds (s) per instance • Limited CPU-time to process each instance • Preferred methods: • Hoeffding Tree (HT) • Ensemble of HT 3/24
  • 4. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Hoeffding Tree • Decision tree suitable for large data streams • Easy–to–deploy • They are usually able to keep up with the arrival rate 4/24
  • 5. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Hoeffding Tree: Basics • Build tree structure incrementally (on-the-fly) • Tree structure uses attributes to route an instance to a leaf node • Leaf node: • Contains the classifier (Naive Bayes) • Statistics to decide next attribute to split (attribute counters) • Split decision: needs per-attribute information gain • Naive Bayes Classifier: calculates each attribute probability 5/24
  • 6. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Ensembles: Random Forest of Hoeffding Trees • Random Forest uses RandomHT (a variation of HT): • Split decision uses a random subset of attributes • For each RandomHT in the ensemble: • Input: sampling with repetition • Tree is reset if change (drift) is detected • Responses are combined to form final prediction • Ensembles require more work for each instance 6/24
  • 7. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Exploiting CPU Parallelism • Parallelism: more work on the same amount of time • Improves throughput per time unit • Or improves the accuracy by using more CPU-intensive methods • Modern CPU parallel features: multithreaded, SIMD instructions 7/24
  • 8. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Contributions • Very low-latency response time • few micro-seconds (µs) per instance • Compared to an state-of-art implementation, MOA: • Same accuracy • Single HT, on average: • Response time: 2 microseconds (µs) per instance • 6.73x faster • Multithreaded ensemble, on average: • Response time: 10 microseconds (µs) per instance • 85x faster • Up to 70% parallel efficiency on a 24 cores CPU • Highly scalable/adaptive design tested on: • Intel platform: i7,Xeon • ARM SoCs: from server range to low end (Raspberry Pi3) 8/24
  • 9. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Hoeffding Tree • Our implementation uses a binary tree • Split into smaller sub-trees that fit on the CPU L1 cache • L1 cache 64KB (8x64bits pointer) • Max sub-tree height: 3 • SIMD instructions to accelerate calculations: • Information Gain • Naive Bayes Classifier cache line cache line 9/24
  • 10. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT: Architecture • N threads (up to the number of CPU hardware threads) • Thread 1: data load/parser • N-1 workers for L learners • Common instance buffer (lockless ring buffer) 10/24
  • 11. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT: Achieving low latency • Lockless datas structures: at least one makes progress • Key for scaling with low latency • Lockless ring buffer: • Single writer principle: • Only the owner can write to it • Everyone can read it 11/24
  • 12. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT: Achieving low latency • Lockless ring buffer: • Buffer Head: • Signals new instance on the buffer • Owned by the parser • Buffer Tail: • Each worker owns its LastProcessed sequence number • Buffer Tail: Lowest LastProcessed among all workers 12/24
  • 13. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Evaluation • Reference implementations • MOA (Java) • StreamDM (C++) • Datasets: • Two real-world: Covertype and Electricity • 11 Synthetic generators: • RandomRBF Drift: r1-r6 • Hyperplane: h1, h2 • LED Drift: l1 • SEA: s1, s2 • Hardware Platforms: • Intel platform: • Desktop class: i7-5930K • Server class: Xeon Platinum 8160 • ARM-based SoC: • High-end: Nvidia Jetson Tx1, Applied Micro X-Gene2 • Low-end: Raspberry Pi3 13/24
  • 14. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Single Hoeffding Tree Performance vs MOA • Single thread • Same accuracy as MOA • Average throughput 525.65 instances per millisecond (ms) • 6.73x faster than MOA • 7x faster then StreamDM • Including instance loading/parsing time • Except on the RPI3: all instances already parsed in memory • Data parsing is currently a bottleneck • Using data from memory: 3x faster on the Intel i7 14/24
  • 15. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Single Hoeffding Tree: Speedup Over MOA cov elec h1 h2 l1 r1 r2 r3 r4 r5 r6 s1 s2 Datasets 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Speedup 10 StreamDM (Intel i7) LMHT (Intel i7) LMHT (Jetson TX1) LMHT (X-Gene2) LMHT (Rpi3) 15/24
  • 16. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT Throughput vs MOA • 85x faster than MOA on average (Intel i7-5930K): • MOA average: 1.30 Instances per millisecond (ms) • LHMT average: 105 instances per millisecond (ms) covtype elec h1 h2 l1 r1 r2 r3 r4 r5 r6 s1 s2 Datasets 0 20 40 60 80 100 120 Instances/ms 130 123 MOA LHMT 16/24
  • 17. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT Throughput vs MOA • Average throughput on different hardware platforms MOA (i7) LMHT(i7) LMHT(Xeon) LMHT(Jetson-TX1) LMHT(X-Gene2) 2 13 29 105 109 Throughput(instances/ms) 17/24
  • 18. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT Speedup vs MOA LMHT Intel i7 LMHT Xeon LMHT Jetson TX1 LMHT X-Gene2 11 24 85 88 Speedup 18/24
  • 19. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT Scalability Test: i7-5930K • i7-5930K: 6 cores / 12 threads • Threads start competing for resources when using 6+ workers (7+ threads) 1 2 3 4 5 6 7 8 9 10 11 # Worker threads 1 2 3 4 5 6 7 RelativeSpeedup cov elec h1 h2 l1 r1 r2 r3 r4 r5 r6 s1 s2 19/24
  • 20. Introduction Hoeffding Tree Ensembles Evaluations Conclusions LMHT Scalability Test: Xeon Platinum 8160 • Up to 70% parallel efficiency on Xeon 8160 (24 Cores)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 # Worker threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 RelativeSpeedup cov elec h1 h2 l1 r1 r2 r3 r4 r5 r6 s1 s2 20/24
  • 21. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Conclusions • We presented a high performance scalable design for real-time data streams classification • Very low latency: few microseconds (µs) per instance • Same accuracy than MOA • Highly adaptive to a variety of hardware platforms • From server to edge computing (ARM and Intel) • Up to 70% parallel efficiency on a 24 cores CPU • On Intel Platforms, on average: • Single HT: 6.73x faster than MOA • Multithreaded Ensemble: 85x faster than MOA • On Arm-based SoCs, on average: • Single HT: 2x faster than MOA (i7) • Similar performance on a Raspberry Pi3 (ARM) than MOA(i7) • Multithreaded ensemble: 24x faster than MOA (i7) 21/24
  • 22. Introduction Hoeffding Tree Ensembles Evaluations Conclusions Future Work • Parser thread can easily limit throughput • Find the appropriate ratio of learners per parser • Implement counters for all kinds of attributes • Scaling to multi-socket nodes (NUMA architectures) • Distribute ensemble across several nodes 22/24
  • 24. Low-latency Multi-threaded Ensemble Learning for Dynamic Big Data Streams Diego Marr´on (dmarron@ac.upc.edu) Eduard Ayguad´e (eduard.ayguade@bsc.es) Jos´e R. Herrero (josepr@ac.upc.edu) Jesse Read (jesse.read@polytechnique.edu) Albert Bifet (albert.bifet@telecom-paristech.fr) 2017 IEEE International Conference on Big Data December 11-14, 2017, Boston, MA, USA