SlideShare a Scribd company logo
1 of 25
Download to read offline
MOA: Massive Online Analysis, a Framework for Stream
Classification and Clustering
Albert Bifet, Geoff Holmes, Bernhard Pfahringer,
Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl
University of Waikato
Hamilton, New Zealand
Data Management and Data Exploration Group
RWTH Aachen University, Germany
Cumberland Lodge, 2 September 2010
Workshop on Applications of Pattern Analysis 2010
Mining Massive Data
2007
Digital Universe: 281 exabytes (billion gigabytes)
The amount of information created exceeded available
storage for the first time
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
Twitter
106 million registered users
3 billion requests a day via its API.
2 / 21
Efficient Algorithms
Evolving Data Streams
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Stream Mining Algorithms
Fast methods without storing all dataset in memory
Traditional methods don’t deal with restrictions
3 / 21
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
Easy to extend
Easy to design and run experiments
4 / 21
WEKA
Waikato Environment for Knowledge Analysis
Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java
Released under the GPL
Support for the whole process of experimental data mining
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Used for education, research and applications
Complements “Data Mining” by Witten & Frank
5 / 21
WEKA: the bird
6 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
Data stream learning cycle
1 Process an example at a time,
and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point
8 / 21
Classification Experimental setting
9 / 21
Classification Experimental setting
Evaluation procedures for Data
Streams
Holdout
Interleaved Test-Then-Train or
Prequential
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
10 / 21
Classification Experimental setting
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Hyperplane
SEA Generator
STAGGER Generator
10 / 21
Classification Experimental setting
Classifiers
Naive Bayes
Decision stumps
Hoeffding Tree
Hoeffding Option Tree
Bagging and Boosting
ADWIN Bagging and
Leveraging Bagging
Prediction strategies
Majority class
Naive Bayes Leaves
Adaptive Hybrid
10 / 21
Clustering Experimental setting
11 / 21
Clustering Experimental setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.
12 / 21
Clustering Experimental setting
Clusterers
StreamKM++
CluStream
ClusTree
Den-Stream
D-Stream
CobWeb
13 / 21
Web
http://www.moa.cs.waikato.ac.nz
14 / 21
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.GUI
15 / 21
Command Line
EvaluatePeriodicHeldOutTest
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar
moa.DoTask "EvaluatePeriodicHeldOutTest
-l DecisionStump -s generators.WaveformGenerator
-n 100000 -i 100000000 -f 1000000" > dsresult.csv
This command creates a comma separated values file:
training the DecisionStump classifier on the WaveformGenerator
data,
using the first 100 thousand examples for testing,
training on a total of 100 million examples, and
testing every one million examples:
16 / 21
Easy Design of a MOA classifier
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
double[] getVotesForInstance (Instance i)
17 / 21
Easy Design of a MOA clusterer
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
Clustering getClusteringResult()
18 / 21
Extensions of MOA
Multi-label Classification
Itemset Pattern Mining
Sequence Pattern Mining
19 / 21
Summary
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
http://www.moa.cs.waikato.ac.nz
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
MOA deals with evolving data streams
MOA is easy to use and extend
20 / 21
21 / 21

More Related Content

What's hot

Propositional logic
Propositional logicPropositional logic
Propositional logicMamta Pandey
 
Vendor Selection Case Study
Vendor Selection Case Study Vendor Selection Case Study
Vendor Selection Case Study Laura Arber, PMP
 
Satisfiability
SatisfiabilitySatisfiability
SatisfiabilityJim Kukula
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESsuthi
 
zkStudyClub - cqlin: Efficient linear operations on KZG commitments
zkStudyClub - cqlin: Efficient linear operations on KZG commitments zkStudyClub - cqlin: Efficient linear operations on KZG commitments
zkStudyClub - cqlin: Efficient linear operations on KZG commitments Alex Pruden
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cutMayank Garg
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithmKamal Nayan
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Srimatre K
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real Worldahmad bassiouny
 
Kruskal & Prim's Algorithm
Kruskal & Prim's AlgorithmKruskal & Prim's Algorithm
Kruskal & Prim's AlgorithmIfad Rahman
 

What's hot (17)

Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Vendor Selection Case Study
Vendor Selection Case Study Vendor Selection Case Study
Vendor Selection Case Study
 
Backtracking
BacktrackingBacktracking
Backtracking
 
Satisfiability
SatisfiabilitySatisfiability
Satisfiability
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTES
 
zkStudyClub - cqlin: Efficient linear operations on KZG commitments
zkStudyClub - cqlin: Efficient linear operations on KZG commitments zkStudyClub - cqlin: Efficient linear operations on KZG commitments
zkStudyClub - cqlin: Efficient linear operations on KZG commitments
 
26 Computational Geometry
26 Computational Geometry26 Computational Geometry
26 Computational Geometry
 
Max flow min cut
Max flow min cutMax flow min cut
Max flow min cut
 
Alpha beta
Alpha betaAlpha beta
Alpha beta
 
Alpha beta pruning
Alpha beta pruningAlpha beta pruning
Alpha beta pruning
 
Network flows
Network flowsNetwork flows
Network flows
 
KMP Pattern Matching algorithm
KMP Pattern Matching algorithmKMP Pattern Matching algorithm
KMP Pattern Matching algorithm
 
Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3Formal Languages and Automata Theory unit 3
Formal Languages and Automata Theory unit 3
 
Kmp
KmpKmp
Kmp
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
 
Kruskal & Prim's Algorithm
Kruskal & Prim's AlgorithmKruskal & Prim's Algorithm
Kruskal & Prim's Algorithm
 

Viewers also liked

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2OSri Ambati
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalseSAT Journals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityJuho Kim
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesLighton Phiri
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013Doc Norton
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Daniele Dell'Aglio
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methodsweka Content
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimationjachno
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Beniamino Murgante
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An EpidemicDoc Norton
 

Viewers also liked (20)

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2O
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner Activity
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An Epidemic
 

Similar to Moa: Real Time Analytics for Data Streams

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...Raffaele Montella
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance IssuesCloudera, Inc.
 

Similar to Moa: Real Time Analytics for Data Streams (20)

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsAlbert Bifet
 

More from Albert Bifet (20)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Moa: Real Time Analytics for Data Streams

  • 1. MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl University of Waikato Hamilton, New Zealand Data Management and Data Exploration Group RWTH Aachen University, Germany Cumberland Lodge, 2 September 2010 Workshop on Applications of Pattern Analysis 2010
  • 2. Mining Massive Data 2007 Digital Universe: 281 exabytes (billion gigabytes) The amount of information created exceeded available storage for the first time Eric Schmidt, August 2010 Every two days now we create as much information as we did from the dawn of civilization up until 2003. 5 exabytes of data Twitter 106 million registered users 3 billion requests a day via its API. 2 / 21
  • 3. Efficient Algorithms Evolving Data Streams Extract information from potentially infinite sequence of data possibly varying over time using few resources Stream Mining Algorithms Fast methods without storing all dataset in memory Traditional methods don’t deal with restrictions 3 / 21
  • 4. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering Easy to extend Easy to design and run experiments 4 / 21
  • 5. WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 5 / 21
  • 7. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 8. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 9. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 10. Data stream learning cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 8 / 21
  • 12. Classification Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 10 / 21
  • 13. Classification Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Hyperplane SEA Generator STAGGER Generator 10 / 21
  • 14. Classification Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting ADWIN Bagging and Leveraging Bagging Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 10 / 21
  • 16. Clustering Experimental setting Internal measures External measures Gamma Rand statistic C Index Jaccard coefficient Point-Biserial Folkes and Mallow Index Log Likelihood Hubert Γ statistics Dunn’s Index Minkowski score Tau Purity Tau A van Dongen criterion Tau C V-measure Somer’s Gamma Completeness Ratio of Repetition Homogeneity Modified Ratio of Repetition Variation of information Adjusted Ratio of Clustering Mutual information Fagan’s Index Class-based entropy Deviation Index Cluster-based entropy Z-Score Index Precision D Index Recall Silhouette coefficient F-measure Table: Internal and external clustering evaluation measures. 12 / 21
  • 20. Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 16 / 21
  • 21. Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) 17 / 21
  • 22. Easy Design of a MOA clusterer void resetLearningImpl () void trainOnInstanceImpl (Instance inst) Clustering getClusteringResult() 18 / 21
  • 23. Extensions of MOA Multi-label Classification Itemset Pattern Mining Sequence Pattern Mining 19 / 21
  • 24. Summary {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. http://www.moa.cs.waikato.ac.nz It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering MOA deals with evolving data streams MOA is easy to use and extend 20 / 21