SlideShare a Scribd company logo
1 of 25
Download to read offline
MOA: Massive Online Analysis, a Framework for Stream
Classification and Clustering
Albert Bifet, Geoff Holmes, Bernhard Pfahringer,
Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl
University of Waikato
Hamilton, New Zealand
Data Management and Data Exploration Group
RWTH Aachen University, Germany
Cumberland Lodge, 2 September 2010
Workshop on Applications of Pattern Analysis 2010
Mining Massive Data
2007
Digital Universe: 281 exabytes (billion gigabytes)
The amount of information created exceeded available
storage for the first time
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
Twitter
106 million registered users
3 billion requests a day via its API.
2 / 21
Efficient Algorithms
Evolving Data Streams
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Stream Mining Algorithms
Fast methods without storing all dataset in memory
Traditional methods don’t deal with restrictions
3 / 21
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
Easy to extend
Easy to design and run experiments
4 / 21
WEKA
Waikato Environment for Knowledge Analysis
Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java
Released under the GPL
Support for the whole process of experimental data mining
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Used for education, research and applications
Complements “Data Mining” by Witten & Frank
5 / 21
WEKA: the bird
6 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
7 / 21
Data stream learning cycle
1 Process an example at a time,
and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point
8 / 21
Classification Experimental setting
9 / 21
Classification Experimental setting
Evaluation procedures for Data
Streams
Holdout
Interleaved Test-Then-Train or
Prequential
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
10 / 21
Classification Experimental setting
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Hyperplane
SEA Generator
STAGGER Generator
10 / 21
Classification Experimental setting
Classifiers
Naive Bayes
Decision stumps
Hoeffding Tree
Hoeffding Option Tree
Bagging and Boosting
ADWIN Bagging and
Leveraging Bagging
Prediction strategies
Majority class
Naive Bayes Leaves
Adaptive Hybrid
10 / 21
Clustering Experimental setting
11 / 21
Clustering Experimental setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.
12 / 21
Clustering Experimental setting
Clusterers
StreamKM++
CluStream
ClusTree
Den-Stream
D-Stream
CobWeb
13 / 21
Web
http://www.moa.cs.waikato.ac.nz
14 / 21
GUI
java -cp .:moa.jar:weka.jar
-javaagent:sizeofag.jar moa.gui.GUI
15 / 21
Command Line
EvaluatePeriodicHeldOutTest
java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar
moa.DoTask "EvaluatePeriodicHeldOutTest
-l DecisionStump -s generators.WaveformGenerator
-n 100000 -i 100000000 -f 1000000" > dsresult.csv
This command creates a comma separated values file:
training the DecisionStump classifier on the WaveformGenerator
data,
using the first 100 thousand examples for testing,
training on a total of 100 million examples, and
testing every one million examples:
16 / 21
Easy Design of a MOA classifier
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
double[] getVotesForInstance (Instance i)
17 / 21
Easy Design of a MOA clusterer
void resetLearningImpl ()
void trainOnInstanceImpl (Instance inst)
Clustering getClusteringResult()
18 / 21
Extensions of MOA
Multi-label Classification
Itemset Pattern Mining
Sequence Pattern Mining
19 / 21
Summary
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
http://www.moa.cs.waikato.ac.nz
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
MOA deals with evolving data streams
MOA is easy to use and extend
20 / 21
21 / 21

More Related Content

What's hot

Computer Fundamentals & Intro to C Programming module i
Computer Fundamentals & Intro to C Programming module iComputer Fundamentals & Intro to C Programming module i
Computer Fundamentals & Intro to C Programming module iAjit Nayak
 
Operating System - Monitors (Presentation)
Operating System - Monitors (Presentation)Operating System - Monitors (Presentation)
Operating System - Monitors (Presentation)Experts Desk
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Sensibiliser à la prévention des cyber-attaques
Sensibiliser à la prévention des cyber-attaquesSensibiliser à la prévention des cyber-attaques
Sensibiliser à la prévention des cyber-attaquesCap'Com
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computingRuben Tan
 
Critical Section Problem - Ramakrishna Reddy Bijjam
Critical Section Problem - Ramakrishna Reddy BijjamCritical Section Problem - Ramakrishna Reddy Bijjam
Critical Section Problem - Ramakrishna Reddy BijjamRamakrishna Reddy Bijjam
 
virtual memory management in multi processor mach os
virtual memory management in multi processor mach osvirtual memory management in multi processor mach os
virtual memory management in multi processor mach osAJAY KHARAT
 
OS UNIT 1 PPT.pptx
OS UNIT 1 PPT.pptxOS UNIT 1 PPT.pptx
OS UNIT 1 PPT.pptxPRABAVATHIH
 
Systems Analysis And Design 2
Systems Analysis And Design 2Systems Analysis And Design 2
Systems Analysis And Design 2MISY
 
Synchronization
SynchronizationSynchronization
SynchronizationSara shall
 
System Analysis and Design
System Analysis and DesignSystem Analysis and Design
System Analysis and DesignZakaria Hossain
 
Linux operating system ppt
Linux operating system pptLinux operating system ppt
Linux operating system pptAchyut Sinha
 
Introduction to System Calls
Introduction to System CallsIntroduction to System Calls
Introduction to System CallsVandana Salve
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit INANDINI SHARMA
 

What's hot (20)

Computer Fundamentals & Intro to C Programming module i
Computer Fundamentals & Intro to C Programming module iComputer Fundamentals & Intro to C Programming module i
Computer Fundamentals & Intro to C Programming module i
 
Operating System - Monitors (Presentation)
Operating System - Monitors (Presentation)Operating System - Monitors (Presentation)
Operating System - Monitors (Presentation)
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
SYNCHRONIZATION
SYNCHRONIZATIONSYNCHRONIZATION
SYNCHRONIZATION
 
Sensibiliser à la prévention des cyber-attaques
Sensibiliser à la prévention des cyber-attaquesSensibiliser à la prévention des cyber-attaques
Sensibiliser à la prévention des cyber-attaques
 
Consensus in distributed computing
Consensus in distributed computingConsensus in distributed computing
Consensus in distributed computing
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Critical Section Problem - Ramakrishna Reddy Bijjam
Critical Section Problem - Ramakrishna Reddy BijjamCritical Section Problem - Ramakrishna Reddy Bijjam
Critical Section Problem - Ramakrishna Reddy Bijjam
 
virtual memory management in multi processor mach os
virtual memory management in multi processor mach osvirtual memory management in multi processor mach os
virtual memory management in multi processor mach os
 
3 analysis and design overview
3 analysis and design overview3 analysis and design overview
3 analysis and design overview
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
OS UNIT 1 PPT.pptx
OS UNIT 1 PPT.pptxOS UNIT 1 PPT.pptx
OS UNIT 1 PPT.pptx
 
Critical section operating system
Critical section  operating systemCritical section  operating system
Critical section operating system
 
Systems Analysis And Design 2
Systems Analysis And Design 2Systems Analysis And Design 2
Systems Analysis And Design 2
 
Synchronization
SynchronizationSynchronization
Synchronization
 
System Analysis and Design
System Analysis and DesignSystem Analysis and Design
System Analysis and Design
 
Linux operating system ppt
Linux operating system pptLinux operating system ppt
Linux operating system ppt
 
Introduction to System Calls
Introduction to System CallsIntroduction to System Calls
Introduction to System Calls
 
Course outline of parallel and distributed computing
Course outline of parallel and distributed computingCourse outline of parallel and distributed computing
Course outline of parallel and distributed computing
 
Distributed system notes unit I
Distributed system notes unit IDistributed system notes unit I
Distributed system notes unit I
 

Viewers also liked

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2OSri Ambati
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalseSAT Journals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityJuho Kim
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesLighton Phiri
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013Doc Norton
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Daniele Dell'Aglio
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methodsweka Content
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimationjachno
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Beniamino Murgante
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An EpidemicDoc Norton
 

Viewers also liked (20)

Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Machine Learning at Progressive with H2O
Machine Learning at Progressive with H2OMachine Learning at Progressive with H2O
Machine Learning at Progressive with H2O
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
Learnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner ActivityLearnersourcing: Improving Learning with Collective Learner Activity
Learnersourcing: Improving Learning with Collective Learner Activity
 
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned ActivitiesPh.D. Research Update: Year#4 Annual Progress and Planned Activities
Ph.D. Research Update: Year#4 Annual Progress and Planned Activities
 
The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013The Technical Debt Trap - AgileIndy 2013
The Technical Debt Trap - AgileIndy 2013
 
Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016Summary of the Stream Reasoning workshop at ISWC 2016
Summary of the Stream Reasoning workshop at ISWC 2016
 
WEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic MethodsWEKA:Algorithms The Basic Methods
WEKA:Algorithms The Basic Methods
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
Let's Start An Epidemic
Let's Start An EpidemicLet's Start An Epidemic
Let's Start An Epidemic
 

Similar to Moa: Real Time Analytics for Data Streams

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Gilles Fedak
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...WSO2
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...Raffaele Montella
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance IssuesCloudera, Inc.
 

Similar to Moa: Real Time Analytics for Data Streams (20)

Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
Active Data: Managing Data-Life Cycle on Heterogeneous Systems and Infrastruc...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Hw09 Fingerpointing Sourcing Performance Issues
Hw09   Fingerpointing  Sourcing Performance IssuesHw09   Fingerpointing  Sourcing Performance Issues
Hw09 Fingerpointing Sourcing Performance Issues
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsAlbert Bifet
 

More from Albert Bifet (20)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Moa: Real Time Analytics for Data Streams

  • 1. MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen and Thomas Seidl University of Waikato Hamilton, New Zealand Data Management and Data Exploration Group RWTH Aachen University, Germany Cumberland Lodge, 2 September 2010 Workshop on Applications of Pattern Analysis 2010
  • 2. Mining Massive Data 2007 Digital Universe: 281 exabytes (billion gigabytes) The amount of information created exceeded available storage for the first time Eric Schmidt, August 2010 Every two days now we create as much information as we did from the dawn of civilization up until 2003. 5 exabytes of data Twitter 106 million registered users 3 billion requests a day via its API. 2 / 21
  • 3. Efficient Algorithms Evolving Data Streams Extract information from potentially infinite sequence of data possibly varying over time using few resources Stream Mining Algorithms Fast methods without storing all dataset in memory Traditional methods don’t deal with restrictions 3 / 21
  • 4. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering Easy to extend Easy to design and run experiments 4 / 21
  • 5. WEKA Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL Support for the whole process of experimental data mining Preparation of input data Statistical evaluation of learning schemes Visualization of input data and the result of learning Used for education, research and applications Complements “Data Mining” by Witten & Frank 5 / 21
  • 7. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 8. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 9. MOA: the bird The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 7 / 21
  • 10. Data stream learning cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 8 / 21
  • 12. Classification Experimental setting Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train or Prequential Environments Sensor Network: 100Kb Handheld Computer: 32 Mb Server: 400 Mb 10 / 21
  • 13. Classification Experimental setting Data Sources Random Tree Generator Random RBF Generator LED Generator Waveform Generator Hyperplane SEA Generator STAGGER Generator 10 / 21
  • 14. Classification Experimental setting Classifiers Naive Bayes Decision stumps Hoeffding Tree Hoeffding Option Tree Bagging and Boosting ADWIN Bagging and Leveraging Bagging Prediction strategies Majority class Naive Bayes Leaves Adaptive Hybrid 10 / 21
  • 16. Clustering Experimental setting Internal measures External measures Gamma Rand statistic C Index Jaccard coefficient Point-Biserial Folkes and Mallow Index Log Likelihood Hubert Γ statistics Dunn’s Index Minkowski score Tau Purity Tau A van Dongen criterion Tau C V-measure Somer’s Gamma Completeness Ratio of Repetition Homogeneity Modified Ratio of Repetition Variation of information Adjusted Ratio of Clustering Mutual information Fagan’s Index Class-based entropy Deviation Index Cluster-based entropy Z-Score Index Precision D Index Recall Silhouette coefficient F-measure Table: Internal and external clustering evaluation measures. 12 / 21
  • 20. Command Line EvaluatePeriodicHeldOutTest java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv This command creates a comma separated values file: training the DecisionStump classifier on the WaveformGenerator data, using the first 100 thousand examples for testing, training on a total of 100 million examples, and testing every one million examples: 16 / 21
  • 21. Easy Design of a MOA classifier void resetLearningImpl () void trainOnInstanceImpl (Instance inst) double[] getVotesForInstance (Instance i) 17 / 21
  • 22. Easy Design of a MOA clusterer void resetLearningImpl () void trainOnInstanceImpl (Instance inst) Clustering getClusteringResult() 18 / 21
  • 23. Extensions of MOA Multi-label Classification Itemset Pattern Mining Sequence Pattern Mining 19 / 21
  • 24. Summary {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. http://www.moa.cs.waikato.ac.nz It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: classification clustering MOA deals with evolving data streams MOA is easy to use and extend 20 / 21