SlideShare a Scribd company logo
1 of 47
Download to read offline
WEKA: A Machine
                     Machine Learning with                                      Learning Toolkit
                           WEKA                                                 The Explorer
                                                                                •   Classification and
                                                                                    Regression
                                                                                •   Clustering
                                  Eibe Frank                                    •   Association Rules
                                                                                •   Attribute Selection
                          Department of Computer Science,
                         University of Waikato, New Zealand                     •   Data Visualization
                                                                                The Experimenter
                                                                                The Knowledge
                                                                                Flow GUI
                                                                                Conclusions




                    WEKA: the bird




                                        Copyright: Martin Kramer (mkramer@wxs.nl)
                   2/4/2004                             University of Waikato                             2




Machine Learning for Data Mining                                                                              1
WEKA: the software
                       Machine learning/data mining software written in
                       Java (distributed under the GNU Public License)
                       Used for research, education, and applications
                       Complements “Data Mining” by Witten & Frank
                       Main features:
                              Comprehensive set of data pre-processing tools,
                              learning algorithms and evaluation methods
                              Graphical user interfaces (incl. data visualization)
                              Environment for comparing learning algorithms
                   2/4/2004                        University of Waikato             3




                    WEKA: versions
                       There are several versions of WEKA:
                              WEKA 3.0: “book version” compatible with
                              description in data mining book
                              WEKA 3.2: “GUI version” adds graphical user
                              interfaces (book version is command-line only)
                              WEKA 3.3: “development version” with lots of
                              improvements
                       This talk is based on the latest snapshot of WEKA
                       3.3 (soon to be WEKA 3.4)

                   2/4/2004                        University of Waikato             4




Machine Learning for Data Mining                                                         2
WEKA only deals with “flat” files
                    @relation heart-disease-simplified

                    @attribute age numeric
                    @attribute sex { female, male}
                    @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
                    @attribute cholesterol numeric
                    @attribute exercise_induced_angina { no, yes}
                    @attribute class { present, not_present}

                    @data
                    63,male,typ_angina,233,no,not_present
                    67,male,asympt,286,yes,present
                    67,male,asympt,229,yes,present
                    38,female,non_anginal,?,no,not_present
                    ...
                   2/4/2004                              University of Waikato                   5




                    WEKA only deals with “flat” files
                    @relation heart-disease-simplified

                    @attribute age numeric
                    @attribute sex { female, male}
                    @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
                    @attribute cholesterol numeric
                    @attribute exercise_induced_angina { no, yes}
                    @attribute class { present, not_present}

                    @data
                    63,male,typ_angina,233,no,not_present
                    67,male,asympt,286,yes,present
                    67,male,asympt,229,yes,present
                    38,female,non_anginal,?,no,not_present
                    ...
                   2/4/2004                              University of Waikato                   6




Machine Learning for Data Mining                                                                     3
2/4/2004        University of Waikato   7




                   2/4/2004        University of Waikato   8




Machine Learning for Data Mining                               4
2/4/2004                       University of Waikato           9




                    Explorer: pre-processing the data
                       Data can be imported from a file in various
                       formats: ARFF, CSV, C4.5, binary
                       Data can also be read from a URL or from an SQL
                       database (using JDBC)
                       Pre-processing tools in WEKA are called “filters”
                       WEKA contains filters for:
                              Discretization, normalization, resampling, attribute
                              selection, transforming and combining attributes, …


                   2/4/2004                       University of Waikato          10




Machine Learning for Data Mining                                                      5
2/4/2004        University of Waikato   11




                   2/4/2004        University of Waikato   12




Machine Learning for Data Mining                                6
2/4/2004        University of Waikato   13




                   2/4/2004        University of Waikato   14




Machine Learning for Data Mining                                7
2/4/2004        University of Waikato   15




                   2/4/2004        University of Waikato   16




Machine Learning for Data Mining                                8
2/4/2004        University of Waikato   17




                   2/4/2004        University of Waikato   18




Machine Learning for Data Mining                                9
2/4/2004        University of Waikato   19




                   2/4/2004        University of Waikato   20




Machine Learning for Data Mining                                10
2/4/2004        University of Waikato   21




                   2/4/2004        University of Waikato   22




Machine Learning for Data Mining                                11
2/4/2004        University of Waikato   23




                   2/4/2004        University of Waikato   24




Machine Learning for Data Mining                                12
2/4/2004        University of Waikato   25




                   2/4/2004        University of Waikato   26




Machine Learning for Data Mining                                13
2/4/2004        University of Waikato   27




                   2/4/2004        University of Waikato   28




Machine Learning for Data Mining                                14
2/4/2004        University of Waikato   29




                   2/4/2004        University of Waikato   30




Machine Learning for Data Mining                                15
2/4/2004                        University of Waikato          31




                    Explorer: building “classifiers”
                       Classifiers in WEKA are models for predicting
                       nominal or numeric quantities
                       Implemented learning schemes include:
                              Decision trees and lists, instance-based classifiers,
                              support vector machines, multi-layer perceptrons,
                              logistic regression, Bayes’ nets, …
                       “Meta”-classifiers include:
                              Bagging, boosting, stacking, error-correcting output
                              codes, locally weighted learning, …

                   2/4/2004                        University of Waikato          32




Machine Learning for Data Mining                                                       16
2/4/2004        University of Waikato   33




                   2/4/2004        University of Waikato   34




Machine Learning for Data Mining                                17
2/4/2004        University of Waikato   35




                   2/4/2004        University of Waikato   36




Machine Learning for Data Mining                                18
2/4/2004        University of Waikato   37




                   2/4/2004        University of Waikato   38




Machine Learning for Data Mining                                19
2/4/2004                     University of Waikato       53




                    Explorer: clustering data
                       WEKA contains “clusterers” for finding groups of
                       similar instances in a dataset
                       Implemented schemes are:
                              k-Means, EM, Cobweb, X-means, FarthestFirst
                       Clusters can be visualized and compared to “true”
                       clusters (if given)
                       Evaluation based on loglikelihood if clustering
                       scheme produces a probability distribution


                   2/4/2004                     University of Waikato       92




Machine Learning for Data Mining                                                 20
Explorer: finding associations
                       WEKA contains an implementation of the Apriori
                       algorithm for learning association rules
                              Works only with discrete data
                       Can identify statistical dependencies between
                       groups of attributes:
                              milk, butter ⇒ bread, eggs (with confidence 0.9 and
                              support 2000)
                       Apriori can compute all rules that have a given
                       minimum support and exceed a given confidence

                   2/4/2004                       University of Waikato        108




                   2/4/2004                       University of Waikato        109




Machine Learning for Data Mining                                                     21
2/4/2004        University of Waikato   110




                   2/4/2004        University of Waikato   111




Machine Learning for Data Mining                                 22
2/4/2004        University of Waikato   112




                   2/4/2004        University of Waikato   113




Machine Learning for Data Mining                                 23
2/4/2004        University of Waikato   114




                   2/4/2004        University of Waikato   115




Machine Learning for Data Mining                                 24
Explorer: attribute selection
                       Panel that can be used to investigate which
                       (subsets of) attributes are the most predictive ones
                       Attribute selection methods contain two parts:
                              A search method: best-first, forward selection,
                              random, exhaustive, genetic algorithm, ranking
                              An evaluation method: correlation-based, wrapper,
                              information gain, chi-squared, …
                       Very flexible: WEKA allows (almost) arbitrary
                       combinations of these two

                   2/4/2004                      University of Waikato        116




                   2/4/2004                      University of Waikato        117




Machine Learning for Data Mining                                                    25
2/4/2004        University of Waikato   118




                   2/4/2004        University of Waikato   119




Machine Learning for Data Mining                                 26
2/4/2004        University of Waikato   120




                   2/4/2004        University of Waikato   121




Machine Learning for Data Mining                                 27
2/4/2004        University of Waikato   122




                   2/4/2004        University of Waikato   123




Machine Learning for Data Mining                                 28
2/4/2004                        University of Waikato         124




                    Explorer: data visualization
                       Visualization very useful in practice: e.g. helps to
                       determine difficulty of the learning problem
                       WEKA can visualize single attributes (1-d) and
                       pairs of attributes (2-d)
                              To do: rotating 3-d visualizations (Xgobi-style)
                       Color-coded class values
                       “Jitter” option to deal with nominal attributes (and
                       to detect “hidden” data points)
                       “Zoom-in” function
                   2/4/2004                        University of Waikato         125




Machine Learning for Data Mining                                                       29
2/4/2004        University of Waikato   126




                   2/4/2004        University of Waikato   127




Machine Learning for Data Mining                                 30
2/4/2004        University of Waikato   128




                   2/4/2004        University of Waikato   129




Machine Learning for Data Mining                                 31
2/4/2004        University of Waikato   130




                   2/4/2004        University of Waikato   131




Machine Learning for Data Mining                                 32
2/4/2004        University of Waikato   132




                   2/4/2004        University of Waikato   133




Machine Learning for Data Mining                                 33
2/4/2004        University of Waikato   134




                   2/4/2004        University of Waikato   135




Machine Learning for Data Mining                                 34
2/4/2004        University of Waikato   136




                   2/4/2004        University of Waikato   137




Machine Learning for Data Mining                                 35
Performing experiments
                       Experimenter makes it easy to compare the
                       performance of different learning schemes
                       For classification and regression problems
                       Results can be written into file or database
                       Evaluation options: cross-validation, learning
                       curve, hold-out
                       Can also iterate over different parameter settings
                       Significance-testing built in!

                   2/4/2004                   University of Waikato          138




                    The Knowledge Flow GUI
                       New graphical user interface for WEKA
                       Java-Beans-based interface for setting up and
                       running machine learning experiments
                       Data sources, classifiers, etc. are beans and can
                       be connected graphically
                       Data “flows” through components: e.g.,
                       “data source” -> “filter” -> “classifier” -> “evaluator”
                       Layouts can be saved and loaded again later

                   2/4/2004                   University of Waikato          152




Machine Learning for Data Mining                                                   36
2/4/2004        University of Waikato   153




                   2/4/2004        University of Waikato   154




Machine Learning for Data Mining                                 37
2/4/2004        University of Waikato   155




                   2/4/2004        University of Waikato   156




Machine Learning for Data Mining                                 38
2/4/2004        University of Waikato   157




                   2/4/2004        University of Waikato   158




Machine Learning for Data Mining                                 39
2/4/2004        University of Waikato   159




                   2/4/2004        University of Waikato   160




Machine Learning for Data Mining                                 40
2/4/2004        University of Waikato   161




                   2/4/2004        University of Waikato   162




Machine Learning for Data Mining                                 41
2/4/2004        University of Waikato   163




                   2/4/2004        University of Waikato   164




Machine Learning for Data Mining                                 42
2/4/2004        University of Waikato   165




                   2/4/2004        University of Waikato   166




Machine Learning for Data Mining                                 43
2/4/2004        University of Waikato   167




                   2/4/2004        University of Waikato   168




Machine Learning for Data Mining                                 44
2/4/2004        University of Waikato   169




                   2/4/2004        University of Waikato   170




Machine Learning for Data Mining                                 45
2/4/2004        University of Waikato   171




                   2/4/2004        University of Waikato   172




Machine Learning for Data Mining                                 46
Conclusion: try it yourself!
                       WEKA is available at
                          http://www.cs.waikato.ac.nz/ml/weka
                       Also has a list of projects based on WEKA
                       WEKA contributors:
                       Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
                       Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
                       ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
                       Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
                       Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
                       Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang

                   2/4/2004                         University of Waikato                  173




Machine Learning for Data Mining                                                                 47

More Related Content

Similar to Microsoft PowerPoint - weka [Read-Only]

Weka a tool_for_exploratory_data_mining
Weka a tool_for_exploratory_data_miningWeka a tool_for_exploratory_data_mining
Weka a tool_for_exploratory_data_mining
Tony Frame
 
Advancing the fourth paradigm of research: Assimilating repositories into act...
Advancing the fourth paradigm of research: Assimilating repositories into act...Advancing the fourth paradigm of research: Assimilating repositories into act...
Advancing the fourth paradigm of research: Assimilating repositories into act...
Tyler Walters
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
butest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
butest
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
butest
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
butest
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
Saeed Iqbal
 

Similar to Microsoft PowerPoint - weka [Read-Only] (20)

Weka a tool_for_exploratory_data_mining
Weka a tool_for_exploratory_data_miningWeka a tool_for_exploratory_data_mining
Weka a tool_for_exploratory_data_mining
 
Wekatutorial
WekatutorialWekatutorial
Wekatutorial
 
Wek1
Wek1Wek1
Wek1
 
weka-tutorial-all.ppt
weka-tutorial-all.pptweka-tutorial-all.ppt
weka-tutorial-all.ppt
 
Weka
WekaWeka
Weka
 
Advancing the fourth paradigm of research: Assimilating repositories into act...
Advancing the fourth paradigm of research: Assimilating repositories into act...Advancing the fourth paradigm of research: Assimilating repositories into act...
Advancing the fourth paradigm of research: Assimilating repositories into act...
 
A simple introduction to weka
A simple introduction to wekaA simple introduction to weka
A simple introduction to weka
 
Weka
WekaWeka
Weka
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Shraddha weka
Shraddha wekaShraddha weka
Shraddha weka
 
Shraddha weka
Shraddha wekaShraddha weka
Shraddha weka
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Weka-Presentation.ppt
Weka-Presentation.ppt Weka-Presentation.ppt
Weka-Presentation.ppt
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
LAK 2013: Open academic analytics initiative - Initial research findings
LAK 2013: Open academic analytics initiative - Initial research findingsLAK 2013: Open academic analytics initiative - Initial research findings
LAK 2013: Open academic analytics initiative - Initial research findings
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Microsoft PowerPoint - weka [Read-Only]

  • 1. WEKA: A Machine Machine Learning with Learning Toolkit WEKA The Explorer • Classification and Regression • Clustering Eibe Frank • Association Rules • Attribute Selection Department of Computer Science, University of Waikato, New Zealand • Data Visualization The Experimenter The Knowledge Flow GUI Conclusions WEKA: the bird Copyright: Martin Kramer (mkramer@wxs.nl) 2/4/2004 University of Waikato 2 Machine Learning for Data Mining 1
  • 2. WEKA: the software Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms 2/4/2004 University of Waikato 3 WEKA: versions There are several versions of WEKA: WEKA 3.0: “book version” compatible with description in data mining book WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) WEKA 3.3: “development version” with lots of improvements This talk is based on the latest snapshot of WEKA 3.3 (soon to be WEKA 3.4) 2/4/2004 University of Waikato 4 Machine Learning for Data Mining 2
  • 3. WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 5 WEKA only deals with “flat” files @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... 2/4/2004 University of Waikato 6 Machine Learning for Data Mining 3
  • 4. 2/4/2004 University of Waikato 7 2/4/2004 University of Waikato 8 Machine Learning for Data Mining 4
  • 5. 2/4/2004 University of Waikato 9 Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 2/4/2004 University of Waikato 10 Machine Learning for Data Mining 5
  • 6. 2/4/2004 University of Waikato 11 2/4/2004 University of Waikato 12 Machine Learning for Data Mining 6
  • 7. 2/4/2004 University of Waikato 13 2/4/2004 University of Waikato 14 Machine Learning for Data Mining 7
  • 8. 2/4/2004 University of Waikato 15 2/4/2004 University of Waikato 16 Machine Learning for Data Mining 8
  • 9. 2/4/2004 University of Waikato 17 2/4/2004 University of Waikato 18 Machine Learning for Data Mining 9
  • 10. 2/4/2004 University of Waikato 19 2/4/2004 University of Waikato 20 Machine Learning for Data Mining 10
  • 11. 2/4/2004 University of Waikato 21 2/4/2004 University of Waikato 22 Machine Learning for Data Mining 11
  • 12. 2/4/2004 University of Waikato 23 2/4/2004 University of Waikato 24 Machine Learning for Data Mining 12
  • 13. 2/4/2004 University of Waikato 25 2/4/2004 University of Waikato 26 Machine Learning for Data Mining 13
  • 14. 2/4/2004 University of Waikato 27 2/4/2004 University of Waikato 28 Machine Learning for Data Mining 14
  • 15. 2/4/2004 University of Waikato 29 2/4/2004 University of Waikato 30 Machine Learning for Data Mining 15
  • 16. 2/4/2004 University of Waikato 31 Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, … 2/4/2004 University of Waikato 32 Machine Learning for Data Mining 16
  • 17. 2/4/2004 University of Waikato 33 2/4/2004 University of Waikato 34 Machine Learning for Data Mining 17
  • 18. 2/4/2004 University of Waikato 35 2/4/2004 University of Waikato 36 Machine Learning for Data Mining 18
  • 19. 2/4/2004 University of Waikato 37 2/4/2004 University of Waikato 38 Machine Learning for Data Mining 19
  • 20. 2/4/2004 University of Waikato 53 Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 2/4/2004 University of Waikato 92 Machine Learning for Data Mining 20
  • 21. Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter ⇒ bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 2/4/2004 University of Waikato 108 2/4/2004 University of Waikato 109 Machine Learning for Data Mining 21
  • 22. 2/4/2004 University of Waikato 110 2/4/2004 University of Waikato 111 Machine Learning for Data Mining 22
  • 23. 2/4/2004 University of Waikato 112 2/4/2004 University of Waikato 113 Machine Learning for Data Mining 23
  • 24. 2/4/2004 University of Waikato 114 2/4/2004 University of Waikato 115 Machine Learning for Data Mining 24
  • 25. Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 2/4/2004 University of Waikato 116 2/4/2004 University of Waikato 117 Machine Learning for Data Mining 25
  • 26. 2/4/2004 University of Waikato 118 2/4/2004 University of Waikato 119 Machine Learning for Data Mining 26
  • 27. 2/4/2004 University of Waikato 120 2/4/2004 University of Waikato 121 Machine Learning for Data Mining 27
  • 28. 2/4/2004 University of Waikato 122 2/4/2004 University of Waikato 123 Machine Learning for Data Mining 28
  • 29. 2/4/2004 University of Waikato 124 Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 2/4/2004 University of Waikato 125 Machine Learning for Data Mining 29
  • 30. 2/4/2004 University of Waikato 126 2/4/2004 University of Waikato 127 Machine Learning for Data Mining 30
  • 31. 2/4/2004 University of Waikato 128 2/4/2004 University of Waikato 129 Machine Learning for Data Mining 31
  • 32. 2/4/2004 University of Waikato 130 2/4/2004 University of Waikato 131 Machine Learning for Data Mining 32
  • 33. 2/4/2004 University of Waikato 132 2/4/2004 University of Waikato 133 Machine Learning for Data Mining 33
  • 34. 2/4/2004 University of Waikato 134 2/4/2004 University of Waikato 135 Machine Learning for Data Mining 34
  • 35. 2/4/2004 University of Waikato 136 2/4/2004 University of Waikato 137 Machine Learning for Data Mining 35
  • 36. Performing experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, hold-out Can also iterate over different parameter settings Significance-testing built in! 2/4/2004 University of Waikato 138 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data “flows” through components: e.g., “data source” -> “filter” -> “classifier” -> “evaluator” Layouts can be saved and loaded again later 2/4/2004 University of Waikato 152 Machine Learning for Data Mining 36
  • 37. 2/4/2004 University of Waikato 153 2/4/2004 University of Waikato 154 Machine Learning for Data Mining 37
  • 38. 2/4/2004 University of Waikato 155 2/4/2004 University of Waikato 156 Machine Learning for Data Mining 38
  • 39. 2/4/2004 University of Waikato 157 2/4/2004 University of Waikato 158 Machine Learning for Data Mining 39
  • 40. 2/4/2004 University of Waikato 159 2/4/2004 University of Waikato 160 Machine Learning for Data Mining 40
  • 41. 2/4/2004 University of Waikato 161 2/4/2004 University of Waikato 162 Machine Learning for Data Mining 41
  • 42. 2/4/2004 University of Waikato 163 2/4/2004 University of Waikato 164 Machine Learning for Data Mining 42
  • 43. 2/4/2004 University of Waikato 165 2/4/2004 University of Waikato 166 Machine Learning for Data Mining 43
  • 44. 2/4/2004 University of Waikato 167 2/4/2004 University of Waikato 168 Machine Learning for Data Mining 44
  • 45. 2/4/2004 University of Waikato 169 2/4/2004 University of Waikato 170 Machine Learning for Data Mining 45
  • 46. 2/4/2004 University of Waikato 171 2/4/2004 University of Waikato 172 Machine Learning for Data Mining 46
  • 47. Conclusion: try it yourself! WEKA is available at http://www.cs.waikato.ac.nz/ml/weka Also has a list of projects based on WEKA WEKA contributors: Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang 2/4/2004 University of Waikato 173 Machine Learning for Data Mining 47