SlideShare a Scribd company logo
1 of 9
Data Mining Using WEKA



         Submitted to
    Prof. Prithwis Mukerjee


        Submitted By
       Shikha Jayaswal




        17th April, 2012
Table of Contents

Objective ................................................................................................................................................4

WEKA......................................................................................................................................................4

   Running WEKA....................................................................................................................................4

Loading Datasets:...................................................................................................................................5

Linear Regression...................................................................................................................................7

   Model.................................................................................................................................................7

   Interpreting the Output......................................................................................................................7

Clustering................................................................................................................................................8

   Model.................................................................................................................................................8

   Interpreting the Output......................................................................................................................9
List of Figures:

Figure 1: Weka GUI Chooser...................................................................................................................4

Figure 2: Weka Explorer.........................................................................................................................5

Figure 3: Load Dataset............................................................................................................................6

Figure 4: Linear Regression.....................................................................................................................7

Figure 5: Clustering.................................................................................................................................8
Objective

Exhibit the use of WEKA in performing the following data mining tasks:

    •   Linear Regression.
    •   Clustering



WEKA

Weka is a data mining tool developed at the University of Waikato. It uses GNU general public
licenses and is freely available. It is implemented in the java programming language and has GUI for
loading data, running analysis and producing visualizations.

The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/
The version being used in the current analysis is 3.6.6.


Running WEKA


The following Weka GUI Chooser pops up on running weka:




Figure 1: Weka GUI Chooser




The Explorer button leads to the Weka Explorer window through which data could be loaded and be
used further for analysis.
Figure 2: Weka Explorer




Loading Datasets:

The file types supported are:

    •   Arff data files
    •   C4.5 data files
    •   Csv data files
    •   Libsvm data file
    •   Svm ligt data files
    •   Binary serialized data files
    •   Xrff data files


The data file being used for the study is:
Click “Open file..” >> select the file to be loaded and open it.




Figure 3: Load Dataset
Linear Regression
Model
Steps for creating the regression model:

   1. Click on the Classify tab.
   2. Click on the Choose button, in the window that opens up expand classifiers and then
      functions, select LinearRegression.
   3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the
      dropdown attributeSelectionMethod select No Attribute Selection, Click on OK.
   4. Check Use Training Set to use the loaded dataset.
   5. In the dropdown select Price/Unit as the dependent variable and click on the Start button.




   Figure 4: Linear Regression




Interpreting the Output


Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume
             -1.2524 * Region -2.1025 * Type + 24.8058
Clustering
Model
Steps for creating the clustering model:

    1. Click on the Cluster tab.
    2. Click on the Choose button, in the window that opens up expand clusterers, select EM.
    3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster
       attributes, Click on OK.
            a. -V Verbose.
            b. -N The number of clusters to generate. If omitted, EM will use cross validation to
                select the number of clusters automatically.
            c. -I Terminate after this many iterations if EM has not converged.
            d. -S Specify random number seed.
            e. -M Set the minimum allowable standard deviation for normal density calculation.
    4. Check Use Training Set to use the loaded dataset and click on the Start button.




Figure 5: Clustering
Interpreting the Output


The Clustered Instances:

   Cluster      Instances
      0           7(16%)
      1          14(31%)
      2          10(22%)
      3            3(%)
      4          11(24%)


The attributes of the clusters are:

 Cluster                                     0           1           2           3          4
 Attribute                                0.16         0.3         0.2        0.07       0.27
                      mean             34.1022    32.5883     39.1963     38.0867     30.9768
 Price/Unit           std. dev.         4.1176     1.2413      2.2264      1.0193      2.8369
                      mean            912.8122   499.9553    496.4343    856.6667    347.0964
 BTU/Hr               std. dev.       105.4301   159.6201    178.5667     57.9272    140.3392
                      mean             10.4966     5.6066      5.6444      9.5967      3.9301
 Weight lbs.          std. dev.         1.3785      1.848      2.0181      0.7312       1.559
                     mean               3.3643     3.9673      4.9873      4.8533      4.4754
 EER                 std. dev           0.2773     0.3885      0.3347      0.1586      0.3313
                     mean             180985.9   129223.9    71417.94       74000    92473.04
 Unit Volume         std. dev         239037.4   135545.2    45108.85     44639.3    85150.53
                     mean                    3     3.1226            4           5     4.8882
 Region              std. dev           0.8848     0.4794            0     0.8848       0.365
                     mean               1.1427           2           2     1.3333           2
 Type                std. dev           0.3497     0.3866      0.3866      0.4714      0.3866

More Related Content

Viewers also liked (6)

Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
 
Performance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in WekaPerformance analysis of Data Mining algorithms in Weka
Performance analysis of Data Mining algorithms in Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 

Similar to Weka_ITB

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012
RossBettinger
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docx
dickonsondorris
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
Pieter Van Zyl
 
edc_adaptivity
edc_adaptivityedc_adaptivity
edc_adaptivity
Ramin Zohouri
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
HariniMS1
 

Similar to Weka_ITB (20)

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012
 
thesis
thesisthesis
thesis
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docx
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi
 
Report
ReportReport
Report
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
 
edc_adaptivity
edc_adaptivityedc_adaptivity
edc_adaptivity
 
document
documentdocument
document
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 
Thesis
ThesisThesis
Thesis
 
Financial Data Mining Talk
Financial Data Mining TalkFinancial Data Mining Talk
Financial Data Mining Talk
 
AWS Cost Cheat Sheet
AWS Cost Cheat SheetAWS Cost Cheat Sheet
AWS Cost Cheat Sheet
 
data structures
data structuresdata structures
data structures
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2
 
ep08_11
ep08_11ep08_11
ep08_11
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on Steroids
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 
Big data-and-the-web
Big data-and-the-webBig data-and-the-web
Big data-and-the-web
 

Recently uploaded

Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
daisycvs
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
doktercalysta
 
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODFRATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
CaitlinCummins3
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
srcw2322l101
 

Recently uploaded (20)

HAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future ProspectsHAL Financial Performance Analysis and Future Prospects
HAL Financial Performance Analysis and Future Prospects
 
NFS- Operations Presentation - Recurrent
NFS- Operations Presentation - RecurrentNFS- Operations Presentation - Recurrent
NFS- Operations Presentation - Recurrent
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
 
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTARPEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
PEMATANG SIANTAR 0851/8063/4797 JUAL OBAT ABORSI CYTOTEC PEMATANG SIANTAR
 
1Q24_EN hyundai capital 1q performance
1Q24_EN   hyundai capital 1q performance1Q24_EN   hyundai capital 1q performance
1Q24_EN hyundai capital 1q performance
 
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODFRATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
RATINGS OF EACH VIDEO FOR UNI PROJECT IWDSFODF
 
Hyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings releaseHyundai capital 2024 1q Earnings release
Hyundai capital 2024 1q Earnings release
 
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdfProgress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
Progress Report - UKG Analyst Summit 2024 - A lot to do - Good Progress1-1.pdf
 
Your Work Matters to God RestorationChurch.pptx
Your Work Matters to God RestorationChurch.pptxYour Work Matters to God RestorationChurch.pptx
Your Work Matters to God RestorationChurch.pptx
 
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptxBlinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
Blinkit: Revolutionizing the On-Demand Grocery Delivery Service.pptx
 
Series A Fundraising Guide (Investing Individuals Improving Our World) by Accion
Series A Fundraising Guide (Investing Individuals Improving Our World) by AccionSeries A Fundraising Guide (Investing Individuals Improving Our World) by Accion
Series A Fundraising Guide (Investing Individuals Improving Our World) by Accion
 
How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?How Do Venture Capitalists Make Decisions?
How Do Venture Capitalists Make Decisions?
 
Global Internal Audit Standards 2024.pdf
Global Internal Audit Standards 2024.pdfGlobal Internal Audit Standards 2024.pdf
Global Internal Audit Standards 2024.pdf
 
Copyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to KnowCopyright: What Creators and Users of Art Need to Know
Copyright: What Creators and Users of Art Need to Know
 
Stages of Startup Funding - An Explainer
Stages of Startup Funding - An ExplainerStages of Startup Funding - An Explainer
Stages of Startup Funding - An Explainer
 
Raising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE VenturesRaising Seed Capital by Steve Schlafman at RRE Ventures
Raising Seed Capital by Steve Schlafman at RRE Ventures
 
Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)Inside the Black Box of Venture Capital (VC)
Inside the Black Box of Venture Capital (VC)
 
What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...What is paper chromatography, principal, procedure,types, diagram, advantages...
What is paper chromatography, principal, procedure,types, diagram, advantages...
 

Weka_ITB

  • 1. Data Mining Using WEKA Submitted to Prof. Prithwis Mukerjee Submitted By Shikha Jayaswal 17th April, 2012
  • 2. Table of Contents Objective ................................................................................................................................................4 WEKA......................................................................................................................................................4 Running WEKA....................................................................................................................................4 Loading Datasets:...................................................................................................................................5 Linear Regression...................................................................................................................................7 Model.................................................................................................................................................7 Interpreting the Output......................................................................................................................7 Clustering................................................................................................................................................8 Model.................................................................................................................................................8 Interpreting the Output......................................................................................................................9
  • 3. List of Figures: Figure 1: Weka GUI Chooser...................................................................................................................4 Figure 2: Weka Explorer.........................................................................................................................5 Figure 3: Load Dataset............................................................................................................................6 Figure 4: Linear Regression.....................................................................................................................7 Figure 5: Clustering.................................................................................................................................8
  • 4. Objective Exhibit the use of WEKA in performing the following data mining tasks: • Linear Regression. • Clustering WEKA Weka is a data mining tool developed at the University of Waikato. It uses GNU general public licenses and is freely available. It is implemented in the java programming language and has GUI for loading data, running analysis and producing visualizations. The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/ The version being used in the current analysis is 3.6.6. Running WEKA The following Weka GUI Chooser pops up on running weka: Figure 1: Weka GUI Chooser The Explorer button leads to the Weka Explorer window through which data could be loaded and be used further for analysis.
  • 5. Figure 2: Weka Explorer Loading Datasets: The file types supported are: • Arff data files • C4.5 data files • Csv data files • Libsvm data file • Svm ligt data files • Binary serialized data files • Xrff data files The data file being used for the study is:
  • 6. Click “Open file..” >> select the file to be loaded and open it. Figure 3: Load Dataset
  • 7. Linear Regression Model Steps for creating the regression model: 1. Click on the Classify tab. 2. Click on the Choose button, in the window that opens up expand classifiers and then functions, select LinearRegression. 3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the dropdown attributeSelectionMethod select No Attribute Selection, Click on OK. 4. Check Use Training Set to use the loaded dataset. 5. In the dropdown select Price/Unit as the dependent variable and click on the Start button. Figure 4: Linear Regression Interpreting the Output Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume -1.2524 * Region -2.1025 * Type + 24.8058
  • 8. Clustering Model Steps for creating the clustering model: 1. Click on the Cluster tab. 2. Click on the Choose button, in the window that opens up expand clusterers, select EM. 3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster attributes, Click on OK. a. -V Verbose. b. -N The number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically. c. -I Terminate after this many iterations if EM has not converged. d. -S Specify random number seed. e. -M Set the minimum allowable standard deviation for normal density calculation. 4. Check Use Training Set to use the loaded dataset and click on the Start button. Figure 5: Clustering
  • 9. Interpreting the Output The Clustered Instances: Cluster Instances 0 7(16%) 1 14(31%) 2 10(22%) 3 3(%) 4 11(24%) The attributes of the clusters are: Cluster 0 1 2 3 4 Attribute 0.16 0.3 0.2 0.07 0.27 mean 34.1022 32.5883 39.1963 38.0867 30.9768 Price/Unit std. dev. 4.1176 1.2413 2.2264 1.0193 2.8369 mean 912.8122 499.9553 496.4343 856.6667 347.0964 BTU/Hr std. dev. 105.4301 159.6201 178.5667 57.9272 140.3392 mean 10.4966 5.6066 5.6444 9.5967 3.9301 Weight lbs. std. dev. 1.3785 1.848 2.0181 0.7312 1.559 mean 3.3643 3.9673 4.9873 4.8533 4.4754 EER std. dev 0.2773 0.3885 0.3347 0.1586 0.3313 mean 180985.9 129223.9 71417.94 74000 92473.04 Unit Volume std. dev 239037.4 135545.2 45108.85 44639.3 85150.53 mean 3 3.1226 4 5 4.8882 Region std. dev 0.8848 0.4794 0 0.8848 0.365 mean 1.1427 2 2 1.3333 2 Type std. dev 0.3497 0.3866 0.3866 0.4714 0.3866