SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR




 Data Mining using Weka
A Paper on Data Mining techniques using Weka
                  software



                        MBA 2010-2012


           IT FOR BUSINESS INTELLIGENCE – TERM PAPER

             INSTRUCTOR – PROF. PRITHWIS MUKERJEE




                                                         SUBMITTED BY
                                                       SATHISHWARAN.R
                                                            10BM60079
                                                         MBA 2010-2012
Data Mining using WEKA                      2



Table of Contents
  1. INTRODUCTION ......................................................................................................................... 3
  2. CLASSIFICATION......................................................................................................................... 3
       2.1 DATA.................................................................................................................................... 3
       2.2 SCREENS .............................................................................................................................. 3
       2.3 OUTPUT ............................................................................................................................... 6
       2.4 INTERPRETATION ................................................................................................................ 7
  3. ASSOCIATION RULES ................................................................................................................. 7
       3.1 DATA.................................................................................................................................... 7
       3.2 SCREENS .............................................................................................................................. 8
       3.3 OUTPUT ............................................................................................................................. 10
       3.4 INTERPRETATION .............................................................................................................. 12
  4. REFERNCES............................................................................................................................... 12
Data Mining using WEKA       3


1. INTRODUCTION

Widespread usage of computers has made life easier for business executives. However it has led
to the proliferation of data which had made it difficult to comprehend meaning out of it. The
amount of data that is generated in the world today had made decision making difficult. Data
mining is one approach that identifies the patterns in data and helps in making decisions by
analysing this huge data ocean. Weka (Waikato Environment for Knowledge Analysis) is free
software developed at university of Waikato in New Zealand and is available under the General
Public License. The software can be used for research, education and applications. It has a GUI
interface and comprehensive set of tools for analysing data. In this paper I have worked on data
mining techniques using the Weka software.


2. CLASSIFICATION

2.1 Data

The raw data used for this analysis has been obtained from website: http://tunedit.org/ and it
has been originally gathered from census data. There are 14 original attributes (features)
include age, work class, education, education, marital status, occupation, native country, etc. It
contains continuous, binary and categorical features. I have used the data for a two-class
classification problem. The task is to discover high revenue people from the census data and
also to make sure whether the data has been classified correctly by cross validation.

Link: http://tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff

2.2 Screens

Step 1: Launch Weka
Data Mining using WEKA   4


Step 2: Click Explorer




Step 3: Click Open file
Data Mining using WEKA   5


Step 4: Data updated in Weka




Step 4: Click Cross Validation and Decision Table. Click Start
Data Mining using WEKA       6


2.3 Output

Cross-validation

       === Run information ===

       Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst -
       D 1 -N 5"
       Relation: ADA_Prior
       Instances: 4147
       Attributes: 15
              age
              workclass
              fnlwgt
              education
              educationNum
              maritalStatus
              occupation
              relationship
              race
              sex
              capitalGain
              capitalLoss
              hoursPerWeek
              nativeCountry
              label
       Test mode:10-fold cross-validation

       === Classifier model (full training set) ===

       Decision Table:

       Number of training instances: 4147
       Number of Rules: 130
       Non matches covered by Majority class.
              Best first.
              Start set: no attributes
              Search direction: forward
              Stale search after 5 node expansions
              Total number of subsets evaluated: 96
              Merit of best subset found: 83.82
       Evaluation (for feature selection): CV (leave one out)
       Feature set: 5, 8,11,12,15

       Time taken to build model: 0.98 seconds

       === Stratified cross-validation ===
Data Mining using WEKA        7


       === Summary ===

       Correctly Classified Instances     3461      83.4579 %
       Incorrectly Classified Instances    686      16.5421 %
       Kappa statistic              0.5073
       Mean absolute error              0.2353
       Root mean squared error             0.339
       Relative absolute error          63.0518 %
       Root relative squared error        78.4907 %
       Total Number of Instances         4147

       === Detailed Accuracy By Class ===

             TP Rate      FP Rate Precision Recall F-Measure ROC Area Class
              0.939       0.483 0.855 0.939 0.895 0.873 -1
              0.517       0.061 0.738 0.517 0.608 0.873 1
       Weighted Avg.      0.835 0.378 0.826 0.835 0.824 0.873

       === Confusion Matrix ===

            a b <-- classified as
           2929 189 | a = -1
           497 532 | b = 1

2.4 Interpretation

      There are 83.45 % correctly classified instances and 16.54 % incorrectly classified
       instances.
      Classifier accuracy is 54.73 % from the kappa statistic
      The forecast error is got from the mean absolute error is 0.339
      3461 instances have been classified correctly and 686 instances have been classified
       incorrectly.

3. ASSOCIATION RULES


3.1 Data

The data set includes votes for each of the U.S. House of Representatives Congressmen on the 16
key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for,
and announced for (these three simplified to yea), voted against, paired against, and announced
against (these three simplified to nay), voted present, voted present to avoid conflict of interest,
and did not vote or otherwise make a position known (these three simplified to an unknown
disposition).

       Number of Instances: 435 (267 democrats, 168 republicans)
       Number of Attributes: 16 + class name = 17 (all Boolean valued)
Data Mining using WEKA   8


Attribute Information:

      Class Name: 2 (democrat, republican)
      handicapped-infants: 2 (y,n)
      water-project-cost-sharing: 2 (y,n)
      adoption-of-the-budget-resolution: 2 (y,n)
      physician-fee-freeze: 2 (y,n)
      el-salvador-aid: 2 (y,n)
      religious-groups-in-schools: 2 (y,n)
      anti-satellite-test-ban: 2 (y,n)
      aid-to-nicaraguan-contras: 2 (y,n)
      mx-missile: 2 (y,n)
      immigration: 2 (y,n)
      synfuels-corporation-cutback: 2 (y,n)
      education-spending: 2 (y,n)
      superfund-right-to-sue: 2 (y,n)
      crime: 2 (y,n)
      duty-free-exports: 2 (y,n)
      export-administration-act-south-africa: 2 (y,n)

Link: http://tunedit.org/repo/UCI/vote.arff

3.2 Screens

Step 1: Launch Weka
Data Mining using WEKA   9


Step 2: Click Explorer




Step 3: Click Open file… and choose respective file
Data Mining using WEKA   10


Step 4: Click Associate and choose Apriori




Step 5: Click Start




3.3 Output

=== Run information ===
Scheme:     weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: vote
Instances: 435
Attributes: 17
       handicapped-infants
Data Mining using WEKA     11


      water-project-cost-sharing
      adoption-of-the-budget-resolution
      physician-fee-freeze
      el-salvador-aid
      religious-groups-in-schools
      anti-satellite-test-ban
      aid-to-nicaraguan-contras
      mx-missile
      immigration
      synfuels-corporation-cutback
      education-spending
      superfund-right-to-sue
      crime
      duty-free-exports
      export-administration-act-south-africa
      Class
=== Associator model (full training set) ===

Apriori
=======

Minimum support: 0.45 (196 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 11

Generated sets of large itemsets:

Size of set of large itemsets L(1): 20
Size of set of large itemsets L(2): 17
Size of set of large itemsets L(3): 6
Size of set of large itemsets L(4): 1

Best rules found:

1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219
conf:(1)
2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y
198 ==> Class=democrat 198 conf:(1)
3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1)
4. physician-fee-freeze=n education-spending=n 202 ==> Class=democrat 201 conf:(1)
5. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99)
6. el-salvador-aid=n Class=democrat 200 ==> aid-to-nicaraguan-contras=y 197 conf:(0.99)
7. el-salvador-aid=n 208 ==> aid-to-nicaraguan-contras=y 204 conf:(0.98)
8. adoption-of-the-budget-resolution=y aid-to-nicaraguan-contras=y Class=democrat 203 ==>
physician-fee-freeze=n 198 conf:(0.98)
9. el-salvador-aid=n aid-to-nicaraguan-contras=y 204 ==> Class=democrat 197 conf:(0.97)
Data Mining using WEKA     12


10. aid-to-nicaraguan-contras=y Class=democrat 218 ==> physician-fee-freeze=n 210
conf:(0.96)

3.4 Interpretation

Association rules have been formed by apriori association as they can be seen from the output.

4. REFERENCES:

      Book: Data Mining – Practical Machine Learning Tools and Techniques, Ian H. Witten,
       Eibe Frank, Mark A. Hall

      http://www.cs.waikato.ac.nz/ml/weka/

      http://www.tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff

      http://tunedit.org/repo/UCI/vote.arff

Weitere ähnliche Inhalte

Was ist angesagt?

Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
YanhuaSi
 
Er & eer to relational mapping
Er & eer to relational mappingEr & eer to relational mapping
Er & eer to relational mapping
saurabhshertukde
 
cryptography and network security chap 3
cryptography and network security chap 3cryptography and network security chap 3
cryptography and network security chap 3
Debanjan Bhattacharya
 

Was ist angesagt? (20)

Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Types of keys in database management system by Dr. Kamal Gulati
Types of keys in database management system by Dr. Kamal GulatiTypes of keys in database management system by Dr. Kamal Gulati
Types of keys in database management system by Dr. Kamal Gulati
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Deep deterministic policy gradient
Deep deterministic policy gradientDeep deterministic policy gradient
Deep deterministic policy gradient
 
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
CS6701 CRYPTOGRAPHY AND NETWORK SECURITYCS6701 CRYPTOGRAPHY AND NETWORK SECURITY
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
 
Object Relational Mapping in PHP
Object Relational Mapping in PHPObject Relational Mapping in PHP
Object Relational Mapping in PHP
 
Unit 01 dbms
Unit 01 dbmsUnit 01 dbms
Unit 01 dbms
 
Démo Big Data Paris - Détection de Fraude
Démo Big Data Paris - Détection de FraudeDémo Big Data Paris - Détection de Fraude
Démo Big Data Paris - Détection de Fraude
 
Machine learning
Machine learningMachine learning
Machine learning
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Er & eer to relational mapping
Er & eer to relational mappingEr & eer to relational mapping
Er & eer to relational mapping
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Model compression
Model compressionModel compression
Model compression
 
cryptography and network security chap 3
cryptography and network security chap 3cryptography and network security chap 3
cryptography and network security chap 3
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
Machine learning by using python lesson 3 Confusion Matrix By : Professor Lil...
 
Chapter 3 Entity Relationship Model
Chapter 3 Entity Relationship ModelChapter 3 Entity Relationship Model
Chapter 3 Entity Relationship Model
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 

Ähnlich wie Weka project - Classification & Association Rule Generation

Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
Sanghun Kim
 
MS Word.doc
MS Word.docMS Word.doc
MS Word.doc
butest
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
Dr. Rajesh P Barnwal
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
Prashant Menon
 

Ähnlich wie Weka project - Classification & Association Rule Generation (20)

Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)Comparison of Top Data Mining(Final)
Comparison of Top Data Mining(Final)
 
MS Word.doc
MS Word.docMS Word.doc
MS Word.doc
 
Benchmarking_ML_Tools
Benchmarking_ML_ToolsBenchmarking_ML_Tools
Benchmarking_ML_Tools
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
research paper
research paperresearch paper
research paper
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
 
Fraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning TechniqueFraud Detection with Ensemble Learning Technique
Fraud Detection with Ensemble Learning Technique
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 
Project
ProjectProject
Project
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
 
Performance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various ClassifiersPerformance Evaluation: A Comparative Study of Various Classifiers
Performance Evaluation: A Comparative Study of Various Classifiers
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
A Survey on Stroke Prediction
A Survey on Stroke PredictionA Survey on Stroke Prediction
A Survey on Stroke Prediction
 
A survey on heart stroke prediction
A survey on heart stroke predictionA survey on heart stroke prediction
A survey on heart stroke prediction
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Weka project - Classification & Association Rule Generation

  • 1. VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR Data Mining using Weka A Paper on Data Mining techniques using Weka software MBA 2010-2012 IT FOR BUSINESS INTELLIGENCE – TERM PAPER INSTRUCTOR – PROF. PRITHWIS MUKERJEE SUBMITTED BY SATHISHWARAN.R 10BM60079 MBA 2010-2012
  • 2. Data Mining using WEKA 2 Table of Contents 1. INTRODUCTION ......................................................................................................................... 3 2. CLASSIFICATION......................................................................................................................... 3 2.1 DATA.................................................................................................................................... 3 2.2 SCREENS .............................................................................................................................. 3 2.3 OUTPUT ............................................................................................................................... 6 2.4 INTERPRETATION ................................................................................................................ 7 3. ASSOCIATION RULES ................................................................................................................. 7 3.1 DATA.................................................................................................................................... 7 3.2 SCREENS .............................................................................................................................. 8 3.3 OUTPUT ............................................................................................................................. 10 3.4 INTERPRETATION .............................................................................................................. 12 4. REFERNCES............................................................................................................................... 12
  • 3. Data Mining using WEKA 3 1. INTRODUCTION Widespread usage of computers has made life easier for business executives. However it has led to the proliferation of data which had made it difficult to comprehend meaning out of it. The amount of data that is generated in the world today had made decision making difficult. Data mining is one approach that identifies the patterns in data and helps in making decisions by analysing this huge data ocean. Weka (Waikato Environment for Knowledge Analysis) is free software developed at university of Waikato in New Zealand and is available under the General Public License. The software can be used for research, education and applications. It has a GUI interface and comprehensive set of tools for analysing data. In this paper I have worked on data mining techniques using the Weka software. 2. CLASSIFICATION 2.1 Data The raw data used for this analysis has been obtained from website: http://tunedit.org/ and it has been originally gathered from census data. There are 14 original attributes (features) include age, work class, education, education, marital status, occupation, native country, etc. It contains continuous, binary and categorical features. I have used the data for a two-class classification problem. The task is to discover high revenue people from the census data and also to make sure whether the data has been classified correctly by cross validation. Link: http://tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff 2.2 Screens Step 1: Launch Weka
  • 4. Data Mining using WEKA 4 Step 2: Click Explorer Step 3: Click Open file
  • 5. Data Mining using WEKA 5 Step 4: Data updated in Weka Step 4: Click Cross Validation and Decision Table. Click Start
  • 6. Data Mining using WEKA 6 2.3 Output Cross-validation === Run information === Scheme: weka.classifiers.rules.DecisionTable -X 1 -S "weka.attributeSelection.BestFirst - D 1 -N 5" Relation: ADA_Prior Instances: 4147 Attributes: 15 age workclass fnlwgt education educationNum maritalStatus occupation relationship race sex capitalGain capitalLoss hoursPerWeek nativeCountry label Test mode:10-fold cross-validation === Classifier model (full training set) === Decision Table: Number of training instances: 4147 Number of Rules: 130 Non matches covered by Majority class. Best first. Start set: no attributes Search direction: forward Stale search after 5 node expansions Total number of subsets evaluated: 96 Merit of best subset found: 83.82 Evaluation (for feature selection): CV (leave one out) Feature set: 5, 8,11,12,15 Time taken to build model: 0.98 seconds === Stratified cross-validation ===
  • 7. Data Mining using WEKA 7 === Summary === Correctly Classified Instances 3461 83.4579 % Incorrectly Classified Instances 686 16.5421 % Kappa statistic 0.5073 Mean absolute error 0.2353 Root mean squared error 0.339 Relative absolute error 63.0518 % Root relative squared error 78.4907 % Total Number of Instances 4147 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.939 0.483 0.855 0.939 0.895 0.873 -1 0.517 0.061 0.738 0.517 0.608 0.873 1 Weighted Avg. 0.835 0.378 0.826 0.835 0.824 0.873 === Confusion Matrix === a b <-- classified as 2929 189 | a = -1 497 532 | b = 1 2.4 Interpretation  There are 83.45 % correctly classified instances and 16.54 % incorrectly classified instances.  Classifier accuracy is 54.73 % from the kappa statistic  The forecast error is got from the mean absolute error is 0.339  3461 instances have been classified correctly and 686 instances have been classified incorrectly. 3. ASSOCIATION RULES 3.1 Data The data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA. The CQA lists nine different types of votes: voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition). Number of Instances: 435 (267 democrats, 168 republicans) Number of Attributes: 16 + class name = 17 (all Boolean valued)
  • 8. Data Mining using WEKA 8 Attribute Information:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Link: http://tunedit.org/repo/UCI/vote.arff 3.2 Screens Step 1: Launch Weka
  • 9. Data Mining using WEKA 9 Step 2: Click Explorer Step 3: Click Open file… and choose respective file
  • 10. Data Mining using WEKA 10 Step 4: Click Associate and choose Apriori Step 5: Click Start 3.3 Output === Run information === Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 Relation: vote Instances: 435 Attributes: 17 handicapped-infants
  • 11. Data Mining using WEKA 11 water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa Class === Associator model (full training set) === Apriori ======= Minimum support: 0.45 (196 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 11 Generated sets of large itemsets: Size of set of large itemsets L(1): 20 Size of set of large itemsets L(2): 17 Size of set of large itemsets L(3): 6 Size of set of large itemsets L(4): 1 Best rules found: 1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219 conf:(1) 2. adoption-of-the-budget-resolution=y physician-fee-freeze=n aid-to-nicaraguan-contras=y 198 ==> Class=democrat 198 conf:(1) 3. physician-fee-freeze=n aid-to-nicaraguan-contras=y 211 ==> Class=democrat 210 conf:(1) 4. physician-fee-freeze=n education-spending=n 202 ==> Class=democrat 201 conf:(1) 5. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99) 6. el-salvador-aid=n Class=democrat 200 ==> aid-to-nicaraguan-contras=y 197 conf:(0.99) 7. el-salvador-aid=n 208 ==> aid-to-nicaraguan-contras=y 204 conf:(0.98) 8. adoption-of-the-budget-resolution=y aid-to-nicaraguan-contras=y Class=democrat 203 ==> physician-fee-freeze=n 198 conf:(0.98) 9. el-salvador-aid=n aid-to-nicaraguan-contras=y 204 ==> Class=democrat 197 conf:(0.97)
  • 12. Data Mining using WEKA 12 10. aid-to-nicaraguan-contras=y Class=democrat 218 ==> physician-fee-freeze=n 210 conf:(0.96) 3.4 Interpretation Association rules have been formed by apriori association as they can be seen from the output. 4. REFERENCES:  Book: Data Mining – Practical Machine Learning Tools and Techniques, Ian H. Witten, Eibe Frank, Mark A. Hall  http://www.cs.waikato.ac.nz/ml/weka/  http://www.tunedit.org/repo/Data/Agnostic-vs-Prior/Training/ada_prior_train.arff  http://tunedit.org/repo/UCI/vote.arff