2. Introduction to WEKA
ï Waikato Environment for Knowledge Analysis (WEKA)
ï Open software, data mining software
GNU General Public Licence
ï Open JAVA library, weka.jar
Path: Program Fileswekaweka.jar
ï One of the best 6 open source data mining tool
http://thenewstack.io/six-of-the-best-open-source-data-mining-tools/
2
3. Classification
ï Data contain class and attribute.
ï 2 data set, i.e., training data and testing data.
ï A function (learning model) is inferred by observing
attribute and class of training data --- pattern
ï The learning model will be used to predict class for
new unseen data
3
// train decision tree c4.5
J48 j48=new J48();
j48.buildClassifier(instances);
4. Decision Tree
4
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Model: Decision Tree
Tid Refund Marital
Status
Taxable
Income
Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund Marital
Status
Taxable
Income
Cheat
No Single 75K ?
Yes Married 50K ?
No Married 80K ?
10
Train
Test
Training Data set
Testing Data set
5. ARFF Format
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
5
// load ARFF file to Instances object
File file = new File("/weka/data/weather.arff");
ArffLoader arffLoader = new ArffLoader();
arffLoader.setSource(file);
Instances instances = arffLoader.getDataSet();
6. Definitions: Object
outlook Temperatu
re
humidity windy play
sunny 85.5 85 false no
sunny 80.0 90 true no
overcast 83.2 83 false yes
rainy 70.6 96 false yes
rainy 63.9 80 false yes
6
Instance
Attribute
Class
Instances
// set the last attribute (column) as a class attribute
instances.setClassIndex(instances.numAttributes() - 1);
7. // load ARFF file to Instances object
File file = new File("/weka/data/weather.arff");
ArffLoader arffLoader = new ArffLoader();
arffLoader.setSource(file);
Instances instances = arffLoader.getDataSet();
// set the last attribute (column) as a class attribute
instances.setClassIndex(instances.numAttributes() - 1);
// train decision tree c4.5
J48 j48=new J48();
j48.buildClassifier(instances);
// test model, 10 fold - crossvalidation
Evaluation evaluation=new Evaluation(instances);
evaluation.crossValidateModel(j48, instances, 10, new Random(1));
System.out.println(evaluation.toSummaryString());
7