Manual orange

Tutorial on “Orange: An Open Source Data Mining Package”

Prepared By:

Mr. KISHOJ BAJRACHARYA (ID No: 111224)
kishoj@gmail.com

Department of Computer Science and Information Management
School of Engineering and Technology
Asian Institute of Technology

October 12, 2011

Contents

1 Orange 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Features of Orange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Installing Orange-Canvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Installing on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Installing on Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Python Scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Python Scripting Code Examples 6
2.1 Using Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Support, Conﬁdence and Lift for Association Rule . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Naive Bayes Classiﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 K-Means Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 References 13

1

1 Orange

1.1 Introduction

Orange is a collection of Python-based modules that sit over the core library of C++ objects and routines
that handles machine learning and data mining algorithms. It is an open source data mining package build
on Python, Wrapped C, C++ and Qt.

Orange widgets provide a graphical users interface to Oranges data mining and machine learning meth-
ods. They include widgets for data entry and pre-processing, data visualization, classification, regression,
association rules and clustering, a set of widgets for model evaluation and visualization of evaluation re-
sults, and widgets for exporting the models into Decision support system. Orange widgets and Orange
Canvas are all written in Python using Qt graphical users interface library. This allows Orange to run on
various platforms, including MS Windows and Linux.

1.2 Features of Orange

1. Open and free software: Orange is an open source and free data mining software tool.

2. Platform independent software: Orange is supported on various versions of Linux, Microsoft win-
dows, and Apples Mac.

3. Programming support: Orange supports visual programming tools for Data mining: Users can
design data analysis process via visual programming. Orange provides different visualization like
bardiagram, scatterplots, trees, network, etc.

4. Scripting Interface: Orange provides python scripting. Programmers can test various new algorithms
and data analysis using python scripting.

5. Support for other components: Orange provides support for Machine Learning, bioinformatics, text
mining, etc.

1.3 Installing Orange-Canvas

Orange-Canvas can be install on any platform. Browse the URI http://orange.biolab.si/nightly_
builds.html for more informations on installing Orange-Canvas on different platforms. Here we focus
on two platforms: Windows 7 and Ubuntu.

1.3.1 Installing on Windows

1. Browse the URI http://orange.biolab.si/nightly_builds.html.

2. Download “orange-win-w-python-snapshot-2011-09-11-py2.7.exe”.

3. Install the software by double clicking the file “orange-win-w-python-snapshot-2011-09-11-py2.7.exe”

The steps for installations are shown in the figures below:

2

Fig 1: License Agreement

Fig 2: Completion of Installation

Fig 3: Locating Orange-Canvas

3

Fig 4: Orange-Canvas GUI

1.3.2 Installing on Ubuntu

1. Browse the URI http://orange.biolab.si/download/archive/.

2. Download the compressed file “orange-2.0-20101215svn.zip”.

3. Extract all the files from the file “orange-2.0-20101215svn.zip”.
unzip orange-2.0-20101215svn.zip

4. Type the following commands on linux terminal
python setup.py build
sudo python setup.py install
python setup.py install –user

1.4 Python Scripting

Using the scripting language python in which the module orange can be imported by following code.
# Import module orange for python scripting
import orange

4

Fig 5(a): Using Python Scripting on Windows

Fig 5(b): Using Python Scripting on Ubuntu

5

2 Python Scripting Code Examples

Orange provides scripting interface on python programming language. Programmers can test various new
algorithms and data analysis using python scripting.

2.1 Using Python Code

The following code in the ﬁle test.py is used to test python scripting for Orange. It shows the simple python
program to test the importing of data from an external ﬁle and play with the data access mechanism.

1 # test.py
2 # Importing Orange Library for python
3 import orange
4
5 # Importing data from the file named "test.tab"
6 data = orange.ExampleTable("test")
7
8 # Printing the attributes of the table
9 print "Attributes:"
10 print data.domain.attributes
11
12 # List of attributes
13 attributeList = []
14
15 # Printing the attributes of the table
16 for i in data.domain.attributes:
17 attributeList.append(i.name)
18 print i.name
19 attributeList.append(data.domain.classVar.name)
20
21 # Class Name
22 print "Class:", data.domain.classVar.name
23
24 # Display atributes
25 print attributeList
26 #attributeList.split(",")
27 print
28
29 # Displaying the data from the table
30 print "Data items:"
31 for i in range(14):
32 print data[i]

Let the data table for above code be shown as in Fig 6.

Fig 6: Data Table 1

6

The output of the above program is shown in Fig 7.

Fig 7: Output of test.py

2.2 Support, Confidence and Lift for Association Rule

The following “example2.py” shows how do we use scripting language like python to get support and
confidence for all the possible association rules developed from the data of imported file “association.tab”.

1 # example2.py
2 # Importing classes Orange and orngAssoc
3 import orange, orngAssoc
4
5 # Importing data from a file named association.tab
6 data = orange.ExampleTable("association")
7
8 # Data Preprocessing
9 data = orange.Preprocessor_discretize(data, method=orange.EquiNDiscretization(numberOfIntervals=4))
10
11 # Data Selection (We have range of 2)
12 data = data.select(range(2))
13
14 # List of supports
15 iList = [0.1, 0.2, 0.3, 0.4]
16
17 for x in iList:
18 # Developing association rules from Orange
19 rules = orange.AssociationRulesInducer(data, support=x)
20
21 # if there is no association rule
22 if(len(rules) == 0):
23 print "No any association rules for support = %5.3f" % (x)
24 # if there exists an association rule
25 else:
26 print "%i rules with support = %5.3f found.n" % (len(rules), x)
27 orngAssoc.sort(rules, ["support", "confidence", "lift"])
28 orngAssoc.printRules(rules[:(len(rules))], ["support", "confidence", "lift"])
29 print


7

Fig 8: Output of example2.py

2.3 Naive Bayes Classifier

Using Python, we observe the working of Bayesian classifier from voting data set i.e. “voting.tab” and will
use it to classify the first five instances from this data set.

1 # classifier.py
2 import orange
3 data = orange.ExampleTable("voting")
4 classifier = orange.BayesLearner(data)
6 c = classifier(data[i])
7 print "%d: %s (originally %s)" % (i+1, c, data[i].getclass())

The script loads the data, uses it to constructs a classifier using naive Bayesian method, and then classifies
first five instances of the data set. Naive Bayes made a mistake at a third instance, but otherwise predicted
correctly as shown if the figure below.

8

Fig 9: Output of classiﬁer.py

2.4 Regression

Following example uses both regression trees and k-nearest neighbors, and also uses a majority learner
which for regression simply returns an average value from learning data set.

1 # regression2.py
2 import orange, orngTree, orngTest, orngStat
3
4 data = orange.ExampleTable("housing.tab")
5 selection = orange.MakeRandomIndices2(data, 0.5)
6 train_data = data.select(selection, 0)
7 test_data = data.select(selection, 1)
8
9 maj = orange.MajorityLearner(train_data)
10 maj.name = "default"
11
12 rt = orngTree.TreeLearner(train_data, measure="retis", mForPruning=2, minExamples=20)
13 rt.name = "reg. tree"
14
15 k = 5
16 knn = orange.kNNLearner(train_data, k=k)
17 knn.name = "k-NN (k=%i)" % k
18
19 regressors = [maj, rt, knn]
20
21 print "n%10s " % "original",
22 for r in regressors:
23 print "%10s " % r.name,
24 print
25
27 print "%10.1f " % test_data[i].getclass(),
28 for r in regressors:
29 print "%10.1f " % r(test_data[i]),
30 print


Fig 10: Output of regression.py

9

2.5 K-Means Clustering Algorithm

Let us use python to implement K-means clustering algorithm for the problem solved in the class i.e. K = 2
and array = [1,2,3,4,8,9,10,11].

1 # test3.py
2 import numpy
3 import math
4
5 # Given Array of elements that needs to be clustered
6 iArray = [1.0, 2.0, 3.0, 4.0, 8.0, 9.0, 10.0, 11.0]
7
8 # Returns the value of the mean of an array elements
9 def meanArray(aArray):
10 icount = len(aArray)
11 iSum = 0
12 for x in aArray:
13 iSum = iSum + x
14 return (iSum/icount)
15
16 Count = len(iArray)
17
18 # Randomly select 2 elements
19 c1 = iArray[Count-2]
20 c2 = iArray[Count-1]
21
22 # Initial assumptions all classes null
23 Class1 = [0.0]
24 Class2 = []
25 oldClass1 = []
26 i = 1
27
28 # Loop exit condition
29 while (oldClass1 != Class1):
30 print "Iteration: " + str(i)
31 oldClass1 = Class1
32 Class1 = []
33 Class2 = []
34 for x in iArray:
35 if math.fabs(c1 - x) < math.fabs(c2 - x):
36 Class1.append(x)
37 else:
38 Class2.append(x)
39 print "Class1: " + str(Class1)
40 c1 = round(meanArray(Class1),1)
41 print "c1 = " + str(c1)
42
43 print "Class2: " + str(Class2)
44 c2 = round(meanArray(Class2),1)
45 print "c2 = " + str(c2)
46 print
47 i = i + 1

10

Fig 11: Output of K-Means Clustering Algorithm

Using Orange, we can easily implement K-Means Clustering algorithm and plot graph using the following
code.

1 import orange
2 import orngClustering
3 import pylab
4 import random
5
6 # To plot the 2D-point
7 def plot_scatter(data, km, attx, atty, filename="kmeans-scatter", title=None):
8 """plot a data scatter plot with the position of centroids"""
9 pylab.rcParams.update({’font.size’: 8, ’figure.figsize’: [4,3]})
10
11 # For the points
12 x = [float(d[attx]) for d in data]
13 y = [float(d[atty]) for d in data]
14 colors = ["c", "b"]
15 cs = "".join([colors[c] for c in km.clusters])
16 pylab.scatter(x, y, c=cs, s=10)
17
18 # For the centroid points
19 xc = [float(d[attx]) for d in km.centroids]
20 yc = [float(d[atty]) for d in km.centroids]
21 pylab.scatter(xc, yc, marker="x", c="k", s=200)
22
23 pylab.xlabel(attx)
24 pylab.ylabel(atty)
25 if title:
26 pylab.title(title)
27 pylab.savefig("%s-%03d.png" % (filename, km.iteration))
28 pylab.close()
29
30 def in_callback(km):
31 print "Iteration: %d, changes: %d" % (km.iteration, km.nchanges)
32 plot_scatter(data, km, "X", "Y", title="Iteration %d" % km.iteration)
33
34 # Read the data from table
35 data = orange.ExampleTable("data")
36 km = orngClustering.KMeans(data, 2, minscorechange=-1, maxiters=10, inner_callback=in_callback)

11

The output of this program is shown below: Result of test.py

Fig 12(a): During iteration 0

Fig 12(b): During iteration 1

12

3 References

The following are the references taken as a help to prepare this manual:

1. http://en.wikipedia.org/wiki/Orange_(software)

2. http://orange.biolab.si/

3. http://orange.biolab.si/nightly_builds.html

4. http://orange.biolab.si/doc/ofb-rst/genindex.html

13

Manual orange

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (19)

Ähnlich wie Manual orange

Ähnlich wie Manual orange (20)

Mehr von Kishoj Bajracharya

Mehr von Kishoj Bajracharya (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Manual orange