SlideShare ist ein Scribd-Unternehmen logo
1 von 8
WEKA
A DATA MINING TOOL
   Amu Prabhjot Singh
     10BM60011
WEKA
Data mining isn't solely the domain of big companies and expensive software. In fact, there's a
piece of software that does almost all the same things as these expensive pieces of software —
the software is called WEKA. WEKA is the product of the University of Waikato (New Zealand)
and was first implemented in its modern form in 1997. It uses the GNU General Public License
(GPL). The software is written in the Java™ language and contains a GUI for interacting with
data files and producing visual results (think tables and curves). It's Java-based, so if we don't
have a JRE installed on your computer, download the WEKA version that contains the JRE, as
well. To load data into WEKA, we have to put it into a format that will be understood. WEKA's
preferred method for loading data is in the Attribute-Relation File Format (ARFF), where we can
define the type of data being loaded, then supply the data itself.

When we start WEKA, the GUI chooser
pops up as shown in figure

It lets us choose four ways to work with
WEKA and our data. The four ways are

        Explorer
        Experimenter
        Knowledge Flow
        Simple CLI



REGRESSION
Regression is the easiest technique to use, but is also probably the least powerful. In effect,
regression models all fit the same general pattern. There are a number of independent
variables, which, when taken together, produce a result — a dependent variable. The
regression model is then used to predict the result of an unknown dependent variable, given
the values of the independent variables.

We will perform Regression on the pricing of the house. The price of the house (the dependent
variable) is the result of many independent variables — the square footage of the house, the
size of the lot, whether granite is in the kitchen, bathrooms are upgraded, etc.

To create our regression model, start WEKA, then choose the Explorer. In the Explorer screen,
select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the
file the explorer window looks as below
In the left section of the Explorer window, it outlines all of the columns in the data (Attributes)
and the number of rows of data supplied (Instances). By selecting each column, the right
section of the Explorer window will also give the information about the data in that column of
the data set. For example, by selecting the houseSize column in the left section the right-
section should change to show the additional statistical information about the column. Finally,
there's a visual way of examining the data, which can be viewed by clicking the Visualize
All button.
To create the model, click on the Classify tab. The first step is to select the model we want to
build, so WEKA knows how to work with the data, and how to create the appropriate model:
    1. Click the Choose button, then expand the functions branch.
    2. Select the LinearRegression leaf.
This tells WEKA that we want to build a regression model. When we have selected the right
model, your WEKA Explorer should look as below
Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there
are actually different options than what we'll be using. The other three choices are Supplied
test set, where we can supply a different set of data to build the model; Cross-validation, which
lets WEKA build a model based on subsets of the supplied data and then average them out to
create a final model; and Percentage split, where WEKA takes a percentile subset of the
supplied data to build a final model. With regression, we can simply choose Use training set.
Finally, the last step to creating our model is to choose the dependent variable (the column we
are looking to predict). We know this should be the selling price, since that's what we're trying
to determine for my house. Right below the test options, there's a combo box that lets you
choose the dependent variable. The column SellingPrice should be selected by default. If it's
not, please select it. To create our model, click Start. Figure below shows the output window




INTERPRETATION OF THE RESULT:

SellingPrice = (-26.6882 * houseSize) + (7.0551 * lotSize) + (43166.0767 * bedrooms) +

               (42292.0901 * bathroom) - 21661.1208

Interpreting the pattern and conclusion that our model generated we see that besides just a
strict house value:
        Granite doesn't matter — WEKA will only use columns that statistically contribute to
        the accuracy of the model. It will throw out and ignore columns that don't help in
        creating a good model. So this regression model is telling us that granite in your kitchen
        doesn't affect the house's value.
Bathrooms do matter — Since we use a simple 0 or 1 value for an upgraded bathroom,
       we can use the coefficient from the regression model to determine the value of an
       upgraded bathroom on the house value.
       Bigger houses reduce the value — WEKA is telling us that the bigger our house is, the
       lower the selling price. This can be seen by the negative coefficient in front of
       the houseSize variable. The model is telling us that every additional square foot of the
       house reduces its price by $26.



CLUSTERING
Clustering is the task of assigning a set of objects into groups (called clusters) so that the
objects in the same cluster are more similar (in some sense or another) to each other than to
those in other clusters. WEKA offers clustering capabilities not only as standalone schemes, but
also as filters and classifiers.

To begin with clustering we will use the data set of bank. The data set contains – id, age, sex,
region, income, married, children, car, save_acct, current_acct, mortgage and pep.

To create clustering, start WEKA, then choose the Explorer. In the Explorer screen, select
the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file
the explorer window looks as below




To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button.
This results in a drop down list of available clustering algorithms. In this case select
"SimpleKMeans". Next, click on the text box to the right of the "Choose" button to get the pop-
up window shown in Figure below, for editing the clustering parameter.




In the pop-up window we enter 6 as the number of clusters (instead of the default values of 2)
and we leave the value of "seed" as is. The seed value is used in generating a random number
which is, in turn, used for making the initial assignment of instances to clusters.




Once the options have been specified, we can run the clustering algorithm. Here we make sure
that in the "Cluster Mode" panel, the "Use training set" option is selected, and we click "Start".
We can right click the result set in the "Result list" panel and view the results of clustering in a
separate window.
We can choose the cluster number and any of the other attributes for each of the three
different dimensions available (x-axis, y-axis, and color). Different combinations of choices will
result in a visual rendering of different relationships within each cluster. Here we have chosen
the cluster number as the x-axis, the instance number as the y-axis, and the "sex" attribute as
the color dimension. This will result in a visualization of the distribution of males and females in
each cluster. For instance, here clusters 2 and 3 are dominated by males, while clusters 4 and 5
are dominated by females.
INTERPRETING THE RESULT:

Each cluster shows us a type of behavior in our customers, from which we can begin to draw
some conclusions:

Cluster 0 – It contains a cluster of Females with an average age of 37 who live in inner city and possess
saving account number and current account number. They are unmarried and donot have any mortgage
or pep. The average monthly income is 23,300.

Cluster 1 - It contains a cluster of Females with an average age of 44 who live in rural area and possess
saving account number and current account number. They are married and donot have any mortgage or
pep. The average monthly income is 27,772.

Cluster 2 - It contains a cluster of Females with an average age of 48 who live in inner city and possess
current account number but no saving account number. They are unmarried and donot have mortgage
but do have pep. The average monthly income is 27,668.

Cluster 3 - It contains a cluster of Females with an average age of 39 who live in town and possess saving
account number and current account number. They are married and donot have any mortgage or pep.
The average monthly income is 24,047.

Cluster 4 - It contains a cluster of Males with an average age of 39 who live in inner city and possess
current account number but no saving account number. They are married and have mortgage and pep.
The average monthly income is 26,359.

Cluster 5 - It contains a cluster of Males with an average age of 47 who live in inner city and possess
saving account number and current account number. They are unmarried and donot have mortgage but
do have pep. The average monthly income is 35,419.

Weitere ähnliche Inhalte

Andere mochten auch

Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platformabial
 
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)Ankit Pandey
 
Data Mining Tools / Orange
Data Mining Tools / OrangeData Mining Tools / Orange
Data Mining Tools / OrangeYasemin Karaman
 
Web Crawling with Apache Nutch
Web Crawling with Apache NutchWeb Crawling with Apache Nutch
Web Crawling with Apache Nutchsebastian_nagel
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonLivePerson
 
Orange Canvas - PyData 2013
Orange Canvas - PyData 2013Orange Canvas - PyData 2013
Orange Canvas - PyData 2013justin_sun
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedShareek Ahamed
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Krishna Petrochemicals
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorialbutest
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

Andere mochten auch (11)

Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platform
 
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
Data Mining Techniques using WEKA (Ankit Pandey-10BM60012)
 
Data Mining Tools / Orange
Data Mining Tools / OrangeData Mining Tools / Orange
Data Mining Tools / Orange
 
Web Crawling with Apache Nutch
Web Crawling with Apache NutchWeb Crawling with Apache Nutch
Web Crawling with Apache Nutch
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Orange Canvas - PyData 2013
Orange Canvas - PyData 2013Orange Canvas - PyData 2013
Orange Canvas - PyData 2013
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
WEKA Tutorial
WEKA TutorialWEKA Tutorial
WEKA Tutorial
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Ähnlich wie Weka Term Paper_VGSoM_10BM60011

Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Siddharth Verma
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advancedexcel content
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKAsatyamkhatri
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginnerexcel content
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalPrabhat Agarwal
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambastVijay Ambast
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Week 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docxWeek 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docxcockekeshia
 
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faqmaheshboggula
 
De vry math 399 ilabs & discussions latest 2016
De vry math 399 ilabs & discussions latest 2016De vry math 399 ilabs & discussions latest 2016
De vry math 399 ilabs & discussions latest 2016lenasour
 
De vry math 399 ilabs & discussions latest 2016 november
De vry math 399 ilabs & discussions latest 2016 novemberDe vry math 399 ilabs & discussions latest 2016 november
De vry math 399 ilabs & discussions latest 2016 novemberlenasour
 

Ähnlich wie Weka Term Paper_VGSoM_10BM60011 (20)

Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
Excel Datamining Addin Advanced
Excel Datamining Addin AdvancedExcel Datamining Addin Advanced
Excel Datamining Addin Advanced
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKA
 
Itb weka
Itb wekaItb weka
Itb weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Itb weka nikhil
Itb weka nikhilItb weka nikhil
Itb weka nikhil
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat Agarwal
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
Excel help 01
Excel help 01Excel help 01
Excel help 01
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Week 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docxWeek 2 Project - STAT 3001Student Name Type your name here.docx
Week 2 Project - STAT 3001Student Name Type your name here.docx
 
Obiee interview questions and answers faq
Obiee interview questions and answers faqObiee interview questions and answers faq
Obiee interview questions and answers faq
 
De vry math 399 ilabs & discussions latest 2016
De vry math 399 ilabs & discussions latest 2016De vry math 399 ilabs & discussions latest 2016
De vry math 399 ilabs & discussions latest 2016
 
De vry math 399 ilabs & discussions latest 2016 november
De vry math 399 ilabs & discussions latest 2016 novemberDe vry math 399 ilabs & discussions latest 2016 november
De vry math 399 ilabs & discussions latest 2016 november
 

Kürzlich hochgeladen

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Weka Term Paper_VGSoM_10BM60011

  • 1. WEKA A DATA MINING TOOL Amu Prabhjot Singh 10BM60011
  • 2. WEKA Data mining isn't solely the domain of big companies and expensive software. In fact, there's a piece of software that does almost all the same things as these expensive pieces of software — the software is called WEKA. WEKA is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in 1997. It uses the GNU General Public License (GPL). The software is written in the Java™ language and contains a GUI for interacting with data files and producing visual results (think tables and curves). It's Java-based, so if we don't have a JRE installed on your computer, download the WEKA version that contains the JRE, as well. To load data into WEKA, we have to put it into a format that will be understood. WEKA's preferred method for loading data is in the Attribute-Relation File Format (ARFF), where we can define the type of data being loaded, then supply the data itself. When we start WEKA, the GUI chooser pops up as shown in figure It lets us choose four ways to work with WEKA and our data. The four ways are Explorer Experimenter Knowledge Flow Simple CLI REGRESSION Regression is the easiest technique to use, but is also probably the least powerful. In effect, regression models all fit the same general pattern. There are a number of independent variables, which, when taken together, produce a result — a dependent variable. The regression model is then used to predict the result of an unknown dependent variable, given the values of the independent variables. We will perform Regression on the pricing of the house. The price of the house (the dependent variable) is the result of many independent variables — the square footage of the house, the size of the lot, whether granite is in the kitchen, bathrooms are upgraded, etc. To create our regression model, start WEKA, then choose the Explorer. In the Explorer screen, select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file the explorer window looks as below
  • 3. In the left section of the Explorer window, it outlines all of the columns in the data (Attributes) and the number of rows of data supplied (Instances). By selecting each column, the right section of the Explorer window will also give the information about the data in that column of the data set. For example, by selecting the houseSize column in the left section the right- section should change to show the additional statistical information about the column. Finally, there's a visual way of examining the data, which can be viewed by clicking the Visualize All button. To create the model, click on the Classify tab. The first step is to select the model we want to build, so WEKA knows how to work with the data, and how to create the appropriate model: 1. Click the Choose button, then expand the functions branch. 2. Select the LinearRegression leaf. This tells WEKA that we want to build a regression model. When we have selected the right model, your WEKA Explorer should look as below
  • 4. Though it may be obvious to us that we want to use the data we supplied in the ARFF file, there are actually different options than what we'll be using. The other three choices are Supplied test set, where we can supply a different set of data to build the model; Cross-validation, which lets WEKA build a model based on subsets of the supplied data and then average them out to create a final model; and Percentage split, where WEKA takes a percentile subset of the supplied data to build a final model. With regression, we can simply choose Use training set. Finally, the last step to creating our model is to choose the dependent variable (the column we are looking to predict). We know this should be the selling price, since that's what we're trying to determine for my house. Right below the test options, there's a combo box that lets you choose the dependent variable. The column SellingPrice should be selected by default. If it's not, please select it. To create our model, click Start. Figure below shows the output window INTERPRETATION OF THE RESULT: SellingPrice = (-26.6882 * houseSize) + (7.0551 * lotSize) + (43166.0767 * bedrooms) + (42292.0901 * bathroom) - 21661.1208 Interpreting the pattern and conclusion that our model generated we see that besides just a strict house value: Granite doesn't matter — WEKA will only use columns that statistically contribute to the accuracy of the model. It will throw out and ignore columns that don't help in creating a good model. So this regression model is telling us that granite in your kitchen doesn't affect the house's value.
  • 5. Bathrooms do matter — Since we use a simple 0 or 1 value for an upgraded bathroom, we can use the coefficient from the regression model to determine the value of an upgraded bathroom on the house value. Bigger houses reduce the value — WEKA is telling us that the bigger our house is, the lower the selling price. This can be seen by the negative coefficient in front of the houseSize variable. The model is telling us that every additional square foot of the house reduces its price by $26. CLUSTERING Clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. WEKA offers clustering capabilities not only as standalone schemes, but also as filters and classifiers. To begin with clustering we will use the data set of bank. The data set contains – id, age, sex, region, income, married, children, car, save_acct, current_acct, mortgage and pep. To create clustering, start WEKA, then choose the Explorer. In the Explorer screen, select the Preprocess tab. Select the Open File button and select the ARFF file. After selecting the file the explorer window looks as below To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button. This results in a drop down list of available clustering algorithms. In this case select
  • 6. "SimpleKMeans". Next, click on the text box to the right of the "Choose" button to get the pop- up window shown in Figure below, for editing the clustering parameter. In the pop-up window we enter 6 as the number of clusters (instead of the default values of 2) and we leave the value of "seed" as is. The seed value is used in generating a random number which is, in turn, used for making the initial assignment of instances to clusters. Once the options have been specified, we can run the clustering algorithm. Here we make sure that in the "Cluster Mode" panel, the "Use training set" option is selected, and we click "Start". We can right click the result set in the "Result list" panel and view the results of clustering in a separate window.
  • 7. We can choose the cluster number and any of the other attributes for each of the three different dimensions available (x-axis, y-axis, and color). Different combinations of choices will result in a visual rendering of different relationships within each cluster. Here we have chosen the cluster number as the x-axis, the instance number as the y-axis, and the "sex" attribute as the color dimension. This will result in a visualization of the distribution of males and females in each cluster. For instance, here clusters 2 and 3 are dominated by males, while clusters 4 and 5 are dominated by females.
  • 8. INTERPRETING THE RESULT: Each cluster shows us a type of behavior in our customers, from which we can begin to draw some conclusions: Cluster 0 – It contains a cluster of Females with an average age of 37 who live in inner city and possess saving account number and current account number. They are unmarried and donot have any mortgage or pep. The average monthly income is 23,300. Cluster 1 - It contains a cluster of Females with an average age of 44 who live in rural area and possess saving account number and current account number. They are married and donot have any mortgage or pep. The average monthly income is 27,772. Cluster 2 - It contains a cluster of Females with an average age of 48 who live in inner city and possess current account number but no saving account number. They are unmarried and donot have mortgage but do have pep. The average monthly income is 27,668. Cluster 3 - It contains a cluster of Females with an average age of 39 who live in town and possess saving account number and current account number. They are married and donot have any mortgage or pep. The average monthly income is 24,047. Cluster 4 - It contains a cluster of Males with an average age of 39 who live in inner city and possess current account number but no saving account number. They are married and have mortgage and pep. The average monthly income is 26,359. Cluster 5 - It contains a cluster of Males with an average age of 47 who live in inner city and possess saving account number and current account number. They are unmarried and donot have mortgage but do have pep. The average monthly income is 35,419.