SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
IT FOR BUSINESS INTELLIGENCE




Data Analysis techniques using
WEKA: Classification and
Regression
                  Nikhil Yagnic (07AG3801)
Introduction
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software
written in Java, developed at the University of Waikato, New Zealand. Weka is free software
available under the GNU General Public License.

The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis
and predictive modelling, together with graphical user interfaces for easy access to this functionality.
The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling
algorithms implemented in other programming languages, plus data pre-processing utilities in C, and
a Makefile-based system for running machine learning experiments. This original version was
primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent
fully Java-based version (Weka 3), for which development started in 1997, is now used in many
different application areas, in particular for educational purposes and research. Advantages of Weka
include:

       free availability under the GNU General Public License
       portability, since it is fully implemented in the Java programming language and thus runs on
        almost any modern computing platform
       a comprehensive collection of data pre-processing and modelling techniques
       ease of use due to its graphical user interfaces

Weka supports several standard data mining tasks, more specifically, data pre-processing, clustering,
classification, regression, visualization, and feature selection. All of Weka's techniques are
predicated on the assumption that the data is available as a single flat file or relation, where each
data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but
some other attribute types are also supported). Weka provides access to SQL databases using Java
Database Connectivity and can process the result returned by a database query. It is not capable of
multi-relational data mining, but there is separate software for converting a collection of linked
database tables into a single table that is suitable for processing using Weka.[4] Another important
area that is currently not covered by the algorithms included in the Weka distribution is sequence
modelling.


Classification via decision trees using WEKA

Problem:
A bank is introducing a new financial product. So the bank wants to classify the new customers
whether they will be ready to buy the new product or not. Bank has the existing information from
the old clients who are interested in buying the new product.

Classification is a statistical technique that helps to classify any new client into one of the existing
groups. It will create a model on the test data available. And then classifies the new data based on
the model that is developed using the test data.

Steps to do classification in WEKA
Step 1: Create a data file in the format of arff or csv. Weka understands these two formats. We are
taking the file in csv format Bank.csv
Step 2: Open the Weka application. This will show the following screen




Step 3: Loading data into WEKA.

To do that click on the open file button and browse for the bank.csv file. Then it shows all the
attributes as shown in the below figure.
Step 4: View the data

In the selected attribute panel you can see the values corresponding to the attributes and also its
type, name e.t.c

You can also visualize the frequency distribution of all the attributes at a time by clicking on the
“Visualize All” button. It shows the following screen.
This visualizes all shows the range of data for each attribute and also the mean, median and
frequency of each attribute. For example the value of age in our case is ranging from 18 to 67 with
an average of 42.5

Step 5: Classify the Test data

To do this select the classify button which shows the following screen.




Then select the J48 algorithm which is under the node of tree when you click on the choose button.
This will show the following screen.
Step 6: Run the classification Algorithm

Select the dependent variable that should be classified and click on the start.

This shows the output in the classifier output panel in ASCII version of the tree.

This is difficult to understand. To view the output in the form of tree, right click on the trees.j48 and
select “visualize tree” option. This shows the following screen by again right clicking on the output
and selecting full screen option.
Step 7: Analyze the model created by existing data

From the Classifier output we can find that the Classification accuracy of the model is 89%.

This means that the model is able to predict the values 89% correctly. So if we use the same model
to find out the buying decision of new customer the probability will be 0.89

Step 8: Test the New customer data

Create your new customer data in arff or csv format with the same attributes as test data.

Now input the data by checking the radio button “Supplied test set” and click on “ set” to browse for
the new data set.
Then click on the start button which generates a new tree.

Save the classification result as arff. This file contains a copy of the new instances along with an
additional column for the predicted value. The result will look like following.
Regression Using WEKA
Problem:
The idea is to find out how the CPU performance is correlated with the attributes like machine cycle
time, minimum main memory, cache memory e.t.c

A regression is a statistic tool that helps in finding out how the dependent variable (CPU
performance) is related to the independent attributes.

Steps to do Regression in WEKA
Step 1: Create data file and open the WEKA as in the same way as we did for Classification.

Step 2: Load the regression data file CPU.arff into weka.

Click on open file and browse for the file, that shows the following screen




Step 3: Run the regression

Click on the Classify tab and choose “Linear Regression” from the node under function. This shows
the following screen.
Click on start that will show output in the classifier output screen which gives a regression equation.
Interpretation of the output:

The CPU performance is more dependent on CHMAX and then CACHE

The correlation coefficient of 0.912 is very high, its output suggests that the dependent
variable is strongly associated with the independent variables.

We can also determine the new CPU performance by using the regression equation if we
have the values of the attributes.

Weitere ähnliche Inhalte

Was ist angesagt?

Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationGeant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationRasheed Auguste
 
Ancient Database Presentation
Ancient Database PresentationAncient Database Presentation
Ancient Database Presentationredhelix
 
SSIS Project Profile
SSIS Project ProfileSSIS Project Profile
SSIS Project Profiletthompson0421
 
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...bidwhm
 
Association Rule Mining Using WEKA
Association Rule Mining Using WEKAAssociation Rule Mining Using WEKA
Association Rule Mining Using WEKAProthoma Diteeya
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPTbutest
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573Santhoo Vardan
 
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Ashock Roy
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dcAmit Sharma
 
Introduction to ado
Introduction to adoIntroduction to ado
Introduction to adoHarman Bajwa
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouseKomal Choudhary
 
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGSAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGbidwhm
 
Process management seminar
Process management seminarProcess management seminar
Process management seminarapurva_naik
 

Was ist angesagt? (20)

Oracle reports
Oracle reportsOracle reports
Oracle reports
 
Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationGeant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
 
Create generic delta
Create generic deltaCreate generic delta
Create generic delta
 
Ancient Database Presentation
Ancient Database PresentationAncient Database Presentation
Ancient Database Presentation
 
SSIS Project Profile
SSIS Project ProfileSSIS Project Profile
SSIS Project Profile
 
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
 
6 database
6 database 6 database
6 database
 
Association Rule Mining Using WEKA
Association Rule Mining Using WEKAAssociation Rule Mining Using WEKA
Association Rule Mining Using WEKA
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPT
 
Ado.net
Ado.netAdo.net
Ado.net
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573
 
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dc
 
Introduction to ado
Introduction to adoIntroduction to ado
Introduction to ado
 
Olap
OlapOlap
Olap
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Database testing
Database testingDatabase testing
Database testing
 
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGSAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
 
Process management seminar
Process management seminarProcess management seminar
Process management seminar
 

Ähnlich wie Itb weka nikhil

Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKAFayan TAO
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
James Jara Portfolio 2014  - Enterprise datagrid - Part 3James Jara Portfolio 2014  - Enterprise datagrid - Part 3
James Jara Portfolio 2014 - Enterprise datagrid - Part 3James Jara
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Toolsmcthedog
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
 
End to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionEnd to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionCleo Filho
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic QuestionsSooraj Vinodan
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Amu Singh
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparisonStratebi
 
lab #6
lab #6lab #6
lab #6butest
 
Automation Framework Design
Automation Framework DesignAutomation Framework Design
Automation Framework DesignKunal Saxena
 
Simulink Lab Manual final.doc
Simulink Lab Manual final.docSimulink Lab Manual final.doc
Simulink Lab Manual final.docAkashPatel490216
 
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxjosies1
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.combigclasses.com
 

Ähnlich wie Itb weka nikhil (20)

Itb weka
Itb wekaItb weka
Itb weka
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
James Jara Portfolio 2014  - Enterprise datagrid - Part 3James Jara Portfolio 2014  - Enterprise datagrid - Part 3
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Tools
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
End to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionEnd to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolution
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic Questions
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparison
 
lab #6
lab #6lab #6
lab #6
 
Automation Framework Design
Automation Framework DesignAutomation Framework Design
Automation Framework Design
 
Simulink Lab Manual final.doc
Simulink Lab Manual final.docSimulink Lab Manual final.doc
Simulink Lab Manual final.doc
 
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docxTask A. [20 marks] Data Choice. Name the chosen data set(s) .docx
Task A. [20 marks] Data Choice. Name the chosen data set(s) .docx
 
Tableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.comTableau interview questions www.bigclasses.com
Tableau interview questions www.bigclasses.com
 

Kürzlich hochgeladen

Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 

Kürzlich hochgeladen (20)

Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 

Itb weka nikhil

  • 1. IT FOR BUSINESS INTELLIGENCE Data Analysis techniques using WEKA: Classification and Regression Nikhil Yagnic (07AG3801)
  • 2. Introduction Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License. The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling algorithms implemented in other programming languages, plus data pre-processing utilities in C, and a Makefile-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include:  free availability under the GNU General Public License  portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform  a comprehensive collection of data pre-processing and modelling techniques  ease of use due to its graphical user interfaces Weka supports several standard data mining tasks, more specifically, data pre-processing, clustering, classification, regression, visualization, and feature selection. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka.[4] Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modelling. Classification via decision trees using WEKA Problem: A bank is introducing a new financial product. So the bank wants to classify the new customers whether they will be ready to buy the new product or not. Bank has the existing information from the old clients who are interested in buying the new product. Classification is a statistical technique that helps to classify any new client into one of the existing groups. It will create a model on the test data available. And then classifies the new data based on the model that is developed using the test data. Steps to do classification in WEKA Step 1: Create a data file in the format of arff or csv. Weka understands these two formats. We are taking the file in csv format Bank.csv
  • 3. Step 2: Open the Weka application. This will show the following screen Step 3: Loading data into WEKA. To do that click on the open file button and browse for the bank.csv file. Then it shows all the attributes as shown in the below figure.
  • 4. Step 4: View the data In the selected attribute panel you can see the values corresponding to the attributes and also its type, name e.t.c You can also visualize the frequency distribution of all the attributes at a time by clicking on the “Visualize All” button. It shows the following screen.
  • 5. This visualizes all shows the range of data for each attribute and also the mean, median and frequency of each attribute. For example the value of age in our case is ranging from 18 to 67 with an average of 42.5 Step 5: Classify the Test data To do this select the classify button which shows the following screen. Then select the J48 algorithm which is under the node of tree when you click on the choose button. This will show the following screen.
  • 6. Step 6: Run the classification Algorithm Select the dependent variable that should be classified and click on the start. This shows the output in the classifier output panel in ASCII version of the tree. This is difficult to understand. To view the output in the form of tree, right click on the trees.j48 and select “visualize tree” option. This shows the following screen by again right clicking on the output and selecting full screen option.
  • 7. Step 7: Analyze the model created by existing data From the Classifier output we can find that the Classification accuracy of the model is 89%. This means that the model is able to predict the values 89% correctly. So if we use the same model to find out the buying decision of new customer the probability will be 0.89 Step 8: Test the New customer data Create your new customer data in arff or csv format with the same attributes as test data. Now input the data by checking the radio button “Supplied test set” and click on “ set” to browse for the new data set.
  • 8. Then click on the start button which generates a new tree. Save the classification result as arff. This file contains a copy of the new instances along with an additional column for the predicted value. The result will look like following.
  • 9. Regression Using WEKA Problem: The idea is to find out how the CPU performance is correlated with the attributes like machine cycle time, minimum main memory, cache memory e.t.c A regression is a statistic tool that helps in finding out how the dependent variable (CPU performance) is related to the independent attributes. Steps to do Regression in WEKA Step 1: Create data file and open the WEKA as in the same way as we did for Classification. Step 2: Load the regression data file CPU.arff into weka. Click on open file and browse for the file, that shows the following screen Step 3: Run the regression Click on the Classify tab and choose “Linear Regression” from the node under function. This shows the following screen.
  • 10. Click on start that will show output in the classifier output screen which gives a regression equation.
  • 11. Interpretation of the output: The CPU performance is more dependent on CHMAX and then CACHE The correlation coefficient of 0.912 is very high, its output suggests that the dependent variable is strongly associated with the independent variables. We can also determine the new CPU performance by using the regression equation if we have the values of the attributes.