SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
USING WEKA TO CLUSTERING AND
     REGRESSION ANALYSIS
                 ( ITB PAPER )




          ANURADHA CHAKRABORTY
              ROLL NO: 10BM60014




  VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR
WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine
learning software written in Java, developed at the University of Waikato, New Zealand. WEKA
is free software available under the GNU General Public License. WEKA is a unique software
compared to MS –EXCEL because it can be used to run multivariate regression without any
hassles. It also gives output showing dependent variable equation and other statistical data.

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for
data pre-processing, classification, regression, clustering, association rules, and visualization. It
is also well-suited for developing new machine learning schemes.

The initial versions of WEKA used only Attribute Relationship File Format (ARFF) files, saved
as *.arff. But newer versions provide an option for multiple versions like: XRFF, Binary serial
files, LIBSVM, SVM Light, CSV, C4.5 among others.

USING WEKA:

The WEKA GUI Chooser has the four following options:
   1. Weka Explorer
   2. Weka Experimenter
   3. Weka Knowledge Flow
   4. Simple CLI




Weka Explorer has the following options in each tabs:
  1. Preprocess
  2. Classify
  3. Cluster
  4. Associate
  5. Select Attributes
  6. Visualize
Apart from doing these statistical operations, each of the data can be visualized graphically and
filtered according to requirement.




Weka Experimenter:
There are several algorithms for each process. Thus the criticality of the software lies in
identifying the optimal algorithm. For Regression and classification, Experimenter gives a
comparisn of the best algorithm by statistical analysis. Unfortunately, such an option is not there
for Clustering algorithms.

Import of data:
Data is imported in form of CSV file which is converted into arff format automatically while
importing. The data is imported through Preprocess tab of WEKA as shown in picture above.



                                      CLUSTERING
Definition: Cluster analysis is a class of statistical techniques that can be applied to data that
exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into
clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a
cluster are similar to each other. They are also dissimilar to objects outside the cluster,
particularly objects in other clusters.”
DATA SET USED FOR CLUSTERING

The example used is a survey report on instant noodles. It had:
Instances: 76
Attribute: 33

The questions or attributes were as follows:
Age
Profession
Diabetesstop
Obesitystop
Otherstop
Cadburynchocl
Homemadesweets
Sweetfrmshop
Cakepastry
Sugarcube
Celebration
Gifts
Beginningauspicious
Yummyfood
Healthconcern
Lunchdinnerafter
Tastytraditn
Abroad
Frequencyeating
Inflnearby
Inflfrndrelative
Inflblogonline
Advert
Quality
Packaging
Ambience
Price
Imptraditonsweet
 Newexperimentswt
 Newvariety
 Homedeliveryimp
 Impchitchatplace
 Packagdsweetslngtime
PROCEDURE AND RESULT:

Data-set is taken from my AMRP project survey, regarding the interest and motivation of
consumers towards traditional sweets.

Simple K-Mean Algorithm was used to cluster the data set.

The output is as follows:

 Attribute        Full Data    0         1
                    (76)     (44)       (32)
 =======================================================
 Age               1.6711    1.6364   1.7188
 Profession          1.7632  1.6818   1.875
 Diabetesstop        2.3553   2.3636  2.3438
 Obesitystop         1.9605   1.9545  1.9688
 otherstop           1.9474  1.8636   2.0625
 Cadburynchocl       4.2895   4.25    4.3438
 homemadesweets       4.3421   4.3636  4.3125
 sweetfrmshop         4.0395   4.1136  3.9375
 cakepastry          3.9342    4.0455  3.7813
 sugarcube           2.4605    2.5    2.4063
 celebration          4.1447   4.3409 3.875
 gifts               3.7632   3.7955   3.7188
 beginningauspicious 3.7763    3.8636  3.6563
 yummyfood           3.8158    3.9318  3.6563
 healthconcern       2.9868    3       2.9688
 lunchdinnerafter    3.9737   4.0909   3.8125
 tastytraditn        3.7632   4.0227   3.4063
 abroad              1.8684   1.8864   1.8438
 frequencyeating     2.5658    2.4318   2.75
 inflnearby            3.0       4.0    3.0
 inflfrndrelative      4.0        4.0    3.0
 inflblogonline        3.0        3.0    2.0
 advert                3.0        3.0     2.0
 quality               5.0       5.0      5.0
 packaging            3.0        3.0       4.0
 ambience              3.0        3.0      4.0
 price                 3.0       4.0      3.0
 imptraditonsweet      5.0       5.0      3.0
 newexperimentswt 3.0            3.0      3.0
 newvariety            3.0      3.0       4.0
 homedeliveryimp 2.8158       2.8409      2.7813
 impchitchatplace 3.3421     3.3182        3.375
packagdsweetslngtime        3.1579        3     3.375

Note: The significant values in the above table, on which the cluster characteristics are formed,
are marked with red.

Clustered Instances

0    44 ( 58%)
1    32 ( 42%)


INTERPRETATION:

ASPECTS                        CLUSTER ‘0’                            CLUSTER ‘1’
Traditionality                 Loves traditional sweets.              Loves experiments and newer
                               Considers     sweet     as   a         variety of sweets
                               traditional symbol. Wants
                               sweet after lunch or dinner.
Frequency of consumption       High                                   Medium
Price                          More price sensitive                   Lesser price sensitive.
Influnce by friends and High                                          Medium. Generally tries new
relatives or advertisements to                                        shop by own instinct.
try a new shop
Ambience of shop and Matters less                                     Matters significantly.
packaging
Food Court for chatting (Like preferred                               prefered
Haldiram)
Packaged/ tinned sweets        Medium                                 Good Demand


INFERENCE AND SUGGESTION DERIVED FROM THE CLUSTERING:

There are two distinct clusters of consumers in the sweet industry.

Cluster ‘0’ (58%) considers sweet as the “symbol of tradition”, which is typically savored
after lunch and dinner. They enjoy the most traditional sweets, and don’t prefer to try new
variants. They prefer sticking to old shops unless inspired by external agents (friends/ relatives/
blog/ advertisements etc) to try otherwise. Quality is an important factor. But ambience and
packaging doesn’t play a major role. So, shops like Nokur or Girish Dey will be their typical
favorite ones.

Cluster ‘1’(42%) are the true connoisseurs of sweets. They appreciate both traditional as well
as experimental sweets (the new variants). They often prefer trying out new shops and
brands. Packaged sweets are also preferred which can be savored later. Apart from quality,
ambience and packaging plays a vital role, where as price is of medium importance. This
cluster seems to be more impulsive consumers, and would probably not mind paying a premium
for some new and creative sweets. So, brands like K.C. Das will be their preferred choice.


                               REGRESSION
The next procedure is regression analysis.

We obtain data from stores on monthly sales of a celebration chocolate pack depending on the
amount spent on its promotion in terms of posters used around the block or any other effort .

Here after we select all attributes and go to classify tab and run regression function.




OUTPUT

The output obtained is given below
= Run information ===

Scheme:    weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation: Problem_2-weka.filters.unsupervised.attribute.Remove-R1
Instances: 46
Attributes: 3
         Sales
         Price
         Promotion
Test mode: split 80.0% train, remainder test

=== Classifier model (full training set) ===


Linear Regression Model

Sales = -53.2173 * Price +       3.6131 * Promotion + 5837.5208

Time taken to build model: 0 seconds

=== Evaluation on test split ===
=== Summary ===

Correlation coefficient        0.8066
Mean absolute error          543.6332
Root mean squared error         711.4575
Relative absolute error       48.288 %
Root relative squared error     59.6886 %
Total Number of Instances         5
Ignored Class Unknown Instances          4


INTERPRETETION


The given data shows correlation coefficient of 0.8066 which means 65% accuracy of the model.
As expected we find that sales will decrease due to increase in price and increase with increase in
promotion budget.
This explains how WEKA can be used for multivariate regression .



REFERENCE

http://en.wikipedia.org/wiki/Weka_(machine_learning)

http://www.cs.waikato.ac.nz/ml/weka/

http://en.wikipedia.org/wiki/Cluster_analysis_(in_marketing)

Weitere ähnliche Inhalte

Ähnlich wie Weka for clustering and regression itb vgsom

The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxssusera34210
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...Merlien Institute
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handlinghiratufail
 
Strategic Tools- Walmart
Strategic Tools- WalmartStrategic Tools- Walmart
Strategic Tools- WalmartSara Abdelaal
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docxblondellchancy
 
Process Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data MiningProcess Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data MiningWil van der Aalst
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_miningMuhammad Ajmal
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalPrabhat Agarwal
 
Ingredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engineIngredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engineBharat Gandhi
 
ADVANCED SPREADSHEET SKILLS.pptx
ADVANCED SPREADSHEET SKILLS.pptxADVANCED SPREADSHEET SKILLS.pptx
ADVANCED SPREADSHEET SKILLS.pptxROWELTREYES
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentDaria Bogdanova
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Power line business overview
Power line business overviewPower line business overview
Power line business overviewbestwebsite2008
 
Less is more: Household milk allocation response to price change in peri-urba...
Less is more: Household milk allocation response to price change in peri-urba...Less is more: Household milk allocation response to price change in peri-urba...
Less is more: Household milk allocation response to price change in peri-urba...ILRI
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Compusense Inc.
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 

Ähnlich wie Weka for clustering and regression itb vgsom (20)

The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...
Insights from Sensory Research - How this Leads to Fresh Ideas and Innovation...
 
Soft And Handling
Soft And HandlingSoft And Handling
Soft And Handling
 
Strategic Tools- Walmart
Strategic Tools- WalmartStrategic Tools- Walmart
Strategic Tools- Walmart
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
6MODULE 2Module 2 Problem SetEXAMPLEGrand .docx
 
Process Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data MiningProcess Mining - Chapter 3 - Data Mining
Process Mining - Chapter 3 - Data Mining
 
Process mining chapter_03_data_mining
Process mining chapter_03_data_miningProcess mining chapter_03_data_mining
Process mining chapter_03_data_mining
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat Agarwal
 
Ingredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engineIngredients based - Recipe recommendation engine
Ingredients based - Recipe recommendation engine
 
ADVANCED SPREADSHEET SKILLS.pptx
ADVANCED SPREADSHEET SKILLS.pptxADVANCED SPREADSHEET SKILLS.pptx
ADVANCED SPREADSHEET SKILLS.pptx
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignment
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Power line business overview
Power line business overviewPower line business overview
Power line business overview
 
Less is more: Household milk allocation response to price change in peri-urba...
Less is more: Household milk allocation response to price change in peri-urba...Less is more: Household milk allocation response to price change in peri-urba...
Less is more: Household milk allocation response to price change in peri-urba...
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...Existing and new approaches for analysing data from Check All That Apply ques...
Existing and new approaches for analysing data from Check All That Apply ques...
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Weka for clustering and regression itb vgsom

  • 1. USING WEKA TO CLUSTERING AND REGRESSION ANALYSIS ( ITB PAPER ) ANURADHA CHAKRABORTY ROLL NO: 10BM60014 VINOD GUPTA SCHOOL OF MANAGEMENT, IIT KHARAGPUR
  • 2. WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. WEKA is free software available under the GNU General Public License. WEKA is a unique software compared to MS –EXCEL because it can be used to run multivariate regression without any hassles. It also gives output showing dependent variable equation and other statistical data. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. The initial versions of WEKA used only Attribute Relationship File Format (ARFF) files, saved as *.arff. But newer versions provide an option for multiple versions like: XRFF, Binary serial files, LIBSVM, SVM Light, CSV, C4.5 among others. USING WEKA: The WEKA GUI Chooser has the four following options: 1. Weka Explorer 2. Weka Experimenter 3. Weka Knowledge Flow 4. Simple CLI Weka Explorer has the following options in each tabs: 1. Preprocess 2. Classify 3. Cluster 4. Associate 5. Select Attributes 6. Visualize
  • 3. Apart from doing these statistical operations, each of the data can be visualized graphically and filtered according to requirement. Weka Experimenter: There are several algorithms for each process. Thus the criticality of the software lies in identifying the optimal algorithm. For Regression and classification, Experimenter gives a comparisn of the best algorithm by statistical analysis. Unfortunately, such an option is not there for Clustering algorithms. Import of data: Data is imported in form of CSV file which is converted into arff format automatically while importing. The data is imported through Preprocess tab of WEKA as shown in picture above. CLUSTERING Definition: Cluster analysis is a class of statistical techniques that can be applied to data that exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.”
  • 4. DATA SET USED FOR CLUSTERING The example used is a survey report on instant noodles. It had: Instances: 76 Attribute: 33 The questions or attributes were as follows: Age Profession Diabetesstop Obesitystop Otherstop Cadburynchocl Homemadesweets Sweetfrmshop Cakepastry Sugarcube Celebration Gifts Beginningauspicious Yummyfood Healthconcern Lunchdinnerafter Tastytraditn Abroad Frequencyeating Inflnearby Inflfrndrelative Inflblogonline Advert Quality Packaging Ambience Price Imptraditonsweet Newexperimentswt Newvariety Homedeliveryimp Impchitchatplace Packagdsweetslngtime
  • 5. PROCEDURE AND RESULT: Data-set is taken from my AMRP project survey, regarding the interest and motivation of consumers towards traditional sweets. Simple K-Mean Algorithm was used to cluster the data set. The output is as follows: Attribute Full Data 0 1 (76) (44) (32) ======================================================= Age 1.6711 1.6364 1.7188 Profession 1.7632 1.6818 1.875 Diabetesstop 2.3553 2.3636 2.3438 Obesitystop 1.9605 1.9545 1.9688 otherstop 1.9474 1.8636 2.0625 Cadburynchocl 4.2895 4.25 4.3438 homemadesweets 4.3421 4.3636 4.3125 sweetfrmshop 4.0395 4.1136 3.9375 cakepastry 3.9342 4.0455 3.7813 sugarcube 2.4605 2.5 2.4063 celebration 4.1447 4.3409 3.875 gifts 3.7632 3.7955 3.7188 beginningauspicious 3.7763 3.8636 3.6563 yummyfood 3.8158 3.9318 3.6563 healthconcern 2.9868 3 2.9688 lunchdinnerafter 3.9737 4.0909 3.8125 tastytraditn 3.7632 4.0227 3.4063 abroad 1.8684 1.8864 1.8438 frequencyeating 2.5658 2.4318 2.75 inflnearby 3.0 4.0 3.0 inflfrndrelative 4.0 4.0 3.0 inflblogonline 3.0 3.0 2.0 advert 3.0 3.0 2.0 quality 5.0 5.0 5.0 packaging 3.0 3.0 4.0 ambience 3.0 3.0 4.0 price 3.0 4.0 3.0 imptraditonsweet 5.0 5.0 3.0 newexperimentswt 3.0 3.0 3.0 newvariety 3.0 3.0 4.0 homedeliveryimp 2.8158 2.8409 2.7813 impchitchatplace 3.3421 3.3182 3.375
  • 6. packagdsweetslngtime 3.1579 3 3.375 Note: The significant values in the above table, on which the cluster characteristics are formed, are marked with red. Clustered Instances 0 44 ( 58%) 1 32 ( 42%) INTERPRETATION: ASPECTS CLUSTER ‘0’ CLUSTER ‘1’ Traditionality Loves traditional sweets. Loves experiments and newer Considers sweet as a variety of sweets traditional symbol. Wants sweet after lunch or dinner. Frequency of consumption High Medium Price More price sensitive Lesser price sensitive. Influnce by friends and High Medium. Generally tries new relatives or advertisements to shop by own instinct. try a new shop Ambience of shop and Matters less Matters significantly. packaging Food Court for chatting (Like preferred prefered Haldiram) Packaged/ tinned sweets Medium Good Demand INFERENCE AND SUGGESTION DERIVED FROM THE CLUSTERING: There are two distinct clusters of consumers in the sweet industry. Cluster ‘0’ (58%) considers sweet as the “symbol of tradition”, which is typically savored after lunch and dinner. They enjoy the most traditional sweets, and don’t prefer to try new variants. They prefer sticking to old shops unless inspired by external agents (friends/ relatives/ blog/ advertisements etc) to try otherwise. Quality is an important factor. But ambience and packaging doesn’t play a major role. So, shops like Nokur or Girish Dey will be their typical favorite ones. Cluster ‘1’(42%) are the true connoisseurs of sweets. They appreciate both traditional as well as experimental sweets (the new variants). They often prefer trying out new shops and brands. Packaged sweets are also preferred which can be savored later. Apart from quality, ambience and packaging plays a vital role, where as price is of medium importance. This
  • 7. cluster seems to be more impulsive consumers, and would probably not mind paying a premium for some new and creative sweets. So, brands like K.C. Das will be their preferred choice. REGRESSION The next procedure is regression analysis. We obtain data from stores on monthly sales of a celebration chocolate pack depending on the amount spent on its promotion in terms of posters used around the block or any other effort . Here after we select all attributes and go to classify tab and run regression function. OUTPUT The output obtained is given below = Run information === Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 Relation: Problem_2-weka.filters.unsupervised.attribute.Remove-R1 Instances: 46
  • 8. Attributes: 3 Sales Price Promotion Test mode: split 80.0% train, remainder test === Classifier model (full training set) === Linear Regression Model Sales = -53.2173 * Price + 3.6131 * Promotion + 5837.5208 Time taken to build model: 0 seconds === Evaluation on test split === === Summary === Correlation coefficient 0.8066 Mean absolute error 543.6332 Root mean squared error 711.4575 Relative absolute error 48.288 % Root relative squared error 59.6886 % Total Number of Instances 5 Ignored Class Unknown Instances 4 INTERPRETETION The given data shows correlation coefficient of 0.8066 which means 65% accuracy of the model. As expected we find that sales will decrease due to increase in price and increase with increase in promotion budget. This explains how WEKA can be used for multivariate regression . REFERENCE http://en.wikipedia.org/wiki/Weka_(machine_learning) http://www.cs.waikato.ac.nz/ml/weka/ http://en.wikipedia.org/wiki/Cluster_analysis_(in_marketing)