SlideShare ist ein Scribd-Unternehmen logo
1 von 19
WEKA Training Software
Content
 What is WEKA?
 Data set in WEKA
 The Explorer :
 Preprocess data
 Classification
 Clustering
 Association Rules
 Attribute Selection
 Data Visualization
 References and Resources
2
What is WEKA?
 Waikato Environment for Knowledge Analysis
 It’s a data mining/machine learning tool developed by Department of
 Computer Science, University of Waikato, New Zealand.
 Weka is a collection of machine learning algorithms for data mining tasks.
 Weka is open source software issued under the GNU General Public
License.
3
Main Features
 49 data preprocessing tools
 76 classification/regression algorithms
 8 clustering algorithms
 3 algorithms for finding association rules
 15 attribute/subset evaluators + 10 search algorithms
for feature selection
4
Main GUI
 Three graphical user interfaces
 “The Explorer” (exploratory data analysis)
 “The Experimenter” (experimental environment)
 “The KnowledgeFlow” (new process model inspired interface)
 Simple CLI (Command prompt)
Offers some functionality not available via the GUI1
5
1.Graphic User Interface
Datasets in Weka
 Each entry in a dataset is an instance of the java class:
 weka.core.Instance
 Each instance consists of a number of attributes
 Nominal: one of a predefined list of values
e.g. red, green, blue
 Numeric: A real or integer number
 String: Enclosed in “double quotes”
 Date
 Relational
ARFF File
Weka wants its input data in ARFF format.
@relation weather
@attribute outlook { sunny, overcast, rainy }
@attribute temperature numeric
@attribute humidity numeric
@attribute windy { TRUE, FALSE }
@attribute play { yes, no }
@data
sunny, 85, 85, FALSE, no
sunny, 80, 90, TRUE, no
overcast, 83, 86, FALSE, yes
rainy, 70, 96, FALSE, yes
rainy, 68, 80, FALSE, yes
rainy, 65, 70, TRUE, no
overcast, 64, 65, TRUE, yes
sunny, 72, 95, FALSE, no
sunny, 69, 70, FALSE, yes
rainy, 75, 80, FALSE, yes
sunny, 75, 70, TRUE, yes
overcast, 72, 90, TRUE, yes
overcast, 81, 75, FALSE, yes
rainy, 71, 91, TRUE, no
WEKA : Explorer
 Preprocess : Choose and modify the data being acted on.
 Classify: Train and test learning schemes that classify or perform regression.
 Cluster : Learn clusters for the data.
 Associate : Learn association rules for the data.
 Select attributes : Select the most relevant attributes in the data.
 Visualize : View an interactive 2D plot of the data.
07/11/17 10
analyze
load
filter
classifiers
• Algorithm for Decision Tree Induction
OutLookOutLook
overcastovercast
HumidityHumidity
= Sunny= Sunny = Rainy= Rainy
nono yesyes
yesyes
= Normal= Normal= High= High
example
WindyWindy
yesyesnono
=False=False=True=True
Explorer : clustering data
14
Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6
-S 100
Relation: weather.symbolic
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode : evaluate on training data
Cluster
Attribute 0
(1)
outlook
sunny 6
overcast 5
rainy 6
[total] 17
temperature
hot 5
mild 7
cool 5
[total] 17
Cluster
humidity
high 8
normal 8
[total] 16
windy
TRUE 7
FALSE 9
[total] 16
play
yes 10
no 6
[total] 16
Associate
Generated sets of large item sets
Best rules found
Panel that can be used to investigate which (subsets of) attributes are
the most predictive ones
Attribute selection methods contain two parts:
A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking
An evaluation method: correlation-based, wrapper, information gain, chi-squared, …
Very flexible: WEKA allows (almost) arbitrary combinations of these two
Explorer : attribute selection
Explorer : data visualization
 Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem
 WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
 Color-coded class values
 “Jitter” option to deal with nominal attributes (and to detect “hidden” data points)
 “Zoom-in” function
 References:
 WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html
 WEKA Tutorial:
 Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces
(GUI) in Weka.
 A presentation which explains how to use Weka for exploratory data mining.
 WEKA Data Mining Book:
 Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and
Techniques (Second Edition)
 WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page
 Others:
 Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.
You Have Question ?You Have Question ?

Weitere ähnliche Inhalte

Was ist angesagt?

Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Evolutionary Algorithms
Evolutionary AlgorithmsEvolutionary Algorithms
Evolutionary AlgorithmsReem Alattas
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Wekaweka Content
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classificationDr-Dipali Meher
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networksguestfee8698
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Particle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer VisionParticle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer Visionzukun
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule miningDeepa Jeya
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
DATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxDATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxPraneethBhai1
 

Was ist angesagt? (20)

Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Evolutionary Algorithms
Evolutionary AlgorithmsEvolutionary Algorithms
Evolutionary Algorithms
 
Random number generator
Random number generatorRandom number generator
Random number generator
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Back propagation
Back propagationBack propagation
Back propagation
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Anfis (1)
Anfis (1)Anfis (1)
Anfis (1)
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Particle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer VisionParticle Filters and Applications in Computer Vision
Particle Filters and Applications in Computer Vision
 
Eclat algorithm in association rule mining
Eclat algorithm in association rule miningEclat algorithm in association rule mining
Eclat algorithm in association rule mining
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
DATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxDATA VISUALIZATION.pptx
DATA VISUALIZATION.pptx
 

Ähnlich wie data mining with weka application

wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfDr. Rajesh P Barnwal
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptradhikadsu
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningKeshab Kumar Gaurav
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionSiddharth Shrivastava
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8Roger Barga
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 

Ähnlich wie data mining with weka application (20)

wekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdfwekapresentation-130107115704-phpapp02.pdf
wekapresentation-130107115704-phpapp02.pdf
 
Weka
Weka Weka
Weka
 
Weka
WekaWeka
Weka
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Introduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.pptIntroduction to Weka and Preprocessing.ppt
Introduction to Weka and Preprocessing.ppt
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data miningWeka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Wek1
Wek1Wek1
Wek1
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
Barga Data Science lecture 8
Barga Data Science lecture 8Barga Data Science lecture 8
Barga Data Science lecture 8
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Shraddha weka
Shraddha wekaShraddha weka
Shraddha weka
 
Shraddha weka
Shraddha wekaShraddha weka
Shraddha weka
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

data mining with weka application

  • 2. Content  What is WEKA?  Data set in WEKA  The Explorer :  Preprocess data  Classification  Clustering  Association Rules  Attribute Selection  Data Visualization  References and Resources 2
  • 3. What is WEKA?  Waikato Environment for Knowledge Analysis  It’s a data mining/machine learning tool developed by Department of  Computer Science, University of Waikato, New Zealand.  Weka is a collection of machine learning algorithms for data mining tasks.  Weka is open source software issued under the GNU General Public License. 3
  • 4. Main Features  49 data preprocessing tools  76 classification/regression algorithms  8 clustering algorithms  3 algorithms for finding association rules  15 attribute/subset evaluators + 10 search algorithms for feature selection 4
  • 5. Main GUI  Three graphical user interfaces  “The Explorer” (exploratory data analysis)  “The Experimenter” (experimental environment)  “The KnowledgeFlow” (new process model inspired interface)  Simple CLI (Command prompt) Offers some functionality not available via the GUI1 5 1.Graphic User Interface
  • 6.
  • 7. Datasets in Weka  Each entry in a dataset is an instance of the java class:  weka.core.Instance  Each instance consists of a number of attributes  Nominal: one of a predefined list of values e.g. red, green, blue  Numeric: A real or integer number  String: Enclosed in “double quotes”  Date  Relational
  • 8. ARFF File Weka wants its input data in ARFF format. @relation weather @attribute outlook { sunny, overcast, rainy } @attribute temperature numeric @attribute humidity numeric @attribute windy { TRUE, FALSE } @attribute play { yes, no } @data sunny, 85, 85, FALSE, no sunny, 80, 90, TRUE, no overcast, 83, 86, FALSE, yes rainy, 70, 96, FALSE, yes rainy, 68, 80, FALSE, yes rainy, 65, 70, TRUE, no overcast, 64, 65, TRUE, yes sunny, 72, 95, FALSE, no sunny, 69, 70, FALSE, yes rainy, 75, 80, FALSE, yes sunny, 75, 70, TRUE, yes overcast, 72, 90, TRUE, yes overcast, 81, 75, FALSE, yes rainy, 71, 91, TRUE, no
  • 9. WEKA : Explorer  Preprocess : Choose and modify the data being acted on.  Classify: Train and test learning schemes that classify or perform regression.  Cluster : Learn clusters for the data.  Associate : Learn association rules for the data.  Select attributes : Select the most relevant attributes in the data.  Visualize : View an interactive 2D plot of the data.
  • 11.
  • 12. classifiers • Algorithm for Decision Tree Induction OutLookOutLook overcastovercast HumidityHumidity = Sunny= Sunny = Rainy= Rainy nono yesyes yesyes = Normal= Normal= High= High example WindyWindy yesyesnono =False=False=True=True
  • 13.
  • 14. Explorer : clustering data 14 Scheme:weka.clusterers.EM -I 100 -N -1 -M 1.0E-6 -S 100 Relation: weather.symbolic Instances: 14 Attributes: 5 outlook temperature humidity windy play Test mode : evaluate on training data Cluster Attribute 0 (1) outlook sunny 6 overcast 5 rainy 6 [total] 17 temperature hot 5 mild 7 cool 5 [total] 17 Cluster humidity high 8 normal 8 [total] 16 windy TRUE 7 FALSE 9 [total] 16 play yes 10 no 6 [total] 16
  • 15. Associate Generated sets of large item sets Best rules found
  • 16. Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two Explorer : attribute selection Explorer : data visualization  Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem  WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style)  Color-coded class values  “Jitter” option to deal with nominal attributes (and to detect “hidden” data points)  “Zoom-in” function
  • 17.
  • 18.  References:  WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html  WEKA Tutorial:  Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces (GUI) in Weka.  A presentation which explains how to use Weka for exploratory data mining.  WEKA Data Mining Book:  Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)  WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page  Others:  Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.
  • 19. You Have Question ?You Have Question ?