SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Incremental Learning using WEKA
CS267: Data Mining Presentation
Guided By: Dr. Tran
- Rohit Vobbilisetty
 WEKA - Definition
 Incremental Learning – Definition
 Incremental Learning in WEKA
 Steps to train an UpdateableClassifier
 Stochastic Gradient Descent
 Sample Code, Result and Demo
Overview
 Weka (Waikato Environment for Knowledge Analysis)
is a collection of machine learning algorithms for data
mining tasks.
 Weka 3.7 (Developer version)
What is WEKA ?
 Train the Model for each Instance within the dataset
 Suitable when dealing with large datasets, which do not fit
into the computer’s memory.
Incremental Learning
Definition and Need
 Applicable to Models implementing the interface:
weka.classifiers.UpdateableClassifier
(http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas
sifier.html)
 Models implementing this interface:
HoeffdingTree, Ibk, KStar , LWL,
MultiClassClassifierUpdateable, NaiveBayesMultinomialText,
NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable,
SGD, SGDText
Incremental Learning - Weka
 Initialize an object of ArffLoader.
 Retrieve this object’s structure and set it’s class index
(The feature that needs to be predicted –
setClassIndex() ).
 Iteratively retrieve an instance from the training set
and update the classifier ( updateClassifier() ).
 Evaluate the trained model against the test dataset.
Step to train an
UpdateableClassifier()
 Stochastic gradient descent is a gradient descent
optimization method for minimizing an objective
function that is written as a sum of differentiable
functions.
 Applicable to large datasets, since each iteration
involves processing only a single instance of the
training dataset.
Stochastic Gradient Descent
w: Parameter to be estimated.
Qi(w): A single instance of data
 Name: vote.arff ( 17 features )
 Features:
 Class Name: 2 (democrat, republican)
 handicapped-infants: 2 (y,n)
 water-project-cost-sharing: 2 (y,n)
 adoption-of-the-budget-resolution: 2 (y,n)
 physician-fee-freeze: 2 (y,n)
 el-salvador-aid: 2 (y,n)
 religious-groups-in-schools: 2 (y,n)
 anti-satellite-test-ban: 2 (y,n)
 aid-to-nicaraguan-contras: 2 (y,n)
 mx-missile: 2 (y,n)
 immigration: 2 (y,n)
 synfuels-corporation-cutback: 2 (y,n)
 education-spending: 2 (y,n)
 superfund-right-to-sue: 2 (y,n)
 crime: 2 (y,n)
 duty-free-exports: 2 (y,n)
 export-administration-act-south-africa: 2 (y,n)
Sample DataSet Description
ArffLoader loader = new ArffLoader();
loader.setFile(new File(“Training File Path”));
Instances structure = loader.getStructure();
SGD classifier = new SGD(); // Configure the classifier
classifier.setEpochs(500);
classifier.setEpsilon(0.001);
// Required if dealing with binary class
classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION));
structure.setClassIndex(16); // Set the feature to be predicted
classifier.buildClassifier(structure);
Instance current;
// Incrementally update the Classifier
while ((current = loader.getNextInstance(structure)) != null) {
((UpdateableClassifier)classifier).updateClassifier(current);
}
Sample Code - SGD
Class =
-0.26 handicapped-infants
+ -0.09 water-project-cost-sharing
+ -0.51 adoption-of-the-budget-resolution
+ 0.73 physician-fee-freeze
+ 0.33 el-salvador-aid
+ 0.04 religious-groups-in-schools
+ -0.14 anti-satellite-test-ban
+ -0.33 aid-to-nicaraguan-contras
+ -0.28 mx-missile
+ 0.1 immigration
+ -0.37 synfuels-corporation-cutback
+ 0.33 education-spending
+ 0.15 superfund-right-to-sue
+ 0.18 crime
+ -0.25 duty-free-exports
+ 0.02 export-administration-act-south-africa
- 0.11
Sample Output
Correctly Classified Instances 401 92.1839 %
Incorrectly Classified Instances 34 7.8161 %
Kappa statistic 0.838
Mean absolute error 0.0782
Root mean squared error 0.2796
Relative absolute error 16.482 %
Root relative squared error 57.4214 %
Coverage of cases (0.95 level) 92.1839 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 435
Confusion Matrix:
242.0 25.0
9.0 159.0
 SGD class does not support Numeric data types,
unless it is configured to use Huber Loss or Square
Loss.
 The learning rate should not be too small (Slow
process) or large (Overshoot the minimum).
 Some errors had to be resolved by consulting the
WEKA Java code.
Challenges Faced
 Wikipedia:
http://en.wikipedia.org/wiki/Stochastic_gradient_desc
ent
 Weka Wiki
http://weka.wikispaces.com/Use+Weka+in+your+Java
+code
References
Thank You

Weitere ähnliche Inhalte

Ähnlich wie Incremental Learning using WEKA

Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一scalaconfjp
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbLucidworks
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Training course lect3
Training course lect3Training course lect3
Training course lect3Noor Dhiya
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management systemClusterpoint
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applicationsRobert Sanders
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoDatabricks
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013BertrandDrouvot
 
Clustering
ClusteringClustering
ClusteringMeme Hei
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online ClassificationPrasad Chalasani
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 

Ähnlich wie Incremental Learning using WEKA (20)

Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Weka library, JAVA
Weka library, JAVAWeka library, JAVA
Weka library, JAVA
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Training course lect3
Training course lect3Training course lect3
Training course lect3
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Javascript & SQL within database management system
Javascript & SQL within database management systemJavascript & SQL within database management system
Javascript & SQL within database management system
 
First fare 2010 java-beta-2011
First fare 2010 java-beta-2011First fare 2010 java-beta-2011
First fare 2010 java-beta-2011
 
Productionalizing spark streaming applications
Productionalizing spark streaming applicationsProductionalizing spark streaming applications
Productionalizing spark streaming applications
 
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei DiaoTowards a Unified Data Analytics Optimizer with Yanlei Diao
Towards a Unified Data Analytics Optimizer with Yanlei Diao
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Clustering
ClusteringClustering
Clustering
 
Fast Distributed Online Classification
Fast Distributed Online ClassificationFast Distributed Online Classification
Fast Distributed Online Classification
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Prashant Kumar
Prashant KumarPrashant Kumar
Prashant Kumar
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 

Kürzlich hochgeladen

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projectssmsksolar
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 

Kürzlich hochgeladen (20)

Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 

Incremental Learning using WEKA

  • 1. Incremental Learning using WEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty
  • 2.  WEKA - Definition  Incremental Learning – Definition  Incremental Learning in WEKA  Steps to train an UpdateableClassifier  Stochastic Gradient Descent  Sample Code, Result and Demo Overview
  • 3.  Weka (Waikato Environment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.  Weka 3.7 (Developer version) What is WEKA ?
  • 4.  Train the Model for each Instance within the dataset  Suitable when dealing with large datasets, which do not fit into the computer’s memory. Incremental Learning Definition and Need
  • 5.  Applicable to Models implementing the interface: weka.classifiers.UpdateableClassifier (http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas sifier.html)  Models implementing this interface: HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText Incremental Learning - Weka
  • 6.  Initialize an object of ArffLoader.  Retrieve this object’s structure and set it’s class index (The feature that needs to be predicted – setClassIndex() ).  Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).  Evaluate the trained model against the test dataset. Step to train an UpdateableClassifier()
  • 7.  Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.  Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset. Stochastic Gradient Descent w: Parameter to be estimated. Qi(w): A single instance of data
  • 8.  Name: vote.arff ( 17 features )  Features:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Sample DataSet Description
  • 9. ArffLoader loader = new ArffLoader(); loader.setFile(new File(“Training File Path”)); Instances structure = loader.getStructure(); SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500); classifier.setEpsilon(0.001); // Required if dealing with binary class classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION)); structure.setClassIndex(16); // Set the feature to be predicted classifier.buildClassifier(structure); Instance current; // Incrementally update the Classifier while ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current); } Sample Code - SGD
  • 10. Class = -0.26 handicapped-infants + -0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11 Sample Output Correctly Classified Instances 401 92.1839 % Incorrectly Classified Instances 34 7.8161 % Kappa statistic 0.838 Mean absolute error 0.0782 Root mean squared error 0.2796 Relative absolute error 16.482 % Root relative squared error 57.4214 % Coverage of cases (0.95 level) 92.1839 % Mean rel. region size (0.95 level) 50 % Total Number of Instances 435 Confusion Matrix: 242.0 25.0 9.0 159.0
  • 11.  SGD class does not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.  The learning rate should not be too small (Slow process) or large (Overshoot the minimum).  Some errors had to be resolved by consulting the WEKA Java code. Challenges Faced
  • 12.  Wikipedia: http://en.wikipedia.org/wiki/Stochastic_gradient_desc ent  Weka Wiki http://weka.wikispaces.com/Use+Weka+in+your+Java +code References