HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

•

1 gefällt mir•437 views

Albert Orriols-Puig

Bildung

Genetic-based Synthetic Data
Sets for the A l i f
S t f th Analysis of
Classifiers Behavior
8th I t
International Conference on Hybrid Intelligent Systems
ti lC f H b id I t lli tS t

Núria Macià
Albert Orriols-Puig
Alb t O i l P i
Ester Bernadó-Mansilla
{nmacia,aorriols,esterb}@salle.url.edu

Grup de Recerca en Sistemes Intel·ligents
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull

Motivation

Knowledge
Data Set Model
Extraction
Real-world
Learner
problem
+
Prediction
Necessity of synthetic data sets
To evaluate real learners performance under
controlled scenarios
How to generate synthetic data sets?
Data complexity (Ho & Basu, 2002)
Length of the class boundary (Macià et al., 2008)

Objective: Set of benchmark problems to analyze
learners behavior
Overview and Future Research Slide 2

Outline
1.
1 Data complexity
2. Synthetic data sets
3. Design of GA
4.
4 Experiments and results
5. Conclusions and further work

Overview and Future Research Slide 3

1. Data complexity
Length of the class boundary
Build minimum spanning tree (MST) connecting all
the points regardless of class
Count the number of edges joining
opposite classes
it l

Two cases of many points in boundary:
Very interleaved or random data
Linearly separable problem with narrow margins

Overview and Future Research Slide 4

2. Synthetic data sets
Generation procedure
Set the number of instances n, the number of
attributes m and the length of the class boundary
m,
b.
Generate n points di t ib t d randomly and b ild
G t i t distributed dl d build
the MST.

Label the class of each
instances

Overview and Future Research Slide 5

2. Synthetic data sets
Exhaustive search
Labelings grow exponentially with the number of
instances
Heuristic search
Demanded length of the class boundary is not
always achieved
No diverse solutions

Genetic algorithm
G ti l ith

Overview and Future Research Slide 6

3. Design of GA
Knowledge representation
k-ary string where the bit i stores the class label of
the ith instance

Data set i Individual i
Att. 1 Att. 2 … Att. N Class
0.4
04 0.5
05 0.4
04 0
0.2 1.0 0.2 1
011011
0.5 0.3 0.4 1
0.6 0.5 0.4 0
0.7 0.1 1.0 1
0.5 0.3 0.9 1

Overview and Future Research Slide 7

3. Design of GA
Genetic operators
s-wise tournament selection
Two-point crossover
T it
Bit-wise mutation
Fitness function
fitnessi = bobj − bi

Overview and Future Research Slide 8

4. Experiment and results (I)
Synthetic data set generation
Different solutions < Solutions
Population converge
Pop lation con erge to the same sol tion
solution
{0100,1011} are equivalent individuals
Intermediate complexity are obtained i early
It di t l it bt i d in l
generations

Overview and Future Research Slide 9

4. Experiment and results (II)
Analysis of classifiers behavior
Three different paradigms: C4.5, Naïve Bayes, and
SMO
Similar accuracy rates with noticeable variability

Overview and Future Research Slide 10

5. Conclusions
The GA allows us to generate data sets with
the demanded length of the class boundary

Overview and Future Research Slide 11

6. Further work
Efficiency and scalability
Move from simple GA to competent GA
Capacity of satisfying multiple criteria
C f f
Multi-objective strategy
j gy
Achieve structure of real-world problems
Provide a set of benchmark problems

Overview and Future Research Slide 12

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture24Albert Orriols-Puig

Lecture3 - Machine LearningAlbert Orriols-Puig

Lecture1 - Machine LearningAlbert Orriols-Puig

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig

Lecture7 - IBkAlbert Orriols-Puig

Lecture11 - neural networksAlbert Orriols-Puig

Lecture17Albert Orriols-Puig

Lecture2 - Machine LearningAlbert Orriols-Puig

Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig

Lecture19Albert Orriols-Puig

Lecture18Albert Orriols-Puig

Lecture20Albert Orriols-Puig

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig

Lecture23Albert Orriols-Puig

A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...Anderson Pinho

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya

An Unorthodox View on Memetic AlgorithmsNatalio Krasnogor

Deep Learning Representations for All (a.ka. the AI hype)Universitat Politècnica de Catalunya

Multimodal Deep LearningUniversitat Politècnica de Catalunya

Neural Architectures for Video EncodingUniversitat Politècnica de Catalunya

Was ist angesagt? (20)

Lecture24

Lecture3 - Machine Learning

Lecture1 - Machine Learning

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...

Lecture7 - IBk

Lecture11 - neural networks

Lecture17

Lecture2 - Machine Learning

Lecture15 - Advances topics on association rules PART II

Lecture19

Lecture18

Lecture20

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS

Lecture23

A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

An Unorthodox View on Memetic Algorithms

Deep Learning Representations for All (a.ka. the AI hype)

Multimodal Deep Learning

Neural Architectures for Video Encoding

Ähnlich wie HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier

Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner

Lec1-Intobutest

Presentation on Machine Learning and Data Miningbutest

Clusterix at VDS 2016Eamonn Maguire

Machine Learning: Foundations Course Number 0368403401butest

Machine learning and_neural_network_lecture_slide_ece_dkuSeokhyun Yoon

Machine Learning: Foundations Course Number 0368403401butest

A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal

SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER

Computational model for artificial learning using formal concept analysisAboul Ella Hassanien

Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals

Hypothesis on Different Data Mining AlgorithmsIJERA Editor

Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh

Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi

Classifier Model using Artificial Neural NetworkAI Publications

Chapter1_C.docbutest

SYNOPSIS on Parse representation and Linear SVM.bhavinecindus

Kenett On Information NYU-Poly 2013The Hebrew University of Jerusalem

Ähnlich wie HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior (20)

Data-centric AI and the convergence of data and model engineering:opportunit...

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

Lec1-Into

Presentation on Machine Learning and Data Mining

Clusterix at VDS 2016

Machine Learning: Foundations Course Number 0368403401

Machine learning and_neural_network_lecture_slide_ece_dku

Machine Learning: Foundations Course Number 0368403401

A SURVEY ON DATA MINING IN STEEL INDUSTRIES

SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET

Computational model for artificial learning using formal concept analysis

Selecting the correct Data Mining Method: Classification & InDaMiTe-R

Hypothesis on Different Data Mining Algorithms

Neural Network Classification and its Applications in Insurance Industry

Machine learning ppt unit one syllabuspptx

Classifier Model using Artificial Neural Network

Chapter1_C.doc

SYNOPSIS on Parse representation and Linear SVM.

Kenett On Information NYU-Poly 2013

Mehr von Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig

HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig

Lecture22Albert Orriols-Puig

Lecture21Albert Orriols-Puig

Lecture16 - Advances topics on association rules PART IIIAlbert Orriols-Puig

Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig

Lecture13 - Association RulesAlbert Orriols-Puig

Lecture12 - SVMAlbert Orriols-Puig

Lecture10 - Naïve BayesAlbert Orriols-Puig

Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig

Mehr von Albert Orriols-Puig (11)

Lecture1 AI1 Introduction to artificial intelligence

HAIS09-BeyondHomemadeArtificialDatasets

Lecture22

Lecture21

Lecture16 - Advances topics on association rules PART III

Lecture14 - Advanced topics in association rules

Lecture13 - Association Rules

Lecture12 - SVM

Lecture10 - Naïve Bayes

Lecture9 - Bayesian-Decision-Theory

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...

Kürzlich hochgeladen

Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99

Earth Day Presentation wow hello nice greatYousafMalik24

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George

Field Attribute Index Feature in Odoo 17Celine George

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood

4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239

Difference Between Search & Browse Methods in Odoo 17Celine George

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2

How to Add Barcode on PDF Report in Odoo 17Celine George

Concurrency Control in Database Management systemChristalin Nelson

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma

Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood

Transaction Management in Database Management SystemChristalin Nelson

What is Model Inheritance in Odoo 17 ERPCeline George

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543

Kürzlich hochgeladen (20)

Integumentary System SMP B. Pharm Sem I.ppt

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx

Choosing the Right CBSE School A Comprehensive Guide for Parents

Earth Day Presentation wow hello nice great

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17

Field Attribute Index Feature in Odoo 17

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx

4.18.24 Movement Legacies, Reflection, and Review.pptx

Difference Between Search & Browse Methods in Odoo 17

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf

How to Add Barcode on PDF Report in Odoo 17

Concurrency Control in Database Management system

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx

Student Profile Sample - We help schools to connect the data they have, with ...

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx

Transaction Management in Database Management System

What is Model Inheritance in Odoo 17 ERP

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

1. Genetic-based Synthetic Data Sets for the A l i f S t f th Analysis of Classifiers Behavior 8th I t International Conference on Hybrid Intelligent Systems ti lC f H b id I t lli tS t Núria Macià Albert Orriols-Puig Alb t O i l P i Ester Bernadó-Mansilla {nmacia,aorriols,esterb}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull

2. Motivation Knowledge Data Set Model Extraction Real-world Learner problem + Prediction Necessity of synthetic data sets To evaluate real learners performance under controlled scenarios How to generate synthetic data sets? Data complexity (Ho & Basu, 2002) Length of the class boundary (Macià et al., 2008) Objective: Set of benchmark problems to analyze learners behavior Overview and Future Research Slide 2

3. Outline 1. 1 Data complexity 2. Synthetic data sets 3. Design of GA 4. 4 Experiments and results 5. Conclusions and further work Overview and Future Research Slide 3

4. 1. Data complexity Length of the class boundary Build minimum spanning tree (MST) connecting all the points regardless of class Count the number of edges joining opposite classes it l Two cases of many points in boundary: Very interleaved or random data Linearly separable problem with narrow margins Overview and Future Research Slide 4

5. 2. Synthetic data sets Generation procedure Set the number of instances n, the number of attributes m and the length of the class boundary m, b. Generate n points di t ib t d randomly and b ild G t i t distributed dl d build the MST. Label the class of each instances Overview and Future Research Slide 5

6. 2. Synthetic data sets Exhaustive search Labelings grow exponentially with the number of instances Heuristic search Demanded length of the class boundary is not always achieved No diverse solutions Genetic algorithm G ti l ith Overview and Future Research Slide 6

7. 3. Design of GA Knowledge representation k-ary string where the bit i stores the class label of the ith instance Data set i Individual i Att. 1 Att. 2 … Att. N Class 0.4 04 0.5 05 0.4 04 0 0.2 1.0 0.2 1 011011 0.5 0.3 0.4 1 0.6 0.5 0.4 0 0.7 0.1 1.0 1 0.5 0.3 0.9 1 Overview and Future Research Slide 7

8. 3. Design of GA Genetic operators s-wise tournament selection Two-point crossover T it Bit-wise mutation Fitness function fitnessi = bobj − bi Overview and Future Research Slide 8

9. 4. Experiment and results (I) Synthetic data set generation Different solutions < Solutions Population converge Pop lation con erge to the same sol tion solution {0100,1011} are equivalent individuals Intermediate complexity are obtained i early It di t l it bt i d in l generations Overview and Future Research Slide 9

10. 4. Experiment and results (II) Analysis of classifiers behavior Three different paradigms: C4.5, Naïve Bayes, and SMO Similar accuracy rates with noticeable variability Overview and Future Research Slide 10

11. 5. Conclusions The GA allows us to generate data sets with the demanded length of the class boundary Overview and Future Research Slide 11

12. 6. Further work Efficiency and scalability Move from simple GA to competent GA Capacity of satisfying multiple criteria C f f Multi-objective strategy j gy Achieve structure of real-world problems Provide a set of benchmark problems Overview and Future Research Slide 12

13. Genetic-based Synthetic Data Sets for the A l i f S t f th Analysis of Classifiers Behavior 8th I t International Conference on Hybrid Intelligent Systems ti lC f H b id I t lli tS t Núria Macià Albert Orriols-Puig Alb t O i l P i Ester Bernadó-Mansilla {nmacia,aorriols,esterb}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Ähnlich wie HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior (20)

Mehr von Albert Orriols-Puig

Mehr von Albert Orriols-Puig (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior