SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Yan-Cheng Chen and Chao-Ton Su
Dept. of Industrial Engineering and Engineering Management,
National Tsing Hua University,
HsinChu, Taiwan
Knowledge Discovery from Support
Vector Machines with Application to
Credit Screening
Outlines
 Introduction
 Related Works
 Proposed Method
 Performance Metric
 Experiment and Results
 Conclusions
Introduction
3
Data
Data
Data
Original Data Sets Data Preprocessing
Mining
Rules
Target Data
Classifiers Decision TreeDecision Tree
Neural Network
(NN)
Neural Network
(NN)
Nearest
Neighbor Classifier
Nearest
Neighbor ClassifierSupport Vector
Machines (SVM)
Theoretical foundationsTheoretical foundations Excellent resultsExcellent results
Motivation & Objective
4
Main Challenge:
SVM is regarded as black box analysis tool
Decision Boundary of SVM
൥
� 𝟏𝟏 = 𝟑� ⋯ � 𝟏� = �𝟑𝟖�
⋮ ⋱ ⋮
� 𝐧𝟏 = 𝟒𝟑 ⋯ � 𝐧� = 𝟑�𝟖
൩
𝐧×�
൥
� 𝟏 = +�. 𝟗𝟑𝟖
⋮
�� = −�.�𝟑�
൩
�×𝟏
Lack of explicit declarative
knowledge representation
Present a complicated
mathematical pattern
�൥�൥= � ∙ ϕ(�) + �
Objective
Develop
a rule extraction
algorithm from SVM
Develop
a rule extraction
algorithm from SVM
Related Works
Rule extraction from SVM
 The new research issue of rule extraction from SVM
(Núñez (2002), Martens (2008), and Barakat (2010))
 Expressive power of the extracted rules depends on the language
used to express those rules.
 Propositional rules (simple if-then expressions),
 M-of-N rules (If at least M of N conditions (C1, C2,…, Cn), then…)
 Fuzzy rules
 Rule extraction algorithms from SVM can be divided into four
types:
 Region-based rule extraction (Núñez (2006) and G. Fung (2005))
 Decision tree-based rule extraction (Martens (2007, 2008) and Barakat (2006))
 Sequential covering rule extraction (Barakat, 2007)
 Fuzzy rule extraction (Chaves, 2005)
6
Decision tree-based rule extraction
 The main idea is to generate artificial examples from a decision
boundary and then use them with tree induction algorithms.
(Barakat, 2004)
 The number of SVs easily affects the learning-based method
because with few data points, it is difficult to generate good
artificial label examples.
7
Direct rule learners
 Decision Tree (Quinlan, 1993)
 A hierarchical tree structure is used to classify classes based on a series
of rules. Attributes of the classes can be any type of variable from
binary, nominal, ordinal, and quantitative values; the classes must be the
qualitative type.
 Each node represent a variable, and each leaf represent a outcome.
 RIPPER (Cohen, 1995)
 The general-to-specific strategy generates a rule
 FOIL’s information gains measure chooses the best conjunct to be
added into rule antecedent.
8
Related Methods
 Weighted Kernel k-means (Dhillon, 2005)
 Use the kernel trick approach to transform all data points into the high-
dimensional space.
 Locally optimize a number of graph partitioning.
 Discovering the suitable prototype vectors corresponding to each cluster
 Genetic Algorithm (Goldberg,1989)
 Generate symbolic rules directly from data is the ease of constructing
chromosome structures with any type of variables
 Using the chromosomes to represent the if–then rule condition.
 Identify the suitable value for each attribute in the high-dimensional
space.
9
Proposed Method
Proposed Method: KCGex-SVM
 Rule extraction from SVMs by
using weight kernel k-means
algorithm and GA (KCGex-SVM)
 Integrates, SVs, Pi ,and GAs.
 The procedure of constructing the
rule set into each hypercube form.
 Fig. (a) illustrates the scatter plot for
support vectors and data points are
classified by three classes.
 Fig. (b) shows the application of the
weighted kernel k-means algorithms in
determining cluster center for each cluster .
 Fig. (c) illustrates the application of GAs in
identifying the hypercube to construct the
interval for each cluster.
 Fig. (d) shows that each hypercube can
generate a rule set.
11
Proposed Method: Procedure of KCGex-SVM
 Step 1 involves a prepossessing step.
 Step 2 generates support vectors from
SVM.
 Step 3 uses the weighted kernel k-
means algorithm to find prototype
center corresponding to each cluster.
12
Proposed Method: Procedure of KCGex-SVM
 Step 4 describes that the chromosome design includes any type of
variable, either discrete or continuous. The mapping from a binary
string to the problem of rule extraction for each variable and each
threshold are completed as follows:
where t is jth
generation of a chromosome.
 Step 5 evaluates each chromosome using the defined fitness function.
where Cd1, Cd2, and Cy are the penalty parameters greater than zero.
13
ቌ��1 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
− ��2 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
ቌ + �� ቌ (yቌොቌොොොො� − y�)2
�
�=1
,
Proposed Method 3
Procedure of KCGex-SVM
 Step 6 involves breeding new organisms through crossover
and mutation, and considers roulette wheel selection
 Step 7 repeats iterations until reaching the final
termination condition.
 Step 8 prunes redundant rules from the candidate rules.
 Step 9 uses the best chromosome corresponding to the best
fitness value to construct the rule set.
14
Performance Evaluation
15
Performance Evaluation
 In the two-class case with classes yes and no, incidence or absence, and so
on, a single prediction has four different possible outcomes.
 Accuracy: (TP + TN) / ( TP + FN + FP + TN)
 Comprehensibility indicates the number of rules and the number of
antecedent conditions.
16
Predicted Class
Yes No
Actual
Class
Yes
TP
(True positive)
FN
(False negative)
No
FP
(False positive)
TN
(True negative)
Experiment
17
Experiment
 Three credit screening data sets were selected from the
University of California, Irvine (UCI) repository
18
Data set Example Classes Continuous Discrete
Japanese 124 1:0.48 5 5
Austrian 690 1:1.24 8 5
German 1000 1:0.4285 13 6
Experiment
 Numerical Experiment
 The experiment compared KCGex-SVM with direct rule learners,
such as C4.5, and RIPPER.
 Parameter settings
19
Parameter Settings
Kernel function Radial basis kernel
C and σ Grid search method
Population size 200
Crossover rate ranged from 0.2 to 0.6
Mutation rate ranged from 0.01 to 0.1
Termination
condition
at 1000 iterations reached or the
same results after 100 iterations
Results
Results
 The following table shows the results comparison of KCGex-SVM with the other
rule learners on the three credit screening data sets.
 Our proposed method, KCGex-SVM obtain the better performance in term of the
accuracy.Data Methods
Index
SVMs
ALBA
KCGex-SVM C4.5 RIPPER.Set
C4.5 RIPPER
German
Acc. 78.98 73.53 73.05 75.11 72.67 71.77
# of rule - 64 11.6 5 9 3
# of ante. - - - 10 11 5
Austrian
Acc. 85.04 - - 84.78 84.78 84.78
# of rule
- - - 5 2 2
# of ante. - - - 9 2 1
Japanese
Acc. 68.29 - - 68.29 68.29 65.85
# of rule - - - 2 2 2
# of ante. - - - 2 2 3
Ave. of acc. 77.44 73.53 73.05 76.06 75.25 74.13
Total # of rules - 64 11.6 12 13 7
Total # of ante. - - - 21 15 9
Results
 The rule sets of the three credit screening data sets
No. German Austrian Japanese
# 1
X1 = 1
& X2 > 17,
X3 > 47,
Then Class II
X8 = 1 & X9 = 1,
Then Class I
X10 2, Then≦
Class II
# 2
X1 = 1 & X2 16≧
Then Class II
X8 = 1 & X14≧
259, Then Class I
Otherwise Class I
# 3
X1 = 2 & X2 24≧
& X5 2,≦
Then Class II
X8 = 1&
X13 110≦
& X14 1, Then≦
Class I
# 4
X16 1 and X11 2,≧ ≦
Then Class II
X8 = 1 & X6 = 8,
Then Class I
# 5 Otherwise Class I Otherwise Class II
Conclusions
Conclusion
 KCGex-SVM combines GAs, prototype centers, and
information provided by SVMs to enhances the explanation
capability of SVMs.
 KCGex-SVM can not only generate the rule set, but can also
select the important variable from the credit screening data sets.
 The proposed method performs well in terms of its average
accuracy than those of most popular direct rule learners in the
field of data mining based on three credit screening data sets.
References
 The issue of rule extraction from SVM
 D. Martens, J. Huysmans, R. Setiono et al., “Rule Extraction from Support Vector Machines: An Overview of Issues and
Application in Credit Scoring,” Studies in Computational Intelligence, vol. 80, pp. 33-63, 2008.
 N. Barakat, and A. P. Bradley, “Rule extraction from support vector machines: A review,” Neurocomputing, vol. 74, no. 1-3,
pp. 178-190, 2010.
 H. Núñez, C. Angulo, and A. Catala, “Rule-Based Learning Systems for Support Vector Machines,” Neural Process. Letter.,
vol. 24, no. 1, pp. 1-18, 2006.
 H. Núñez, C. Angulo, and A. Catala, “Rule Extraction from Support Vector Machines,” Proc. European Symp. Artificial Neural
Networks, pp. 107-112, 2002.
 G. Fung, S. Sandilya, and R. B. Rao, “Rule extraction from linear support vector machines,” in Proceedings of the eleventh
ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, 2005.
 D. Martens, B. Baesens, and T.V. Gestel, “Decompositional rule extraction from support vector machines by active learning,”
IEEE Transactions on Knowledge and Data Engineering vol. 21, pp. 177-190, 2009.
 D. Martens, B. Baesens, T. Van Gestel et al., “Comprehensible credit scoring models using rule extraction from support vector
machines,” European Journal of Operational Research, vol. 183, no. 3, pp. 1466-1476, 2007.
 N. Barakat, and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,” International Journal of
Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2006.
 A.C. Chaves, M. Vellasco, and R. Tanscheit, “Fuzzy rule extraction from support vector machines,” in Proceedings of the Fifth
International Conference on Hybrid Intelligent Systems, 2005.
 N. Barakat, and A. P. Bradley, “Rule Extraction from Support Vector Machines: A Sequential Covering Approach,” IEEE
Transactions on Knowledge and Data Engineering, vol. 19, no. 6, pp. 729-741, 2007.
References
 Related Methods
 V. Vapnik, The Nature of Statistical Learning Theory, New York: Springer, 1995.
 J. R. Quinlan, Programs for Machine Learning: Morgan Kaufmann, 1993.
 L. Breiman, J. Friedman, R. Olshen et al., Classification and Regression trees,
Monterey, CA: Wadsworth and Brooks, 1994.
 W. W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Maching
Learning, pp. 115-123, 1995.
 T. Pang-Ning, M. Steinbach, and V. Kumar, Introduction to Data Mining: Addison
Wesley, 2005.
 I. Dhillon, Y. Guan, and B. Kulis, A unified view of kernel k-means, spectral
clustering and graph cuts, Univ. of Texas at Austin, 2005.
 L. D. Davis, and M. Mitchell, Handbook of genetic algorithms: Van Nostrand
Reinhold, 1991.
 D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning:
Addison-Wesley Longman Publishing Co., Inc., 1989.
Thanks for your attentions

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
Anomaly Detection in Temporal data Using Kmeans Clustering with C5.0
 
CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)CSC446: Pattern Recognition (LN4)
CSC446: Pattern Recognition (LN4)
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution Algorithm
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
CSC446: Pattern Recognition (LN3)
CSC446: Pattern Recognition (LN3)CSC446: Pattern Recognition (LN3)
CSC446: Pattern Recognition (LN3)
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 
Csc446: Pattren Recognition (LN1)
Csc446: Pattren Recognition (LN1)Csc446: Pattren Recognition (LN1)
Csc446: Pattren Recognition (LN1)
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
An Automatic Medical Image Segmentation using Teaching Learning Based Optimiz...
 
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
System for Prediction of Non Stationary Time Series based on the Wavelet Radi...
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
K means report
K means reportK means report
K means report
 
Csc446: Pattren Recognition (LN2)
Csc446: Pattren Recognition (LN2)Csc446: Pattren Recognition (LN2)
Csc446: Pattren Recognition (LN2)
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 

Andere mochten auch

Quality stories feb
Quality stories   febQuality stories   feb
Quality stories feb
samsungmena
 
Elements of art in photography
Elements of art in photographyElements of art in photography
Elements of art in photography
gzorskas
 
Brunel Engineering
Brunel EngineeringBrunel Engineering
Brunel Engineering
wouterarts
 
Lovato Images - Orange Photography
Lovato Images - Orange PhotographyLovato Images - Orange Photography
Lovato Images - Orange Photography
jackielovato
 
The globe mire i judit
The globe mire i juditThe globe mire i judit
The globe mire i judit
guest7de3a8e0
 
Darcell ppt
Darcell pptDarcell ppt
Darcell ppt
gzorskas
 

Andere mochten auch (20)

Quality stories feb
Quality stories   febQuality stories   feb
Quality stories feb
 
Oilfield Pics
Oilfield PicsOilfield Pics
Oilfield Pics
 
Oficina drupal - Temas Drupal (Theming)
Oficina drupal - Temas Drupal (Theming)Oficina drupal - Temas Drupal (Theming)
Oficina drupal - Temas Drupal (Theming)
 
Selectivitat10
Selectivitat10Selectivitat10
Selectivitat10
 
24symbols. The Spotify Model for eBooks - Presentation
24symbols. The Spotify Model for eBooks - Presentation24symbols. The Spotify Model for eBooks - Presentation
24symbols. The Spotify Model for eBooks - Presentation
 
Elements of art in photography
Elements of art in photographyElements of art in photography
Elements of art in photography
 
Brunel Engineering
Brunel EngineeringBrunel Engineering
Brunel Engineering
 
Lovato Images - Orange Photography
Lovato Images - Orange PhotographyLovato Images - Orange Photography
Lovato Images - Orange Photography
 
P1111223206
P1111223206P1111223206
P1111223206
 
The globe mire i judit
The globe mire i juditThe globe mire i judit
The globe mire i judit
 
P1111146028
P1111146028P1111146028
P1111146028
 
P1111141868
P1111141868P1111141868
P1111141868
 
Kingdom nomics book-131205
Kingdom nomics book-131205Kingdom nomics book-131205
Kingdom nomics book-131205
 
Imam Dhahabi - Kitaab al-Arsh (Arabic)
Imam Dhahabi - Kitaab al-Arsh (Arabic)Imam Dhahabi - Kitaab al-Arsh (Arabic)
Imam Dhahabi - Kitaab al-Arsh (Arabic)
 
Διερεύνηση της Γεωμετρικής Σκέψης των Αποφοίτων του Δημοτικού
Διερεύνηση της Γεωμετρικής Σκέψης των Αποφοίτων του ΔημοτικούΔιερεύνηση της Γεωμετρικής Σκέψης των Αποφοίτων του Δημοτικού
Διερεύνηση της Γεωμετρικής Σκέψης των Αποφοίτων του Δημοτικού
 
Endocrinology
EndocrinologyEndocrinology
Endocrinology
 
An Invitation By Gulab Devi Hospital
An Invitation By Gulab Devi Hospital An Invitation By Gulab Devi Hospital
An Invitation By Gulab Devi Hospital
 
P1151418327
P1151418327P1151418327
P1151418327
 
P1121052399
P1121052399P1121052399
P1121052399
 
Darcell ppt
Darcell pptDarcell ppt
Darcell ppt
 

Ähnlich wie P1121133727

IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
grssieee
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
imu409
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 

Ähnlich wie P1121133727 (20)

Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular AutomataCost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automata
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCERKNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Data clustering
Data clustering Data clustering
Data clustering
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
Control chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networksControl chart pattern recognition using k mica clustering and neural networks
Control chart pattern recognition using k mica clustering and neural networks
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 

Mehr von Ashraf Aboshosha

Mehr von Ashraf Aboshosha (20)

P1151351311
P1151351311P1151351311
P1151351311
 
P1151345302
P1151345302P1151345302
P1151345302
 
P1121352313
P1121352313P1121352313
P1121352313
 
P1121340296
P1121340296P1121340296
P1121340296
 
P1121340294
P1121340294P1121340294
P1121340294
 
P1121327289
P1121327289P1121327289
P1121327289
 
P1151442348
P1151442348P1151442348
P1151442348
 
P1151442347
P1151442347P1151442347
P1151442347
 
P1151439345
P1151439345P1151439345
P1151439345
 
P1151424332
P1151424332P1151424332
P1151424332
 
P1151423331
P1151423331P1151423331
P1151423331
 
P1151420328
P1151420328P1151420328
P1151420328
 
P1151404314
P1151404314P1151404314
P1151404314
 
P1111351312
P1111351312P1111351312
P1111351312
 
P1111444352
P1111444352P1111444352
P1111444352
 
P1111440346
P1111440346P1111440346
P1111440346
 
P1111431335
P1111431335P1111431335
P1111431335
 
P1111410320
P1111410320P1111410320
P1111410320
 
P1111410321
P1111410321P1111410321
P1111410321
 
P1111350310
P1111350310P1111350310
P1111350310
 

P1121133727

  • 1. Yan-Cheng Chen and Chao-Ton Su Dept. of Industrial Engineering and Engineering Management, National Tsing Hua University, HsinChu, Taiwan Knowledge Discovery from Support Vector Machines with Application to Credit Screening
  • 2. Outlines  Introduction  Related Works  Proposed Method  Performance Metric  Experiment and Results  Conclusions
  • 3. Introduction 3 Data Data Data Original Data Sets Data Preprocessing Mining Rules Target Data Classifiers Decision TreeDecision Tree Neural Network (NN) Neural Network (NN) Nearest Neighbor Classifier Nearest Neighbor ClassifierSupport Vector Machines (SVM) Theoretical foundationsTheoretical foundations Excellent resultsExcellent results
  • 4. Motivation & Objective 4 Main Challenge: SVM is regarded as black box analysis tool Decision Boundary of SVM ൥ � 𝟏𝟏 = 𝟑� ⋯ � 𝟏� = �𝟑𝟖� ⋮ ⋱ ⋮ � 𝐧𝟏 = 𝟒𝟑 ⋯ � 𝐧� = 𝟑�𝟖 ൩ 𝐧×� ൥ � 𝟏 = +�. 𝟗𝟑𝟖 ⋮ �� = −�.�𝟑� ൩ �×𝟏 Lack of explicit declarative knowledge representation Present a complicated mathematical pattern �൥�൥= � ∙ ϕ(�) + � Objective Develop a rule extraction algorithm from SVM Develop a rule extraction algorithm from SVM
  • 6. Rule extraction from SVM  The new research issue of rule extraction from SVM (Núñez (2002), Martens (2008), and Barakat (2010))  Expressive power of the extracted rules depends on the language used to express those rules.  Propositional rules (simple if-then expressions),  M-of-N rules (If at least M of N conditions (C1, C2,…, Cn), then…)  Fuzzy rules  Rule extraction algorithms from SVM can be divided into four types:  Region-based rule extraction (Núñez (2006) and G. Fung (2005))  Decision tree-based rule extraction (Martens (2007, 2008) and Barakat (2006))  Sequential covering rule extraction (Barakat, 2007)  Fuzzy rule extraction (Chaves, 2005) 6
  • 7. Decision tree-based rule extraction  The main idea is to generate artificial examples from a decision boundary and then use them with tree induction algorithms. (Barakat, 2004)  The number of SVs easily affects the learning-based method because with few data points, it is difficult to generate good artificial label examples. 7
  • 8. Direct rule learners  Decision Tree (Quinlan, 1993)  A hierarchical tree structure is used to classify classes based on a series of rules. Attributes of the classes can be any type of variable from binary, nominal, ordinal, and quantitative values; the classes must be the qualitative type.  Each node represent a variable, and each leaf represent a outcome.  RIPPER (Cohen, 1995)  The general-to-specific strategy generates a rule  FOIL’s information gains measure chooses the best conjunct to be added into rule antecedent. 8
  • 9. Related Methods  Weighted Kernel k-means (Dhillon, 2005)  Use the kernel trick approach to transform all data points into the high- dimensional space.  Locally optimize a number of graph partitioning.  Discovering the suitable prototype vectors corresponding to each cluster  Genetic Algorithm (Goldberg,1989)  Generate symbolic rules directly from data is the ease of constructing chromosome structures with any type of variables  Using the chromosomes to represent the if–then rule condition.  Identify the suitable value for each attribute in the high-dimensional space. 9
  • 11. Proposed Method: KCGex-SVM  Rule extraction from SVMs by using weight kernel k-means algorithm and GA (KCGex-SVM)  Integrates, SVs, Pi ,and GAs.  The procedure of constructing the rule set into each hypercube form.  Fig. (a) illustrates the scatter plot for support vectors and data points are classified by three classes.  Fig. (b) shows the application of the weighted kernel k-means algorithms in determining cluster center for each cluster .  Fig. (c) illustrates the application of GAs in identifying the hypercube to construct the interval for each cluster.  Fig. (d) shows that each hypercube can generate a rule set. 11
  • 12. Proposed Method: Procedure of KCGex-SVM  Step 1 involves a prepossessing step.  Step 2 generates support vectors from SVM.  Step 3 uses the weighted kernel k- means algorithm to find prototype center corresponding to each cluster. 12
  • 13. Proposed Method: Procedure of KCGex-SVM  Step 4 describes that the chromosome design includes any type of variable, either discrete or continuous. The mapping from a binary string to the problem of rule extraction for each variable and each threshold are completed as follows: where t is jth generation of a chromosome.  Step 5 evaluates each chromosome using the defined fitness function. where Cd1, Cd2, and Cy are the penalty parameters greater than zero. 13 ቌ��1 ቌ �� (�� − 𝑆�� )2 � �� ∈��,𝑆�� ∈�� − ��2 ቌ �� (�� − 𝑆�� )2 � �� ∈��,𝑆�� ∈�� ቌ + �� ቌ (yቌොቌොොොො� − y�)2 � �=1 ,
  • 14. Proposed Method 3 Procedure of KCGex-SVM  Step 6 involves breeding new organisms through crossover and mutation, and considers roulette wheel selection  Step 7 repeats iterations until reaching the final termination condition.  Step 8 prunes redundant rules from the candidate rules.  Step 9 uses the best chromosome corresponding to the best fitness value to construct the rule set. 14
  • 16. Performance Evaluation  In the two-class case with classes yes and no, incidence or absence, and so on, a single prediction has four different possible outcomes.  Accuracy: (TP + TN) / ( TP + FN + FP + TN)  Comprehensibility indicates the number of rules and the number of antecedent conditions. 16 Predicted Class Yes No Actual Class Yes TP (True positive) FN (False negative) No FP (False positive) TN (True negative)
  • 18. Experiment  Three credit screening data sets were selected from the University of California, Irvine (UCI) repository 18 Data set Example Classes Continuous Discrete Japanese 124 1:0.48 5 5 Austrian 690 1:1.24 8 5 German 1000 1:0.4285 13 6
  • 19. Experiment  Numerical Experiment  The experiment compared KCGex-SVM with direct rule learners, such as C4.5, and RIPPER.  Parameter settings 19 Parameter Settings Kernel function Radial basis kernel C and σ Grid search method Population size 200 Crossover rate ranged from 0.2 to 0.6 Mutation rate ranged from 0.01 to 0.1 Termination condition at 1000 iterations reached or the same results after 100 iterations
  • 21. Results  The following table shows the results comparison of KCGex-SVM with the other rule learners on the three credit screening data sets.  Our proposed method, KCGex-SVM obtain the better performance in term of the accuracy.Data Methods Index SVMs ALBA KCGex-SVM C4.5 RIPPER.Set C4.5 RIPPER German Acc. 78.98 73.53 73.05 75.11 72.67 71.77 # of rule - 64 11.6 5 9 3 # of ante. - - - 10 11 5 Austrian Acc. 85.04 - - 84.78 84.78 84.78 # of rule - - - 5 2 2 # of ante. - - - 9 2 1 Japanese Acc. 68.29 - - 68.29 68.29 65.85 # of rule - - - 2 2 2 # of ante. - - - 2 2 3 Ave. of acc. 77.44 73.53 73.05 76.06 75.25 74.13 Total # of rules - 64 11.6 12 13 7 Total # of ante. - - - 21 15 9
  • 22. Results  The rule sets of the three credit screening data sets No. German Austrian Japanese # 1 X1 = 1 & X2 > 17, X3 > 47, Then Class II X8 = 1 & X9 = 1, Then Class I X10 2, Then≦ Class II # 2 X1 = 1 & X2 16≧ Then Class II X8 = 1 & X14≧ 259, Then Class I Otherwise Class I # 3 X1 = 2 & X2 24≧ & X5 2,≦ Then Class II X8 = 1& X13 110≦ & X14 1, Then≦ Class I # 4 X16 1 and X11 2,≧ ≦ Then Class II X8 = 1 & X6 = 8, Then Class I # 5 Otherwise Class I Otherwise Class II
  • 24. Conclusion  KCGex-SVM combines GAs, prototype centers, and information provided by SVMs to enhances the explanation capability of SVMs.  KCGex-SVM can not only generate the rule set, but can also select the important variable from the credit screening data sets.  The proposed method performs well in terms of its average accuracy than those of most popular direct rule learners in the field of data mining based on three credit screening data sets.
  • 25. References  The issue of rule extraction from SVM  D. Martens, J. Huysmans, R. Setiono et al., “Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring,” Studies in Computational Intelligence, vol. 80, pp. 33-63, 2008.  N. Barakat, and A. P. Bradley, “Rule extraction from support vector machines: A review,” Neurocomputing, vol. 74, no. 1-3, pp. 178-190, 2010.  H. Núñez, C. Angulo, and A. Catala, “Rule-Based Learning Systems for Support Vector Machines,” Neural Process. Letter., vol. 24, no. 1, pp. 1-18, 2006.  H. Núñez, C. Angulo, and A. Catala, “Rule Extraction from Support Vector Machines,” Proc. European Symp. Artificial Neural Networks, pp. 107-112, 2002.  G. Fung, S. Sandilya, and R. B. Rao, “Rule extraction from linear support vector machines,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, 2005.  D. Martens, B. Baesens, and T.V. Gestel, “Decompositional rule extraction from support vector machines by active learning,” IEEE Transactions on Knowledge and Data Engineering vol. 21, pp. 177-190, 2009.  D. Martens, B. Baesens, T. Van Gestel et al., “Comprehensible credit scoring models using rule extraction from support vector machines,” European Journal of Operational Research, vol. 183, no. 3, pp. 1466-1476, 2007.  N. Barakat, and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,” International Journal of Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2006.  A.C. Chaves, M. Vellasco, and R. Tanscheit, “Fuzzy rule extraction from support vector machines,” in Proceedings of the Fifth International Conference on Hybrid Intelligent Systems, 2005.  N. Barakat, and A. P. Bradley, “Rule Extraction from Support Vector Machines: A Sequential Covering Approach,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 6, pp. 729-741, 2007.
  • 26. References  Related Methods  V. Vapnik, The Nature of Statistical Learning Theory, New York: Springer, 1995.  J. R. Quinlan, Programs for Machine Learning: Morgan Kaufmann, 1993.  L. Breiman, J. Friedman, R. Olshen et al., Classification and Regression trees, Monterey, CA: Wadsworth and Brooks, 1994.  W. W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Maching Learning, pp. 115-123, 1995.  T. Pang-Ning, M. Steinbach, and V. Kumar, Introduction to Data Mining: Addison Wesley, 2005.  I. Dhillon, Y. Guan, and B. Kulis, A unified view of kernel k-means, spectral clustering and graph cuts, Univ. of Texas at Austin, 2005.  L. D. Davis, and M. Mitchell, Handbook of genetic algorithms: Van Nostrand Reinhold, 1991.  D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning: Addison-Wesley Longman Publishing Co., Inc., 1989.
  • 27. Thanks for your attentions

Hinweis der Redaktion

  1. The title of this presentation is Knowledge discovery from SVM with application to credit screening. My name is Chen Yan-Cheng and my advisor is Chao-Ton Su. We come from dept. of IEEM, Taiwan My presentation will take only 15 min.
  2. This slide shows my outlines The first part is introduction, I will introduce the objective, motivation of this study. The second one is related works, I will introduce the related references about the issue of rule extraction from SVM. The third one is our proposed method. The fourth part is performance metric. The fifth part is experiment and results. Final part is conclusion.
  3. This slide shows briefly the procedure of data mining. When we obtain the real data set from the real application, and then identify the target data. The data set need to preprocess in order into the classifiers. There are many classifier in the data mining fields. Among these data mining techniques, SVM is very powerful techniques. SVM has strong theoretic foundations and excellent classification results. In the data mining fields, discovering knowledge and extract rule are interesting.
  4. Main challenge of SVM is regarded as black box analysis tool. Because decision boundary of SVM is lacking the explicit declarative knowledge representation. Also it present a complicated mathematical pattern. Construct a rule extraction algorithm from SVM is our objective of this study.
  5. This section will introduce related works about the issues of rule extraction from SVM and related studies.
  6. Two authors mentioned that the new research issue of rule extraction from SVM. The style of rule set can be represented into variant kinds of types. The rule extraction algorithms can be divided into four types. This study only consider the techniques of decision tree based rule extraction.
  7. DT is very famous techniques in the rule learners. Each node represent a variable, each leaf represent a outcome. RIPPER uses the strategy of general to specific and FOIL information gains to general rule set.
  8. JASONJOAN