1. Yan-Cheng Chen and Chao-Ton Su
Dept. of Industrial Engineering and Engineering Management,
National Tsing Hua University,
HsinChu, Taiwan
Knowledge Discovery from Support
Vector Machines with Application to
Credit Screening
3. Introduction
3
Data
Data
Data
Original Data Sets Data Preprocessing
Mining
Rules
Target Data
Classifiers Decision TreeDecision Tree
Neural Network
(NN)
Neural Network
(NN)
Nearest
Neighbor Classifier
Nearest
Neighbor ClassifierSupport Vector
Machines (SVM)
Theoretical foundationsTheoretical foundations Excellent resultsExcellent results
4. Motivation & Objective
4
Main Challenge:
SVM is regarded as black box analysis tool
Decision Boundary of SVM
� 𝟏𝟏 = 𝟑� ⋯ � 𝟏� = �𝟑𝟖�
⋮ ⋱ ⋮
� 𝐧𝟏 = 𝟒𝟑 ⋯ � 𝐧� = 𝟑�𝟖
൩
𝐧×�
� 𝟏 = +�. 𝟗𝟑𝟖
⋮
�� = −�.�𝟑�
൩
�×𝟏
Lack of explicit declarative
knowledge representation
Present a complicated
mathematical pattern
��= � ∙ ϕ(�) + �
Objective
Develop
a rule extraction
algorithm from SVM
Develop
a rule extraction
algorithm from SVM
6. Rule extraction from SVM
The new research issue of rule extraction from SVM
(Núñez (2002), Martens (2008), and Barakat (2010))
Expressive power of the extracted rules depends on the language
used to express those rules.
Propositional rules (simple if-then expressions),
M-of-N rules (If at least M of N conditions (C1, C2,…, Cn), then…)
Fuzzy rules
Rule extraction algorithms from SVM can be divided into four
types:
Region-based rule extraction (Núñez (2006) and G. Fung (2005))
Decision tree-based rule extraction (Martens (2007, 2008) and Barakat (2006))
Sequential covering rule extraction (Barakat, 2007)
Fuzzy rule extraction (Chaves, 2005)
6
7. Decision tree-based rule extraction
The main idea is to generate artificial examples from a decision
boundary and then use them with tree induction algorithms.
(Barakat, 2004)
The number of SVs easily affects the learning-based method
because with few data points, it is difficult to generate good
artificial label examples.
7
8. Direct rule learners
Decision Tree (Quinlan, 1993)
A hierarchical tree structure is used to classify classes based on a series
of rules. Attributes of the classes can be any type of variable from
binary, nominal, ordinal, and quantitative values; the classes must be the
qualitative type.
Each node represent a variable, and each leaf represent a outcome.
RIPPER (Cohen, 1995)
The general-to-specific strategy generates a rule
FOIL’s information gains measure chooses the best conjunct to be
added into rule antecedent.
8
9. Related Methods
Weighted Kernel k-means (Dhillon, 2005)
Use the kernel trick approach to transform all data points into the high-
dimensional space.
Locally optimize a number of graph partitioning.
Discovering the suitable prototype vectors corresponding to each cluster
Genetic Algorithm (Goldberg,1989)
Generate symbolic rules directly from data is the ease of constructing
chromosome structures with any type of variables
Using the chromosomes to represent the if–then rule condition.
Identify the suitable value for each attribute in the high-dimensional
space.
9
11. Proposed Method: KCGex-SVM
Rule extraction from SVMs by
using weight kernel k-means
algorithm and GA (KCGex-SVM)
Integrates, SVs, Pi ,and GAs.
The procedure of constructing the
rule set into each hypercube form.
Fig. (a) illustrates the scatter plot for
support vectors and data points are
classified by three classes.
Fig. (b) shows the application of the
weighted kernel k-means algorithms in
determining cluster center for each cluster .
Fig. (c) illustrates the application of GAs in
identifying the hypercube to construct the
interval for each cluster.
Fig. (d) shows that each hypercube can
generate a rule set.
11
12. Proposed Method: Procedure of KCGex-SVM
Step 1 involves a prepossessing step.
Step 2 generates support vectors from
SVM.
Step 3 uses the weighted kernel k-
means algorithm to find prototype
center corresponding to each cluster.
12
13. Proposed Method: Procedure of KCGex-SVM
Step 4 describes that the chromosome design includes any type of
variable, either discrete or continuous. The mapping from a binary
string to the problem of rule extraction for each variable and each
threshold are completed as follows:
where t is jth
generation of a chromosome.
Step 5 evaluates each chromosome using the defined fitness function.
where Cd1, Cd2, and Cy are the penalty parameters greater than zero.
13
ቌ��1 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
− ��2 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
ቌ + �� ቌ (yቌොቌොොොො� − y�)2
�
�=1
,
14. Proposed Method 3
Procedure of KCGex-SVM
Step 6 involves breeding new organisms through crossover
and mutation, and considers roulette wheel selection
Step 7 repeats iterations until reaching the final
termination condition.
Step 8 prunes redundant rules from the candidate rules.
Step 9 uses the best chromosome corresponding to the best
fitness value to construct the rule set.
14
16. Performance Evaluation
In the two-class case with classes yes and no, incidence or absence, and so
on, a single prediction has four different possible outcomes.
Accuracy: (TP + TN) / ( TP + FN + FP + TN)
Comprehensibility indicates the number of rules and the number of
antecedent conditions.
16
Predicted Class
Yes No
Actual
Class
Yes
TP
(True positive)
FN
(False negative)
No
FP
(False positive)
TN
(True negative)
18. Experiment
Three credit screening data sets were selected from the
University of California, Irvine (UCI) repository
18
Data set Example Classes Continuous Discrete
Japanese 124 1:0.48 5 5
Austrian 690 1:1.24 8 5
German 1000 1:0.4285 13 6
19. Experiment
Numerical Experiment
The experiment compared KCGex-SVM with direct rule learners,
such as C4.5, and RIPPER.
Parameter settings
19
Parameter Settings
Kernel function Radial basis kernel
C and σ Grid search method
Population size 200
Crossover rate ranged from 0.2 to 0.6
Mutation rate ranged from 0.01 to 0.1
Termination
condition
at 1000 iterations reached or the
same results after 100 iterations
21. Results
The following table shows the results comparison of KCGex-SVM with the other
rule learners on the three credit screening data sets.
Our proposed method, KCGex-SVM obtain the better performance in term of the
accuracy.Data Methods
Index
SVMs
ALBA
KCGex-SVM C4.5 RIPPER.Set
C4.5 RIPPER
German
Acc. 78.98 73.53 73.05 75.11 72.67 71.77
# of rule - 64 11.6 5 9 3
# of ante. - - - 10 11 5
Austrian
Acc. 85.04 - - 84.78 84.78 84.78
# of rule
- - - 5 2 2
# of ante. - - - 9 2 1
Japanese
Acc. 68.29 - - 68.29 68.29 65.85
# of rule - - - 2 2 2
# of ante. - - - 2 2 3
Ave. of acc. 77.44 73.53 73.05 76.06 75.25 74.13
Total # of rules - 64 11.6 12 13 7
Total # of ante. - - - 21 15 9
22. Results
The rule sets of the three credit screening data sets
No. German Austrian Japanese
# 1
X1 = 1
& X2 > 17,
X3 > 47,
Then Class II
X8 = 1 & X9 = 1,
Then Class I
X10 2, Then≦
Class II
# 2
X1 = 1 & X2 16≧
Then Class II
X8 = 1 & X14≧
259, Then Class I
Otherwise Class I
# 3
X1 = 2 & X2 24≧
& X5 2,≦
Then Class II
X8 = 1&
X13 110≦
& X14 1, Then≦
Class I
# 4
X16 1 and X11 2,≧ ≦
Then Class II
X8 = 1 & X6 = 8,
Then Class I
# 5 Otherwise Class I Otherwise Class II
24. Conclusion
KCGex-SVM combines GAs, prototype centers, and
information provided by SVMs to enhances the explanation
capability of SVMs.
KCGex-SVM can not only generate the rule set, but can also
select the important variable from the credit screening data sets.
The proposed method performs well in terms of its average
accuracy than those of most popular direct rule learners in the
field of data mining based on three credit screening data sets.
25. References
The issue of rule extraction from SVM
D. Martens, J. Huysmans, R. Setiono et al., “Rule Extraction from Support Vector Machines: An Overview of Issues and
Application in Credit Scoring,” Studies in Computational Intelligence, vol. 80, pp. 33-63, 2008.
N. Barakat, and A. P. Bradley, “Rule extraction from support vector machines: A review,” Neurocomputing, vol. 74, no. 1-3,
pp. 178-190, 2010.
H. Núñez, C. Angulo, and A. Catala, “Rule-Based Learning Systems for Support Vector Machines,” Neural Process. Letter.,
vol. 24, no. 1, pp. 1-18, 2006.
H. Núñez, C. Angulo, and A. Catala, “Rule Extraction from Support Vector Machines,” Proc. European Symp. Artificial Neural
Networks, pp. 107-112, 2002.
G. Fung, S. Sandilya, and R. B. Rao, “Rule extraction from linear support vector machines,” in Proceedings of the eleventh
ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, 2005.
D. Martens, B. Baesens, and T.V. Gestel, “Decompositional rule extraction from support vector machines by active learning,”
IEEE Transactions on Knowledge and Data Engineering vol. 21, pp. 177-190, 2009.
D. Martens, B. Baesens, T. Van Gestel et al., “Comprehensible credit scoring models using rule extraction from support vector
machines,” European Journal of Operational Research, vol. 183, no. 3, pp. 1466-1476, 2007.
N. Barakat, and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,” International Journal of
Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2006.
A.C. Chaves, M. Vellasco, and R. Tanscheit, “Fuzzy rule extraction from support vector machines,” in Proceedings of the Fifth
International Conference on Hybrid Intelligent Systems, 2005.
N. Barakat, and A. P. Bradley, “Rule Extraction from Support Vector Machines: A Sequential Covering Approach,” IEEE
Transactions on Knowledge and Data Engineering, vol. 19, no. 6, pp. 729-741, 2007.
26. References
Related Methods
V. Vapnik, The Nature of Statistical Learning Theory, New York: Springer, 1995.
J. R. Quinlan, Programs for Machine Learning: Morgan Kaufmann, 1993.
L. Breiman, J. Friedman, R. Olshen et al., Classification and Regression trees,
Monterey, CA: Wadsworth and Brooks, 1994.
W. W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Maching
Learning, pp. 115-123, 1995.
T. Pang-Ning, M. Steinbach, and V. Kumar, Introduction to Data Mining: Addison
Wesley, 2005.
I. Dhillon, Y. Guan, and B. Kulis, A unified view of kernel k-means, spectral
clustering and graph cuts, Univ. of Texas at Austin, 2005.
L. D. Davis, and M. Mitchell, Handbook of genetic algorithms: Van Nostrand
Reinhold, 1991.
D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning:
Addison-Wesley Longman Publishing Co., Inc., 1989.
The title of this presentation is Knowledge discovery from SVM with application to credit screening.
My name is Chen Yan-Cheng and my advisor is Chao-Ton Su.
We come from dept. of IEEM, Taiwan
My presentation will take only 15 min.
This slide shows my outlines
The first part is introduction, I will introduce the objective, motivation of this study.
The second one is related works, I will introduce the related references about the issue of rule extraction from SVM.
The third one is our proposed method.
The fourth part is performance metric.
The fifth part is experiment and results.
Final part is conclusion.
This slide shows briefly the procedure of data mining. When we obtain the real data set from the real application, and then identify the target data. The data set need to preprocess in order into the classifiers.
There are many classifier in the data mining fields. Among these data mining techniques, SVM is very powerful techniques.
SVM has strong theoretic foundations and excellent classification results.
In the data mining fields, discovering knowledge and extract rule are interesting.
Main challenge of SVM is regarded as black box analysis tool.
Because decision boundary of SVM is lacking the explicit declarative knowledge representation.
Also it present a complicated mathematical pattern.
Construct a rule extraction algorithm from SVM is our objective of this study.
This section will introduce related works about the issues of rule extraction from SVM and related studies.
Two authors mentioned that the new research issue of rule extraction from SVM.
The style of rule set can be represented into variant kinds of types.
The rule extraction algorithms can be divided into four types.
This study only consider the techniques of decision tree based rule extraction.
DT is very famous techniques in the rule learners. Each node represent a variable, each leaf represent a outcome.
RIPPER uses the strategy of general to specific and FOIL information gains to general rule set.