P1121133727

Yan-Cheng Chen and Chao-Ton Su
Dept. of Industrial Engineering and Engineering Management,
National Tsing Hua University,
HsinChu, Taiwan
Knowledge Discovery from Support
Vector Machines with Application to
Credit Screening

Outlines
 Introduction
 Related Works
 Proposed Method
 Performance Metric
 Experiment and Results
 Conclusions

Introduction
3
Data
Data
Data
Original Data Sets Data Preprocessing
Mining
Rules
Target Data
Classifiers Decision TreeDecision Tree
Neural Network
(NN)
Neural Network
(NN)
Nearest
Neighbor Classifier
Nearest
Neighbor ClassifierSupport Vector
Machines (SVM)
Theoretical foundationsTheoretical foundations Excellent resultsExcellent results

Motivation & Objective
4
Main Challenge:
SVM is regarded as black box analysis tool
Decision Boundary of SVM
൥
� 𝟏𝟏 = 𝟑� ⋯ � 𝟏� = �𝟑𝟖�
⋮ ⋱ ⋮
� 𝐧𝟏 = 𝟒𝟑 ⋯ � 𝐧� = 𝟑�𝟖
൩
𝐧×�
൥
� 𝟏 = +�. 𝟗𝟑𝟖
⋮
�� = −�.�𝟑�
൩
�×𝟏
Lack of explicit declarative
knowledge representation
Present a complicated
mathematical pattern
�൥�൥= � ∙ ϕ(�) + �
Objective
Develop
a rule extraction
algorithm from SVM
Develop
a rule extraction
algorithm from SVM

Rule extraction from SVM
 The new research issue of rule extraction from SVM
(Núñez (2002), Martens (2008), and Barakat (2010))
 Expressive power of the extracted rules depends on the language
used to express those rules.
 Propositional rules (simple if-then expressions),
 M-of-N rules (If at least M of N conditions (C1, C2,…, Cn), then…)
 Fuzzy rules
 Rule extraction algorithms from SVM can be divided into four
types:
 Region-based rule extraction (Núñez (2006) and G. Fung (2005))
 Decision tree-based rule extraction (Martens (2007, 2008) and Barakat (2006))
 Sequential covering rule extraction (Barakat, 2007)
 Fuzzy rule extraction (Chaves, 2005)
6

Decision tree-based rule extraction
 The main idea is to generate artificial examples from a decision
boundary and then use them with tree induction algorithms.
(Barakat, 2004)
 The number of SVs easily affects the learning-based method
because with few data points, it is difficult to generate good
artificial label examples.
7

Direct rule learners
 Decision Tree (Quinlan, 1993)
 A hierarchical tree structure is used to classify classes based on a series
of rules. Attributes of the classes can be any type of variable from
binary, nominal, ordinal, and quantitative values; the classes must be the
qualitative type.
 Each node represent a variable, and each leaf represent a outcome.
 RIPPER (Cohen, 1995)
 The general-to-specific strategy generates a rule
 FOIL’s information gains measure chooses the best conjunct to be
added into rule antecedent.
8

Related Methods
 Weighted Kernel k-means (Dhillon, 2005)
 Use the kernel trick approach to transform all data points into the high-
dimensional space.
 Locally optimize a number of graph partitioning.
 Discovering the suitable prototype vectors corresponding to each cluster
 Genetic Algorithm (Goldberg,1989)
 Generate symbolic rules directly from data is the ease of constructing
chromosome structures with any type of variables
 Using the chromosomes to represent the if–then rule condition.
 Identify the suitable value for each attribute in the high-dimensional
space.
9

Proposed Method: KCGex-SVM
 Rule extraction from SVMs by
using weight kernel k-means
algorithm and GA (KCGex-SVM)
 Integrates, SVs, Pi ,and GAs.
 The procedure of constructing the
rule set into each hypercube form.
 Fig. (a) illustrates the scatter plot for
support vectors and data points are
classified by three classes.
 Fig. (b) shows the application of the
weighted kernel k-means algorithms in
determining cluster center for each cluster .
 Fig. (c) illustrates the application of GAs in
identifying the hypercube to construct the
interval for each cluster.
 Fig. (d) shows that each hypercube can
generate a rule set.
11

Proposed Method: Procedure of KCGex-SVM
 Step 1 involves a prepossessing step.
 Step 2 generates support vectors from
SVM.
 Step 3 uses the weighted kernel k-
means algorithm to find prototype
center corresponding to each cluster.
12

Proposed Method: Procedure of KCGex-SVM
 Step 4 describes that the chromosome design includes any type of
variable, either discrete or continuous. The mapping from a binary
string to the problem of rule extraction for each variable and each
threshold are completed as follows:
where t is jth
generation of a chromosome.
 Step 5 evaluates each chromosome using the defined fitness function.
where Cd1, Cd2, and Cy are the penalty parameters greater than zero.
13
ቌ��1 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
− ��2 ቌ �� (�� − 𝑆�� )2
�
�� ∈��,𝑆�� ∈��
ቌ + �� ቌ (yቌොቌොොොො� − y�)2
�
�=1
,

Proposed Method 3
Procedure of KCGex-SVM
 Step 6 involves breeding new organisms through crossover
and mutation, and considers roulette wheel selection
 Step 7 repeats iterations until reaching the final
termination condition.
 Step 8 prunes redundant rules from the candidate rules.
 Step 9 uses the best chromosome corresponding to the best
fitness value to construct the rule set.
14

Performance Evaluation
 In the two-class case with classes yes and no, incidence or absence, and so
on, a single prediction has four different possible outcomes.
 Accuracy: (TP + TN) / ( TP + FN + FP + TN)
 Comprehensibility indicates the number of rules and the number of
antecedent conditions.
16
Predicted Class
Yes No
Actual
Class
Yes
TP
(True positive)
FN
(False negative)
No
FP
(False positive)
TN
(True negative)

Experiment
 Three credit screening data sets were selected from the
University of California, Irvine (UCI) repository
18
Data set Example Classes Continuous Discrete
Japanese 124 1:0.48 5 5
Austrian 690 1:1.24 8 5
German 1000 1:0.4285 13 6

Experiment
 Numerical Experiment
 The experiment compared KCGex-SVM with direct rule learners,
such as C4.5, and RIPPER.
 Parameter settings
19
Parameter Settings
Kernel function Radial basis kernel
C and σ Grid search method
Population size 200
Crossover rate ranged from 0.2 to 0.6
Mutation rate ranged from 0.01 to 0.1
Termination
condition
at 1000 iterations reached or the
same results after 100 iterations

Results
 The following table shows the results comparison of KCGex-SVM with the other
rule learners on the three credit screening data sets.
 Our proposed method, KCGex-SVM obtain the better performance in term of the
accuracy.Data Methods
Index
SVMs
ALBA
KCGex-SVM C4.5 RIPPER.Set
C4.5 RIPPER
German
Acc. 78.98 73.53 73.05 75.11 72.67 71.77
# of rule － 64 11.6 5 9 3
# of ante. －－－ 10 11 5
Austrian
Acc. 85.04 －－ 84.78 84.78 84.78
# of rule
－－－ 5 2 2
# of ante. －－－ 9 2 1
Japanese
Acc. 68.29 －－ 68.29 68.29 65.85
# of rule －－－ 2 2 2
# of ante. －－－ 2 2 3
Ave. of acc. 77.44 73.53 73.05 76.06 75.25 74.13
Total # of rules － 64 11.6 12 13 7
Total # of ante. －－－ 21 15 9

Results
 The rule sets of the three credit screening data sets
No. German Austrian Japanese
# 1
X1 = 1
& X2 ＞ 17,
X3 ＞ 47,
Then Class II
X8 = 1 & X9 = 1,
Then Class I
X10 2, Then≦
Class II
# 2
X1 = 1 & X2 16≧
Then Class II
X8 = 1 & X14≧
259, Then Class I
Otherwise Class I
# 3
X1 = 2 & X2 24≧
& X5 2,≦
Then Class II
X8 = 1&
X13 110≦
& X14 1, Then≦
Class I
# 4
X16 1 and X11 2,≧ ≦
Then Class II
X8 = 1 & X6 = 8,
Then Class I
# 5 Otherwise Class I Otherwise Class II

Conclusion
 KCGex-SVM combines GAs, prototype centers, and
information provided by SVMs to enhances the explanation
capability of SVMs.
 KCGex-SVM can not only generate the rule set, but can also
select the important variable from the credit screening data sets.
 The proposed method performs well in terms of its average
accuracy than those of most popular direct rule learners in the
field of data mining based on three credit screening data sets.

References
 The issue of rule extraction from SVM
 D. Martens, J. Huysmans, R. Setiono et al., “Rule Extraction from Support Vector Machines: An Overview of Issues and
Application in Credit Scoring,” Studies in Computational Intelligence, vol. 80, pp. 33-63, 2008.
 N. Barakat, and A. P. Bradley, “Rule extraction from support vector machines: A review,” Neurocomputing, vol. 74, no. 1-3,
pp. 178-190, 2010.
 H. Núñez, C. Angulo, and A. Catala, “Rule-Based Learning Systems for Support Vector Machines,” Neural Process. Letter.,
vol. 24, no. 1, pp. 1-18, 2006.
 H. Núñez, C. Angulo, and A. Catala, “Rule Extraction from Support Vector Machines,” Proc. European Symp. Artificial Neural
Networks, pp. 107-112, 2002.
 G. Fung, S. Sandilya, and R. B. Rao, “Rule extraction from linear support vector machines,” in Proceedings of the eleventh
ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, 2005.
 D. Martens, B. Baesens, and T.V. Gestel, “Decompositional rule extraction from support vector machines by active learning,”
IEEE Transactions on Knowledge and Data Engineering vol. 21, pp. 177-190, 2009.
 D. Martens, B. Baesens, T. Van Gestel et al., “Comprehensible credit scoring models using rule extraction from support vector
machines,” European Journal of Operational Research, vol. 183, no. 3, pp. 1466-1476, 2007.
 N. Barakat, and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,” International Journal of
Computational Intelligence, vol. 2, no. 1, pp. 59-62, 2006.
 A.C. Chaves, M. Vellasco, and R. Tanscheit, “Fuzzy rule extraction from support vector machines,” in Proceedings of the Fifth
International Conference on Hybrid Intelligent Systems, 2005.
 N. Barakat, and A. P. Bradley, “Rule Extraction from Support Vector Machines: A Sequential Covering Approach,” IEEE
Transactions on Knowledge and Data Engineering, vol. 19, no. 6, pp. 729-741, 2007.

References
 Related Methods
 V. Vapnik, The Nature of Statistical Learning Theory, New York: Springer, 1995.
 J. R. Quinlan, Programs for Machine Learning: Morgan Kaufmann, 1993.
 L. Breiman, J. Friedman, R. Olshen et al., Classification and Regression trees,
Monterey, CA: Wadsworth and Brooks, 1994.
 W. W. Cohen, “Fast Effective Rule Induction,” Proc. 12th Int'l Conf. Maching
Learning, pp. 115-123, 1995.
 T. Pang-Ning, M. Steinbach, and V. Kumar, Introduction to Data Mining: Addison
Wesley, 2005.
 I. Dhillon, Y. Guan, and B. Kulis, A unified view of kernel k-means, spectral
clustering and graph cuts, Univ. of Texas at Austin, 2005.
 L. D. Davis, and M. Mitchell, Handbook of genetic algorithms: Van Nostrand
Reinhold, 1991.
 D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning:
Addison-Wesley Longman Publishing Co., Inc., 1989.

P1121133727

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie P1121133727

Ähnlich wie P1121133727 (20)

Mehr von Ashraf Aboshosha

Mehr von Ashraf Aboshosha (20)

P1121133727

Hinweis der Redaktion