05 classification 1 decision tree and rule based classification

สอนโดย ดร.หทัยรัตน์ เกตุมณีชัยรัตน์
ภาควิชาการจัดการเทคโนโลยีการผลิตและสารสนเทศ
บทที่ 5: การจาแนกประเภทข้อมูล 1
(Classification 1)
1

Classification 1
2
 Basic Concept
 Decision Tree
 Rule-Base Classification

การจาแนกประเภทข้อมูล (Classification)
 เป็นงานทาเหมืองข้อมูลประเภทหนึ่ง โดยมีตัวอย่างในชุดข้อมูลสอน (training set) ที่
ใช้ จะมีคุณลักษณะหนึ่งซึ่งบอกคาประเภทของตัวอย่างนั้น เราเรียกค่าคุณลักษณะนี้ว่า
ฉลากบอกประเภท (class label) ซึ่งเป็นค่าข้อมูลแบบ categorical
 การประยุกต์ใช้งานการจาแนกประเภทข้อมูล มักพบใน
 การอนุมัติคาขอมีบัตรเครดิต (Credit Approval)
 การทาตลาดลูกค้ากลุ่มเป้ าหมาย (Target marketing)
 การตรวจวินิจฉัย (Medical Diagnosis)
 การวิเคราะห์ประสิทธิผลการรักษา (Treatment effectiveness analysis)
3

Classification—มี 2 ขั้นตอนหลัก คือ
 การสร้างแบบจาลอง (Model construction):
 เซตของตัวอย่างที่ใช้ในการสร้างแบบจาลอง เรียกว่า ชุดข้อมูลสอน
 ตัวอย่างแต่ละตัวจะมีคุณลักษณะหนึ่ง ซึ่งบอกค่าประเภทที่กาหนดไว้ล่วงหน้า
 แบบจาลองที่สร้างขึ้น แสดงผลลัพธ์การเรียนรู้การจาแนกประเภทในรูปแบบ กฎ
การจาแนกประเภท (classification rules) ต้นไม้ตัดสินใจ (decision trees) หรือสูตร
ทางคณิตศาสตร์ ต่างๆ
 การนาแบบจาลองที่ได้ไปใช้ (Model usage):
 เพื่อการจาแนกประเภทตัวอย่างในอนาคต โดยจะต้องมีการประมาณค่าความ
แม่นยา (accuracy) ของแบบจาลองที่ได้เสียก่อนการนาไปใช้ โดย
 เปรียบเทียบค่าฉลากประเภทที่ทราบล่วงหน้าของตัวอย่างในชุดทดสอบ(test set)
กับค่าผลลัพธ์การจาแนกประเภทที่ได้จากแบบจาลอง
 อัตราความแม่นยา (accuracy rate) คานวณได้จากเปอร์เซ็นต์ผลการจาแนกประเภท
ได้อย่างถูกต้องของแบบจาลองที่สร้างขึ้น
 ชุดข้อมูลทดสอบเป็นอิสระไม่ขึ้นต่อชุดข้อมูลสอน มิเช่นนั้นจะเกิด overfitting
4

ขั้นตอนที่ 1: Model Construction
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = ‘professor’
OR (rank = ‘Assistant Prof’
and years > 6)
THEN tenured = ‘yes’
Classifier
(Model)
5

ขั้นตอนที่ 2: Use the Model in Prediction
Classifier
Testing
Data
NAM E R A N K YEARS TENURED
Tom Assistant Prof 2 no
M erlisa Associate Prof 7 no
G eorge Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
6

Over fitting
 คือ ปรากฎการที่แบบจาลองหรือตัวจาแนกประเภทที่ได้มีความพอดีเกินไปกับ
ชุดข้อมูลสอน กล่าวคือ จะให้ผลการจาแนกประเภทที่ถูกต้องสาหรับตัวอย่าง
ในชุดข้อมูลสอน แต่ใช้ไม่ได้ดีกับกรณีตัวอย่างทั่วไป
 ปรากฎการณ์ดังกล่าวเกิดขึ้นเนื่องจาก
 เซตข้อมูลมีขนาดเล็กเกินไป
 มีความผิดปกติของข้อมูลที่ใช้สอน
 ตัวอย่างเช่น ถ้าทุกคนที่ชื่อ “John” ในชุดข้อมูลสอนเป็ นลูกค้าที่สร้างกาไร
ให้กับบริษัท ตัวจาแนกประเภทข้อมูลอาจให้ผลสรุปที่ผิดว่า ลูกค้าใดๆ ก็ตามที่
ชื่อ John จะเป็นลูกค้าที่สร้างกาไรให้กับบริษัท
7

Supervised vs. Unsupervised Learning
 Supervised learning (classification)
 เราทราบค่าประเภทของตัวอย่างในชุดข้อมูลสอนล่วงหน้า การจาแนก
ประเภทตัวอย่างใหม่ขึ้นอยู่กับแบบจาลองที่สร้างจากชุดข้อมูลสอน การ
จาแนกประเภทเป็นตัวอย่างการเรียนรู้แบบ supervised
 Unsupervised learning (clustering)
 เราไม่ทราบค่าประเภทของตัวอย่างในชุดข้อมูลสอนล่วงหน้า แต่จะพยายาม
เรียนรู้เพื่อระบุประเภทหรือกลุ่มที่ซ่อนอยู่ในข้อมูล ตัวอย่างการเรียนรู้แบบ
unsupervised ได้แก่ การจัดกลุ่ม
8

การเตรียมข้อมูลสาหรับการจาแนกประเภทข้อมูล
 การทาความสะอาดข้อมูล (Data cleaning)
 เพื่อจัดการกับข้อมูลที่สูญหาย (missing) และลดข้อมูลรบกวน (noise) หรือ
ข้อมูลที่มีค่าผิดปกติ (outlier)
 การวิเคราะห์ความเกี่ยวข้องของข้อมูล (Relevance analysis)
 คัดเลือกคุณลักษณะ (feature selection) ที่เกี่ยวข้องกับงานทาเหมืองข้อมูล
และกาจัดคุณลักษณะที่ซ้าซ้อน หรือที่ไม่เกี่ยวออกไป
 การแปลงข้อมูล (Data transformation)
 ได้แก่ การนอร์มอลไลซ์ข้อมูล (Normalization) เพื่อให้การกระจายของ
ข้อมูลอยู่ในช่วงที่กาหนด หรือการทาข้อมูลให้อยู่ในรูปทั่วไป
(Generalization)
9

วิธีการจาแนกประเภทข้อมูล
 ต้นไม้ตัดสินใจ (Decision tree)
 การเรียนรู้แบบเบย์(Bayesian Classification)
 ข่ายงานประสาทเทียม (Neural Networks)
10

ต้นไม้ตัดสินใจ (Decision Tree)
 การเรียนรู้ของต้นไม้ตัดสินใจ (Decision Tree) เป็นการเรียนรู้โดยการจาแนกประเภท
(Classification) ข้อมูลออกเป็นกลุ่ม (class) ต่างๆ โดยใช้คุณลักษณะ (attribute) ข้อมูล
ในการจาแนกประเภท ต้นไม้ตัดสินใจที่ได้จากการเรียนรู้ทาให้ทราบว่า คุณลักษณะ
ใดเป็นตัวกาหนดการจาแนกประเภท และคุณลักษณะแต่ละตัวมีความสาคัญมากน้อย
ต่างกันอย่างไร
 เพราะฉะนั้น การจาแนกประเภทมีประโยชน์ช่วยให้ผู้สามารถวิเคราะห์ข้อมูลและ
ตัดสินใจได้ถูกต้องยิ่งขึ้น
11

Decision Trees
 ชุดข้อมูล
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
attribute
Attribute value
12

ส่วนประกอบของผลลัพธ์ของการเรียนรู้ต้นไม้ตัดสินใจ:
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40
age? Internal node
Branch
Leaf node
Splitting Attribute
13

 โนดภายใน (internal node) คือ คุณลักษณะต่าง ๆ ของข้อมูล ซึ่งเมื่อข้อมูลใดๆ
ตกลงมาที่โนด จะใช้คุณลักษณะนี้เป็นตัวตัดสินใจว่าข้อมูลจะไปในทิศทางใด
โดยโนดภายในที่เป็นจุดเริ่มต้นของต้นไม้ เรียกว่า โนดราก
 กิ่ง (branch, link) เป็นค่าของคุณลักษณะในโนดภายในที่แตกกิ่งนี้ออกมา ซึ่ง
โนดภายในจะแตกกิ่งเป็นจานวนเท่ากับจานวนค่าของคุณลักษณะในโนด
ภายในนั้น
 โนดใบ (leaf node) คือ กลุ่มต่าง ๆ ซึ่งเป็นผลลัพธ์ในการจาแนกประเภทข้อมูล
ส่วนประกอบของผลลัพธ์ของการเรียนรู้ต้นไม้ตัดสินใจ:
14

ทาไมถึงต้องใช้ Decision Tree?
 เป็นเทคนิคที่ให้ผลเร็วเมื่อเทียบกับเทคนิคอื่น
 ผลลัพธ์ที่ได้สามารถนาไปใช้ได้งาน และสามารถแปลงเป็นกฎได้
 สามารถนาไปประยุกต์ใช้กับการค้นหาข้อมูล SQL
 ให้ความแม่นยาสูง
15

ตัวอย่างของประเภทการพิจารณา
Add Your Text
 Nominal Attribute
 Ordinal Attribute
 Continuous Attribute
accept
Yes No
color
Red,Blue Green
color
Red
blue
Green
level
Low,medium high
level
low
medium
high
income
< 50 >= 50
income
< 10
[10, 20) [20, 30)
>= 30
16

ปัจจัยที่ใช้ในการพิจารณา
 Attribute Type
 Nominal แบ่งเป็นกลุ่ม
 Ordinal แบ่งเป็นกลุ่มโดยมีการเรียงลาดับของกลุ่ม
 Continuous เป็นข้อมูลมีความต่อเนื่องกัน
 จานวนของทางที่จะแยก
 2 way split (Binary split)
 Multi-way split
17

ขั้นตอนวิธีทา Decision Tree
 ต้นไม้ตัดสินใจสร้างโดยวิธีแบบ top-down recursive
 เริ่มต้นด้วยนาตัวอย่างการสอน มาสร้างเป็นราก
 Attribute ควรอยู่ในรูปของ Categorical คือ ข้อมูลชนิดกลุ่ม หากเป็นข้อมูลที่อยู่ใน
รูป Continuous หรือ Numeric เป็นข้อมูลมีความต่อเนื่องกัน ควรทาแบ่งข้อมูลให้
เป็นกลุ่มก่อน
 การสร้างต้นไม้ตัดสินใจมีพื้นฐานมาจากวิธีการเลือก Attribute
 เมื่อไหร่ถึงจะหยุดการสร้างต้นไม้
 เมื่อทุกข้อมูลใน node นั้นเป็น Class เดียวกัน
 เมื่อทุกข้อมูลใน node นั้นมีค่าของ Attribute เหมือนกัน
18

Attribute ใดเป็นตัวจาแนกประเภทที่ดีที่สุด?
Attribute Selection
- age?
- income?
- student?
- credit_rating?
19

การเลือก Best Split
C0 : 4
C1 : 6
C0 : 5
C1 : 5
Non- Homogeneous
ข้อมูลมีลักษณะไม่เหมือนกัน
C0 : 9
C1 : 1
C0 : 0
C1 : 10
Homogeneous
ข้อมูลมีลักษณะเหมือนกัน
Input Data : 9 record of class 0,
11 record of class 1
Node ที่มีความเป็น Homogeneous มากกว่า จะมีคุณสมบัติแยกข้อมูลได้ดีกว่า
? ?
20

Measure ที่ใช้ในการเลือก Attribute:
 **Gini Index ( ค่าที่บ่งบอกว่า attribute สมควรนามาใช้เป็นคุณลักษณะในการแบ่ง )
 Entropy ( การคานวณหาค่าความยุ่งเหยิงของข้อมูลกลุ่มหนึ่ง ) ใช้อัลกอรึทึม ชื่อว่า
ID3
 Misclassification error (การคานวณความผิดพลาดที่เกิดกับโนด t)
Classification error(ti) = 1 – Max[p(ti)]
Gini(ti) = 1 - [p(ti)]2
i=1
Entropy(ti) = 1 - [p(ti)]log2p(ti)
i=0
N
N
21

การเลือก Attribute
 ขั้นตอนที่ 1 ถ้าข้อมูลนาเข้า T มีการแบ่งประเภทคลาสทั้งหมด n classes, Gini index,
Gini(T) คือ
โดยที่ pj เป็นความถี่ของคลาส j ในข้อมูลนาเข้า T
 ขั้นตอนที่ 2 ถ้าข้อมูลนาเข้า T แบ่งออกเป็น 2 กลุ่ม คือ T1 and T2 และมีความถี่โดยรวม N1
and N2 ตามลาดับ, Ginispit(T) is defined as
 Attribute ที่ทาให้ Ginisplit(T) น้อยที่สุดจะทาเป็นค่าที่ดีที่สุด
)()()( 2
2
1
1
tgini
N
N
tgini
N
NTGinisplit 
Gini(ti) = 1 - [p(ti)]2
i=1
n
22

Gini Indexตัวอย่างการคานวณ
C0 : 8
C1 : 2
C0 : 0
C1 : 4
yes No
t1 t2
N1 N2
C0 8 0
C1 2 4
Gini(t1) = 1 – (8/10)2 – (2/10)2 = 0.32
Gini(t2) = 1 – (0/4)2 – (4/4)2 = 0
Ginisplit (T) = (10/14)(0.32) + (4/14)(0)
= 0.2286
Owner
23

ตัวอย่างการเลือก Attribute
24

ตัวอย่าง การเลือก Attribute
จากตารางข้างต้น ลองเลือก Attribute age ก่อน
คานวณตาม Gini Index ได้ดังต่อไปนี้
Gini(t1) = 1 – (2/5)2 – (3/5)2 = 0.48
Gini(t2) = 1 – (4/4)2 – (0/4)2 = 0
Ginisplit (T) = (5/14)(0.48) + (4/14)(0)+(5/14)(0.48) = 0.343
C0 : 2
C1 : 3
C0 : 3
C1 : 2
<=30
t1
31..40
>40
C0 : 4
C1 : 0
t2 t3
age
C0 = yes
C1 = no
Gini(t3) = 1 – (3/5)2 – (2/5)2 = 0.48
N1 N2 N3
C0 2 4 3
C1 3 0 2
25

จากตารางข้างต้น ลองเลือก Attribute income ก่อน
คานวณตาม Gini Indexได้ดังต่อไปนี้
Gini(t1) = 1 – (3/4)2 – (1/1)2 = 0.56
Gini(t2) = 1 – (4/6)2 – (2/6)2 = 0.44
Ginisplit (T) = (4/14)(0.56) + (6/14)(0.44)+(4/14)(0.5) = 0.491
C0 : 3
C1 : 1
C0 : 2
C1 : 2
low
t1
highmedium
C0 : 4
C1 : 2
t2 t3
income
C0 = yes
C1 = no
Gini(t3) = 1 – (2/4)2 – (2/4)2 = 0.5
N1 N2 N3
C0 3 4 2
C1 1 2 2
26

จากตารางข้างต้น ลองเลือก Attribute student ก่อน
Gini(t1) = 1 – (6/7)2 – (1/7)2 = 0.24
Gini(t2) = 1 – (3/7)2 – (4/7)2 = 0.49
Ginisplit (T) = (7/14)(0.24) + (7/14)(0.49) = 0.365
C0 : 6
C1 : 1
C0 : 3
C1 : 4
yes
t1
no
t3
student
C0 = yes
C1 = no
N1 N2
C0 6 3
C1 1 4
27

จากตารางข้างต้น ลองเลือก Attribute income ก่อน
Gini(t1) = 1 – (6/8)2 – (2/8)2 = 0.37
Gini(t2) = 1 – (3/6)2 – (3/6)2 = 0.5
Ginisplit (T) = (8/14)(0.37) + (6/14)(0.5) = 0.426
C0 : 6
C1 : 2
C0 : 3
C1 : 3
fair
t1
excellent
t2
credit
C0 = yes
C1 = no
N1 N2
C0 6 3
C1 2 3
28

จากการคานวณ เลือก Attribute ที่มีค่า Ginisplit (T) น้อยที่สุด
Ginisplit (age) = (5/14)(0.48) + (4/14)(0)+(5/14)(0.48) = 0.343
Ginisplit (income) = (4/14)(0.56) + (6/14)(0.44)+(4/14)(0.5) = 0.491
Ginisplit (student) = (7/14)(0.24) + (7/14)(0.49) = 0.365
Ginisplit (Credit) = (8/14)(0.37) + (6/14)(0.5) = 0.426
29

จากการพิจารณาค่า Ginisplit (T) เลือก age เป็น Attribute ที่ใช้พิจารณาเป็น Root
C0 : 2
C1 : 3
C0 : 3
C1 : 2
<=30
t1
31..40
>40
C0 : 4
C1 : 0
t2 t3
age
C0 : 2
C1 : 3
C0 : 3
C1 : 2
<=30
t1
31..40
>40
t3
age
yes
select attribute??
30

เลือก Attribute ต่อโดยพิจารณาเฉพาะกิ่งของ age <=30 ก่อน โดยเลือก
Attribute ที่เหลือคือ income, student และ credit
31

จากตารางข้างต้น ลองเลือก Attribute income เป็นตัวพิจารณากิ่งของ
age <=30 ก่อน คานวณตาม Gini Indexได้ดังต่อไปนี้
C0 : 2
C1 : 3 C0 : 3
C1 : 2
<=30
31..40
>40
age
yesincome
C0 : ?
C1 : ?
C0 : ?
C1 : ?
low
t1
highmedium
C0 : ?
C1 : ?
t2 t3
32

age <=30 ก่อน คานวนตาม Gini Indexได้ดังต่อไปนี้
C0 : 2
C1 : 3 C0 : 3
C1 : 2
<=30
31..40
>40
age
yesincome
C0 : 1
C1 : 0
C0 : 0
C1 : 2
low
t1
highmedium
C0 : 1
C1 : 1
t2 t3
33

C0 : 2
C1 : 3 income
C0 : 1
C1 : 0
C0 : 0
C1 : 2
low
t1
highmedium
C0 :1
C1 : 1
t2 t3
Gini(t1) = 1 – (1/1)2 – (0/1)2 = 0
Gini(t2) = 1 – (1/2)2 – (1/2)2 = 0.5
Ginisplit (T) = (1/5)(0) + (2/5)(0.5)+(2/5)(0) = 0.2
C0 = yes
C1 = no
Gini(t3) = 1 – (0/2)2 – (2/2)2 = 0
N1 N2 N3
C0 1 1 0
C1 0 1 2
34

จากตารางข้างต้น ลองเลือก Attribute student เป็นตัวพิจารณากิ่งของ
C0 : 2
C1 : 3 C0 : 3
C1 : 2
<=30
31..40
>40
age
yesstudent
C0 : 2
C1 : 0
C0 : 0
C1 : 3
yes
t1
no
t3
35

จากตารางข้างต้น ลองเลือก Attribute student เป็นตัวพิจารณากิ่งของ
C0 : 2
C1 : 3 student
C0 : 2
C1 : 0
C0 : 0
C1 : 3
yes
t1
no
t3
Gini(t1) = 1 – (2/2)2 – (0/2)2 = 0
Gini(t2) = 1 – (0/3)2 – (3/3)2 = 0
Ginisplit (T) = (2/5)(0) + (3/5)(0) = 0
C0 = yes
C1 = no
N1 N2
C0 2 0
C1 0 3
36

จากตารางข้างต้น ลองเลือก Attribute credit เป็นตัวพิจารณากิ่งของ
C0 : 2
C1 : 3 C0 : 3
C1 : 2
<=30
31..40
>40
age
yescredit
C0 : 1
C1 : 2
C0 : 1
C1 : 1
fair
t1
excellent
t2
37

จากตารางข้างต้น ลองเลือก Attribute credit เป็นตัวพิจารณากิ่งของ
C0 : 2
C1 : 3 credit
C0 : 1
C1 : 2
C0 : 1
C1 : 1
fair
t1
excellent
t2
Gini(t1) = 1 – (1/3)2 – (2/3)2 = 0.44
Gini(t2) = 1 – (1/2)2 – (1/2)2 = 0.5
Ginisplit (T) = (3/5)(0.44) + (2/5)(0.5) = 0.464
C0 = yes
C1 = no
N1 N2
C0 1 1
C1 2 1
38

จากการพิจารณาค่า Ginisplit (age,?) เลือก Attribute ไหนที่ใช้พิจารณา
เป็น Attribute ถัดไป
C0 : 2
C1 : 3 income
C0 : 1
C1 : 0
C0 : 0
C1 : 2
low
t1
highmedium
C0 : 0
C1 : 1
t2 t3
C0 : 2
C1 : 3 student
C0 : 2
C1 : 0
C0 : 0
C1 : 3
yes
t1
no
t3
C0 : 2
C1 : 3 credit
C0 : 1
C1 : 2
C0 : 1
C1 : 1
fair
t1
excellent
t2
Ginisplit (T) = 0.464
Ginisplit (T) = 0.2 Ginisplit (T) = 0
39

จากการพิจารณา Attribute income และ student ซึ่งมีค่า Gini เท่ากัน
แต่ student มี จานวนกิ่งน้อยกว่า จึงเลือก Attribute student
C0 : 2
C1 : 3 C0 : 3
C1 : 2
<=30
31..40
>40
age
yesstudent
C0 : 2
C1 : 0
C0 : 0
C1 : 3
yes
t1
no
t3
yes no
40

จากการพิจารณา Attribute income และ student ซึ่งมีค่า Gini เท่ากัน
แต่ student มีจานวนกิ่งน้อยกว่า จึงเลือก Attribute student
C0 : 3
C1 : 2
<=30
31..40
>40
age
yesstudent
yes no
yes no
ต่อไป Select Attribute เพื่อเป็นตัวพิจารณากิ่งของ age > 40
41

เลือก Attribute ต่อโดยพิจารณาเฉพาะกิ่งของ age > 40 เป็นกิ่งสุดท้าย
โดยเลือก Attribute ที่เหลือคือ income และ credit
42

C0 : 3
C1 : 2
<=30
31..40
>40
age
yesstudent
yes no
yes no
income
C0 : 1
C1 : 1
C0 : 0
C1 :0
low
t1
highmedium
C0 : 2
C1 : 1
t2 t3
Gini(t1) = 1 – (1/2)2 – (1/2)2 = 0.5
Gini(t2) = 1 – (2/3)2 – (1/3)2 = 0.67
Ginisplit (T) = (2/5)(0.5) + (3/5)(0.67)+ (0/5)(1) = 0.602
Gini(t2) = 1 – (0/0)2 – (0/0)2 = 1
43

C0 : 3
C1 : 2
<=30
31..40
>40
age
yesstudent
yes no
yes no
credit
C0 : 3
C1 : 0
C0 : 0
C1 : 2
fair
t1
excellent
t2
Gini(t1) = 1 – (3/3)2 – (0/3)2 = 0
Gini(t2) = 1 – (0/2)2 – (2/2)2 = 0
Ginisplit (T) = (3/5)(0) + (2/5)(0) = 0
44

<=30
31..40
>40
age
yesstudent
yes no
yes no
credit
fair excellent
ผลลัพธ์จากการเรียนรู้ต้นไม้ตัดสิน
yes no
45

<=30
31..40
>40
age
yesstudent
yes no
yes no
credit
fair excellent
yes no
Age Income Student Credit_rate Buy
computer
<=30 low yes fair ?
การนาไปใช้สาหรับต้นไม้ตัดสินใจ














46

กฎข้อที่ 1:: If (age<=30) and (student = yes)
Then buy computer = yes
กฎข้อที่ 2:: If (age<=30) and (student = no)
Then buy computer = no
กฎข้อที่ 3:: If (age> 30 and age < 40)
กฎข้อที่ 4:: If (age > 40) and (credit = fair)
กฎข้อที่ 5:: If (age > 40) and (credit = excellent)
Then buy computer = no
การนาไปใช้สาหรับต้นไม้ตัดสินใจ (แปลงเป็น กฎ)
47

Overfitting in Decision Tree
ถ้าเพิ่มข้อมูลเข้าไป training example #15:
Age Income Student Credit_rate Buy
computer
<=30 high yes fair no
What effect on this Tree?
<=30
31..40
>40
age
yesstudent
yes no
yes no
credit
fair excellent
yes no
48

การหลีกเลี่ยงเหตุการณ์ Overfitting
 Overfitting
 หากมีหลายกิ่งอาจจะทาให้การจาแนกประเภทข้อมูลผิดเพี้ยนไป และมีผล
ทาให้เกิดข้อมูลรบกวนหรือข้อมูลผิดปกติได้
 ความถูกต้องน้อยสาหรับตัวอย่างที่ไม่เคยเห็นมาก่อน (unseen samples)
 มี 2 แนวทาง
 Prepruning
 Postpruning
49

 ตัดเล็มกิ่งไม้ขณะเรียนรู้ เกิดขึ้นขณะที่กาลังเรียนรู้และสร้างต้นไม้ตัดสินใจ
โดยดูว่าถ้าโนดลูกที่สร้างนั้นมีความผิดพลาดในการจาแนกประเภทกลุ่ม
มากกว่าความผิดพลาดของกลุ่มที่มีอยู่เดิมเมื่อยังไม่ได้จาแนกประเภทก็ไม่จาเป็น
ที่จะต้องสร้างโนดนั้นและตัดต้นไม้ย่อยที่มีโดนนั้นเป็นรากออกไป
Preprunning
50

Postpruning
 การตัดเล็มหลังการเรียนรู้จะเสียเวลาในการคานวณมากกว่าการตัดเล็มขณะ
เรียนรู้ แต่มักจะให้ต้นไม้ตัดสินใจที่น่าเชื่อถือได้มากกว่า ดังนั้น จึงใช้ค่าความ
ผิดพลาด (Error-based prunning) ในการรวมโนดย่อยที่ต้องการตัดดูก่อน เพื่อ
ดูว่าในโนดนั้นๆ จะไม่ทาให้ค่าความผิดพลาดเพิ่มขึ้น จากนั้นจึงค่อยตัด
ออกไป
51

คุณภาพของต้นไม้ตัดสินใจ
 ความแม่นยา (Accuracy)
คือ การที่ต้นไม้ตัดสินใจซึ่งเป็นผลลัพธ์จากกระบวนการเรียนรู้สามารถทานาย
กลุ่มของตัวอย่างใหม่ได้อย่างถูกต้อง
 ความซับซ้อน (Complexity)
วัดได้จากขนาดของต้นไม้ และจานวนโนดใบ
 ความเร็ว (Speed)
วัดจากค่าใช้จ่ายการคานวณในการสร้างและใช้ต้นไม้ตัดสินใจในการทานาย
กลุ่มของข้อมูลใหม่
 รองรับข้อมูลขนาดใหญ่ (Scalability)
 ความาสามารถในการจาแนกประเภทข้อมูลขนาดใหญ่ (จานวนเป็นล้าน) ที่
ประกอบด้วยจานวนคุณลักษณะเป็นร้อยๆ ตัว ด้วยความเร็วที่รับได้
52

Holdout Method
 เป็นวิธีเหมาะกับชุดข้อมูลขนาดใหญ่ ตัวอย่างในชุดข้อมูลจะถูกแบ่งออกเป็น 2 ส่วน
แบบสุ่ม ด้วยอัตราส่วนขนาดของข้อมูลสอนเท่ากับ 2/3 และขนาดข้อมูลทดสอบ
เท่ากับ 1/3 ใช้ชุดข้อมูลสอนในการสร้างแบบจาลองการทานาย และตรวจสอบความ
ถูกต้องในการจาแนกประเภทข้อมูลใหม่หรือที่ไม่เคยเห็นมาก่อนด้วยชุดข้อมูล
ทดสอบ ค่าความแม่นยาคานวณได้จากอัตราส่วนระหว่างจานวนตัวอย่างในชุดข้อมูล
ทดสอบที่ทานายกลุ่มได้กับจานวนตัวอย่างทั้งหมดในชุดข้อมูลทดสอบ


N
i
i
N
Error
1


1 hit
0 Miss
N = No. Test Data Set
การประเมินค่าความแม่นยา
53

Data Set
Training set Test set
แบ่งเป็น 2/3 แบ่งเป็น 1/3
54

K-fold Cross validation
 เหมาะสาหรับชุดข้อมูลจานวนไม่มาก สมมติว่าขนาดของข้อมูลเท่ากับ N ตัวอย่างในชุด
ข้อมูลจะถูกแบ่งออกเป็น k ส่วน โดยแต่ละชุดข้อมูลจะมีขนาด N/k วิธีนี้จะเรียนรู้ด้วยชุด
ข้อมูลสอนและตรวจสอบความถูกต้งในการจาแนกประเภทด้วย ชุดข้อมูลทั้งหมด k
รอบ โดยที่
 รอบที่ i จะใช้ชุดข้อมูลทดสอบชุดที่ i เป็นชุดข้อมูลทดสอบ ชุดที่เหลือเป็นชุดข้อมูล
สอน เป็นต้น
 ดังนั้นค่าความแม่นยาจะคานวณได้จากอัตราส่วนระหว่างจานวนตัวอย่างในชุดหารด้วย
จานวนทั้งหมด k รอบ
 

fold
i
fold
j
ij
Total
Error
1 1


1 hit
0 Miss
N = No. Test Data Set
55

 K-Fold Cross Validation
1 2 3 4 5
Data Set
1 2 4 53
21 3 4 5
51 3 42
#1
#2
#5
.
.
.
56

 Confusion Matrix
คือการประเมินผลลัพธ์การทานาย (หรือผลลัพธ์จากโปรแกรม) เปรียบเทียบกับผลลัพธ์จริงๆ
 True Positive (TP) คือ สิ่งที่โปรแกรมทานายว่าจริง และผลการทานายบอกว่าจริง
 True Negative (TN) คือ สิ่งที่โปรแกรมทานายว่าไม่จริง และผลการทานายบอกว่าไม่จริง
 False Positive (FP) คือ สิ่งที่โปรแกรมทานายว่าจริง แต่ผลการทานายบอกว่าไม่จริง
 False Negative (FN) คือ สิ่งที่โปรแกรมทานายว่าไม่จริง แต่ผลการทานายบอกว่าจริง
57

 Sensitivity or Recall คือ ค่าที่บอกว่าโปรแกรมทานายได้ว่าจริง เป็นอัตราส่วน
เท่าไรของจริงทั้งหมด
 Specificity คือ ค่าที่บอกว่าโปรแกรมทานายได้ว่าไม่จริง เป็นอัตราส่วนเท่าไร
ของจริงทั้งหมด
 Precision คือ ค่าที่บอกว่าโปรแกรมทานายว่าจริง ถูกต้องเท่าไร
58

Accuracy = (TP+TN)
(TP+TN+FP+FN)
 Accuracy คือ ค่าที่บอกว่าโปรแกรมสามารถทานายได้แม่นยาขนาดไหน

59

Example:
 Recall : 6,954/7,000 = 0.993
 Specificity: 2,588/ 3,000 = 0.863
 Precision: 6,954/7,366 = 0.944
 Accuracy: (6,954+2,588)/10,000 = 0.954
60

Rule-Based Classifier
 Classify recordsby using a collectionof “if…then…” rules
 Rule: (Condition)  y
 where
 Condition is a conjunctionsof attributes
 y is the class label
 LHS: rule antecedent orcondition
 RHS: rule consequent
 Examplesof classificationrules:
 (Blood Type=Warm)  (Lay Eggs=Yes)  Birds
 (TaxableIncome < 50K)  (Refund=Yes)  Evade=No
61

Rule-based Classifier (Example)
R1: (Give Birth = no)  (Can Fly = yes)  Birds
R2: (GiveBirth= no)  (Live in Water = yes) Fishes
R3: (Give Birth = yes)  (Blood Type = warm) Mammals
R4: (Give Birth = no)  (Can Fly = no)  Reptiles
R5: (Live in Water = sometimes) Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
62

Application of Rule-Based Classifier
 A rule r covers an instance x if the attributes of the instance satisfy the condition
of the rule
R2: (Give Birth = no)  (Live in Water = yes)  Fishes
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R5: (Live in Water = sometimes)  Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzlybear => Mammal
hawk warm no yes no ?
grizzly bear warm yes no no ?
63

Rule Coverage and Accuracy
 Coverage of a rule:
 Fraction of records that satisfy
the antecedent of a rule
 Accuracy of a rule:
 Fraction of records that satisfy
both the antecedent and
consequent of a rule
Tid Refund Marital
Status
Taxable
Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
(Status=Single) No
Coverage = 40%, Accuracy = 50%
64

How does Rule-based Classifier Work?
R3: (Give Birth = yes)  (BloodType = warm) Mammals
R5: (Live in Water = sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
65

Characteristics of Rule-Based Classifier
 Mutually exclusive rules
 Classifier contains mutually exclusive rules if the rules are
independent of each other
 Every record is covered by at most one rule
 Exhaustive rules
 Classifier has exhaustive coverage if it accounts for every possible
combination of attribute values
 Each record is covered by at least one rule
66

From Decision Trees To Rules
YESYESNONO
NONO
NONO
Yes No
{Married}
{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Classification Rules
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the tree
67

Rules Can Be Simplified
YESYESNONO
NONO
NONO
Yes No
{Married}
{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Initial Rule: (Refund=No)  (Status=Married)  No
Simplified Rule: (Status=Married)  No
68

Effect of Rule Simplification
 Rules are no longer mutually exclusive
 A record may trigger more than one rule
 Solution?
 Ordered rule set
 Unordered rule set – use voting schemes
 Rules are no longer exhaustive
 A record may not trigger any rules
 Solution?
 Use a default class
69

Ordered Rule Set
 Rules are rank ordered according to their priority
 An ordered rule set is known as a decision list
 When a test record is presented to the classifier
 It is assigned to the class label of the highest ranked rule it has triggered
 If none of the rules fired, it is assigned to the default class
R3: (Give Birth = yes)  (Blood Type = warm)  Mammals
R5: (Live in Water = sometimes)  Amphibians
turtle cold no no sometimes ?
70

Rule Ordering Schemes
 Rule-based ordering
 Individual rules are ranked based on their quality
 Class-based ordering
 Rules that belong to the same class appear together
Rule-based Ordering
(Refund=Yes) ==> No
Class-based Ordering
(Refund=Yes) ==> No
71

Building Classification Rules
 Direct Method:
 Extract rules directly from data
 e.g.: RIPPER, CN2, Holte’s 1R
 Indirect Method:
 Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
 e.g: C4.5rules
72

Direct Method: Sequential Covering
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion is met
73

Example of Sequential Covering
(i) Original Data (ii) Step 1
74

Example of Sequential Covering…
(iii) Step 2
R1
(iv) Step 3
R1
R2
75

Aspects of Sequential Covering
 Rule Growing
 Instance Elimination
 Rule Evaluation
 Stopping Criterion
 Rule Pruning
76

Rule Growing
 Two common strategies
Status =
Single
Status =
Divorced
Status =
Married
Income
> 80K
...
Yes: 3
No: 4{ }
Yes: 0
No: 3
Refund=
No
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 3
No: 1
(a) General-to-specific
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
77

Rule Growing (Examples)
 CN2 Algorithm:
 Start from an empty conjunct: {}
 Add conjuncts that minimizesthe entropy measure: {A}, {A,B},…
 Determinethe rule consequent by taking majorityclass of instances covered by the rule
 RIPPERAlgorithm:
 Start from an empty rule: {} => class
 Add conjuncts that maximizes FOIL’sinformationgain measure:
 R0: {} => class (initialrule)
 R1: {A} => class (rule after adding conjunct)
 Gain(R0,R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
 where t: number of positiveinstances covered by both R0 and R1
p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
78

Instance Elimination
 Why do we need to eliminateinstances?
 Otherwise, the next rule is identical to
previousrule
 Why do we remove positive instances?
 Ensure that the next rule is different
 Why do we remove negative instances?
 Prevent underestimating accuracy ofrule
 Compare rules R2 and R3 in the diagram
class = +
class = -
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
-
-
-
-
- -
-
-
-
- -
-
-
-
-
-
-
-
-
-
-
+
+
++
+
+
+
R1
R3 R2
+
+
79

Rule Evaluation
 Metrics:
 Accuracy
 Laplace
 M-estimate
kn
nc



1
kn
kpnc



n : Number of instances covered by rule
nc : Number of instances covered by rule
k : Number of classes
p : Prior probability
n
nc

80

Stopping Criterion and Rule Pruning
 Stopping criterion
 Compute the gain
 If gain is not significant, discard the new rule
 Rule Pruning
 Similar to post-pruning of decision trees
 Reduced Error Pruning:
 Remove one of the conjuncts in the rule
 Compare error rate on validation set before and after pruning
 If error improves, prune the conjunct
81

Summary of Direct Method
 Grow a single rule
 Remove Instancesfrom rule
 Prune the rule (if necessary)
 Add rule to Current Rule Set
 Repeat
82

Direct Method: RIPPER
 For 2-class problem, choose one of the classes as positive class, and the other as
negative class
 Learn rules for positive class
 Negative class will be default class
 For multi-class problem
 Order the classes according to increasing class prevalence (fraction of
instances that belong to a particular class)
 Learn the rule set for smallest class first, treat the rest as negative class
 Repeat with next smallest class as positive class
83

 Growing a rule:
 Start from empty rule
 Add conjuncts as long as they improve FOIL’s informationgain
 Stop when rule no longer covers negative examples
 Prune the rule immediately usingincrementalreduced error pruning
 Measure for pruning: v = (p-n)/(p+n)
 p: number of positiveexamples covered by the rule in
the validationset
 n: number of negative examples covered by the rule in
the validationset
 Pruning method: deleteany final sequence of conditions that maximizesv
84

 Building a Rule Set:
 Use sequential covering algorithm
 Finds the best rule that covers the current set of positive examples
 Eliminate both positive and negative examples covered by the rule
 Each time a rule is added to the rule set, compute the new description length
 stop adding new rules when the new description length is d bits longer than
the smallest description length obtained so far
85

 Optimize the rule set:
 For each rule r in the rule set R
 Consider 2 alternative rules:
Replacement rule (r*): grow new rule from scratch
Revised rule(r’): add conjuncts to extend the rule r
 Compare the rule set for r against the rule set for r*
and r’
 Choose rule set that minimizes MDL principle
 Repeat rule generation and rule optimization for the remaining positive
examples
86

Indirect Methods
Rule Set
r1: (P=No,Q=No) ==> -
r2: (P=No,Q=Yes) ==> +
r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> -
r5: (P=Yes,R=Yes,Q=Yes) ==> +
P
Q R
Q- + +
- +
No No
No
Yes Yes
Yes
No Yes
87

Indirect Method: C4.5rules
 Extract rules from an unpruned decision tree
 For each rule, r: A  y,
 consider an alternative rule r’: A’  y where A’ is obtained by
removing one of the conjuncts in A
 Compare the pessimistic error rate for r against all r’s
 Prune if one of the r’s has lower pessimistic error rate
 Repeat until we can no longer improve generalization error
88

Indirect Method: C4.5rules
 Instead of ordering the rules, order subsets of rules (class ordering)
 Each subset is a collection of rules with the same rule consequent
(class)
 Compute description length of each subset
 Description length = L(error) + g L(model)
 g is a parameter that takes into account the presence of redundant
attributes in a rule set
(default value = 0.5)
89

Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
90

C4.5 versus C4.5rules versus RIPPER
C4.5rules:
(Give Birth=No, Can Fly=Yes)  Birds
(Give Birth=No, Live in Water=Yes)  Fishes
(Give Birth=Yes)  Mammals
(Give Birth=No, Can Fly=No, Live in Water=No)  Reptiles
( )  Amphibians
Give
Birth?
Live In
Water?
Can
Fly?
Mammals
Fishes Amphibians
Birds Reptiles
Yes No
Yes
Sometimes
No
Yes No
RIPPER:
(Live in Water=Yes)  Fishes
(Have Legs=No)  Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
 Reptiles
(Can Fly=Yes,Give Birth=No)  Birds
()  Mammals
91

C4.5 versus C4.5rules versus RIPPER
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 0 0 0 0 2
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
Mammals 0 2 1 0 4
PREDICTED CLASS
Amphibians Fishes Reptiles Birds Mammals
ACTUAL Amphibians 2 0 0 0 0
CLASS Fishes 0 2 0 0 1
Reptiles 1 0 3 0 0
Birds 1 0 0 3 0
Mammals 0 0 1 0 6
C4.5 and C4.5rules:
RIPPER:
92

Advantages of Rule-Based Classifiers
 As highly expressive as decision trees
 Easy to interpret
 Easy to generate
 Can classify new instances rapidly
 Performance comparable to decision trees
93

HW#5
94
 What is Classification?
 What is decision tree?
 What is rule-based classification?
 How many step of Classification?
 Please explain supervised vs. unsupervised
Learning?
 Please explain that how to avoid overfitting event?
 How many type of rule-based classification?

HW#5
95
 The following table consists of training data from a buy computer database. The data
have been generalized. Let status the class label attributes.
 Use your algorithm to construct a decision tree from the given data.
 Use your algorithm to construct a rule-based classification.

LAB5
96
 Use weka program to construct a decision tree and rule-based
classification from the given file.
 buycomputer.csv
 buyhouse_c45.csv
 buyhouse_id3.csv

05 classification 1 decision tree and rule based classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

05 classification 1 decision tree and rule based classification