SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Feature Selection â€Ļ
with RapidMiner Studio 6
(data)3â€Ļ
base|warehouse|mining
http://www.dataminingtrend.comâ€Ļ
http://facebook.com/datacube.th
Eakasit Pacharawongsakda, Ph.D.
Data Cube: http://facebook.com/datacube.th
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡ Classication āļ‚āļķāđ‰āļ™āļ­āļĒāļđāđˆāļāļąāļš āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ āļŦāļĢāļ·āļ­ featureâ€Ļ
āļ—āļĩāđˆāļ™āļģāļĄāļēāđƒāļŠāđ‰
â€Ē attribute selection āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ (āļŦāļĢāļ·āļ­ feature) â€Ļ
āļ—āļĩāđˆāļŠāļģāļ„āļąāļāđƒāļ™āļāļēāļĢāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ (correlation) āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļĨāļēāđ€āļšāļĨ (label) āļĄāļēāļ
â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļāļąāļ™āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āđ‰āļ­āļĒ
â€Ē āļāļēāļĢāļ—āļģ attribute selection āđ€āļŦāļĄāļēāļ°āļāļąāļš
â€Ē āļŠāđ‰āļ­āļĄāļđāļĨāļ—āļĩāđˆāļĄāļĩāļˆāļģāļ™āļ§āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđ€āļ›āđ‡āļ™āļˆāļģāļ™āļ§āļ™āđ€āļĒāļ­āļ° āđ€āļŠāđˆāļ™ text mining
â€Ē āđƒāļŠāđ‰āđ€āļ§āļĨāļēāđƒāļ™āļāļēāļĢāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāļ™āļēāļ™
2
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
3
ID Free Won Cash Call Service Type
1 Y Y Y Y Y spam
2 N Y Y Y N spam
compute weight
ID Free Won Type
1 Y Y spam
2 N Y spam
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđƒāļ™ training data
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļŦāļĨāļąāļ‡āļˆāļēāļāļāļēāļĢāđ€āļĨāļ·āļ­āļâ€Ļ
(selection) āđāļĨāđ‰āļ§
ID Free Won Cash Call Service Type
1 Y Y Y Y Y spam
2 N Y Y Y N spam
ID Free Won Type
1 Y Y spam
2 N Y spam
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđƒāļ™ training data
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļŦāļĨāļąāļ‡āļˆāļēāļāļāļēāļĢāđ€āļĨāļ·āļ­āļâ€Ļ
(selection) āđāļĨāđ‰āļ§
classication
model
Attribute Selection: Filter Approach
Attribute Selection: Wrapper Approach
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē
Information Gain
â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
â€Ē Forward Selection
â€Ē Backward Elimination
â€Ē Evolutionary Selection
4
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒâ€Ļ
āļĨāļēāđ€āļšāļĨāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Information Gain
â€Ē āđƒāļŠāđ‰āđ„āļ”āđ‰āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđ€āļ›āđ‡āļ™āļ™āļ­āļĄāļīāļ™āļ­āļĨ (nominal) āđ€āļ—āđˆāļēāļ™āļąāđ‰āļ™
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Entropy āđāļĨāļ° Information Gain (IG)
5
Entropy(c1) = -p(c1) log p(c1)
IG (parent, child) =  Entropy(parent) – [p(c1) × Entropy(c1) + p(c2) × Entropy(c2) + ...]
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
6
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute IG
Outlook 0.247
Temperature
Humidity
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
7
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute IG
Outlook 0.247
Temperature 0.029
Humidity
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
8
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute IG
Outlook 0.247
Temperature 0.029
Humidity 0.152
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
9
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute IG
Outlook 0.247
Temperature 0.029
Humidity 0.152
Windy 0.048
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Information Theory-based ltering
â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē IG āļĄāļēāļāļāļ§āđˆāļē 0.1
10
attribute IG
Outlook 0.247
Humidity 0.152
Windy 0.048
Temperature 0.029
ID Outlook Humidity Play
1 sunny high no
2 sunny high no
3 overcast high yes
4 rainy high yes
5 rainy normal yes
6 rainy normal no
7 overcast normal yes
8 sunny high no
9 sunny normal yes
10 rainy normal yes
11 sunny normal yes
12 overcast high yes
13 overcast normal yes
14 rainy high no
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-11: Weight by IG
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
11
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV
Weight by Information Gain
āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāđ€āļ—āļ„āļ™āļīāļ„
Information Gain
Select by weight āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āļēāļĄāļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (weight)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-11: Weight by IG
â€Ē āđƒāļŠāđ‰āļ‚āđ‰āļ­āļĄāļđāļĨ weather_nominal āđāļĨāļ°āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Weight by Information
Gain
12
1
2
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-11: Weight by IG
â€Ē āļœāļĨāļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
13
āļ„āđˆāļē Information Gain (IG)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-11: Weight by IG
â€Ē āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Select by weight āđ€āļžāļ·āđˆāļ­āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē weight
āļĄāļēāļāļāļ§āđˆāļē 0.1
14
1
2
5
3
6
4
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-11: Weight by IG
â€Ē āļœāļĨāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē IG āļĄāļēāļāļāļ§āđˆāļē 0.1
15
āļ„āđˆāļē Information Gain (IG)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information
Gain
â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
â€Ē Forward Selection
â€Ē Backward Elimination
â€Ē Evolutionary Selection
16
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° feature āļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Chi-Square
â€Ē āđƒāļŠāđ‰āđ„āļ”āđ‰āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđ€āļ›āđ‡āļ™āļ™āļ­āļĄāļīāļ™āļ­āļĨ (nominal) āđ€āļ—āđˆāļēāļ™āļąāđ‰āļ™
â€Ē āļ”āļđāļ„āļ§āļēāļĄāļ–āļĩāđˆāļ—āļĩāđˆāđ€āļāļīāļ”āļ‚āļķāđ‰āļ™āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āđˆāļēāļ‡āđ† āđ€āļ—āļĩāļĒāļšāļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļĨāļēāđ€āļšāļĨ (label)
â€Ē āļ„āđˆāļē Chi-Square āļ„āļģāļ™āļ§āļ“āđ„āļ”āđ‰āļˆāļēāļ
â€Ē f0 = observed frequency
â€Ē fe = expected frequency
17
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ
â€Ē Expected Frequency āļ‚āļ­āļ‡ Outlook=sunny āđāļĨāļ° Play=noâ€Ļ
= P(Outlook = sunny) * P(Play = no) * Total Numberâ€Ļ
= (5/14) * (5/14) * 14 = 1.785714
18
ID Outlook Play
6 rainy no
14 rainy no
1 sunny no
2 sunny no
8 sunny no
3 overcast yes
7 overcast yes
12 overcast yes
13 overcast yes
4 rainy yes
5 rainy yes
10 rainy yes
9 sunny yes
11 sunny yes
Outlook = sunny overcast rainy Total
Play = no 3 0 2 5
Play = yes 2 4 3 9
Total 5 4 5 14
observed frequency
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ
19
ID Outlook Play
6 rainy no
14 rainy no
1 sunny no
2 sunny no
8 sunny no
3 overcast yes
7 overcast yes
12 overcast yes
13 overcast yes
4 rainy yes
5 rainy yes
10 rainy yes
9 sunny yes
11 sunny yes
Outlook = sunny overcast rainy Total
Play = no 3 0 2 5
Play = yes 2 4 3 9
Total 5 4 5 14
Outlook = sunny overcast rainy Total
Play = no 1.786 1.429 1.786 5
Play = yes 3.214 2.571 3.214 9
Total 5 4 5 14
observed frequency
expected frequency
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ
20
Outlook = sunny overcast rainy
Play = no 3 0 2
Play = yes 2 4 3
Outlook = sunny overcast rainy
Play = no 1.786 1.429 1.786
Play = yes 3.214 2.571 3.214
observed frequency
expected frequency
â€Ē Chi-Square = (3-1.786)2/1.786 + â€Ļ
(0-1.429)2/1.429 + â€Ļ
(2-1.786)2/1.786 +â€Ļ
(2-3.214)2/3.214 +â€Ļ
(4-2.571)2/2.571 +â€Ļ
(3-3.214)2/3.214
= 3.547
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
21
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute Chi-Square
Outlook 3.547
Temperature
Humidity
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
22
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute Chi-Square
Outlook 3.547
Temperature 0.570
Humidity
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
23
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute Chi-Square
Outlook 3.547
Temperature 0.570
Humidity 2.800
Windy
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Chi-Square-based ltering
â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ
24
ID Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast mild normal TRUE yes
8 sunny mild high FALSE no
9 sunny mild normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
attribute Chi-Square
Outlook 3.547
Temperature 0.570
Humidity 2.800
Windy 0.933
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē Chi-Square āļĄāļēāļāļāļ§āđˆāļē 2.0
attribute Chi-Square
Outlook 3.547
Humidity 2.800
Windy 0.933
Temperature 0.570
Chi-Square-based ltering
25
ID Outlook Humidity Play
1 sunny high no
2 sunny high no
3 overcast high yes
4 rainy high yes
5 rainy normal yes
6 rainy normal no
7 overcast normal yes
8 sunny high no
9 sunny normal yes
10 rainy normal yes
11 sunny normal yes
12 overcast high yes
13 overcast normal yes
14 rainy high no
āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-12: Weight by CS
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
26
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV
Weight by Chi-Square
āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāđ€āļ—āļ„āļ™āļīāļ„
Chi-Square
Select by weight āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āļēāļĄāļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (weight)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-12: Weight by CS
â€Ē āđƒāļŠāđ‰āļ‚āđ‰āļ­āļĄāļđāļĨ weather_nominal āđāļĨāļ°āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Weight by Chi-
Square
27
1
4
2
3
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-12: Weight by CS
â€Ē āļœāļĨāļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
28
āļ„āđˆāļē Chi-Square (CS)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
â€Ē āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Select by weight āđ€āļžāļ·āđˆāļ­āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē weight
āļĄāļēāļāļāļ§āđˆāļē 2.0
Example 7-12: Weight by CS
29
7
5
8
6
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-12: Weight by CS
â€Ē āļœāļĨāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē Chi-Square āļĄāļēāļāļāļ§āđˆāļē 2.0
30
āļ„āđˆāļē Chi-Square (CS)
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information
Gain
â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
â€Ē Forward Selection
â€Ē Backward Elimination
â€Ē Evolutionary Selection
31
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
32
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Free Type
1 Y spam
2 N spam
3 N normal
4 N normal
5 Y spam
6 Y spam
7 N normal
8 N spam
9 N normal
10 N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
33
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Won Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 Y spam
9 N normal
10 N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
34
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Cash Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 N spam
9 N normal
10 N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won
35
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Free Won Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N Y spam
9 N N normal
10 N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash
36
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Free Cash Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N N spam
9 N N normal
10 N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āđāļĨāļ° Cash
37
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
ID Won Cash Type
1 Y Y spam
2 Y Y spam
3 N N normal
4 N N normal
5 N N spam
6 N N spam
7 N N normal
8 Y N spam
9 N N normal
10 N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Wrapper Approach
â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ
āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free, Won āđāļĨāļ° Cash
38
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđ€āļžāļīāđˆāļĄāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļĨāļ° 1 āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒâ€Ļ
āļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđƒāļŠāđˆāđ€āļžāļīāđˆāļĄāđ€āļ‚āđ‰āļēāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āļ”āļĩāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ°āđ€āļāđ‡āļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āđ„āļ§āđ‰
â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđƒāļŠāđˆāđ€āļžāļīāđˆāļĄāđ€āļ‚āđ‰āļēāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āđāļĒāđˆāļĨāļ‡āļāđ‡āļˆāļ°āļ”āļķāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āļ­āļ­āļ
āļĄāļē
39
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
40
accuracy = 80%
ID Free Type
1 Y spam
2 N spam
3 N normal
4 N normal
5 Y spam
6 Y spam
7 N normal
8 N spam
9 N normal
10 N normal
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
41
accuracy = 80%
ID Won Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 Y spam
9 N normal
10 N normal
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§
42
accuracy = 50%
ID Cash Type
1 Y spam
2 Y spam
3 N normal
4 N normal
5 N spam
6 N spam
7 N normal
8 N spam
9 N normal
10 N normal
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won
43
accuracy = 60%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Free Won Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N Y spam
9 N N normal
10 N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won
44
accuracy = 60%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Free Won Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N Y spam
9 N N normal
10 N N normal
āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āļĨāļ”āļĨāļ‡
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash
45
accuracy = 80%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Free Cash Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N N spam
9 N N normal
10 N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Forward Selection
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash
46
accuracy = 80%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Free Cash Type
1 Y Y spam
2 N Y spam
3 N N normal
4 N N normal
5 Y N spam
6 Y N spam
7 N N normal
8 N N spam
9 N N normal
10 N N normal
āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāđ„āļĄāđˆāđ„āļ”āđ‰āļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
47
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV
Forward Selection āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Forward Selection
X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
48
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network
Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ
Performanceâ€Ļ
(Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV
49
1
4
2
3
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ
50
5 6
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Forward Selection āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒâ€Ļ
X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ
51
5
7
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ
Neural Network
Example 7-13: Forward Selection
52
10
8
9
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
53
āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
54
āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 4
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-13: Forward Selection
â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation
55
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information
Gain
â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
â€Ē Forward Selection
â€Ē Backward Elimination
â€Ē Evolutionary Selection
56
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Backward Elimination
â€Ē āđ€āļĢāļīāđˆāļĄāļˆāļēāļāđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđāļĨāļ°āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ­āļ­āļāđ„āļ›āļ—āļĩāļĨāļ° 1 āļ•āļąāļ§â€Ļ
āđ€āļžāļ·āđˆāļ­āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ•āļąāļ”āļ­āļ­āļāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āļ”āļĩāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ°āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āļ—āļīāđ‰āļ‡
â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ•āļąāļ”āļ­āļ­āļāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āđāļĒāđˆāļĨāļ‡āļāđ‡āļˆāļ°āđ€āļāđ‡āļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āđ„āļ§āđ‰
57
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Backward Elimination
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free, Won āđāļĨāļ° Cash
58
accuracy = 60%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Free Won Cash Type
1 Y Y Y spam
2 N Y Y spam
3 N N N normal
4 N N N normal
5 Y N N spam
6 Y N N spam
7 N N N normal
8 N Y N spam
9 N N N normal
10 N N N normal
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Backward Elimination
â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āđāļĨāļ° Cash (āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ—āļīāđ‰āļ‡)
59
accuracy = 80%
āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ
āļ”āđ‰āļ§āļĒ Cross-validation
ID Won Cash Type
1 Y Y spam
2 Y Y spam
3 N N normal
4 N N normal
5 N N spam
6 N N spam
7 N N normal
8 Y N spam
9 N N normal
10 N N normal
āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
60
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV
Backward Elimination
āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Backward
Elimination
X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
61
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network
Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ
Performanceâ€Ļ
(Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV
62
1
4
2
3
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ
63
5 6
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Backward Elimination āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒâ€Ļ
X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ
64
7
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ
Neural Network
Example 7-14: Backward Elimination
65
10
8
9
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
66
āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
67
āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 5
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-14: Backward Elimination
â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation
68
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Attribute (Feature) Selection
â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš
â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰
â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information
Gain
â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square
â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§
āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
â€Ē Forward Selection
â€Ē Backward Elimination
â€Ē Evolutionary Selection
69
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Evolutionary Selection
â€Ē Forward Selection āđāļĨāļ° Backward Elimination āđ€āļ›āđ‡āļ™āļāļēāļĢāļ—āļģāļ‡āļēāļ™āđāļšāļš
greedy āļ–āđ‰āļēāđ€āļˆāļ­āđ€āļ‹āļ•āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ°
āļŦāļĒāļļāļ”āļāļēāļĢāļ„āđ‰āļ™āļŦāļē
â€Ē Evolutionary Selection
â€Ē āļŠāļļāđˆāļĄāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ‚āļķāđ‰āļ™āļĄāļē āđāļĨāļ°āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļž
â€Ē āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāđāļĨāļ°āļŠāļļāđˆāļĄāđ€āļĨāļ·āļ­āļāļ•āļąāļ§āļ­āļ·āđˆāļ™āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™āļĄāļē
70
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
71
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV
Optimize Selection
(Evolutionary)
āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Optimize
Selection (Evolutionary)
X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡
72
āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ
Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network
Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ
Performanceâ€Ļ
(Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV
73
1
4
2
3
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ
â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ
74
5 6
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Optimize Selection (Evolutionary) â€Ļ
āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļš
āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ
75
7
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ
Neural Network
Example 7-15: Evolutionary Selection
76
10
8
9
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
77
āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
78
āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 5
āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
(data)3â€Ļ
base|warehouse|mining
http://dataminingtrend.com http://facebook.com/datacube.th
Example 7-15: Evolutionary Selection
â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation
79

Weitere ÃĪhnliche Inhalte

Was ist angesagt?

Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University
 
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
Pitchayanida Khumwichai
 

Was ist angesagt? (20)

Introduction to Weka: Application approach
Introduction to Weka: Application approachIntroduction to Weka: Application approach
Introduction to Weka: Application approach
 
Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
Introduction to Data Analytics with RapidMiner Studio 6 (āļ āļēāļĐāļēāđ„āļ—āļĒ)
 
Search Twitter with RapidMiner Studio 6
Search Twitter with RapidMiner Studio 6Search Twitter with RapidMiner Studio 6
Search Twitter with RapidMiner Studio 6
 
Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate
Practical Data Mining with RapidMiner Studio 7 : A Basic and IntermediatePractical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate
Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
05 classification 1 decision tree and rule based classification
05 classification 1 decision tree and rule based classification05 classification 1 decision tree and rule based classification
05 classification 1 decision tree and rule based classification
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
āļ„āļđāđˆāļĄāļ·āļ­ practical data mining with rapid miner studio7
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
Using Performance Insights to Optimize Database Performance (DAT402) - AWS re...
 
01 introduction to data mining
01 introduction to data mining01 introduction to data mining
01 introduction to data mining
 
DataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clauseDataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clause
 
My First Data Science Project (using Rapid Miner)
My First Data Science Project (using Rapid Miner)My First Data Science Project (using Rapid Miner)
My First Data Science Project (using Rapid Miner)
 
Weka dataprepocessing
Weka dataprepocessingWeka dataprepocessing
Weka dataprepocessing
 
[ėžë°”ėđī페] Elasticsearch Aggregation (2018)
[ėžë°”ėđī페] Elasticsearch Aggregation (2018)[ėžë°”ėđī페] Elasticsearch Aggregation (2018)
[ėžë°”ėđī페] Elasticsearch Aggregation (2018)
 
Personal Data Protection Act & Cybersecurity Act (August 26, 2019)
Personal Data Protection Act & Cybersecurity Act (August 26, 2019)Personal Data Protection Act & Cybersecurity Act (August 26, 2019)
Personal Data Protection Act & Cybersecurity Act (August 26, 2019)
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 

Andere mochten auch

Andere mochten auch (11)

Evaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROCEvaluation metrics: Precision, Recall, F-Measure, ROC
Evaluation metrics: Precision, Recall, F-Measure, ROC
 
Introduction to Predictive Analytics with case studies
Introduction to Predictive Analytics with case studiesIntroduction to Predictive Analytics with case studies
Introduction to Predictive Analytics with case studies
 
Apply (Big) Data Analytics & Predictive Analytics to Business Application
Apply (Big) Data Analytics & Predictive Analytics to Business ApplicationApply (Big) Data Analytics & Predictive Analytics to Business Application
Apply (Big) Data Analytics & Predictive Analytics to Business Application
 
Install weka extension_rapidminer
Install weka extension_rapidminerInstall weka extension_rapidminer
Install weka extension_rapidminer
 
Predictive analytic-for-retail-business
Predictive analytic-for-retail-businessPredictive analytic-for-retail-business
Predictive analytic-for-retail-business
 
āļāļēāļĢāļ•āļīāļ”āļ•āļąāđ‰āļ‡ RapidMiner Studio 6.1
āļāļēāļĢāļ•āļīāļ”āļ•āļąāđ‰āļ‡ RapidMiner Studio 6.1āļāļēāļĢāļ•āļīāļ”āļ•āļąāđ‰āļ‡ RapidMiner Studio 6.1
āļāļēāļĢāļ•āļīāļ”āļ•āļąāđ‰āļ‡ RapidMiner Studio 6.1
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Advanced Predictive Modeling with R and RapidMiner Studio 7
Advanced Predictive Modeling with R and RapidMiner Studio 7Advanced Predictive Modeling with R and RapidMiner Studio 7
Advanced Predictive Modeling with R and RapidMiner Studio 7
 
Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7Introduction to Text Classification with RapidMiner Studio 7
Introduction to Text Classification with RapidMiner Studio 7
 
Introduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data AnalyticsIntroduction to Data Mining and Big Data Analytics
Introduction to Data Mining and Big Data Analytics
 
Building Decision Tree model with numerical attributes
Building Decision Tree model with numerical attributesBuilding Decision Tree model with numerical attributes
Building Decision Tree model with numerical attributes
 

Introduction to Feature (Attribute) Selection with RapidMiner Studio 6

  • 1. Feature Selection â€Ļ with RapidMiner Studio 6 (data)3â€Ļ base|warehouse|mining http://www.dataminingtrend.comâ€Ļ http://facebook.com/datacube.th Eakasit Pacharawongsakda, Ph.D. Data Cube: http://facebook.com/datacube.th
  • 2. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡ Classication āļ‚āļķāđ‰āļ™āļ­āļĒāļđāđˆāļāļąāļš āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ āļŦāļĢāļ·āļ­ featureâ€Ļ āļ—āļĩāđˆāļ™āļģāļĄāļēāđƒāļŠāđ‰ â€Ē attribute selection āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ (āļŦāļĢāļ·āļ­ feature) â€Ļ āļ—āļĩāđˆāļŠāļģāļ„āļąāļāđƒāļ™āļāļēāļĢāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ (correlation) āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļĨāļēāđ€āļšāļĨ (label) āļĄāļēāļ â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļāļąāļ™āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āđ‰āļ­āļĒ â€Ē āļāļēāļĢāļ—āļģ attribute selection āđ€āļŦāļĄāļēāļ°āļāļąāļš â€Ē āļŠāđ‰āļ­āļĄāļđāļĨāļ—āļĩāđˆāļĄāļĩāļˆāļģāļ™āļ§āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđ€āļ›āđ‡āļ™āļˆāļģāļ™āļ§āļ™āđ€āļĒāļ­āļ° āđ€āļŠāđˆāļ™ text mining â€Ē āđƒāļŠāđ‰āđ€āļ§āļĨāļēāđƒāļ™āļāļēāļĢāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāļ™āļēāļ™ 2
  • 3. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 3 ID Free Won Cash Call Service Type 1 Y Y Y Y Y spam 2 N Y Y Y N spam compute weight ID Free Won Type 1 Y Y spam 2 N Y spam āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđƒāļ™ training data āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļŦāļĨāļąāļ‡āļˆāļēāļāļāļēāļĢāđ€āļĨāļ·āļ­āļâ€Ļ (selection) āđāļĨāđ‰āļ§ ID Free Won Cash Call Service Type 1 Y Y Y Y Y spam 2 N Y Y Y N spam ID Free Won Type 1 Y Y spam 2 N Y spam āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđƒāļ™ training data āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļŦāļĨāļąāļ‡āļˆāļēāļāļāļēāļĢāđ€āļĨāļ·āļ­āļâ€Ļ (selection) āđāļĨāđ‰āļ§ classication model Attribute Selection: Filter Approach Attribute Selection: Wrapper Approach
  • 4. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information Gain â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ â€Ē Forward Selection â€Ē Backward Elimination â€Ē Evolutionary Selection 4
  • 5. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒâ€Ļ āļĨāļēāđ€āļšāļĨāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Information Gain â€Ē āđƒāļŠāđ‰āđ„āļ”āđ‰āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđ€āļ›āđ‡āļ™āļ™āļ­āļĄāļīāļ™āļ­āļĨ (nominal) āđ€āļ—āđˆāļēāļ™āļąāđ‰āļ™ â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Entropy āđāļĨāļ° Information Gain (IG) 5 Entropy(c1) = -p(c1) log p(c1) IG (parent, child) =  Entropy(parent) – [p(c1) × Entropy(c1) + p(c2) × Entropy(c2) + ...]
  • 6. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 6 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute IG Outlook 0.247 Temperature Humidity Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
  • 7. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 7 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute IG Outlook 0.247 Temperature 0.029 Humidity Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
  • 8. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 8 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute IG Outlook 0.247 Temperature 0.029 Humidity 0.152 Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
  • 9. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain (IG) āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 9 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute IG Outlook 0.247 Temperature 0.029 Humidity 0.152 Windy 0.048 āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
  • 10. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Information Theory-based ltering â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē IG āļĄāļēāļāļāļ§āđˆāļē 0.1 10 attribute IG Outlook 0.247 Humidity 0.152 Windy 0.048 Temperature 0.029 ID Outlook Humidity Play 1 sunny high no 2 sunny high no 3 overcast high yes 4 rainy high yes 5 rainy normal yes 6 rainy normal no 7 overcast normal yes 8 sunny high no 9 sunny normal yes 10 rainy normal yes 11 sunny normal yes 12 overcast high yes 13 overcast normal yes 14 rainy high no āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Information Gain
  • 11. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-11: Weight by IG â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 11 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV Weight by Information Gain āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāđ€āļ—āļ„āļ™āļīāļ„ Information Gain Select by weight āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āļēāļĄāļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (weight)
  • 12. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-11: Weight by IG â€Ē āđƒāļŠāđ‰āļ‚āđ‰āļ­āļĄāļđāļĨ weather_nominal āđāļĨāļ°āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Weight by Information Gain 12 1 2
  • 13. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-11: Weight by IG â€Ē āļœāļĨāļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Information Gain āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 13 āļ„āđˆāļē Information Gain (IG)
  • 14. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-11: Weight by IG â€Ē āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Select by weight āđ€āļžāļ·āđˆāļ­āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē weight āļĄāļēāļāļāļ§āđˆāļē 0.1 14 1 2 5 3 6 4
  • 15. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-11: Weight by IG â€Ē āļœāļĨāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē IG āļĄāļēāļāļāļ§āđˆāļē 0.1 15 āļ„āđˆāļē Information Gain (IG)
  • 16. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information Gain â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ â€Ē Forward Selection â€Ē Backward Elimination â€Ē Evolutionary Selection 16
  • 17. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° feature āļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Chi-Square â€Ē āđƒāļŠāđ‰āđ„āļ”āđ‰āļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđ€āļ›āđ‡āļ™āļ™āļ­āļĄāļīāļ™āļ­āļĨ (nominal) āđ€āļ—āđˆāļēāļ™āļąāđ‰āļ™ â€Ē āļ”āļđāļ„āļ§āļēāļĄāļ–āļĩāđˆāļ—āļĩāđˆāđ€āļāļīāļ”āļ‚āļķāđ‰āļ™āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āđˆāļēāļ‡āđ† āđ€āļ—āļĩāļĒāļšāļāļąāļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļĨāļēāđ€āļšāļĨ (label) â€Ē āļ„āđˆāļē Chi-Square āļ„āļģāļ™āļ§āļ“āđ„āļ”āđ‰āļˆāļēāļ â€Ē f0 = observed frequency â€Ē fe = expected frequency 17
  • 18. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ â€Ē Expected Frequency āļ‚āļ­āļ‡ Outlook=sunny āđāļĨāļ° Play=noâ€Ļ = P(Outlook = sunny) * P(Play = no) * Total Numberâ€Ļ = (5/14) * (5/14) * 14 = 1.785714 18 ID Outlook Play 6 rainy no 14 rainy no 1 sunny no 2 sunny no 8 sunny no 3 overcast yes 7 overcast yes 12 overcast yes 13 overcast yes 4 rainy yes 5 rainy yes 10 rainy yes 9 sunny yes 11 sunny yes Outlook = sunny overcast rainy Total Play = no 3 0 2 5 Play = yes 2 4 3 9 Total 5 4 5 14 observed frequency
  • 19. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ 19 ID Outlook Play 6 rainy no 14 rainy no 1 sunny no 2 sunny no 8 sunny no 3 overcast yes 7 overcast yes 12 overcast yes 13 overcast yes 4 rainy yes 5 rainy yes 10 rainy yes 9 sunny yes 11 sunny yes Outlook = sunny overcast rainy Total Play = no 3 0 2 5 Play = yes 2 4 3 9 Total 5 4 5 14 Outlook = sunny overcast rainy Total Play = no 1.786 1.429 1.786 5 Play = yes 3.214 2.571 3.214 9 Total 5 4 5 14 observed frequency expected frequency
  • 20. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Outlook āļāļąāļšāļĨāļēāđ€āļšāļĨ 20 Outlook = sunny overcast rainy Play = no 3 0 2 Play = yes 2 4 3 Outlook = sunny overcast rainy Play = no 1.786 1.429 1.786 Play = yes 3.214 2.571 3.214 observed frequency expected frequency â€Ē Chi-Square = (3-1.786)2/1.786 + â€Ļ (0-1.429)2/1.429 + â€Ļ (2-1.786)2/1.786 +â€Ļ (2-3.214)2/3.214 +â€Ļ (4-2.571)2/2.571 +â€Ļ (3-3.214)2/3.214 = 3.547
  • 21. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 21 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute Chi-Square Outlook 3.547 Temperature Humidity Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
  • 22. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 22 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute Chi-Square Outlook 3.547 Temperature 0.570 Humidity Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
  • 23. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 23 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute Chi-Square Outlook 3.547 Temperature 0.570 Humidity 2.800 Windy āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
  • 24. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Chi-Square-based ltering â€Ē āļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļĢāļ°āļŦāļ§āđˆāļēāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļāļąāļšāļĨāļēāđ€āļšāļĨ 24 ID Outlook Temperature Humidity Windy Play 1 sunny hot high FALSE no 2 sunny hot high TRUE no 3 overcast hot high FALSE yes 4 rainy mild high FALSE yes 5 rainy cool normal FALSE yes 6 rainy cool normal TRUE no 7 overcast mild normal TRUE yes 8 sunny mild high FALSE no 9 sunny mild normal FALSE yes 10 rainy mild normal FALSE yes 11 sunny mild normal TRUE yes 12 overcast mild high TRUE yes 13 overcast hot normal FALSE yes 14 rainy mild high TRUE no attribute Chi-Square Outlook 3.547 Temperature 0.570 Humidity 2.800 Windy 0.933 āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
  • 25. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th â€Ē āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē Chi-Square āļĄāļēāļāļāļ§āđˆāļē 2.0 attribute Chi-Square Outlook 3.547 Humidity 2.800 Windy 0.933 Temperature 0.570 Chi-Square-based ltering 25 ID Outlook Humidity Play 1 sunny high no 2 sunny high no 3 overcast high yes 4 rainy high yes 5 rainy normal yes 6 rainy normal no 7 overcast normal yes 8 sunny high no 9 sunny normal yes 10 rainy normal yes 11 sunny normal yes 12 overcast high yes 13 overcast normal yes 14 rainy high no āļ•āļēāļĢāļēāļ‡āļ„āđˆāļē Chi-Square
  • 26. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-12: Weight by CS â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 26 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV Weight by Chi-Square āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāđ€āļ—āļ„āļ™āļīāļ„ Chi-Square Select by weight āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ•āļēāļĄāļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (weight)
  • 27. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-12: Weight by CS â€Ē āđƒāļŠāđ‰āļ‚āđ‰āļ­āļĄāļđāļĨ weather_nominal āđāļĨāļ°āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Weight by Chi- Square 27 1 4 2 3
  • 28. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-12: Weight by CS â€Ē āļœāļĨāļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļē Chi-Square āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 28 āļ„āđˆāļē Chi-Square (CS)
  • 29. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th â€Ē āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Select by weight āđ€āļžāļ·āđˆāļ­āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē weight āļĄāļēāļāļāļ§āđˆāļē 2.0 Example 7-12: Weight by CS 29 7 5 8 6
  • 30. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-12: Weight by CS â€Ē āļœāļĨāļāļēāļĢāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āđˆāļē Chi-Square āļĄāļēāļāļāļ§āđˆāļē 2.0 30 āļ„āđˆāļē Chi-Square (CS)
  • 31. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information Gain â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ â€Ē Forward Selection â€Ē Backward Elimination â€Ē Evolutionary Selection 31
  • 32. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 32 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Free Type 1 Y spam 2 N spam 3 N normal 4 N normal 5 Y spam 6 Y spam 7 N normal 8 N spam 9 N normal 10 N normal
  • 33. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 33 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Won Type 1 Y spam 2 Y spam 3 N normal 4 N normal 5 N spam 6 N spam 7 N normal 8 Y spam 9 N normal 10 N normal
  • 34. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 34 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Cash Type 1 Y spam 2 Y spam 3 N normal 4 N normal 5 N spam 6 N spam 7 N normal 8 N spam 9 N normal 10 N normal
  • 35. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won 35 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Free Won Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N Y spam 9 N N normal 10 N N normal
  • 36. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash 36 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Free Cash Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N N spam 9 N N normal 10 N N normal
  • 37. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āđāļĨāļ° Cash 37 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal ID Won Cash Type 1 Y Y spam 2 Y Y spam 3 N N normal 4 N N normal 5 N N spam 6 N N spam 7 N N normal 8 Y N spam 9 N N normal 10 N N normal
  • 38. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Wrapper Approach â€Ē āđ€āļ›āđ‡āļ™āļ§āļīāļ˜āļĩāļāļēāļĢāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđƒāļŠāđˆāđ€āļ‚āđ‰āļēāđ„āļ›āļŦāļĢāļ·āļ­āļ–āļ­āļ”āļ­āļ­āļāļĄāļēāđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ āđāļĨāļ°āđ€āļĨāļ·āļ­āļ set āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļ”āļĩāđ„āļ§āđ‰āđƒāļŠāđ‰ â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free, Won āđāļĨāļ° Cash 38 ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal
  • 39. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđ€āļžāļīāđˆāļĄāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāļĨāļ° 1 āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒâ€Ļ āļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđƒāļŠāđˆāđ€āļžāļīāđˆāļĄāđ€āļ‚āđ‰āļēāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āļ”āļĩāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ°āđ€āļāđ‡āļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āđ„āļ§āđ‰ â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāđƒāļŠāđˆāđ€āļžāļīāđˆāļĄāđ€āļ‚āđ‰āļēāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āđāļĒāđˆāļĨāļ‡āļāđ‡āļˆāļ°āļ”āļķāļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āļ­āļ­āļ āļĄāļē 39
  • 40. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 40 accuracy = 80% ID Free Type 1 Y spam 2 N spam 3 N normal 4 N normal 5 Y spam 6 Y spam 7 N normal 8 N spam 9 N normal 10 N normal āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation
  • 41. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 41 accuracy = 80% ID Won Type 1 Y spam 2 Y spam 3 N normal 4 N normal 5 N spam 6 N spam 7 N normal 8 Y spam 9 N normal 10 N normal āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation
  • 42. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ­āļĒāđˆāļēāļ‡āđ€āļ”āļĩāļĒāļ§ 42 accuracy = 50% ID Cash Type 1 Y spam 2 Y spam 3 N normal 4 N normal 5 N spam 6 N spam 7 N normal 8 N spam 9 N normal 10 N normal āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation
  • 43. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won 43 accuracy = 60% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Free Won Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N Y spam 9 N N normal 10 N N normal
  • 44. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Won 44 accuracy = 60% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Free Won Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N Y spam 9 N N normal 10 N N normal āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āļĨāļ”āļĨāļ‡
  • 45. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash 45 accuracy = 80% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Free Cash Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N N spam 9 N N normal 10 N N normal
  • 46. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Forward Selection â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āđāļĨāļ° Cash 46 accuracy = 80% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Free Cash Type 1 Y Y spam 2 N Y spam 3 N N normal 4 N N normal 5 Y N spam 6 Y N spam 7 N N normal 8 N N spam 9 N N normal 10 N N normal āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Cash āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāđ„āļĄāđˆāđ„āļ”āđ‰āļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™
  • 47. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 47 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV Forward Selection āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Forward Selection X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
  • 48. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 48 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ Performanceâ€Ļ (Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
  • 49. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV 49 1 4 2 3
  • 50. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ 50 5 6
  • 51. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Forward Selection āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒâ€Ļ X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ 51 5 7
  • 52. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ Neural Network Example 7-13: Forward Selection 52 10 8 9
  • 53. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 53 āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 54. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 54 āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 4 āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 55. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-13: Forward Selection â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation 55
  • 56. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information Gain â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ â€Ē Forward Selection â€Ē Backward Elimination â€Ē Evolutionary Selection 56
  • 57. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Backward Elimination â€Ē āđ€āļĢāļīāđˆāļĄāļˆāļēāļāđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļąāđ‰āļ‡āļŦāļĄāļ”āđāļĨāļ°āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ­āļ­āļāđ„āļ›āļ—āļĩāļĨāļ° 1 āļ•āļąāļ§â€Ļ āđ€āļžāļ·āđˆāļ­āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ„āļ§āļēāļĄāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ•āļąāļ”āļ­āļ­āļāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āļ”āļĩāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ°āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āļ—āļīāđ‰āļ‡ â€Ē āļ–āđ‰āļēāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ•āļąāļ”āļ­āļ­āļāđ„āļ›āđƒāļŦāđ‰āļ„āđˆāļē performance āđāļĒāđˆāļĨāļ‡āļāđ‡āļˆāļ°āđ€āļāđ‡āļšāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ™āļĩāđ‰āđ„āļ§āđ‰ 57
  • 58. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Backward Elimination â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free, Won āđāļĨāļ° Cash 58 accuracy = 60% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Free Won Cash Type 1 Y Y Y spam 2 N Y Y spam 3 N N N normal 4 N N N normal 5 Y N N spam 6 Y N N spam 7 N N N normal 8 N Y N spam 9 N N N normal 10 N N N normal
  • 59. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Backward Elimination â€Ē āđƒāļŠāđ‰āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Won āđāļĨāļ° Cash (āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ—āļīāđ‰āļ‡) 59 accuracy = 80% āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžâ€Ļ āļ”āđ‰āļ§āļĒ Cross-validation ID Won Cash Type 1 Y Y spam 2 Y Y spam 3 N N normal 4 N N normal 5 N N spam 6 N N spam 7 N N normal 8 Y N spam 9 N N normal 10 N N normal āļ•āļąāļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Free āļ—āļīāđ‰āļ‡āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļāļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™
  • 60. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 60 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV Backward Elimination āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Backward Elimination X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
  • 61. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 61 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ Performanceâ€Ļ (Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
  • 62. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV 62 1 4 2 3
  • 63. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ 63 5 6
  • 64. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Backward Elimination āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒâ€Ļ X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ 64 7
  • 65. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ Neural Network Example 7-14: Backward Elimination 65 10 8 9
  • 66. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 66 āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 67. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 67 āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 5 āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 68. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-14: Backward Elimination â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation 68
  • 69. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Attribute (Feature) Selection â€Ē āđāļšāđˆāļ‡āđ„āļ”āđ‰āđ€āļ›āđ‡āļ™ 2 āđāļšāļš â€Ē Filter approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļ (āļŦāļĢāļ·āļ­āļ„āđˆāļēāļ„āļ§āļēāļĄāļŠāļąāļĄāļžāļąāļ™āļ˜āđŒ) āļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāđāļĨāļ°āđ€āļĨāļ·āļ­āļāđ€āļ‰āļžāļēāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļŠāļģāļ„āļąāļāđ€āļāđ‡āļšāđ„āļ§āđ‰ â€Ē Information Theory āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Information Gain â€Ē Chi-Square āļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ„āđˆāļē Chi-Square â€Ē Wrapper approach āđ€āļ›āđ‡āļ™āļāļēāļĢāļ„āļģāļ™āļ§āļ“āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāđ‚āļ”āļĒāđƒāļŠāđ‰āđ‚āļĄāđ€āļ”āļĨ classication āđ€āļ›āđ‡āļ™āļ•āļąāļ§ āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ â€Ē Forward Selection â€Ē Backward Elimination â€Ē Evolutionary Selection 69
  • 70. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Evolutionary Selection â€Ē Forward Selection āđāļĨāļ° Backward Elimination āđ€āļ›āđ‡āļ™āļāļēāļĢāļ—āļģāļ‡āļēāļ™āđāļšāļš greedy āļ–āđ‰āļēāđ€āļˆāļ­āđ€āļ‹āļ•āļ‚āļ­āļ‡āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļ—āļģāđƒāļŦāđ‰āļ„āđˆāļēāļ„āļ§āļēāļĄāļ–āļđāļāļ•āđ‰āļ­āļ‡āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™āļāđ‡āļˆāļ° āļŦāļĒāļļāļ”āļāļēāļĢāļ„āđ‰āļ™āļŦāļē â€Ē Evolutionary Selection â€Ē āļŠāļļāđˆāļĄāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ‚āļķāđ‰āļ™āļĄāļē āđāļĨāļ°āļ§āļąāļ”āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļž â€Ē āļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ—āļĩāđˆāļĄāļĩāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāđāļĨāļ°āļŠāļļāđˆāļĄāđ€āļĨāļ·āļ­āļāļ•āļąāļ§āļ­āļ·āđˆāļ™āđ€āļžāļīāđˆāļĄāļ‚āļķāđ‰āļ™āļĄāļē 70
  • 71. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 71 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Read CSV āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ­āđˆāļēāļ™āđ„āļŸāļĨāđŒāļ›āļĢāļ°āđ€āļ āļ— CSV Optimize Selection (Evolutionary) āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļ„āļąāļ”āđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Optimize Selection (Evolutionary) X-Validation āđāļšāđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāđāļĨāļ°āļ—āļ”āļŠāļ­āļšāđ‚āļĄāđ€āļ”āļĨ
  • 72. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒāļ—āļĩāđˆāđ€āļāļĩāđˆāļĒāļ§āļ‚āđ‰āļ­āļ‡ 72 āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ āļ„āļģāļ­āļ˜āļīāļšāļēāļĒ Neural Net āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļšāļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ Neural Network Apply Model āđƒāļŠāđ‰āļŠāļģāļŦāļĢāļąāļš predict āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļŦāļĄāđˆ Performanceâ€Ļ (Binominal Classication) āļŠāļģāļŦāļĢāļąāļšāđāļŠāļ”āļ‡āļ•āļąāļ§āļŠāļĩāđ‰āļ§āļąāļ”āļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ classication
  • 73. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āđ‚āļŦāļĨāļ”āļ‚āđ‰āļ­āļĄāļđāļĨ gold_training.csv āļ”āđ‰āļ§āļĒāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Read CSV 73 1 4 2 3
  • 74. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āļ„āļĨāļīāļāļ›āļļāđˆāļĄ ‘Import Conguration Wizardâ€Ķ’ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ Date āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āđ„āļ­āļ”āļĩ â€Ē āļāļģāļŦāļ™āļ”āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ GC Trend āđƒāļŦāđ‰āđ€āļ›āđ‡āļ™āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒāļ›āļĢāļ°āđ€āļ āļ—āļĨāļēāđ€āļšāļĨ 74 5 6
  • 75. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ Optimize Selection (Evolutionary) â€Ļ āđāļĨāļ°āđƒāļŠāđ‰āđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-validation āļˆāļēāļ New Building Block āđ€āļžāļ·āđˆāļ­āļ—āļ”āļŠāļ­āļš āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ‚āļ­āļ‡āđ‚āļĄāđ€āļ”āļĨ 75 7
  • 76. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th â€Ē double click āļ—āļĩāđˆāđ‚āļ­āđ€āļ›āļ­āđ€āļĢāđ€āļ•āļ­āļĢāđŒ X-Validation āđ€āļžāļ·āđˆāļ­āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨ â€Ļ Neural Network Example 7-15: Evolutionary Selection 76 10 8 9
  • 77. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āļ„āđˆāļēāļ™āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ°āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 77 āļ„āđˆāļēāļ™āđ‰āļģāļŦāļ™āļąāļāļ‚āļ­āļ‡āđāļ•āđˆāļĨāļ° āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 78. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āđāļŠāļ”āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļŦāļĨāļąāļ‡āļˆāļēāļāđ€āļĨāļ·āļ­āļāđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ 78 āđ€āļŦāļĨāļ·āļ­āđ€āļžāļĩāļĒāļ‡āđāļ„āđˆ 5 āđāļ­āļ•āļ—āļĢāļīāļšāļīāļ§āļ•āđŒ
  • 79. (data)3â€Ļ base|warehouse|mining http://dataminingtrend.com http://facebook.com/datacube.th Example 7-15: Evolutionary Selection â€Ē āļœāļĨāļāļēāļĢāļ—āļ”āļŠāļ­āļšāļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ”āđ‰āļ§āļĒāļ§āļīāļ˜āļĩ Cross-validation 79