Weitere ähnliche Inhalte Mehr von Big Data Spain (20) Kürzlich hochgeladen (20) Unbalanced data: Same algorithms different techniques by Eric Martín at Big Data Spain 20179. RANDOM FOREST
8
F1 F2 F3 …… … FN Y
1 1.2 25 True … 0.185 1
2 3.4 55 False… 0.211 1
3 2.2 58 True … 0.171 0
4 4.0 34 True … 0.132 1
5 1.1 63 True … 0.652 0
6 0.7 61 False… 0.153 0
7 3.3 12 False… 0.477 1
8 3.1 23 True … 0.311 1
9 1.2 29 False… 0.171 1
1
0 3.4 45 True … 0.132 0
1
1 2.1 55 True … 0.652 1
1
2 1.7 19 False… 0.189 0
1
3 3.3 12 False… 0.477 1
1
4 3.1 23 True … 0.311 1
1
5 1.2 29 False… 0.171 1
1
6 2.2 58 True … 0.171 0
1
11. EM FOREST
10
F1 F2 F3 …… … FN Y
1 1.2 25 True … 0.185 1
2 3.4 55 False… 0.211 1
3 2.2 58 True … 0.171 0
4 4.0 34 True … 0.132 1
5 1.1 63 True … 0.652 0
6 0.7 61 False… 0.153 0
7 3.3 12 False… 0.477 1
8 3.1 23 True … 0.311 1
9 1.2 29 False… 0.171 1
1
0 3.4 45 True … 0.132 0
1
1 2.1 55 True … 0.652 1
1
2 1.7 19 False… 0.189 0
1
3 3.3 12 False… 0.477 1
1
4 3.1 23 True … 0.311 1
1
5 1.2 29 False… 0.171 1
1
6 2.2 58 True … 0.171 0
1
12. Tree1 Tree2 Tree3 Y
1 1 1 0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
EM FOREST: Transforming the
problem
11
F1 F2 F3 …… … FN Y
1 1.2 25 True … 0.185 1
2 3.4 55 False… 0.211 1
3 2.2 58 True … 0.171 0
4 4.0 34 True … 0.132 1
5 1.1 63 True … 0.652 0
6 0.7 61 False… 0.153 0
7 3.3 12 False… 0.477 1
8 3.1 23 True … 0.311 1
9 1.2 29 False… 0.171 1
1
0 3.4 45 True … 0.132 0
1
1 2.1 55 True … 0.652 1
1
2 1.7 19 False… 0.189 0
1
3 3.3 12 False… 0.477 1
1
4 3.1 23 True … 0.311 1
1
5 1.2 29 False… 0.171 1
1
6 2.2 58 True … 0.171 0
1
0 1 0 1
13. EM FOREST: The new
problem
12
Tree1 Tree2 Tree3 Y
1 1 1 0 1
2 1 0 1 1
3 1 1 1 0
4 0 1 0 1
5 0 0 0 0
6 1 0 1 0
7 0 1 0 1
8 0 1 0 1
9 1 0 1 1
10 1 1 0 0
11 0 1 0 1
12 0 0 1 0
13 1 0 1 1
14 1 1 0 1
15 1 1 0 1
16 0 0 1 0
17 0 1 0 1
18 1 0 0 0
14. EM FOREST: The new
possibilities
13
Tree1 Tree2 Tree3 Y
1 1 1 0 1
2 1 0 1 1
3 1 1 1 0
4 0 1 0 1
5 0 0 0 0
6 1 0 1 0
7 0 1 0 1
8 0 1 0 1
▪ Vector vs. Aggregated
Agg Y
1 2 1
2 2 1
3 3 0
4 0 1
5 1 0
6 2 0
7 1 1
8 1 1
15. EM FOREST: The new results
14
▪ Result improvement: Better score
( at least the same ) than Random
Forest
▪ Result flexibility: Better in balanced and
unbalanced data (Trading and illness
detection )
17. EM FOREST: Use cases
16
▪ Real projects:
Credit card usage trends
▪ Demo projects:
Bank fraud
Alcohol in students dataset