Bagus Sartono, Lecture at Department of Statistics, Institut Pertanian Bogor (IPB) University,
New Trends in Research Methodoloy & Analytics Technology Update, Nov 28, 2012, Jakarta Indonesia
2. KDD Cup 2010: Overview
• The Challenge
– How generally or narrowly do students learn? How quickly or
slowly? Will the rate of improvement vary between students?
What does it mean for one problem to be similar to another?
– Is it possible to infer the knowledge requirements of problems
directly from student performance data, without human analysis
of the tasks?
– This year's challenge asks you to predict student performance
on mathematical problems from logs of student interaction with
Intelligent Tutoring Systems.
3. KDD Cup 2010: Results
• Winners of KDD Cup 2010: All Teams
– First Place: National Taiwan University
Feature engineering and classifier ensembling for KDD CUP
2010
– First Runner Up: Zhang and Su
Gradient Boosting Machines with Singular Value Decomposition
– Second Runner Up: BigChaos @ KDD
Collaborative Filtering Applied to Educational Data Mining
4. Outline
• What is Ensemble Learning?
• Why Ensemble?
• How good is Ensemble?
• What next?
10. What is Ensemble?
Data Set
Training Set #1 Training Set #2 …… Training Set #k
.
Learning Learning Learning
……
Model #1 Model #2 Model #k
.
Combiner
Ensemble
Prediction
11. Types of Ensemble
• Hybrid Ensemble
– Combining several different learning algorithms into
one prediction
– e.g: combining the result of regression, tree, neural
nets, and support vector machine
• Non-Hybrid Ensemble
– Combining several learning models from the same
algorithm into one prediction
12. Well-Known Ensembles
• Bagging
– Generate learning models for the bootstrap samples
– Aggregate the predictions via averaging or majority-vote
• Boosting (AdaBoost)
– Generate sequential learning models with higher weight to
‘difficult’ cases
– Combine the predictions by concerning the weight
• Random Forest
– Similar to bagging except the existence of random feature
selection for each learning model generation
18. Bagus Sartono
Educational Background Professional Experience
• Bachelor of Science in • Lecturer – Dept of Stats
Stats – IPB (2000) IPB
• Master of Science in • Experienced Trainer in
Stats – IPB (2004) Analytics (Bank
• PhD in Applied Indonesia, Bank
Economics – University of Mandiri, Ganesha Cipta
Antwerp (2012) Informatika, CIFOR, LIPI,
LPEM-UI, etc)