15. Ensembling: Random Forests
• Boosting = average of many simple
algorithms
• Simple algorithm = one decision tree
• Boosting + decision trees = Random Forests
15Breiman, 2001
17. • Organized by Panjia (www.panjiaco.com)
• Problem: predict the strength of social ties
• The prize pool: 75 000 $
• Training set size: 50 000
• Test set size: 40 000
17
Description of problem
18. • Number of features:
more than 500!
• Features example:
1) Number of friends (node feature)
2) Number of common friends (edge feature)
3) Number of common albums (combined
Number of all albums feature)
18
Features engineering
28. • Algorithm perfectly works on Training set
• But! Algorithm does not work on Test set!
28
Overfitting
29. • Target is unknown for the Test set
• Separate Training set in two parts:
• 1st part: New Training set
• 2nd part: New Test set (with known target)
29
Crossvalidation
30. If you are interested in this topic…
• Read papers and books about Machine
Learning
• Communicate with people (Kaggle, LinkedIn)
• Participate in competitions
• Study Mathematics
30
What’s next?