Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# Machine Learning in 5 Minutes— Classification

3.276 Aufrufe

Veröffentlicht am

Slides from a lightning talk on classification methods, originally given at Open Source Open Mic Chicago 01/2016. Yes, I know I left things you. You try covering this in 5 minutes.

Veröffentlicht in: Technologie
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

### Machine Learning in 5 Minutes— Classification

1. 1. classiﬁcation edition Machine Learning in 5 Minutes Brian Lange
2. 2. hi, i’m a data scientist
3. 3. classiﬁcation algorithms
4. 4. popular examples -spam ﬁlters -the Sorting Hat
5. 5. things to know - you need data labeled with the correct answers to “train” these algorithms before they work - feature = dimension = attribute of the data - class = category = Harry Potter house
6. 6. linear discriminants “draw a line through it”
7. 7. linear discriminants “draw a line through it”
8. 8. linear discriminants “draw a line through it”
9. 9. linear discriminants “draw a line through it” 🎉
10. 10. deﬁne what “shitty” means 6 wrong
11. 11. deﬁne what “shitty” means 4 wrong
12. 12. a map of shittiness to ﬁnd the least shitty line shittiness slope intercept
13. 13. probably don’t use these linear discriminants:
14. 14. logistic regression “divide it with a log function”
15. 15. logistic regression “divide it with a log function” 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉 + gives you probabilities + the model is a formula + can “threshold” to make model more or less conservative 💩💩💩💩💩💩💩💩💩💩💩 - only works with linear decision boundaries
16. 16. SVMs (support vector machines) “*advanced* draw a line through it” - better deﬁnition of “shitty” - lines can turn into non-linear shapes if you transform your data
17. 17. 💩
18. 18. 💩
19. 19. “the kernel trick”
20. 20. 🎉 woooooooooooo 🎉🎉
21. 21. SVMs (support vector machines) “*advanced* draw a line through it”
22. 22. SVMs (support vector machines) “*advanced* draw a line through it” 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉 works well on a lot of diﬀerent shapes of data thanks to the kernel trick 💩💩💩💩💩💩💩💩💩💩💩 not super easy to explain to people can only kinda do probabilities
23. 23. KNN (k-nearest neighbors) “what do similar cases look like?”
24. 24. KNN (k-nearest neighbors) “what do similar cases look like?” k=1
25. 25. KNN (k-nearest neighbors) “what do similar cases look like?” k=2
26. 26. KNN (k-nearest neighbors) “what do similar cases look like?” k=1
27. 27. KNN (k-nearest neighbors) “what do similar cases look like?” k=2
28. 28. KNN (k-nearest neighbors) “what do similar cases look like?” k=3
29. 29. KNN (k-nearest neighbors) “what do similar cases look like?”
30. 30. KNN (k-nearest neighbors) “what do similar cases look like?” 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉 + no training, adding new data is easy + you get to deﬁne “distance”   💩💩💩💩💩💩💩💩💩💩💩 - can be outlier-sensitive - you have to deﬁne “distance”
31. 31. decision tree learners make a ﬂow chart of it
32. 32. decision tree learners make a ﬂow chart of it x < 3? yes no 3
33. 33. decision tree learners make a ﬂow chart of it x < 3? yes no y < 4? yes no 3 4
34. 34. decision tree learners make a ﬂow chart of it x < 3? yes no y < 4? yes no x < 5? yes no 3 5 4
35. 35. decision tree learners make a ﬂow chart of it 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉 + ﬁt all kinds of arbitrary shapes + output is a clear set of conditionals  💩💩💩💩💩💩💩💩💩💩💩 - extremely prone to overﬁtting - have to rebuild when you get new data - no probability estimates
36. 36. ensemble models make a bunch of models and combine them
37. 37. ensemble models make a bunch of models and combine them 🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉 - don’t overﬁt as much as their component parts - Generally don’t require much parameter tweaking - If data doesn’t change very often, you can make them semi-online by just adding new trees - Can provide probabilities 💩💩💩💩💩💩💩💩💩💩💩 - Slower than their component parts (though if those are fast, it doesn’t matter)