Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Introduction to Machine Learning @ Mooncascade ML Camp

A practical introduction to Machine Learning with Python and scikit-learn.

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Introduction to Machine Learning @ Mooncascade ML Camp

  1. 1. by Ilya Kuzovkin ilya.kuzovkin@gmail.com Mooncascade ML Camp 2016 Machine Learning ESSENTIAL CONCEPTS
  2. 2. ONE MACHINE LEARNING USE CASE
  3. 3. Can we ask a computer to create those patterns automatically?
  4. 4. Can we ask a computer to create those patterns automatically? Yes
  5. 5. Can we ask a computer to create those patterns automatically? Yes How?
  6. 6. Raw data
  7. 7. Instance Raw data Class (label) A data sample: “7”
  8. 8. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form?
  9. 9. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form? Feature extraction
  10. 10. Instance Raw data Class (label) A data sample: “7” How to represent it in a machine-readable form? Feature extraction 28px 28 px
  11. 11. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction
  12. 12. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) (0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0) (0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) (0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0) “7” “2” “8” “2”
  13. 13. Instance Raw data Class (label) A data sample: “7” 28px 28 px 784 pixels in total Feature vector (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) How to represent it in a machine-readable form? Feature extraction (0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0) (0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0) (0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset (0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0) “7” “2” “8” “2”
  14. 14. The data is in the right format — what’s next?
  15. 15. The data is in the right format — what’s next? • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick an algorithm
  16. 16. The data is in the right format — what’s next? • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick an algorithm
  17. 17. DECISION TREE vs.
  18. 18. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0)
  19. 19. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417
  20. 20. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 PIXEL #417 >200 <200
  21. 21. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 PIXEL #417 >200 <200
  22. 22. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200
  23. 23. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 PIXEL #123
  24. 24. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 PIXEL #123 <100 >100 PIXEL #123
  25. 25. DECISION TREE vs. (0, …, 28, 65, …, 207, 101, 0, 0) (0, …, 19, 34, …, 254, 54, 0, 0) (0, …, 87, 59, …, 240, 52, 4, 0) (0, …, 87, 52, …, 240, 19, 3, 0) (0, …, 28, 64, …, 102, 101, 0, 0) (0, …, 19, 23, …, 105, 54, 0, 0) (0, …, 87, 74, …, 121, 51, 7, 0) (0, …, 87, 112, …, 239, 52, 4, 0) PIXEL #417 >200 <200 <100 >100 PIXEL #123
  26. 26. DECISION TREE
  27. 27. DECISION TREE
  28. 28. ACCURACY
  29. 29. ACCURACY Confusion matrix Trueclass Predicted class
  30. 30. ACCURACY Confusion matrix acc = correctly classified total number of samples Trueclass Predicted class
  31. 31. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Trueclass Predicted class
  32. 32. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Consider the following model: “Always predict 2” Trueclass Predicted class
  33. 33. ACCURACY Confusion matrix acc = correctly classified total number of samples Beware of an imbalanced dataset! Consider the following model: “Always predict 2” Accuracy 0.9 Trueclass Predicted class
  34. 34. DECISION TREE
  35. 35. DECISION TREE “You said 100% accurate?! Every 10th digit your system detects is wrong!” Angry client
  36. 36. DECISION TREE “You said 100% accurate?! Every 10th digit your system detects is wrong!” Angry client We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to. And in the real life — it never will…
  37. 37. OVERFITTING Simulate the real-life situation — split the dataset
  38. 38. OVERFITTING Simulate the real-life situation — split the dataset
  39. 39. OVERFITTING Simulate the real-life situation — split the dataset
  40. 40. OVERFITTING Simulate the real-life situation — split the dataset
  41. 41. Underfitting! “Too stupid” OK Overfitting! “Too smart” OVERFITTING
  42. 42. Underfitting! “Too stupid” OK Overfitting! “Too smart” OVERFITTING Our current decision tree has too much capacity, it just has memorized all of the data. Let’s make it less complex.
  43. 43. You probably did not notice, but we are overfitting again :(
  44. 44. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20%
  45. 45. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset
  46. 46. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters
  47. 47. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI
  48. 48. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI
  49. 49. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI
  50. 50. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI TRA VALI
  51. 51. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting TRA VALI TRA VALI TRA VALI TRA VALI TRA VALI
  52. 52. TEST SET 20% TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% Fit various models and parameter combinations on this subset • Evaluate the models created with different parameters ! • Estimate overfitting Use only once to get the final performance estimate TRA VALI TRA VALI TRA VALI TRA VALI TRA VALI
  53. 53. TEST SET 20% TRAINING SET 60% VALIDATION SET 20%
  54. 54. TEST SET 20% TRAINING SET 60% VALIDATION SET 20%
  55. 55. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20%
  56. 56. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set?
  57. 57. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80%
  58. 58. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15
  59. 59. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15 TRAINING VAL TRAINING VAL TRAININGVAL Repeat 10 times
  60. 60. CROSS-VALIDATION TRAINING SET 60% THE WHOLE DATASET VALIDATION SET 20% What if we got too optimistic validation set? TRAINING SET 80% Fix the parameter value you ned to evaluate, say msl=15 TRAINING VAL TRAINING VAL TRAININGVAL Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.
  61. 61. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters
  62. 62. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters “So it is ~87%…erm… Could you do better?”
  63. 63. MACHINE LEARNING PIPELINE Take raw data Extract features Split into TRAINING and TEST Pick an algorithm and parameters Train on the TRAINING data Evaluate on the TRAINING data with CV Train on the whole TRAINING Fix the best parameters Evaluate on TEST Report final performance to the client Try our different algorithms and parameters “So it is ~87%…erm… Could you do better?” Yes
  64. 64. • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick another algorithm
  65. 65. • C4.5 • Random forests • Bayesian networks • Hidden Markov models • Artificial neural network • Data clustering • Expectation-maximization algorithm • Self-organizing map • Radial basis function network • Vector Quantization • Generative topographic map • Information bottleneck method • IBSEAD • Apriori algorithm • Eclat algorithm • FP-growth algorithm • Single-linkage clustering • Conceptual clustering • K-means algorithm • Fuzzy clustering • Temporal difference learning • Q-learning • Learning Automata • AODE • Artificial neural network • Backpropagation • Naive Bayes classifier • Bayesian network • Bayesian knowledge base • Case-based reasoning • Decision trees • Inductive logic programming • Gaussian process regression • Gene expression programming • Group method of data handling (GMDH) • Learning Automata • Learning Vector Quantization • Logistic Model Tree • Decision tree • Decision graphs • Lazy learning • Monte Carlo Method • SARSA • Instance-based learning • Nearest Neighbor Algorithm • Analogical modeling • Probably approximately correct learning (PACL) • Symbolic machine learning algorithms • Subsymbolic machine learning algorithms • Support vector machines • Random Forest • Ensembles of classifiers • Bootstrap aggregating (bagging) • Boosting (meta-algorithm) • Ordinal classification • Regression analysis • Information fuzzy networks (IFN) • Linear classifiers • Fisher's linear discriminant • Logistic regression • Naive Bayes classifier • Perceptron • Support vector machines • Quadratic classifiers • k-nearest neighbor • Boosting Pick another algorithm
  66. 66. RANDOM FOREST
  67. 67. RANDOM FOREST Decision tree: pick best out of all features
  68. 68. RANDOM FOREST Decision tree: pick best out of all features Random forest: pick best out of random subset of features
  69. 69. RANDOM FOREST
  70. 70. RANDOM FOREST pick best out of another random subset of features
  71. 71. RANDOM FOREST pick best out of another random subset of features pick best out of yet another random subset of features
  72. 72. RANDOM FOREST
  73. 73. RANDOM FOREST
  74. 74. RANDOM FOREST class instance
  75. 75. RANDOM FOREST class instance
  76. 76. RANDOM FOREST class instance
  77. 77. RANDOM FOREST class instance
  78. 78. Happy client
  79. 79. ALL OTHER USE CASES
  80. 80. Sound Frequency components Genre Bag of words Topic Text Pixel values Image Cat or dog Video Frame pixels Walking or running Database records Biometric data Census data Average salary … Dead or alive
  81. 81. HANDS-ON SESSION
  82. 82. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

×