Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

SPWK '20 - explaining data science to humans.pptx

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
AI Orange Belt - Session 2
AI Orange Belt - Session 2
Wird geladen in …3
×

Hier ansehen

1 von 142 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie SPWK '20 - explaining data science to humans.pptx (20)

Aktuellste (20)

Anzeige

SPWK '20 - explaining data science to humans.pptx

  1. 1. Explaining advanced data science techniques to humans SuperWeek ‘20
  2. 2. “ Hi, it’s still me! Just “Mightier” Doug Hall, Director of analytics
  3. 3. “ Everything should be made as simple as possible, but not simpler Einstein
  4. 4. Data Science is for everyone Data Science needs to be non-exclusive - high level, not mathy. Everyone gets value!
  5. 5. “ Any sufficiently advanced technology is indistinguishable from magic Arthur C. Clarke
  6. 6. Techniques appear to be like magical illusions At first appearance data science seems to be nothing short of actual wizardry. As soon as you know the trick, the mystery evaporates - utility grows!
  7. 7. “ When a distinguished but elderly scientist states that something is possible, she/he is most certainly right. When she/he states that something is impossible, she/he is very probably wrong Arthur C. Clarke’s first law
  8. 8. Opposite is true for HiPPOs We can fix this!
  9. 9. The power of analogy, simile and metaphor Beware your inner honey-badger
  10. 10. “ As hard as woodpecker lips Busier than a one armed bricklayer Nick Cummins, The Honey Badger - Aussie Rugby-ist
  11. 11. Vocabulary matters Don’t speak nerd to non-nerds
  12. 12. Do you... Use multiple independent variables in an ANOVA test to reject the null hypothesis?
  13. 13. Or do you... Analyse which brand of cereal has most calories?
  14. 14. Explain well Use your human words.
  15. 15. ● Models ○ How to choose a model ○ Prediction ○ Classification ○ Linear regression ○ Logistic regression ● Classification ○ K-means clustering ● Data engineering ○ Data preparation ○ Dimensionality reduction ■ Principal Component Analysis ■ Embedding ○ Structured/unstructured data ○ Integer and one-hot encoding ○ Correlation != Causation ● Machine Learning ○ Supervised/unsupervised/reinforcement machine learning ○ Type I, Type II errors ○ Precision, Recall, Accuracy ○ Under fitting, over fitting ● Attribution ○ Shapely Values and Markov Chains ● Testing ○ The null hypothesis ○ p-values What you’re about to hear Things that will be explained.
  16. 16. BUSINESS QUESTION Protection against data science detail How can we split our customers into different groups to market to? DATA SCIENCE QUESTION How can we run a clustering algorithm to segment customer data? DATA SCIENCE ANSWER A k-means clustering found 3 distinct groups BUSINESS ANSWER Here are 3 types of customers, new, high spending, and commercial @BecomingDataSci
  17. 17. Model = decision support tool Contemporary decisioning - making choices in a noisy room, stressed, on fire, doing 1000mph, with 50 other demands on your time while being kicked. It’s hard. Go to your mind palace.
  18. 18. Use carefully Don’t abdicate decision making responsibility
  19. 19. ½ ark ½ Light @FryRSquared
  20. 20. ½ ark ½ Light @FryRSquared
  21. 21. ½ ark ½ Light @FryRSquared
  22. 22. ½ ark ½ Light @FryRSquared
  23. 23. Let’s get started First question from the business is:
  24. 24. How do I choose a model? It depends...
  25. 25. How do I choose a model? It’s not that simple at first glance
  26. 26. What do you want to do? What’s the model for?
  27. 27. ½ ark ½ Light Prediction Models
  28. 28. ½ ark ½ Light Classification Models
  29. 29. Linear Regression Start simple
  30. 30. ½ ark ½ Light Simplest of crystal balls Goal - prediction: If I have a fifth cup of coffee, how much will my productivity increase by? Goal - explanation: How much did the 6th coffee reduce hours slept? Models
  31. 31. ½ ark ½ Light Simplest of crystal balls Goal - prediction: If page speed increases, what happens to bounce rate? Goal - explanation: Can a change in page speed explain my change in bounce rate? Models
  32. 32. ½ ark ½ Light Simplest of crystal balls A most basic model of correlation Models
  33. 33. ½ ark ½ Light SIMPLE Linear Regression isn’t extrapolation Models
  34. 34. Logistic Regression Think probability
  35. 35. ½ ark ½ Light Is it cat? How confident are you? Models
  36. 36. ½ ark ½ Light Is it cat? How confident are you now? Models
  37. 37. ½ ark ½ Light Is it cat? More confident? Models
  38. 38. ½ ark ½ Light Is it cat? Got it? Models
  39. 39. ½ ark ½ Light It’s a cat! Kittie! Models
  40. 40. ½ ark ½ Light Is it a puppy? How confident are you? Models
  41. 41. ½ ark ½ Light Is it a puppy? How confident are you? Models
  42. 42. ½ ark ½ Light Is it a puppy? How confident are you? Models
  43. 43. ½ ark ½ Light It’s a puppy! Puppee!!!!!!!! Models
  44. 44. ½ ark ½ Light Is it a fish? How confident are you? Models
  45. 45. ½ ark ½ Light Is it a fish? How confident are you? Models
  46. 46. ½ ark ½ Light Yup, fish... FishEEEEEE!!!!! Models
  47. 47. ½ ark ½ Light Logistic Regression As bounce rate and device category change, what’s the conversion probability? How can budget and time of day changes to bidding help impression volume for my display campaign? Models
  48. 48. ½ ark ½ Light Logistic Regression How does the probability of an answer (yes/no) change when 1 or more other things change? Models
  49. 49. Solving for high dimensionality This sounds hard...
  50. 50. ½ ark ½ Light Data looks like this Data Engineering
  51. 51. ½ ark ½ Light We need data like this Data Engineering
  52. 52. DATA SCIENCE WORDS ● Perform an n dimensional linear transformation ● Eigendecomposition of covariance matrices ● Derive eigenvectors and eigenvalues ● Extract Principal Components We just employ Principal Component Analysis Data Engineering
  53. 53. Clients be like...
  54. 54. ½ ark ½ Light What does this taste of? (NOT a trick question!) Data Engineering
  55. 55. ½ ark ½ Light Chicken Tikka Terrine Sweet Pickled Carrot, Smoked Garlic Yogurt, Beans Kachumber Data Engineering
  56. 56. ½ ark ½ Light Multiple ingredients Complex, sophisticated palate Just like our data Data Engineering
  57. 57. ½ ark ½ Light Which ingredients combine to influence the overall flavour of the dish? Sweetness? Savouriness? Heat? Sourness? Umami? Data Engineering
  58. 58. ½ ark ½ Light Classify based on most important ingredients variables Data Engineering
  59. 59. Clients be like...
  60. 60. ½ ark ½ Light Visualising 34 dimensions on a graph Data Engineering
  61. 61. ● Visualisation ○ Now you can see the data ● Less computation ○ You get your output faster and cheaper ● Now use it Dimension reduction = simpler visualisation Data Engineering
  62. 62. K-means clustering Unsupervised learning - grouping like data points
  63. 63. ½ ark ½ Light Remember your first day at school? Kids cluster when they are alike. Classification
  64. 64. ½ ark ½ Light Yep, I’m the kid with no mates. BUT Outliers can be interesting data points Classification
  65. 65. ½ ark ½ Light Cluster on principal components Audience on intent Classification
  66. 66. Can we “just” model this? Where do you keep your phd?
  67. 67. ½ ark ½ Light Preparation Mise en place for your data >80% of the effort Data Engineering
  68. 68. Prepping data takes effort No hiding this fact
  69. 69. ½ ark ½ Light Why prep data Data is oil...blah blah, refine it Clive Humby Data is meat...prepare it before it spoils @strasm Data Engineering
  70. 70. Structured or Unstructured? How’s YOUR data today?
  71. 71. ½ ark ½ Light Structured/ Unstructured Data Engineering
  72. 72. ½ ark ½ Light Structured/ Unstructured Data Engineering
  73. 73. ½ ark ½ Light Structured/ Unstructured Data Engineering
  74. 74. ½ ark ½ Light Models are a bit strict Models, like bureaucrats, expect input in a specific format... Data Engineering
  75. 75. ½ ark ½ Light Categorical data (unstructured) Data Engineering
  76. 76. ½ ark ½ Light Integer encoding 1 2 3 Data Engineering
  77. 77. ½ ark ½ Light Integer encoding 1 2 3 Is a dog 2x a cat? Data Engineering
  78. 78. ½ ark ½ Light One hot encoding 1 0 0 I haz a cat! Data Engineering
  79. 79. ½ ark ½ Light One hot encoding Dimensionality problem 1110000110010001010110100101 Data Engineering
  80. 80. ½ ark ½ Light Embedding 2 dimensions rather than 12 Data Engineering
  81. 81. ½ ark ½ Light what3words Data Engineering
  82. 82. Better measurement -> better data Better data -> better decision making
  83. 83. ½ ark ½ Light Does your site measurement look like this? Data Engineering
  84. 84. ● If it moves, fire an event ● Event confetti ● Vanity metrics ● Signal to noise ● Lots to go wrong ● Hard to understand ● Expensive to model ● There is another way.. Measure what matters Data Engineering
  85. 85. Correlation != causation This old chestnut...
  86. 86. ½ ark ½ Light Spurious correlation Data Engineering
  87. 87. ½ ark ½ Light Zero correlation? No tactical activation? Question it and potentially bin it. Data Engineering
  88. 88. ½ ark ½ Light Think long tail removal? Do these events contribute to signal? Data Engineering
  89. 89. Supervised or unsupervised? Learning is fun kids!
  90. 90. ½ ark ½ Light AI is ML with a marketing dept
  91. 91. ½ ark ½ Light Let’s talk ML Machine Learning
  92. 92. ½ ark ½ Light Supervised learning I need that report STAT! Machine Learning
  93. 93. ½ ark ½ Light UNsupervised learning My data feels weird...can you take a look? Machine Learning
  94. 94. ½ ark ½ Light Reinforcement learning GOOD BOY! Machine Learning
  95. 95. I need a PERFECT model Uh huh...no such thing
  96. 96. ½ ark ½ Light We have credit Machine Learning
  97. 97. ½ ark ½ Light Did you spend this?! Machine Learning
  98. 98. ½ ark ½ Light Did I buy stuff? Machine Learning
  99. 99. How wrong can you be? Being wrong on so many levels.
  100. 100. ½ ark ½ Light Type I & Type II errors Machine Learning
  101. 101. Credit card history inquisition Machine Learning
  102. 102. Recall/ Precision How many card transactions can I recall with precision for the last 6 months? Machine Learning
  103. 103. Recall/ Precision Precision 33 true positive 33 true positive + 1 false positive Recall 33 true positive 33 true positive + 14 false negative Accuracy 33 true positive + 2 true negative 50 total = 0.971 = 0.702 = 0.7 Machine Learning
  104. 104. Precision Machine Learning
  105. 105. Recall Machine Learning
  106. 106. Accuracy Machine Learning
  107. 107. Underfitting and overfitting Finding a balance
  108. 108. ½ ark ½ Light Underfitting Getting stuck on “Mount Stupid” Just. Doesn’t. Learn! Machine Learning
  109. 109. ½ ark ½ Light Overfitting. “That’s the way I’ve always done it!” Unable to generalise. A student learning by rote can’t handle the exam. Machine Learning
  110. 110. Machine Learning
  111. 111. ½ ark ½ Light Underfitting doesn’t learn. Overfitting doesn’t generalise. Sweet spot in between. Machine Learning
  112. 112. Attribution Peter O'Neill’s favourite
  113. 113. ½ ark ½ Light Yay Spurs! Attribution
  114. 114. ½ ark ½ Light How did the team perform? Danny Rose Harry Winks Harry Kane Attribution
  115. 115. ½ ark ½ Light Did Rose and Winks not turn up? 0 goals 0 goals 2 goals Attribution
  116. 116. ½ ark ½ Light Consider a whole season What’s the performance when they’re all involved in passages of play? 100 goals 5 goals 45 goals Attribution
  117. 117. ½ ark ½ Light Shapely Values 4.2 5.7 91.2 Attribution
  118. 118. ½ ark ½ Light Shapely values for channels as a measure of contribution to total conversions Last Mid Click campaign Display Campaign A 10 200 Social Campaign B 50 300 Organic 300 250 Direct 500 300 Referral 400 250 Attribution
  119. 119. ½ ark ½ Light I need to get to Holborn from Rickmansworth via the tube. Attribution
  120. 120. ½ ark ½ Light Today is a bad day on the tube... Attribution
  121. 121. ½ ark ½ Light What route to take? Attribution
  122. 122. ½ ark ½ Light Once I’m in Central London...I have options Attribution
  123. 123. ½ ark ½ Light Change at Kings Cross? Attribution
  124. 124. ½ ark ½ Light Walk from Euston Square? Attribution
  125. 125. ½ ark ½ Light I’ve done this before...when was I on time, and when was I late? Attribution
  126. 126. ½ ark ½ Light Markov Chain to see likelihood of being on time Attribution
  127. 127. ½ ark ½ Light On-time conversion attribution Walking from Euston Square gets me on time most often. Attribution
  128. 128. ½ ark ½ Light Conversion attribution for channels Attribution
  129. 129. ½ ark ½ Light Conversion attribution for channels Attribution
  130. 130. How’s your testing going? Is the test done yet?
  131. 131. What’s a null hypothesis? Your new default
  132. 132. ½ ark ½ Light Start with “I’m wrong”... Testing
  133. 133. ½ ark ½ Light Testing
  134. 134. ½ ark ½ Light But wait! Testing
  135. 135. ½ ark ½ Light Being confident you’re not wrong. p<0.05 Testing
  136. 136. ½ ark ½ Light p is a measure of surprise p<0.05 p says I saw a change that I wasn’t expecting (according to the null hypothesis) Testing
  137. 137. ½ ark ½ Light p is a measure of surprise p=0.5 p says Meh…. Testing
  138. 138. What you’ve just heard Things that you can now explain. ● Models ○ How to choose a model ○ Prediction ○ Classification ○ Linear regression ○ Logistic regression ● Classification ○ K-means clustering ● Data engineering ○ Data preparation ○ Dimensionality reduction ■ Principal Component Analysis ■ Embedding ○ Structured/unstructured data ○ Integer and one-hot encoding ○ Correlation != Causation ● Machine Learning ○ Supervised/unsupervised/reinforcement machine learning ○ Type I, Type II errors ○ Precision, Recall, Accuracy ○ Under fitting, over fitting ● Attribution ○ Shapely Values and Markov Chains ● Testing ○ The null hypothesis ○ p-values
  139. 139. Now you can speak Data Science Good luck at dinner parties
  140. 140. THANK YOU Doug Hall Director of analytics M I G H T Y H I V E . C O M M I G H T Y H I V E . C O M

×