Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 227 Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (11)

Anzeige

Ähnlich wie Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making (20)

Anzeige

Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making

  1. 1. Data-Driven Off a Cliff Anti-patterns in evidence-based decision making Ketan Gangatirkar & Tom Wilbur
  2. 2. Data-Driven Off a Cliff Anti-patterns in evidence-based decision making Ketan Gangatirkar & Tom Wilbur
  3. 3. I help people get jobs.
  4. 4. Indeed is the #1 job site worldwide
  5. 5. Headquartered in Austin, Texas
  6. 6. We have tons of ideas
  7. 7. We have tons of bad ideas
  8. 8. Occasionally, we have good ideas
  9. 9. It’s hard to tell the difference
  10. 10. What helps people get jobs?
  11. 11. The only reliable way is to see what works
  12. 12. XKCD http://bit.ly/1JWz6Qh
  13. 13. We set up experiments
  14. 14. We collect results
  15. 15. We use the data to decide what to do
  16. 16. We’ve used data to make good decisions
  17. 17. But having data is not a silver bullet
  18. 18. We’ve also used data to make bad decisions
  19. 19. Science is hard
  20. 20. Problem Running an experiment can ruin the experiment
  21. 21. Wikipedia http://bit.ly/1LkLPiP
  22. 22. Change Effect on productivity Brighter light UP Dimmer light UP Warmer UP Cooler UP Shorter breaks UP Longer breaks UP
  23. 23. Change Effect on productivity Brighter light UP (temporarily) Dimmer light UP (temporarily) Warmer UP (temporarily) Cooler UP (temporarily) Shorter breaks UP (temporarily) Longer breaks UP (temporarily)
  24. 24. Change Effect on productivity Brighter light UP (temporarily) Dimmer light UP (temporarily) Warmer UP (temporarily) Cooler UP (temporarily) Shorter breaks UP (temporarily) Longer breaks UP (temporarily)
  25. 25. Problem Statistics are hard
  26. 26. Anscombe’s Quartet Wikipedia http://bit.ly/2dlTUci
  27. 27. Simpson’s Paradox
  28. 28. Simpson’s Paradox Wikipedia http://bit.ly/1OHFSOk
  29. 29. Using data is more than just statistics
  30. 30. + + + + = Good math. Bad idea.
  31. 31. Bad practices can undermine good math
  32. 32. You don’t need me to teach you to be bad at math
  33. 33. I’ll teach you to be bad at everything else
  34. 34. Anti-Lesson 01 Be impatient
  35. 35. p-value is the standard measure of statistical significance
  36. 36. p-value is by measurement, not experiment
  37. 37. If you check results on Monday, that’s one measurement
  38. 38. If you check results on Tuesday, that’s another measurement
  39. 39. Got the result you want?
  40. 40. Declare victory!
  41. 41. Move quickly! Because results and p-values can shift fast
  42. 42. of “winning” A/B tests stopped early are false-positives 80% http://bit.ly/1LtaLkV
  43. 43. Anti-Lesson 02 Sampling is easy
  44. 44. Beware the IEdes of March Story
  45. 45. Building Used Cars Search
  46. 46. Shoppers specifying price, mileage or year do better
  47. 47. Nudge shoppers to specify price, mileage or year
  48. 48. +3% conversion
  49. 49. After rollout, conversion > +3%
  50. 50. Why?
  51. 51. We’d taken a shortcut in our test assignment code X
  52. 52. Users on oldest browsers got ignored
  53. 53. Distorted sample Distorted results
  54. 54. Anti-Lesson 03 Look only at one metric
  55. 55. If a little bit is good, a lot is great
  56. 56. Indeed has a heart Story
  57. 57. ❤ > ★ ?
  58. 58. +16% Saves on search results page
  59. 59. Everyone ❤s ❤s!
  60. 60. ❤s everywhere!
  61. 61. Hearted
  62. 62. Not so fast
  63. 63. Did ❤ help people get jobs?
  64. 64. ❤ jobs: +16% Clicks: no change Applies: no change Hires: no change
  65. 65. I help people ❤ jobs.
  66. 66. Upsell team Story
  67. 67. We formed an “upsell team” and measured their results
  68. 68. + = Success measure
  69. 69. It’s working! Upsells
  70. 70. So why isn’t revenue moving? Overall Revenue
  71. 71. + 0 -
  72. 72. = ⅓+⅓ -⅓
  73. 73. What you measure is what you motivate
  74. 74. Redefine success to include all outcomes
  75. 75. Upsell Team revenue +200%
  76. 76. Anti-Lesson 03: Reloaded Look at all the metrics
  77. 77. It's better for them. Is it better for us?
  78. 78. Job applications: Up Job clicks: Down Recommended Jobs traffic: Up Job views: Sideways New resumes: Up Return visits: Down Logins: Up Revenue: Down (and it goes on…)
  79. 79. We didn’t really know what we wanted
  80. 80. Too much noise from too many metrics
  81. 81. I help people get jobs.
  82. 82. Anti-Lesson 04 Be sloppy with your analysis
  83. 83. We engineer features rigorously
  84. 84. Specification Source control Code review Automated tests Manual QA Metrics Monitors ...
  85. 85. But analysis…
  86. 86. Bad analysis won’t take down Indeed.com
  87. 87. 200 million job seekers don’t care about our sales projections
  88. 88. So we don’t try as hard with analysis code
  89. 89. Specification Source control Code review Automated tests Manual QA Metrics Monitors ...
  90. 90. Dubliners Story
  91. 91. Indeed reports on economic trends
  92. 92. South Carolinians wanted to move to Dublin
  93. 93. Dublin?
  94. 94. No, the other one
  95. 95. Incorrect IP location mapping
  96. 96. IP blocks for South Carolina got reallocated to London, England
  97. 97. Worse things can happen
  98. 98. Growth and Debt Story
  99. 99. “Growth in a Time of Debt” Carmen Reinhart and Kenneth Rogoff 2010
  100. 100. Public debt > 90% GDP leads to slower economic growth
  101. 101. Governments made policy based on this
  102. 102. Fixing the error eliminated the effect Source: https://goo.gl/zAcd1e
  103. 103. Genetic Mutation Story
  104. 104. 20% of genetics papers have Excel errors Source: http://wapo.st/2cWyrpJ
  105. 105. SEPT2 to a geneticist is Septin 2
  106. 106. SEPT2 to Excel is 42615
  107. 107. Does your company use spreadsheets?
  108. 108. How do you know they’re correct?
  109. 109. Under-spending Advertisers Story
  110. 110. Employer budgets ran out before the end of the day
  111. 111. So no evening job seekers saw the jobs
  112. 112. How big was this missed opportunity?
  113. 113. Clicks received 1260 Out of budget time 20:00 % of day w/o budget 0.1667 Potential clicks 1260 / (1 - 0.1667) = 1512 Missed clicks 1512 * 0.1667 = 260 Missed Clicks Report Dear Customer, You got 1,260 clicks yesterday. Your daily budget ran out at 8:00pm. If you funded your budget through the whole day, you’d get another 260 clicks - a +20% improvement! Get More Clicks
  114. 114. Assumption 100 75 50 25 0 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 Missed = 260 clicks (+20%) 0:00
  115. 115. Reality 100 75 50 25 0 Missed = 100 clicks (+8%) 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:000:00
  116. 116. Naive analysis bad recommendation
  117. 117. Anti-Lesson 05 Only look for expected outcomes
  118. 118. Zero results pages from misspelled locations
  119. 119. Goals: fewer ZRPs, more job clicks
  120. 120. Zero-results pages -2.7%
  121. 121. Job clicks +8%
  122. 122. +1,410% Ad revenue
  123. 123. +1,410% Ad revenue
  124. 124. ads
  125. 125. Ad revenue after fix
  126. 126. Treatment on homepage Effect on search page
  127. 127. Anti-Lesson 06 Metrics, not stories
  128. 128. I help people get jobs.
  129. 129. How do I know if people got jobs?
  130. 130. I need employers to tell me
  131. 131. One employer hired 4500 people in 45 minutes!
  132. 132. Nope
  133. 133. Accurate recording of outcomes helps us
  134. 134. It doesn’t help employers
  135. 135. They don't care about using the product “right”
  136. 136. Go away!
  137. 137. There is no “user story”
  138. 138. Right metrics + wrong story = wrong conclusion
  139. 139. Anti-Lesson 06: Parte Deux Story over metrics
  140. 140. Stories are seductive
  141. 141. Even incorrect stories are seductive
  142. 142. Taste Buds Story
  143. 143. Taste map
  144. 144. Totally wrong
  145. 145. Every bite you eat proves it’s wrong
  146. 146. People still believe it
  147. 147. Job Alerts Story
  148. 148. Success for emails is well understood
  149. 149. New subscriptions: Good Email opens: Good Clicking on stuff: Good Unsubscribing: Bad
  150. 150. I help people get emails.
  151. 151. I help people get jobs.
  152. 152. What does job seeker success look like?
  153. 153. 01 Search for jobs
  154. 154. 02 Sign up for alerts
  155. 155. 03 Click on some jobs
  156. 156. 04 Apply to some jobs
  157. 157. 05 Get a job!
  158. 158. 06 Unsubscribe from emails
  159. 159. People with new jobs don't need job alerts
  160. 160. The standard story for email fails here
  161. 161. Light and Dark Redux Story
  162. 162. It’s a persuasive story
  163. 163. But the original study was flawed
  164. 164. Hawthorne Revisited … the variance in productivity could be fully accounted for by the fact that the lighting changes were made on Sundays and therefore followed by Mondays when workers’ productivity was refreshed by a day off.” https://en.wikipedia.org/wiki/Hawthorne_effect
  165. 165. We con people with stories
  166. 166. We con ourselves with stories
  167. 167. Anti-Lesson 07 Believe in yourself
  168. 168. Believing in yourself can be good
  169. 169. “My startup will succeed.”
  170. 170. Often it’s bad
  171. 171. “I’d never fall for a scam like that.”
  172. 172. “I knew it all along.”
  173. 173. “I’m too smart to make that mistake.”
  174. 174. Every story of mistakes is deceptive
  175. 175. We tell stories with 20/20 hindsight
  176. 176. When we live the story, we live in the fog
  177. 177. You won’t think you’re making a mistake
  178. 178. Search your past for mistakes
  179. 179. Painful, embarrassing mistakes
  180. 180. If you didn’t find any, you’re exceptional
  181. 181. Either you’re making mistakes you find
  182. 182. Or you’re making mistakes you don’t find
  183. 183. How do you defend against mistakes?
  184. 184. The first step is admitting you have a problem
  185. 185. There are 174 cognitive biases [citation needed]
  186. 186. Data can help you make better decisions
  187. 187. Or more confidently make bad decisions
  188. 188. Data can’t make you a better decision-maker
  189. 189. Good data + bad decision-maker = bad decision
  190. 190. Our anti-lessons teach you how to use data badly
  191. 191. Do the opposite to do better
  192. 192. Lesson 01 Lesson 02 Lesson 03 Lesson 04 Lesson 05 Lesson 06 Lesson 07 Be patient Sampling is hard Focus on a few, carefully chosen metrics Be rigorous with your analysis Watch out for side effects Use metrics and stories Plan for fallibility
  193. 193. Learn from our mistakes
  194. 194. Be prepared for your own
  195. 195. Learn More Engineering blog & talks http://indeed.tech Open Source http://opensource.indeedeng.io Careers http://indeed.jobs Twitter @IndeedEng
  196. 196. Questions? Contact us ketan@indeed.com | twilbur@indeed.com
  197. 197. Seriously, that was the end Contact us ketan@indeed.com | twilbur@indeed.com
  198. 198. There are no more slides Contact us ketan@indeed.com | twilbur@indeed.com
  199. 199. Stop here Contact us ketan@indeed.com | twilbur@indeed.com

Hinweis der Redaktion

  • Good evening, thanks for coming to our @IndeedEng Tech Talk tonight.
  • This is “Data-driven off a cliff, anti-patterns in evidence-based decision making”. I’m Tom Wilbur, and I’m a product manager at Indeed, and...
  • I help people get jobs.
  • Indeed is the #1 job site worldwide. We serve over 200M monthly unique users, across more than 60 countries and in 29 languages.
  • The primary place that jobseekers start on Indeed is here - the search experience. It’s simple -- you type in some keywords and a location and you get a ranked list of jobs that are relevant to you..
  • Indeed is headquartered here in Austin, Texas, the capital of the Lone Star State. Austin is also the location of our largest engineering office, and we have engineering offices around the world in Tokyo, Seattle, San Francisco and Hyderabad. So we have tons of smart engineers and product teams working around the clock to make a better Indeed.

    https://en.wikipedia.org/wiki/Flag_of_Texas#/media/File:Flag_of_Texas.svg
  • We have tons of ideas.... BUT
  • We have tons of bad ideas, too.
  • Now occasionally we do have good ideas, but
  • It’s hard to tell the difference. What we really want to know, is --
  • What helps people get jobs? We believe...
  • The only reliable way to know is just try stuff and see what works. (NEXT TO JOKE)
  • (pause) So at Indeed,
  • We set up experiments. We run A/B tests on our site where users are randomly assigned to different experiences.
  • We collect results. We observe the users’ behavior. Our LogRepo system adds about 6TB of new data every day.
  • And we use that data to decide what to do. To see which features and capabilities do help people get jobs, and which don’t.
  • We’ve used data to make good decisions,
  • But having a ton of data is not a silver bullet.
  • We’ve also used data to make bad decisions. Because the truth is,
  • Science is hard. (NEXT TO JOKE)
  • (pause) For example, one serious problem is that the very act of just
  • Running an experiment, can ruin the experiment itself. Let me tell you a quick story.
  • There was a famous experiment conducted in the late 1920s at an electrical factory outside of Chicago, Illinois. Called the Hawthorne Works. The factory managers wanted to improve worker productivity, so they decided to try some changes to the worker environment.
  • They changed the lighting conditions, sometimes brighter, sometimes dimmer. They changed the temperature in the factory, and length of breaks. Initially they were excited, as their early experiments resulted in improvements in worker productivity.
  • Brighter lights? Productivity goes up! Dimmer lights? Productivity goes up! Warmer? Up! Cooler? Up. Shorter breaks, longer breaks, it seemed that everything they tried improved worker productivity. And on top of that, none of these improvements stuck.
  • It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
  • It all quickly faded. Ultimately the conclusion of the researchers was that the very fact of changing the conditions, of running the test, of observing the results, affected the workers’ behavior. This effect is now known as -- the Hawthorne Effect. Those of us that run experiments to optimize websites all over the world know this well. When we see a change in user behavior, we often ask the question, “but will it last? Is that change real, or is it just the Hawthorne Effect?” So science is hard. And if that wasn’t enough,
  • Statistics are hard. There are plenty of ways where an analysis can produce surprising if not contradictory results.
  • For example, consider “Anscombe’s quartet”. In 1973, statistician Francis Anscombe described four very different sets of 11 points that all have the same basic statistical properties -- mean, variance, correlation, and as the blue line shows, regression. This demonstrates that looking at a statistical calculation isn’t at all sufficient to understand your data, especially when there are outliers.

    https://en.wikipedia.org/wiki/Anscombe%27s_quartet


  • Another example is Simpson’s Paradox. This is where a statistician goes back in time with his toaster and starts accidentally changing the future and the more he tries to fix it, the worse it gets. There are no donuts, and people have lizard-tongues, and that’s just no way to make data-driven decisions. Wait no, that’s Homer Simpson’s Paradox from Treehouse of Horror V. Sorry.
  • Edwin Simpson’s Paradox is something else. This result describes the situation where individual groups of data tell a different story than when the data are combined. On this chart for example, the four blue dots and four red dots each show a positive trend, but when combined, you get the black dotted line that shows a negative trend overall. Imagine if you saw that revenue for mobile was increasing, and revenue for desktop was increasing, but overall revenue appears to be decreasing. Now what do you do? Usually this situation means you don’t understand underlying causal relationships in your data. Because statistics are hard.
  • But using data correctly is more than just statistics. If you apply good math to a bad idea...
  • Just because it’s mathematically correct, doesn’t mean you won’t seriously regret the outcome of that test.

    http://www.glamour.com/images/health-fitness/2011/06/0606-tequila_at.jpg
  • So bad practices can undermine good math.
  • You don’t need me to teach you how to be bad at math.
  • But tonight, I’ll teach you to be bad at everything else. On top of the inherent challenges of science, statistics and bad ideas, we’ll share with you our powerful techniques of how to make data-driven decisions… the wrong way.
  • So, we’ll start with Anti-Lesson number 1. Be Impatient. One of the best ways to be bad at evidence-based decision making is to be impatient.
  • A p-value is the standard measure of statistical significance. It represents the probability that the observed result would happen if the null hypothesis were true, or, informally, the chance that what you’ve measured is just random chance. For a successful A/B test, we want to see positive results with a p-value below some threshold, typically 5% or .05.
  • But the calculation of p-value is by measurement, not the whole experiment. It only tells you how confident to be in your results given the circumstances of the test thus far.
  • If you check results on Tuesday, that’s another measurement. Now your boss is asking if it’s significant yet. So you keep checking and checking,
  • And your data scientist is muttering, saying you should just wait to get to the necessary sample size she estimated. It’s really frustrating. (pause) There’s a better way.
  • Got the result you want? On that test that you knew was a good idea. Are the results already positive after only two days? And when you checked the p-value on your phone while in line at Starbucks was it at less than 0.05?
  • Declare victory! Turn off the test and roll it 100%. Don’t waste your valuable time with that statistical wah wah wah about regression to the mean and probability of null hypothesis something.
  • http://www.qubit.com/sites/default/files/pdf/mostwinningabtestresultsareillusory_0.pdf (Martin Goodson, Research Lead at Qubit, a UK web consultancy)

    In fact, Martin Goodson shows that if you were to do a check for significance every day, and stop positive tests as soon as they show significance, 80% of those “winning” A/B tests are likely false-positives. Are bogus results. And that’s why being impatient is a great way to make bad decisions.
  • Another great way to do data-driven product development wrong is to believe that sampling is easy. I mean, it’s hard and time-consuming to make sure that you’ve got representative users in your A/B tests.
  • Let me illustrate this with a story I call, “Beware the IEdes of March.” And you’ll see how well this anti-pattern worked for me.
  • At a previous company where I worked, we were building Used Car search experiences for major media brands, and we were doing A/B tests to try to increase the probability that we successfully connect a car shopper to a dealer with matching inventory.
  • One of the things we had observed when we analyzed successful user behavior, was that shoppers specifying price, mileage or year in their search do better. They’re more successful at finding cars they are interested in. So we had a hypothesis --
  • Could we encourage shoppers to specify price, mileage or year, and improve conversion?
  • We tried a couple ideas, including moving the price, mileage and year facets up in the search UI to make it easier to find, and we also tried a tooltip nudge, directly encouraging users to add these terms to their search.
  • Of all the variants, the tooltip nudge wins, we saw a 3% lift in unique conversion (with a p-value of .04). So we decided to roll it out.
  • It turns out, we’d taken a shortcut in our test assignment code. This was the summer of 2009, when IE had 60%+ of the US browser market, and my company, like many others, was sick and tired of supporting IE6 (the browser that PC World called “the least secure software on the planet”). So to work around a problem in our code that assigned users to test variants, we just didn’t handle IE6.
  • So the users on the oldest browsers got ignored. This turned out to be 20%+ of users. And even worse, we learned those 20% didn’t behave the same as the remaining 80%. From later analysis and user research, we came to believe that users on the oldest browsers also shopped differently, for different cars. They were on average more price sensitive and benefitted more from that nudge.
  • We’d depended on a distorted sample of the population. We went through all the effort to run a test, and a technical shortcut we took meant that we didn’t measure the results accurately. And we made an ill-informed decision. Because we thought sampling was easy.
  • Which brings us to the third way I’ll teach you how to do data-driven decision making wrong. Look only at one metric. If there’s one thing we know in life, it’s that
  • If a little bit is good, a lot is great. Anything worth doing is worth overdoing. I mean, there’s never a downside to that, is there?
  • http://www.magpictures.com/resources/presskits/bsf/10.jpg
  • Our first story I want to share about looking only at one metric, is called “Indeed has a heart,” and it’s about a test we ran in our mobile app.
  • As jobseekers explore available jobs, they have the option to Save a job so they can easily come back to it later. We decided to test changing the icon associated with a Save from a star to a heart. We did this on job details page,
  • And on the search results page.
  • So, were hearts better than stars?
  • They were! We observed a 16% increase in Saves on the search results page.
  • Now, everyone loves hearts! We rolled our test out 100%. But why stop there? The obvious thing to do is
  • To have hearts everywhere!
  • Stars on your Amazon reviews?
  • Nope! Hearts now.
  • We sent our test results to Google, and in the next version of Gmail the Starred folder will be replaced with Hearted!
  • And we’ve got a bill in front of the new state legislature. We’re all gonna live and work in the Lone Heart State!
  • [sigh] Not so fast. Changing the stars to hearts improved the one metric we were looking at - usage of the “Save this job” feature, but
  • Did Hearts help people get jobs?
  • Sadly, no. There was no discernable impact on job seeker success. When we analyzed longer-term behavior of jobseekers, there was no evidence of an improvement in the primary metrics -- clicks, applies, hires. Which is unfortunate, because that’s our goal, not
  • To help people heart jobs. What we had done was to focus only on one metric.
  • If you really want to do evidence-based decision making wrong, you should make sure you look only at one metric in situations beyond your A/B tests. This anti-lesson can do damage all across your company.

    For example, at Indeed, we have a talented client services team that works with our customers to keep them engaged and highlight the value they’re receiving. Growing revenue from existing customers is clearly important, and we had a hypothesis that if we had a team focused only on that, we could be more successful.
  • So, we formed a dedicated “upsell team” and measured their results on a dashboard.
  • What we looked for was when there was a upsell contact with a customer, and then subsequently the customer’s spend went up, we credited the rep for that increase on the dashboard. This was also tied to a bonus program. So we started off, and
  • the dashboard told us it’s working! Reported upsells on the dashboard showed lots of wins, 10s of thousands of $$.

  • But when we stepped back, revenue for the total pool of accounts wasn’t increasing.
  • As it turned out, not every contact between a rep and a customer results in an increase in spend. Our naive dashboard looked only at one metric - the positive outcomes.
  • But in reality, Some are neutral. Some are negative. And so it didn’t measure the right result. In fact, when you’re showing people a metric about their performance,
  • What you measure is what you motivate. In talking to the reps, because our dashboard only looked at the positive outcomes, they were less interested in contacting customers who were planning to lower their spend. The incentives were only about getting to an increase, nothing else mattered. So we made a change.
  • We redefined success to include all the outcomes, updated the dashboard and continued the experiment of the upsell team. After that one change, we saw more diverse interactions, and better results!
  • The Upsell Team’s revenue increased by 200%, and we decided to continue the experiment and grow the team.

    So we saw two examples there about how looking only at one metric, especially when it’s an easily-computed feature metric or maybe the first metric you thought of, is a great way to do evidence-based decision making wrong. Now, that anti-lesson has a flip-side, too --


    Caveats: not an A/B test, lots of confounding factors, small sample size, team got better at their job, grain of salt, etc. But we also can directly observe the actors in this story, so we focus on how the metric affected behavior.
  • Because another secret to making bad data-driven decisions is to look at all the metrics. For this anti-lesson, we’ll return to Indeed’s mobile app.
  • We were comparing our mobile app to other companies’ apps and noticed a growing adoption of a particular way to indicate a menu. They were using what’s now popularly known as the “hamburger menu”. One of our product managers stole the idea...
  • (pause) And we decided to test a hamburger menu to improve Indeed’s mobile app.
  • It’s better for them, is it better for us? Let’s look at the results.
  • [read through list, growing more confused]

    <click> at Logins

    (pause) What we realized was...
  • We didn’t really know what we wanted. We didn’t start our test with a goal in mind for what the hamburger menu was supposed to do. So when the metrics came back with conflicting answers, we couldn’t know if the change was any good.
  • There was too much noise from too many metrics. We ended up leaving this test running for a looong time hoping the right decision would become clear. It didn’t. We had lots of discussions and email threads and meetings where “seriously we need to make a decision about the hamburger test.” In the end, we turned it off, so there’s no hamburger in the Indeed mobile app.

    In this case, by not starting with a clear goal, and by looking at all the metrics, we spent a lot of time and energy and failed at making a good evidence-based decision.
  • Tom: Now I’d like to introduce my colleague Ketan who will teach us about even more exciting ways to make bad decisions. Ketan?
  • Who’s got time for rigorous analysis? Just give me an Excel spreadsheet.
  • We often don’t even see it as code
  • https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/
    http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
  • https://www.buzzfeed.com/scott/disaster-girl
  • Do you have a spec?
  • Do you have a spec?
  • Do you have a spec?
  • http://go.indeed.com/RZtb4csgenvm
  • http://go.indeed.com/RZ3rtm7ddtot
    http://go.indeed.com/RZbcaq1a72dd <<<
  • http://go.indeed.com/RZ2sqmo2u6kk
    http://go.indeed.com/RZg3llr4991l << TODO
  • http://go.indeed.com/RZ2sqmo2u6kk
    http://go.indeed.com/RZg3llr4991l
  • There are no keyword ads on this page
  • http://go.indeed.com/RZ6afqh4pci2

    changed to:

    http://go.indeed.com/RZmmebqb3uvj
  • http://winetimeshk.com/admin/wp-content/uploads/2015/08/tongue-map.gif
  • We didn't.
  • Some of the mistakes I'm telling you about were painful and embarrassing for us.
  • Or you think you are
  • …. But there’s one big lesson remaining
  • …. But there’s one big lesson remaining
  • …. But there’s one big lesson remaining

×