Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Gaps in the algorithm

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 123 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Gaps in the algorithm (20)

Anzeige

Aktuellste (20)

Anzeige

Gaps in the algorithm

  1. 1. Gaps in the algorithm What machine learning can teach us about the limits of our knowledge SAScon 2017 Will Critchlow - @willcritchlow
  2. 2. The rise of ML has taken an already-complex system and made it incomprehensible
  3. 3. We might believe we know what works. But experiments show that’s not really true
  4. 4. Computers might already be better than us. By exploring their limits, we learn more about our own, and about the underlying algorithm
  5. 5. This is the sequel to a talk I’ve given a couple of times in the US ...and once in Leeds... if you didn’t see those, you can catch up here:
  6. 6. See the full video of my San Diego talk in DistilledU
  7. 7. If you did see one of them, have a nap for a few minutes Or check your email
  8. 8. Information retrieval PageRank Original research TWEAKS The “classical” algorithm is full of tweaks
  9. 9. Particularly this comment from a user called Kevin Lacker (@lacker): When Amit left, this thread was fascinating
  10. 10. High- dimension Non-linear Discontinuous The algorithm became far too complex to approximate in your head:
  11. 11. Authority Relevance It’s not even easy in two dimensions:
  12. 12. Authority Relevance It’s not even easy in two dimensions: Imagine choosing between a more-relevant page with less authority…
  13. 13. Authority Relevance It’s not even easy in two dimensions: Imagine choosing between a more-relevant page with less authority… ...and a less-relevant page with more authority.
  14. 14. It’s only getting worse under Sundar Pichai
  15. 15. Aided by the new head of search John Giannandrea and ML experts like Jeff Dean
  16. 16. If you haven’t already seen it, you should read the story of how Jeff Dean & three engineers took just a month to beat a decade’s worth of work by hundreds of engineers by attacking Translate with ML.
  17. 17. Audiences generally still think they’re pretty good at this You’re probably thinking something similar to yourself right now.
  18. 18. I’ve now run an in-person experiment a few times.
  19. 19. I show two pages that rank for a particular search along with various metrics for each page.
  20. 20. Then I ask the audience to stand up and predict which page ranks better for a given query.
  21. 21. I get people to sit down as they get them wrong. By the time we’ve done 2 or 3 almost everyone is sitting.
  22. 22. Wake up
  23. 23. Behind this chart is a lot of story...
  24. 24. It starts with a train.
  25. 25. This is the Thameslink. I commute into London on it. It’s also where I allow myself to write code.
  26. 26. It all started because I wanted to learn ML
  27. 27. keras.io I quickly found working in Keras was easier
  28. 28. In order to work on a problem area I knew well, I decided to build a system to predict rankings:
  29. 29. The question we really want to answer is: “How good is this page for this query?”
  30. 30. We want to train our model on Google data
  31. 31. But we don’t actually know how close together these different results are.
  32. 32. And we certainly don’t know if position #3 is the same relevance to this query as #3 is to a totally different query.
  33. 33. So I decided to train on the problem “does page A outrank page B for query X”? I.e. is it A then B or B then A? A B A B
  34. 34. We have tons more data to train this model on - every pair of URLs for every query we look at. A B A B
  35. 35. And it’s ultimately equivalent to “how do we improve page A?” A B A B
  36. 36. In mathematical terms, we express each page as a set of features: {‘DA’: ‘67’, ‘lrd’: ‘254’, ‘tld’: ‘1’, ‘h1_tgtg’: ‘0.478’, ‘links_on_page’: ‘200’ ....} Combine the two sets of features into one big vector. Label it as (1,0) if A outranks B and (0,1) if B outranks A. A B
  37. 37. Note: we’re doing no spam detection We’re working only with Google’s top 10
  38. 38. To run the model, we input a pair of pages with their associated metrics. New input
  39. 39. Model New input
  40. 40. We get back a probability of page A outranking page B. Model Probability- weighted predictions New input
  41. 41. Why? What are we doing here?
  42. 42. If we could do this perfectly, then we could tweak the values of our page (call that A`) and compare A to A` We’d get to simulate changes to see impacts without making them This is the holy grail
  43. 43. And when we get close the gaps will tell us where the unknowns in the algorithm lie
  44. 44. There’s a lot of dead-ends before we get anywhere near that though Let’s go stumbling through the trees
  45. 45. The first thing to realise is that data pipelines are hard. Really hard. There’s a reason that most of Google’s rules of ML is about data. Here’s what we did:
  46. 46. Raw rankings data
  47. 47. Raw rankings data Pull in API data
  48. 48. Raw rankings data Pull in API data
  49. 49. Raw rankings data Pull in API data Crawl the page
  50. 50. Raw rankings data Pull in API data Crawl the page Process on-page data
  51. 51. Google just released a useful tool for exploring and checking your data
  52. 52. This is what it looks like on our data (Running on their web version)
  53. 53. So I took this big dataset, restricted it to property keywords, and gave it a shot I have an ongoing argument with @tomanthonySEO about how much the keyword grouping matters...
  54. 54. OVER 90% accuracy Now hold on a second. That sounds implausible.
  55. 55. I was accidentally telling it the answer. I had included the rank in the features. Remember how I said that data pipelines are hard?
  56. 56. So I fixed that problem and re-ran it
  57. 57. OVER 80% accuracy Now hold on a second. That still sounds implausible.
  58. 58. One of the problems with deep learning is the the models are far from human understanding There is not really any concept of “explain how you got this answer”
  59. 59. So I tried a much simpler model on the same data A “decision tree classifier” from scikit-learn
  60. 60. You read these decision trees like flowcharts The first # refers to the two URLs in the comparison
  61. 61. The name refers to the feature in question
  62. 62. ...and the inequality should be self-explanatory
  63. 63. Then at the “leaf” node, you select the category that got more of the samples (the 2nd in this case - which means that B outranks A)
  64. 64. So you might end up taking a path like this:
  65. 65. ALSO OVER 80% accuracy This is getting silly.
  66. 66. I eventually figured out what was going on. There are a small number of domains that rank well for essentially every property-related search in the UK. My model was just learning: domain A > domain B > domain C
  67. 67. The model was essentially just identifying URLs Zoopla vs. findaproperty Rightmove vs. primelocation etc
  68. 68. So we started splitting the data better so that it never saw the same domains that it was trained on
  69. 69. Our current state-of-the-art is 65-66% accuracy on large diverse keyword sets. Decision trees are nowhere near as good on this data. We are still only using fairly naive on-page metrics.
  70. 70. Known factors Unknown factors The better our model gets, the more we can constrain how much of an impact other things must be having - advanced on-page ML, usage data etc
  71. 71. Known factors Unknown factors The better our model gets, the more we can constrain how much of an impact other things must be having - advanced on-page ML, usage data etc We expect to see progress from more advanced on-page analysis - we have a theory that link signals get you into the consideration set, but increasingly don’t reorder it:
  72. 72. See Tom Capper’s SearchLove San Diego talk in DistilledU
  73. 73. That was all very complicated. In practice, we are running real-world split-tests. This is a difficult thing to do, so we’ve built a platform to help:
  74. 74. In keeping with the theme of this presentation, I want to share some scary results It turns out that you are probably recommending a ton of changes that are making no difference, or even making things worse...
  75. 75. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  76. 76. Established wisdom and correlation studies would suggest ALT attributes on images might be good for SEO
  77. 77. Result: null test. No measurable change in performance.
  78. 78. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  79. 79. Surprisingly often, also a null test result
  80. 80. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  81. 81. Title tag before: Which TV should I buy? - Argos Title tag after: Which TV to buy? - Argos What happens when you match title tags to the greatest search volume?
  82. 82. Organic sessions decreased by an average of 8%
  83. 83. 1. Adding ALT attributes 2. Adding structured data 3. Setting exact match title tags 4. Writing more emotive meta copy
  84. 84. What happens when you try to write more engaging titles & meta?
  85. 85. What happens when you try to write more engaging titles & meta? Maybe not quite this engaging
  86. 86. Still nope.
  87. 87. Don’t worry. We’ve also had some great results.
  88. 88. Some that we have talked about before
  89. 89. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  90. 90. Category pages have lots of images and not much text
  91. 91. Adding structured data to category pages
  92. 92. Organic sessions increased by 11%
  93. 93. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  94. 94. We can render Javascript!
  95. 95. What happens if your content is only visible with Javascript? Javascript EnabledJavascript Disabled
  96. 96. Making it visible increased organic sessions by ~ 6.2%
  97. 97. Read more on our blog: early results from split-testing JS for SEO
  98. 98. 1. Adding structured data 2. Using JS to show content 3. Removing SEO category text
  99. 99. How does SEO text on category pages perform?
  100. 100. E-commerce site number 1 ~ 3.1% increase in organic sessions
  101. 101. E-commerce site number 2 - No effect/negative effect
  102. 102. And a bunch that we haven’t written up yet: Including: ● Replacing en-gb words & spellings with en-us on British company’s US site ○ Status: statistically significant positive uplift ● Fresh content: more recent update dates across large long-tail set of pages ○ Status: statistically significant positive uplift ● Change on-page targeting to higher volume query structure ○ Status: statistically significant positive uplift
  103. 103. All of this is why we have been investing so much in split-testing Check out www.distilledodn.com if you haven’t already. We will be happy to demo for you. We’re now serving well over a billion requests / month, and recently published information covering everything from response times to our +£100k / month split test.
  104. 104. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm
  105. 105. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar
  106. 106. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding
  107. 107. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites:
  108. 108. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites: a. It is very likely that if you are not split-testing, you are recommending changes that have no effect
  109. 109. Let’s recap 1. Even in a world of 200+ “classical” ranking factors, humans were bad at understanding the algorithm 2. Machine learning will make this worse, and is accelerating under Sundar 3. By applying our own machine learning, we can model the algorithm and find the gaps in our understanding 4. We can apply what we learn by split-testing on our own sites: a. It is very likely that if you are not split-testing, you are recommending changes that have no effect b. And (obviously worse) you are very likely recommending changes that damage your visibility
  110. 110. Questions: @willcritchlow
  111. 111. ● Sundar Pichai ● Go ● Jeff Dean ● Train ● Wake up ● Statue of Liberty ● Sleeping cat ● Complexity ● Holy Grail ● Wilderness ● Pipeline ● Houses Image credits ● Head in hands ● Rope bridge ● Spider ● Cheating ● Celebration ● Split rock ● Science ● Jolly Roger ● Thumbs up ● Spam

×