Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

How to Spot a Bear - An Intro to Machine Learning for SEO

54.008 Aufrufe

Veröffentlicht am

Machine Learning is becoming a more and more important part of everything Google does, but can seem quite inaccessible to learn about.

This presentation doesn't try to teach you how to do ML, but focuses instead on showing you the types of problems that ML can address, how Google have used it previously, and how they might use it in the future.

Veröffentlicht in: Internet
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

How to Spot a Bear - An Intro to Machine Learning for SEO

  1. 1. @TomAnthonySEO April 2015 - BrightonSEO HOW TO SPOT A BEAR A Machine Learning Introduction for SEOs
  2. 2. Can you define a list of rules for spotting bears?
  3. 3. 1) Four legs. Let’s start with:
  4. 4. Bear!
  5. 5. List of rules (first half): (when I asked in the office) 1. Four legs. 2.Breathes. 3.Furry. 4.Long snout.
  6. 6. Bear!
  7. 7. List of rules: 1. Four legs. 2.Breathes. 3.Furry. 4.Long snout. 5.Brown. 6.Not always brown. 7. Mammal. 8.No tail. (how do you spot a mammal?!)
  8. 8. Let’s check our rules…
  9. 9. Rules say: Bear
  10. 10. Rules say: Harmless Furry Thing (less than 4 legs)
  11. 11. Rules say: Odd Grey Creature (no long snout)
  12. 12. Remove ‘long snout’, and rules say: Bear (Extra-terrestrial bear?!)
  13. 13. Our rules suck.
  14. 14. A different bear: Google’s Panda
  15. 15. Can you define a list of rules for spotting spammy pages? Same problem as bears!
  16. 16. NBED GOOD PAGE Good page
  17. 17. NBED GOOD PAGE Commercial page, still good.
  18. 18. Hrm…
  19. 19. Seems legit…
  20. 20. WTF!
  21. 21. Google can’t write rules.
  22. 22. What we can do is identify spammy or non-spammy attributes.
  23. 23. Are there adverts on the page? Are there lots of spelling mistakes? Is there little text content? Are there Calls To Action in ALL CAPS? Some Possible Spam Signals
  24. 24. Smooth segue to: Machine Learning
  25. 25. List of pages we’ve manually classified. List of attributes that we believe are important to classifying pages.
  26. 26. adverts on page? more than 5 spelling mistakes? less than 200 words of content? CTA in ALL CAPS? site A Y Y Y Y Spam Site site B N N Y Y Good Site site C Y N N N Spam Site site D N Y N Y Spam Site site E N Y N N Good Site Example Data
  27. 27. Neural Networks: A Perceptron Inputs Output Neuron
  28. 28. Neural Networks: A Perceptron Inputs Output 1 if: inputs >= 1 output TRUE 0 1 0 0.5 0.5 0.5 0.5
  29. 29. 1 x 0.5 = 0.5 0 x 0.5 = 0 1 x 0.5 = 0.5 0 x 0.5 = 0 1 ______ Total: Output: TRUE 1 if: inputs >= 1 output TRUE 0 1 0 0.5 0.5 0.5 0.5 TRUE
  30. 30. 1 x 0.5 = 0.5 0 x 0.5 = 0 0 x 0.5 = 0 0 x 0.5 = 0 0.5 ______ Total: Output: FALSE 1 if: inputs >= 1 output TRUE 0 0 0 0.5 0.5 0.5 0.5 FALSE
  31. 31. 1 x 0.5 = 0.5 0 x 0.5 = 0 1 x 0.4 = 0.4 0 x 0.5 = 0 0.9 ______ Total: Output: FALSE 1 if: inputs >= 1 output TRUE 0 1 0 0.5 0.5 0.4 0.5 FALSE
  32. 32. adverts on page? more than 5 spelling mistakes? less than 200 words of content? CTA in ALL CAPS? site A Y Y Y Y Spam Site site B N N Y Y Good Site site C Y N N N Spam Site site D N Y N Y Spam Site site E N Y N N Good Site Example Data
  33. 33. Untrained Neuron Is site spam? adverts >5 spelling mistakes < 200 words content CTA in ALL CAPS if: inputs >= 1 output TRUE 0.5 0.5 0.5 0.5
  34. 34. Training adverts >5 spelling mistakes < 200 words content CTA in ALL CAPS if: inputs >= 1 output TRUE 0.5 0.5 0.5 0.5 0 0 1 1 SPAM!
  35. 35. Training adverts >5 spelling mistakes < 200 words content CTA in ALL CAPS if: inputs >= 1 output TRUE 0.5 0.5 0.6 0.6
  36. 36. After training: 4/5 sites correct Is site spam? adverts >5 spelling mistakes < 200 words content CTA in ALL CAPS if: inputs >= 1 output TRUE 0.2 0.7 0.4 0.5
  37. 37. ANNs typically have many neurons source: http://www.teco.edu/~albrecht/neuro/html/node18.html
  38. 38. Deep Learning
  39. 39. Humans are good at pattern matching
  40. 40. We’re better than machines… source: Pawan Sinha (http://web.mit.edu/bcs/sinha/papers/sinha_recog_review_NN.pdf)
  41. 41. ML can learn to recognise cats from examples
  42. 42. Deep Learning learns more like us
  43. 43. Ok, so what does this have to do with Google?
  44. 44. PandaML based algorithm updates
  45. 45. Old index Caffeine Caffeine - Infrastructure Update (we believe this made Panda+Penguin possible)
  46. 46. Hummingbird is to ??? as Caffeine is to Panda+Penguin
  47. 47. Hummingbird Is it similar to Caffeine? Is it the basis for new natural language algorithms?
  48. 48. Where is Google going next with ML?
  49. 49. Idea Image Search 2.0
  50. 50. Image Labelling
  51. 51. Image Labelling
  52. 52. Video Labelling
  53. 53. ML Generated Image Descriptions “Two pizzas sitting on top of a stove top oven”
  54. 54. Natural Language Faceted Search Idea
  55. 55. ‘show me olympic athletes' ‘show me the women'
  56. 56. “Find well rated vegetarian cooking books written after 1990” How about:
  57. 57. Idea Factual Accuracy as a Ranking Factor
  58. 58. Fact CheckingKnowledge Vault
  59. 59. Idea: Bad Facts NBED- shot of Google talking about this shit Estimating ‘Trustworthiness’
  60. 60. Idea Entirely ML Generated Algorithm?
  61. 61. http://dis.tl/ml-algo
  62. 62. Thanks! :) @TomAnthonySEO

×