Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

5.153 Aufrufe

Veröffentlicht am

Combining Web Crawler Data with Server Logs to highlight Crawl Budget opportunities. Get Google crawling and indexing more of your pages in Organic Search Results!

Veröffentlicht in: Internet
  • Your opinions matter! get paid for them! click here for more info...■■■ https://tinyurl.com/realmoneystreams2019
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016

  1. 1. LOG FILE ANALYSIS 5 CRITICAL TECH SEO QUESTIONS YOUR LOGS CAN ANSWER #BrightonSEO | @SearchMATH
  2. 2. #BrightonSEO As used by… About Botify
  3. 3. Here’s the problem… > Google doesn’t crawl every page of your website >> If a page isn’t crawled it won’t be indexed >>> If a page isn’t indexed, it won’t make you money
  4. 4. Identify Desired Outcomes and Objectives Information Gathering Action Planning Implementation and Review New Initiative Planning Process This presentation will focus on the “Information Gathering” stage of the process.
  5. 5. Identify Desired Outcomes and Objectives Information Gathering Action Planning Implementation and Review New Initiative Planning Process Dawn Anderson’s slide-deck “BRINGING IN THE FAMILY DURING CRAWLING” is an insightful guide to help you identify crawl budget opportunities. Dawn also suggests powerful actions you should explore.
  6. 6. Log File 101
  7. 7. Hypertext Transfer Protocol (HTTP) Client Server HTTP Request GET /index.html HTTP/1.1 Host: www.exampleshop.com User-Agent: Mozilla 5.0 HTTP Response HTTP/1.1 200 OK Date: Mon, 11 Jul 2016 08:06:45 GMT Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 04 Feb 2016 23:11:55 GMT Etag: “3f84f-1b9-3elcd16b” Accept-Ranges: bytes Content-Length: 458 Connection: close Content-Type: text/html; charset=UTF-8 Fig 1: HTTP Client/Server Communication This is a standard HTTP/1.1 exchange between Client (e.g. Browser or Googlebot) & your Server.
  8. 8. Server Log Files Server 188.65.114.122 - - [19/Jul/2016:08:07:05 -0400] "GET /women/shoes/ converse14579/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" charset=UTF-8 Server IP Timestamp (date & time) Method (GET / POST) Request URI HTTP status code User-agent Fig 2: Example Server Output WHO’S REQUESTING? | WHEN? | HOW? WHAT FILE? SERVER RESPONSE Server Logs are the SINGLE SOURCE OF TRUTH when it comes to seeing how search engines, such as Googlebot, assess your website. Your webserver keeps a file of every hit the server receives during the exchange on the previous slide. Your very own data treasure chest.
  9. 9. “[Cleanup your architecture because] we get lost crawling unnecessary URLs and we might not be able to crawl and index your new and updated content as quickly as we would otherwise… There are a number of crawlers you can use to crawl your website on your own, to run across your website.” Google Webmaster Central office hours hangout, 16 Oct 2015 @JohnMu Crawl your website with a THIRD-PARTY CRAWLER @JohnMu Conduct LOG FILE ANALYSIS
  10. 10. How does Log Files Analysis differ to Web Crawl Analysis?
  11. 11. Home Category Subcategory Detail Web Crawl Systematically fetch, retrieve, and validate the HTML on every page of your website to simulate Googlebot’s/Bingbot’s analysis of your pages _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ ______________ ______________ ______________ ______________ Let’s consider how the information is collected… This is great for optimising your HTML code and helps you try and produce a best in class website.
  12. 12. But that’s not how search engines operate and crawling alone lacks the evidence to back up your strategy. For example, Googlebot might enter through a popular category and crawl the same pages time after time. Search Console won’t tell you this and neither does simulating a crawl from your homepage. So, you need to crawl your architecture and compare the data to Google’s activity (via your log files) to gain an insight into how you’ll get more of your money making pages crawled and indexed.
  13. 13. What barriers do people face when trying to study this vital information? • Access to Server Logs • File Sizes • Misplacing trust in Search Console • Time required to process the data But I don’t think you should be deterred and here’s why…
  14. 14. Accessing your logs is simpler than you think. Your organisation is probably already using them. Common Log Analysis use cases for eCommerce organisations include: >> Application Management >> Access Management >> Network Forensics >> Compliance Popular products used by Applications and Security teams at major Enterprise companies include: LogRhythm, Loggly, and Splunk.
  15. 15. Splunk (a log file storage and processing company): Market Cap $8.6bn, 11,000 Customers
  16. 16. http://www.slideshare.net/Splunk/splunklive-london-john-lewis This is a picture from a presentation I watched at SplunkLive in London 2016. John Lewis visualise their operational intelligence from log files. You can get your logs!!
  17. 17. It’s true that the volume of data involved can make working with the files prohibitive. For example, if a site receives 50,000 visitors a day browsing an average of 5 pages per session, that’s 250,000 log entries per day for the HTML 7.5M entries per month Now add 10 assets requested from the server for each page: 75,000,000 lines in your Log Files per month
  18. 18. SEOs regularly monitor and trend site architecture data (HTTP codes, etc.) in third-party apps but it’s not possible to scrutinise Search Console’s crawling and indexing charts, but you really should.
  19. 19. So, how is engineering helping us overcome these barriers and expand our knowledge? >>> Secure File Transfer Protocol (SFTP) >>> Storing and trending Log Data thanks to cloud services >>> Processing Automation (saving TIME) >>> Diffing Log Data with Simulated Crawl Data for greater insights
  20. 20. Let’s move onto the questions I think you should be looking to answer.
  21. 21. What are the typical questions SEOs try and answer with Log Analysis? • Where do I have accessibility errors? • Which pages are being spidered most frequently? • Is spammer activity proving detrimental to performance? • Which pages haven’t been crawled by search engines? And these are all very valid and helpful but I suggest looking at the next list too…
  22. 22. # 5 Critical Questions / KPIs Score 1 What is my ‘Crawl Ratio’? 2 What percentage of my compliant pages (2xx & unique) will Google crawl each month? 3 How deep will Google crawl into my site architecture? 4 What does Google consider to be my Top, Middle and Long Tail pages? 5 What is my ‘Crawl Window’ score? HOW MANY MORE PAGES NOW HAVE THE POTENTIAL TO MAKE US MONEY? (THANKS TO MY EFFORTS OVER THE PAST 30 DAYS?) INDE X
  23. 23. Crawl Definition Score Crawl Rate requests per second Googlebot makes to your site when it is crawling it Crawl Budget the maximum number of pages that Google crawls on a website Crawl Frequency program determining which sites to crawl, how often, and how many pages to fetch from each site Crawl Rank the frequency a page is crawled compared with the ranking position of that page Crawl Space the totality of possible URLs for a website Crawl Ratio the percentage of my website structure Google is crawling every 30 days Crawl Window the percentage of the compliant (unique & 200) pages on my website Google usually crawls in a 14 day period I’ve mentioned a few terms you might not be familiar with so here’s a list of old friends with a couple of new additions.
  24. 24. Critical Question 1 – What is my ‘Crawl Ratio’?
  25. 25. Crawl Ratio: the percentage of my website structure Google is crawling every 30 days Total Pages in the website structure crawled by Google in 30 days Total Pages in the website structure x100
  26. 26. Organic Growth Opportunities Lifestyle Publisher Business Equipment Retail Real Estate Classified The Venn diagram clearly illustrates the mis-match between the URLs you hope Google is looking at with the accurate picture from your server logs.
  27. 27. Critical Question 2 What percentage of my compliant pages (200 & unique) will Google crawl each month?
  28. 28. % of key pages crawled Total Compliant Pages Crawled By Google in 30 days Total Compliant Pages in the website structure x100 % Potential
  29. 29. Lifestyle Publisher 76.4% Business Equipment Retail 92.2% Real Estate Classified 42% These examples reflect just how varied Google’s crawling of compliant pages can be. Organic Growth Measure
  30. 30. Critical Question 3 – how deep will Google crawl into my website architecture?
  31. 31. What depths will Google plunge? Lifestyle Publisher Business Equipment Retail Real Estate Classified This chart indicates the correlation between the depth of your content and Google’s crawling activity
  32. 32. Lifestyle Publisher Business Equipment Retail Real Estate Classified This chart indicates Google's crawl rate (URL crawled or not by any bot) by Internal Pagerank How can I more effectively use Pagerank to increase visibility?
  33. 33. Critical Question 4 – what does Google consider to be my Top, Middle and Long Tail pages?
  34. 34. This graph details visits frequency from Google search result pages for all URLs analysed by the crawler: how often URLs get organic visits from Google Lifestyle Publisher Business Equipment Retail Real Estate Classified
  35. 35. Then compare Organic Traffic with a measure of how often URLs are crawled by any Google bot. Increase your Middle Tail Lifestyle Publisher Business Equipment Retail Real Estate Classified
  36. 36. Critical Question 5 – what is my ‘Crawl Window’?
  37. 37. Crawl Window: the percentage of my compliant URLs Google usually crawls in a 14 day period* When a change appears on the website, either voluntary or involuntary, understanding your Crawl Window value will help you know precisely how long it will take to identify a positive/negative impact. *This is a simplified calculation of Botify’s Crawl Window metric. Real Estate Classified 25.5% Business Equipment Retail 80.8% Lifestyle Publisher 66.3%
  38. 38. # 5 Critical Questions / KPIs Score 1 What is my ‘Crawl Ratio’? 2 What percentage of my compliant pages (2xx & unique) will Google crawl each month? 3 How deep will Google crawl into my site architecture? 4 What does Google consider to be my Top, Middle and Long Tail pages? 5 What is my ‘Crawl Window’ score? HOW MANY MORE PAGES NOW HAVE THE POTENTIAL TO MAKE US MONEY? (THANKS TO MY EFFORTS OVER THE PAST 30 DAYS?) INDE X You might find this checklist helpful.
  39. 39. THANK YOU! Take a Free Trial via www.botify.com #BrightonSEO | @SearchMATH

×