My presentation from Optimise Oxford in November 2016.
In it I discuss why you should be making use of server logs, and how to go about utilising them.
17. @alex_cestrian #OptimiseOxford
Why are orphan pages bad?
• There may be a lot of them, and they may be
competing with your ‘live’ content
• They waste GoogleBot’s crawl budget for your
domain
19. @alex_cestrian #OptimiseOxford
Upload a crawl of your website (from SF, DeepCrawl etc)
URLs that return a 200 ✅ status code… that don’t appear in the crawl of
your site
20. @alex_cestrian #OptimiseOxford
Redundant content,
off little value
404/410 status code
Relevant, valuable but
out-of-date
301 redirect to
relevant live page
Useful content that
orphaned accidentally
Re-attach the page to
the website
27. @alex_cestrian #OptimiseOxford#OptimiseOxford
• Is this URL in the xml sitemap?
• Is the page too deep within the architecture?
• Is internal linking to this page optimal?
• Are links to this page travelling through multiple redirects?
• Can GoogleBot actually parse the links pointing to this page?
I’m going to talk you through 3 scenarios where logs files can help you.
I’m going to talk you through 3 scenarios where logs files can help you.
This is a raw server log file. Boring isn’t it? So what do you do with this?
Well there are a few options including tools like Botify and OnCrawl, but one of the most usable, affordable (and idiot-friendly ones) that has come onto the market in the past few years is Log Analyzer from Screaming Frog.
It’s really easy to use, you can drag and drop your raw log files (or a zip file) directly into the program, and it sorts them out into manageable sets of data.
By default the Log File Analyser only analyses search engine bot events, so the ‘Store Bot Events Only (Improves Performance)’ box is ticked. We recommend keeping this setting ticked, as it massively reduces time required to only have to store and compile search bots, rather than all event data from users and other User Agents.
And you end up with a pretty dashboard like this. Doing that alone isn’t going to solve anything, so I’m not going to show you….
3 actionable scenarios where logs files can help you do your job….
Let’s start with what is an orphan page?
Some websites stop linking old content that is expired and do not deliver the right status code (like a 404 or a redirect to a newer version). The expired page is thus still available.
What do you do with orphan pages when you identify them?
What do you do with orphan pages when you identify them?
Look for large quantities of parameter driven pages, and combinations of parameters. These will often be areas where GoogleBot is losing time and wasting resource.
One common example of this is on Wordpress blogs. You’ll often find things like this in your log files/
If you see category pages or main service pages at the top of this list – further investigation is much needed.
Investigate why these pages haven’t been visited by search engines;
Review each bot event for these URLs.
Oliver Mason put this eloquently in his recent talk at the Brighton SEO conference:
That’s just an overview of a few things you can do with log files. Once you start playing around and analysing the data, it’s really rather interesting.