Using server logs to your advantage

•Als PPTX, PDF herunterladen•

1 gefällt mir•160 views

My presentation from Optimise Oxford in November 2016. In it I discuss why you should be making use of server logs, and how to go about utilising them.

Daten & Analysen

Alex Johnson
alex (at) white.net
@alex_cestrian

@alex_cestrian #OptimiseOxford#OptimiseOxford
What are server logs?

@alex_cestrian #OptimiseOxford#OptimiseOxford
A server log is a simple text file which
records activity on a server.

@alex_cestrian #OptimiseOxford#OptimiseOxford

@alex_cestrian #OptimiseOxford#OptimiseOxford
So why bother looking at server
logs?

@alex_cestrian #OptimiseOxford#OptimiseOxford
There is only one resource that tells you what
search engines are looking for on a domain…
These are web server logs.
including stuff they found 13 years ago.

@alex_cestrian #OptimiseOxford#OptimiseOxford
How do we analyse all that data?

@alex_cestrian #OptimiseOxford#OptimiseOxford
2 SCENARIOS

@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 1
IDENTIFY ORPHAN PAGES

@alex_cestrian #OptimiseOxford#OptimiseOxford
An orphan is a page that is not linked to by another page on the site.
Homepage
Dresses Skirts Our offers
Summer 2016
offers

@alex_cestrian #OptimiseOxford#OptimiseOxford
Summer 2016 Offers

@alex_cestrian #OptimiseOxford
Why are orphan pages bad?
• There may be a lot of them, and they may be
competing with your ‘live’ content
• They waste GoogleBot’s crawl budget for your
domain

@alex_cestrian #OptimiseOxford#OptimiseOxford
So how do we find orphan pages using
log files?

@alex_cestrian #OptimiseOxford
Upload a crawl of your website (from SF, DeepCrawl etc)
URLs that return a 200 ✅ status code… that don’t appear in the crawl of
your site

@alex_cestrian #OptimiseOxford
Redundant content,
off little value
404/410 status code
Relevant, valuable but
out-of-date
301 redirect to
relevant live page
Useful content that
orphaned accidentally
Re-attach the page to
the website

@alex_cestrian #OptimiseOxford
If GoogleBot is wasting lots of time in a specific folder full of orphan
pages that hold no value, block it via robots.txt

@alex_cestrian #OptimiseOxford#OptimiseOxford
Scenario 2
IMPROVING CRAWL EFFICIENCY

@alex_cestrian #OptimiseOxford#OptimiseOxford
Find where GoogleBot is wasting time
Find parameter driven pages

@alex_cestrian #OptimiseOxford
Block GoogleBot from crawling these URLs

@alex_cestrian #OptimiseOxford#OptimiseOxford
Find infrequently visited pages Order by number of
events: low to high

@alex_cestrian #OptimiseOxford#OptimiseOxford
• Is this URL in the xml sitemap?
• Is the page too deep within the architecture?
• Is internal linking to this page optimal?
• Are links to this page travelling through multiple redirects?
• Can GoogleBot actually parse the links pointing to this page?

@alex_cestrian #OptimiseOxford#OptimiseOxford
Look at all urls, and
filter by average
response time
Find slow loading pages

@alex_cestrian #OptimiseOxford#OptimiseOxford
If time taken is
consistently high, you
need to look at how
you can reduce the
load of the page

@alex_cestrian #OptimiseOxford#OptimiseOxford
“See what GoogleBot is actually
consuming. Improve GoogleBot’s
diet.”
Oliver Mason at Brighton SEO 2016

THANK
@alex_cestrian
ALEX JOHNSON
THANK
ALEX

Empfohlen

Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet

Server Logs: After Excel FailsOliver Mason

LatJUG. Google App Enginedenis Udod

Investigating server logsAnimesh Shaw

Elasticsearch Distributed search & analytics on BigData made easyItamar

A Survey of Elasticsearch UsageGreg Brown

MongoDB and Hadoop: Driving Business InsightsMongoDB

Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters

Empfohlen

Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet

Server Logs: After Excel FailsOliver Mason

LatJUG. Google App Enginedenis Udod

Investigating server logsAnimesh Shaw

Elasticsearch Distributed search & analytics on BigData made easyItamar

A Survey of Elasticsearch UsageGreg Brown

MongoDB and Hadoop: Driving Business InsightsMongoDB

Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters

Realtimestream and realtime fastcatsearch상욱 송

MongoDB and SparkNorberto Leite

Introduction to elasticsearchFlorian Hopf

Beautiful REST+JSON APIs with IonStormpath

AngularjsneoTech Software development company

Introducing URL ShortenersSt. Petersburg College

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...javier ramirez

Elasticsearch 5.0Matias Cascallares

MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB

How To Connect Spark To Your Own DatasourceMongoDB

Big data at scrapinghubDana Brophy

Google history nd architectureDivyangee Jain

SemaGrow demonstrator: “Web Crawler + AgroTagger”AIMS (Agricultural Information Management Standards)

LAWDI - Rogue Linked DataRyan Baumann

Gitminer 2.0 - Advance Search on GithubNullbyte Security Conference

Effective Searching by Dominik KornasAEM HUB

Data Science Stack with MongoDB and RStudioWinston Chen

Watch Your Log!Co-graph Inc.

Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...javier ramirez

Elastic search overviewABC Talks

Alexis + Max - We Love SEO 19 - Bot XAlexis Sanders

Max Prin - TechSEO Boost 2017 - SEO Best Practices For JavaScript-Based WebsitesMax Prin

Weitere ähnliche Inhalte

Was ist angesagt?

Realtimestream and realtime fastcatsearch상욱 송

MongoDB and SparkNorberto Leite

Introduction to elasticsearchFlorian Hopf

Beautiful REST+JSON APIs with IonStormpath

AngularjsneoTech Software development company

Introducing URL ShortenersSt. Petersburg College

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...javier ramirez

Elasticsearch 5.0Matias Cascallares

MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB

How To Connect Spark To Your Own DatasourceMongoDB

Big data at scrapinghubDana Brophy

Google history nd architectureDivyangee Jain

SemaGrow demonstrator: “Web Crawler + AgroTagger”AIMS (Agricultural Information Management Standards)

LAWDI - Rogue Linked DataRyan Baumann

Gitminer 2.0 - Advance Search on GithubNullbyte Security Conference

Effective Searching by Dominik KornasAEM HUB

Data Science Stack with MongoDB and RStudioWinston Chen

Watch Your Log!Co-graph Inc.

Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...javier ramirez

Elastic search overviewABC Talks

Was ist angesagt? (20)

Realtimestream and realtime fastcatsearch

MongoDB and Spark

Introduction to elasticsearch

Beautiful REST+JSON APIs with Ion

Angularjs

Introducing URL Shorteners

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teo...

Elasticsearch 5.0

MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB

How To Connect Spark To Your Own Datasource

Big data at scrapinghub

Google history nd architecture

SemaGrow demonstrator: “Web Crawler + AgroTagger”

LAWDI - Rogue Linked Data

Gitminer 2.0 - Advance Search on Github

Effective Searching by Dominik Kornas

Data Science Stack with MongoDB and RStudio

Watch Your Log!

Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...

Elastic search overview

Ähnlich wie Using server logs to your advantage

Alexis + Max - We Love SEO 19 - Bot XAlexis Sanders

Max Prin - TechSEO Boost 2017 - SEO Best Practices For JavaScript-Based WebsitesMax Prin

TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesCatalyst

How Search WorksAhrefs

Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Bastian Grimm

Advanced data-driven technical SEO - SMX London 2019Bastian Grimm

Demand Quest SEO training session 2Nate Plaunt

Hey Googlebot, did you cache that ?Petra Kis-Herczegh

Demand Quest SEO Training - Session 2Nate Plaunt

The relationship between rankings and technical SEOOmi Sido

The Correlation Between Technical SEO and RankingsClick Consult (Part of Ceuta Group)

Max Prin - SMX West 2017 - What to do when Google can't understand your JavaS...Max Prin

Conexão Kinghost - Otimização PrematuraFabio Akita

How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...Jamie Indigo

On-Page SEO Institute of Digital Marketing

TDC2016SP - Otimização Prematura: a Raíz de Todo o Maltdc-globalcode

Technical SEO Checklist For Developers.pdfBluebash LLC

Javascript SEO - Leicester Digital May 2018Kieran Headley

SEO Audit Tools, Tips and Tricks - SMX West 2016Benj Arriola

BrightonSEO 2017 - SEO quick wins from a technical checkChloe Bodard

Ähnlich wie Using server logs to your advantage (20)

Alexis + Max - We Love SEO 19 - Bot X

Max Prin - TechSEO Boost 2017 - SEO Best Practices For JavaScript-Based Websites

TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites

How Search Works

Technical SEO: Crawl Space Management - SEOZone Istanbul 2014

Advanced data-driven technical SEO - SMX London 2019

Demand Quest SEO training session 2

Hey Googlebot, did you cache that ?

Demand Quest SEO Training - Session 2

The relationship between rankings and technical SEO

The Correlation Between Technical SEO and Rankings

Max Prin - SMX West 2017 - What to do when Google can't understand your JavaS...

Conexão Kinghost - Otimização Prematura

How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...

On-Page SEO

TDC2016SP - Otimização Prematura: a Raíz de Todo o Mal

Technical SEO Checklist For Developers.pdf

Javascript SEO - Leicester Digital May 2018

SEO Audit Tools, Tips and Tricks - SMX West 2016

BrightonSEO 2017 - SEO quick wins from a technical check

Kürzlich hochgeladen

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Easter Eggs From Star Wars and in cars 1 and 217djon017

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

detection and classification of knee osteoarthritis.pptxAleenaJamil4

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann

Kürzlich hochgeladen (20)

Student profile product demonstration on grades, ability, well-being and mind...

Top 5 Best Data Analytics Courses In Queens

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT

Generative AI for Social Good at Open Data Science East 2024

Real-Time AI Streaming - AI Max Princeton

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Easter Eggs From Star Wars and in cars 1 and 2

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...

detection and classification of knee osteoarthritis.pptx

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

Using server logs to your advantage

1. Alex Johnson alex (at) white.net @alex_cestrian

2. USING SERVER LOGS TO YOUR ADVANTAGE

3. @alex_cestrian #OptimiseOxford#OptimiseOxford What are server logs?

4. @alex_cestrian #OptimiseOxford#OptimiseOxford A server log is a simple text file which records activity on a server.

5. @alex_cestrian #OptimiseOxford#OptimiseOxford

6. @alex_cestrian #OptimiseOxford#OptimiseOxford So why bother looking at server logs?

7. @alex_cestrian #OptimiseOxford#OptimiseOxford There is only one resource that tells you what search engines are looking for on a domain… These are web server logs. including stuff they found 13 years ago.

8. @alex_cestrian #OptimiseOxford#OptimiseOxford How do we analyse all that data?

9. @alex_cestrian #OptimiseOxford#OptimiseOxford

10. @alex_cestrian #OptimiseOxford#OptimiseOxford

11. @alex_cestrian #OptimiseOxford#OptimiseOxford

12. @alex_cestrian #OptimiseOxford#OptimiseOxford

13. @alex_cestrian #OptimiseOxford#OptimiseOxford 2 SCENARIOS

14. @alex_cestrian #OptimiseOxford#OptimiseOxford Scenario 1 IDENTIFY ORPHAN PAGES

15. @alex_cestrian #OptimiseOxford#OptimiseOxford An orphan is a page that is not linked to by another page on the site. Homepage Dresses Skirts Our offers Summer 2016 offers

16. @alex_cestrian #OptimiseOxford#OptimiseOxford Summer 2016 Offers

17. @alex_cestrian #OptimiseOxford Why are orphan pages bad? • There may be a lot of them, and they may be competing with your ‘live’ content • They waste GoogleBot’s crawl budget for your domain

18. @alex_cestrian #OptimiseOxford#OptimiseOxford So how do we find orphan pages using log files?

19. @alex_cestrian #OptimiseOxford Upload a crawl of your website (from SF, DeepCrawl etc) URLs that return a 200 ✅ status code… that don’t appear in the crawl of your site

20. @alex_cestrian #OptimiseOxford Redundant content, off little value 404/410 status code Relevant, valuable but out-of-date 301 redirect to relevant live page Useful content that orphaned accidentally Re-attach the page to the website

21. @alex_cestrian #OptimiseOxford If GoogleBot is wasting lots of time in a specific folder full of orphan pages that hold no value, block it via robots.txt

22. @alex_cestrian #OptimiseOxford#OptimiseOxford Scenario 2 IMPROVING CRAWL EFFICIENCY

23. @alex_cestrian #OptimiseOxford#OptimiseOxford Find where GoogleBot is wasting time Find parameter driven pages

24. @alex_cestrian #OptimiseOxford#OptimiseOxford

25. @alex_cestrian #OptimiseOxford Block GoogleBot from crawling these URLs

26. @alex_cestrian #OptimiseOxford#OptimiseOxford Find infrequently visited pages Order by number of events: low to high

27. @alex_cestrian #OptimiseOxford#OptimiseOxford • Is this URL in the xml sitemap? • Is the page too deep within the architecture? • Is internal linking to this page optimal? • Are links to this page travelling through multiple redirects? • Can GoogleBot actually parse the links pointing to this page?

28. @alex_cestrian #OptimiseOxford#OptimiseOxford Look at all urls, and filter by average response time Find slow loading pages

29. @alex_cestrian #OptimiseOxford#OptimiseOxford If time taken is consistently high, you need to look at how you can reduce the load of the page

30. @alex_cestrian #OptimiseOxford#OptimiseOxford “See what GoogleBot is actually consuming. Improve GoogleBot’s diet.” Oliver Mason at Brighton SEO 2016

31. THANK @alex_cestrian ALEX JOHNSON THANK ALEX

Hinweis der Redaktion

I’m going to talk you through 3 scenarios where logs files can help you.
I’m going to talk you through 3 scenarios where logs files can help you.
This is a raw server log file. Boring isn’t it? So what do you do with this?
Well there are a few options including tools like Botify and OnCrawl, but one of the most usable, affordable (and idiot-friendly ones) that has come onto the market in the past few years is Log Analyzer from Screaming Frog.
It’s really easy to use, you can drag and drop your raw log files (or a zip file) directly into the program, and it sorts them out into manageable sets of data.
By default the Log File Analyser only analyses search engine bot events, so the ‘Store Bot Events Only (Improves Performance)’ box is ticked. We recommend keeping this setting ticked, as it massively reduces time required to only have to store and compile search bots, rather than all event data from users and other User Agents.
And you end up with a pretty dashboard like this. Doing that alone isn’t going to solve anything, so I’m not going to show you….
3 actionable scenarios where logs files can help you do your job….
Let’s start with what is an orphan page?
Some websites stop linking old content that is expired and do not deliver the right status code (like a 404 or a redirect to a newer version). The expired page is thus still available.
What do you do with orphan pages when you identify them?
What do you do with orphan pages when you identify them?
Look for large quantities of parameter driven pages, and combinations of parameters. These will often be areas where GoogleBot is losing time and wasting resource.
One common example of this is on Wordpress blogs. You’ll often find things like this in your log files/
If you see category pages or main service pages at the top of this list – further investigation is much needed.
Investigate why these pages haven’t been visited by search engines;
Review each bot event for these URLs.
Oliver Mason put this eloquently in his recent talk at the Brighton SEO conference:
That’s just an overview of a few things you can do with log files. Once you start playing around and analysing the data, it’s really rather interesting.