SlideShare ist ein Scribd-Unternehmen logo
1 von 168
2009
God it’s bad.
-$1.5 Billion
Why hasn’t Google seen the changes on my page?
Has Google noticed my pages are broken?
Why isn’t my page appearing in Google?
Does Google think this page is important?
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
IP Address
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Timestamp
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Request type
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Homepage
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Protocol
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Status Code
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Size of the page (in bytes)
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html))"
User Agent
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
5 things
2 3 4 51
5 ½ things
2 3 4 51
1 Diagnose crawling &
indexation issues
2 3 4 51
Number of
requests
Five folders Googlebot crawled the most
Five folders Googlebot crawled the most
Number of
requests
% of Organic sessions VS % of crawl budget
Sessions Crawl budget
2 Prioritisation
2 3 4 51
example.com/article
Prioritizing
1
Full Print
example.com/article/full
example.com/article/print
Prioritizing
2
example.com/article/pdf
Prioritizing
3
Prioritizing
1
Full Print
3 Spot bugs &
view site health
2 3 4 51
Delayed errors with a limit of 1000
4 How important does Google
see parts of your site?
2 3 4 51
My SEO was as bad as my design
But at least my hair was better
teflsearch.com
teflsearch.com/job-results
teflsearch.com/job-results/country/china
teflsearch.com/jobadvert3455
Average number of times Googlebot crawled a template
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
teflsearch.com/job-results
Average number of times Googlebot crawled a template
35%
5 How fresh does it think your
content is?
2 3 4 51
bit.ly/moz-fresh
Average number of times a page template is crawled by
Googlebot
●Improve our internal linking
●Build trust with last modified date in
sitemap
5 ½ Spotting trends
2 3 4 51
Google desktop vs mobile user agent crawlers
Desktop Mobile
2 3 4 51
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Are all the logs in one place?
Talk to a developer
and ask for
information
Hi x
I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions
about the log set-up (as well as with getting the logs!).
What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re
spending their time, the status code errors they’re finding etc.
There are also some things that are really helpful for us to know when getting logs.
Do the logs have any personal informationin?
We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be
removed.
Do you have any sort of caching which would create separate sets of logs?
If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well
as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example
to cache external images then we don’t need it).
Are there any sub parts of your site which log to a different place?
Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well.
Do you log hostname?
It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very
useful to have that turned on now for any future analysis.
Is there anything else we should know?
Best,
{x}
Email for a developer
So we might have something that looks like this
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
BigQuery
BigQuery
Google’s online database for
data analysis.
1. Ask powerful questions
2. Repeatable
3. Scaleable
4. Combine with crawl data
5. Easy to set-up
6. Easy to learn
What do we want from analysing our logs?
9,000,000 rows of data for 2
months.
400 - 800 queries
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Format the logs so we can import them into
BigQuery
Separate the Googlebot logs from all the
other logs
Screaming Frog Log
Analyser
Code something
Screaming Frog Log Analyser
Code something
bit.ly/logs-code
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Our data in BQ
We make sure we got what we
wanted
THE QUESTION:
What is the total number of requests
Googlebot makes each day to our site?
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Comparing logs to GSC crawl volume
Number of
requests
Run queries
Find something weird
Go look at crawl & website
Our data in BQ
1 Diagnose crawling &
indexation issues
2 Prioritisation
3 Spot bugs &
view site health
4 How important does Google
see parts of your site?
5 How fresh does it think
your content is?
1 Diagnose crawling &
indexation issues
4 How important does Google
see parts of your site?
What are the top 20 URLs crawled by
Google over our logs?
Login is my top crawled page and then search?
What are the top 20 page_path_1 folders
crawled by Google over our logs?
Location folders are taking more than 70% of my budget
Getting data by the day
Page Number of Googlebot Requests
page1 200,000
page2 120,000
Number of Googlebot requests day by day
3 Spot bugs &
view site health
How many of each status code does
Google find per day over our logs?
Number of Googlebot requests day by day
What are most requested 404 URLs by
Googlebot over the past 30 days?
Boy does it want that ad-tech snippet
5 How fresh does it think your
content is?
How many times on average is each page
in a page template crawled a day?
Average number of times a page template is crawled by
Googlebot
How long does it take for a page to be discovered after being published?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
What are the total number of requests across two different time periods?
That’s a lot of questions
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
In Summary
This is the thing you’re probably not doing
bit.ly/logs-resource
@dom_woodman

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
 
SearchLove London 2018 - John Campbell - Voice Search – Calculating and Seizi...
SearchLove London 2018 - John Campbell - Voice Search – Calculating and Seizi...SearchLove London 2018 - John Campbell - Voice Search – Calculating and Seizi...
SearchLove London 2018 - John Campbell - Voice Search – Calculating and Seizi...
 
Google's Hummingbird and the Entity Search Revolution
Google's Hummingbird and the Entity Search RevolutionGoogle's Hummingbird and the Entity Search Revolution
Google's Hummingbird and the Entity Search Revolution
 
CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...
 
Delivering Better Onsite Search Results - Brighton SEO Sep 2018
Delivering Better Onsite Search Results - Brighton SEO Sep 2018Delivering Better Onsite Search Results - Brighton SEO Sep 2018
Delivering Better Onsite Search Results - Brighton SEO Sep 2018
 
SearchLeeds 2017 - Jon Myers - Chief Growth Officer, DeepCrawl - Prepare your...
SearchLeeds 2017 - Jon Myers - Chief Growth Officer, DeepCrawl - Prepare your...SearchLeeds 2017 - Jon Myers - Chief Growth Officer, DeepCrawl - Prepare your...
SearchLeeds 2017 - Jon Myers - Chief Growth Officer, DeepCrawl - Prepare your...
 
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your LogsSearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs
 
SEO split tests you should run - Will Critchlow
SEO split tests you should run - Will CritchlowSEO split tests you should run - Will Critchlow
SEO split tests you should run - Will Critchlow
 
Daft Punk SEO
Daft Punk SEODaft Punk SEO
Daft Punk SEO
 
SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014SEO Success Factors - SMX Advanced 2014
SEO Success Factors - SMX Advanced 2014
 
Split Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of LearningSplit Testing for SEO - 9 Months of Learning
Split Testing for SEO - 9 Months of Learning
 
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach UsSEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
 
The Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingThe Future Of SEO/Content Marketing
The Future Of SEO/Content Marketing
 
The Need for Speed! Accelerated mobile, beyond AMP
The Need for Speed! Accelerated mobile, beyond AMPThe Need for Speed! Accelerated mobile, beyond AMP
The Need for Speed! Accelerated mobile, beyond AMP
 
SLC Mozcation PRO Giveaway - Cyrus Shepard
SLC Mozcation PRO Giveaway - Cyrus ShepardSLC Mozcation PRO Giveaway - Cyrus Shepard
SLC Mozcation PRO Giveaway - Cyrus Shepard
 
The Future Of SEO
The Future Of SEOThe Future Of SEO
The Future Of SEO
 
SearchLove London | 'Jono Alderson', Turbocharging your Wordpress Website'
SearchLove London | 'Jono Alderson', Turbocharging your Wordpress Website'SearchLove London | 'Jono Alderson', Turbocharging your Wordpress Website'
SearchLove London | 'Jono Alderson', Turbocharging your Wordpress Website'
 
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
SearchLove Boston 2018 - Tom Anthony - Hacking Google: what you can learn fro...
 
Google Hummingbird and Semantic Search - An Update
Google Hummingbird and Semantic Search - An UpdateGoogle Hummingbird and Semantic Search - An Update
Google Hummingbird and Semantic Search - An Update
 
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's ComplicatedSearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
SearchLove Boston 2017 | Paul Madden | You, Google and Links: It's Complicated
 

Ähnlich wie SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs

Ähnlich wie SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs (20)

A Guide to Log Analysis with Big Query
A Guide to Log Analysis with Big QueryA Guide to Log Analysis with Big Query
A Guide to Log Analysis with Big Query
 
SEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideSEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech Side
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript
 
Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)
 
Google Tag Manager for Ecommerce
Google Tag Manager for EcommerceGoogle Tag Manager for Ecommerce
Google Tag Manager for Ecommerce
 
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
 
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
 
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
 
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
 
TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom
 
GTM Clowns, fun and hacks - Search Elite - May 2017 Gerry White
GTM Clowns, fun and hacks - Search Elite - May 2017 Gerry WhiteGTM Clowns, fun and hacks - Search Elite - May 2017 Gerry White
GTM Clowns, fun and hacks - Search Elite - May 2017 Gerry White
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
 
Modern JavaScript and SEO
Modern JavaScript and SEOModern JavaScript and SEO
Modern JavaScript and SEO
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
 
Digital Marketing Courses In Pune-SIM
Digital Marketing Courses In Pune-SIMDigital Marketing Courses In Pune-SIM
Digital Marketing Courses In Pune-SIM
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008Advanced Seo Web Development Tech Ed 2008
Advanced Seo Web Development Tech Ed 2008
 

Mehr von Distilled

Mehr von Distilled (20)

SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
 
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
 
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
 
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
 
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
 
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your AudienceSearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
 
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
 
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access PassSearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
 
SearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
SearchLove London 2019 - Heather Physioc - Building a Discoverability PowerhouseSearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
SearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
 
SearchLove London 2019 - Andi Jarvis - The Science of Persuasion
SearchLove London 2019 - Andi Jarvis - The Science of PersuasionSearchLove London 2019 - Andi Jarvis - The Science of Persuasion
SearchLove London 2019 - Andi Jarvis - The Science of Persuasion
 
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
 
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
 
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
 
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-TSearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
 
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
 
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
 
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
 
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
 
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s ToolkitSearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
 
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEOSearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
 

Kürzlich hochgeladen

4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
Cara Menggugurkan Kandungan 087776558899
 
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Abdulsamad Lukman
 

Kürzlich hochgeladen (20)

Optimizing Your Marketing with AI-Powered Prompts
Optimizing Your Marketing with AI-Powered PromptsOptimizing Your Marketing with AI-Powered Prompts
Optimizing Your Marketing with AI-Powered Prompts
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATIONHOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
HOW TO HANDLE SALES OBJECTIONS | SELLING AND NEGOTIATION
 
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night ServiceVIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
 
The 9th May Incident in Pakistan A Turning Point in History.pptx
The 9th May Incident in Pakistan A Turning Point in History.pptxThe 9th May Incident in Pakistan A Turning Point in History.pptx
The 9th May Incident in Pakistan A Turning Point in History.pptx
 
Discover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleDiscover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your Lifestyle
 
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
10 Email Marketing Best Practices to Increase Engagements, CTR, And ROI
 
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency EscortsAligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
Aligarh Hire 💕 8250092165 Young and Hot Call Girls Service Agency Escorts
 
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best PracticesInstant Digital Issuance: An Overview With Critical First Touch Best Practices
Instant Digital Issuance: An Overview With Critical First Touch Best Practices
 
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfMicro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
 
Cartona.pptx. Marketing how to present your project very well , discussed a...
Cartona.pptx.   Marketing how to present your project very well , discussed a...Cartona.pptx.   Marketing how to present your project very well , discussed a...
Cartona.pptx. Marketing how to present your project very well , discussed a...
 
Crypto Quantum Leap - Digital - membership area
Crypto Quantum Leap -  Digital - membership areaCrypto Quantum Leap -  Digital - membership area
Crypto Quantum Leap - Digital - membership area
 
Elevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdfElevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdf
 
Best 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In ChandigarhBest 5 Graphics Designing Course In Chandigarh
Best 5 Graphics Designing Course In Chandigarh
 
The Art of sales from fictional characters.
The Art of sales from fictional characters.The Art of sales from fictional characters.
The Art of sales from fictional characters.
 
Aiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMMAiizennxqc Digital Marketing | SEO & SMM
Aiizennxqc Digital Marketing | SEO & SMM
 
Resumé Karina Perez | Digital Strategist
Resumé Karina Perez | Digital StrategistResumé Karina Perez | Digital Strategist
Resumé Karina Perez | Digital Strategist
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
 
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
 
Alpha Media March 2024 Buyers Guide.pptx
Alpha Media March 2024 Buyers Guide.pptxAlpha Media March 2024 Buyers Guide.pptx
Alpha Media March 2024 Buyers Guide.pptx
 

SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs

  • 2.
  • 3.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18. Why hasn’t Google seen the changes on my page?
  • 19. Has Google noticed my pages are broken?
  • 20. Why isn’t my page appearing in Google?
  • 21. Does Google think this page is important?
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 29.
  • 30.
  • 31. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" IP Address
  • 32. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Timestamp
  • 33. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Request type
  • 34. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Homepage
  • 35. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Protocol
  • 36. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Status Code
  • 37. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Size of the page (in bytes)
  • 38. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html))" User Agent
  • 39. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 40. 5 things 2 3 4 51
  • 41. 5 ½ things 2 3 4 51
  • 42. 1 Diagnose crawling & indexation issues 2 3 4 51
  • 43.
  • 44.
  • 45. Number of requests Five folders Googlebot crawled the most
  • 46. Five folders Googlebot crawled the most Number of requests
  • 47. % of Organic sessions VS % of crawl budget Sessions Crawl budget
  • 49.
  • 58. 3 Spot bugs & view site health 2 3 4 51
  • 59.
  • 60.
  • 61. Delayed errors with a limit of 1000
  • 62.
  • 63. 4 How important does Google see parts of your site? 2 3 4 51
  • 64. My SEO was as bad as my design
  • 65. But at least my hair was better
  • 70. Average number of times Googlebot crawled a template
  • 71. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 72. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 74. Average number of times Googlebot crawled a template 35%
  • 75. 5 How fresh does it think your content is? 2 3 4 51
  • 77. Average number of times a page template is crawled by Googlebot
  • 78. ●Improve our internal linking ●Build trust with last modified date in sitemap
  • 79. 5 ½ Spotting trends 2 3 4 51
  • 80. Google desktop vs mobile user agent crawlers Desktop Mobile
  • 81. 2 3 4 51
  • 82. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 83. Are all the logs in one place?
  • 84. Talk to a developer and ask for information
  • 85. Hi x I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!). What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc. There are also some things that are really helpful for us to know when getting logs. Do the logs have any personal informationin? We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed. Do you have any sort of caching which would create separate sets of logs? If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache external images then we don’t need it). Are there any sub parts of your site which log to a different place? Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well. Do you log hostname? It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful to have that turned on now for any future analysis. Is there anything else we should know? Best, {x} Email for a developer
  • 86. So we might have something that looks like this
  • 87. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 88.
  • 89.
  • 90.
  • 92.
  • 94. Google’s online database for data analysis.
  • 95. 1. Ask powerful questions 2. Repeatable 3. Scaleable 4. Combine with crawl data 5. Easy to set-up 6. Easy to learn What do we want from analysing our logs?
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101. 9,000,000 rows of data for 2 months. 400 - 800 queries
  • 102.
  • 103. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 104. Format the logs so we can import them into BigQuery Separate the Googlebot logs from all the other logs
  • 106. Screaming Frog Log Analyser
  • 107.
  • 110. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 111. Our data in BQ
  • 112. We make sure we got what we wanted
  • 113. THE QUESTION: What is the total number of requests Googlebot makes each day to our site?
  • 114. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 115. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 116. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 117. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 118. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 119. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 120. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis]
  • 121. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis] GROUP BY date
  • 122. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 123. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 124. Comparing logs to GSC crawl volume Number of requests
  • 125. Run queries Find something weird Go look at crawl & website
  • 126. Our data in BQ
  • 127. 1 Diagnose crawling & indexation issues
  • 129. 3 Spot bugs & view site health
  • 130. 4 How important does Google see parts of your site?
  • 131. 5 How fresh does it think your content is?
  • 132. 1 Diagnose crawling & indexation issues 4 How important does Google see parts of your site?
  • 133. What are the top 20 URLs crawled by Google over our logs?
  • 134. Login is my top crawled page and then search?
  • 135. What are the top 20 page_path_1 folders crawled by Google over our logs?
  • 136. Location folders are taking more than 70% of my budget
  • 137. Getting data by the day Page Number of Googlebot Requests page1 200,000 page2 120,000
  • 138. Number of Googlebot requests day by day
  • 139. 3 Spot bugs & view site health
  • 140. How many of each status code does Google find per day over our logs?
  • 141. Number of Googlebot requests day by day
  • 142. What are most requested 404 URLs by Googlebot over the past 30 days?
  • 143. Boy does it want that ad-tech snippet
  • 144. 5 How fresh does it think your content is?
  • 145. How many times on average is each page in a page template crawled a day?
  • 146. Average number of times a page template is crawled by Googlebot
  • 147. How long does it take for a page to be discovered after being published?
  • 148. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs?
  • 149. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl?
  • 150. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled?
  • 151. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website?
  • 152. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day?
  • 153. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes?
  • 154. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters?
  • 155. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset?
  • 156. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days?
  • 157. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days? What are the total number of requests across two different time periods?
  • 158. That’s a lot of questions
  • 164. This is the thing you’re probably not doing
  • 165.
  • 166.
  • 167.

Hinweis der Redaktion

  1. Walmart listened but it didnt’ go and look at what it’s customers were doing
  2. https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-notes-september-9th-2016/
  3. Jono always talks about this
  4. The Good You can customize for more complicated logging formats You can use reverse DNS lookup and ASN lookup You can work with log datasets that are too large to download to your computer
  5. Start as an actual story Can i have the house salad please Greek or lentils Olives or no olives Green or black Stone or no stones Vinegrette? Balsamic or Ceaser Balsamic Do you want rocket? I would like a salad
  6. This is the summation of years worth of work - i can’t fit it into a 40 min presentation so i put resources here. Dw if you get lost it’s all here