In this session, Chris and Danny their experience of recovering from an unsuccessful SEO relaunch by using data tactics. The sessions covers three SEO issues structural change, page bloat and Javascript issues and how data tactics can help.
This Github repo contains all the data pipeline code shown at SMX Advanced EU 2023. It contains implementation details for four data products, as shown in the session:
- a 404 live alert on GA4 data, executed via a cloud function
- a sitemap monitoring script, based on advertools
- a custom SEO crawler, based on advertools
- a webvitals monitoring script
https://github.com/ChrisGutknecht/smx_advanced_seo_data/
6. This Was Our Two Year Journey… Step by Step
Stopped work on
old platform
Relaunch Fixing our structure with data
7. The Start: What Drove the Relaunch Decision?
Blessed from
COVID growth
Need for an integrated
commerce platform
Less flexibility than
initially expected
Delivery deadline
pushed SEO
requirements down
8. Result: We Had to Change 97% of Our URL Structure
Danny: I quit!
9. Pre-Relaunch: So Much Crawling & Redirecting
● ~20M crawl requests in 6 weeks
● ~70k URLs in 4 languages
● added +50k redirect rules
10. Post Relaunch: 404 Levels Per Day Grew 10-Fold (!)
… We kept adding more redirects
11. First Hope: We Won Back Danny as Tech SEO
Org changes made:
● SEO moved to product
(Ecom)
● SEO team to be built
● SEO QA as required
product workflow step
13. Our New Site Had Three Core SEO Issues
1. Large-scale
structural change
3. JS-heavy application
(SPA)
2. Page Bloat
1. Large-scale
structural change
14. Our Custom URL Structure was NOT Compatible
System Capabilities
Our URL Requirements
17. Redirects Aside - How Can We Catch 404s asap?
Dashboards are
beautiful, but they’re
soft like butterflies….
18. To Fix 404s asap, You Need Alerts & a Dashboard
Alerts Dashboards
19. Reaches multiple people at once
This is what our 404 Live Alert looks like
Contains link to dashboard
Above committed threshold (e.g. 85 > 70)
20. Our GCP Data Pipeline for 404 Live Alerts
Cloud Function
(+GA4 Streaming)
Dashboard
Teams Alert
GitHub: https://github.com/ChrisGutknecht/smx_advanced_seo_data
21. You Should Monitor Your Googlebot Logfiles
Save crawl budget
Reduce server requests
22. GSC Crawl Stats are OK, but Not Enough…
not log-level
not searchable
not groupable
24. Logfile Case #1: /ga/screeninformation/
blocked by
robots.txt
WTF?
25. Logfile Case #2: JS Functions that look like Links
WTF?
blocked by
robots.txt
blocked by
robots.txt
26. Our GCP Data Pipeline for Logfile Exports
Cloud
Storage
Hourly csv export
(Stack Management)
Daily Table
Update &
Tests
Dashboard
Teams Alert
GitHub: https://github.com/ChrisGutknecht/smx_advanced_seo_data
27. Let’s Move on to Issue #2: Page Bloat
1. Large-scale
structural change
3. JS-heavy application
(SPA)
2. Page Bloat
28. How Our Daily Category URL Count Fluctuated
Relaunch:
Important URLs not in sitemap,
but also duplicates
Regular jumps in Sitemap URL count
29. The Root Cause for Page Bloat? Team Structure
Marketing E-Commerce
Product Owner
Tech SEO
Content SEO
Content
SEO was spread across teams, but missing critical product influence
Product / IT
Acquisition
30. In Our New Structure, SEO is Part of Product
Marketing E-Commerce/Product
Product Owner
SEO
Reference: https://www.kevin-indig.com/forging-a-fine-tuned-seo-machine/
Content
31. Our Decision Rules to Reduce Our Categories
1. Exclude
High traffic
Duplicates
Low/no traffic
2. Keep or Add
High demand
(internal/external searches)
32. We Combined Three Data Sources For This
Inventory Data
Combined
Data Model Dashboard
SKU … Categories_All
100-01 … Cat1, Cat2, Cat3
101-01 … Cat1, Cat5, Cat6
…. … …
GSC Data
GA4 Data
33. We Combined all Data Sources for Decisions
Overall removed:
~ 500 duplicates
~ 1500 low/no traffic categories
Removed
34. How Can We Monitor Category URL Changes?
What happened
here & here?
36. Our GCP Data Pipeline for a Sitemap Monitor
Cloud Function
Table Update
& Tests
Dashboard
Teams Alert
GitHub: https://github.com/ChrisGutknecht/smx_advanced_seo_data
51. Thanks for Listening!
Looking Forward To Questions.
Danny Zidaric | Lead SEO
Christopher Gutknecht| Teamlead Analytics
52. Annex: Why Use a Custom Crawler?
Screaming Frog SaaS Tool
Adhoc crawls
Low-cost
Direct benefit
Cost for larger sites
Diminishing returns
Not for the cloud
🕷️☁️
Custom Crawler
Cloud-Native
Low-cost
Customization
53. Annex: How Did our Blog visibility change?
Relaunch
Wordpress Blog with
very few changes