Using Screaming Frog to crawl a website
Using R for SEO Analysis
Using PaasLogs to centralize logs
Using Kibana to build fancy dashboards
Tutorial : www.data-seo.com
1. Analyse your SEO Data with R and Kibana
June 10th, 2016
Vincent Terrasi
2. Vincent Terrasi
--
SEO Director - Groupe M6Web
CuisineAZ, PasseportSanté, MeteoCity, …
--
Join the OVH adventure in July 2016
Blog : data-seo.com
3. Agenda
Mission : Do a Real-Time Log Analysis Tool
1. Using Screaming Frog to crawl a website
2. Using R for SEO Analysis
3. Using PaasLogs to centralize logs
4. Using Kibana to build fancy dashboards
5. Test !
3
“The world is full of obvious things which nobody by any chance ever observes.”
Sherlock Holmes Quote
9. Why R ?
Scriptable
Big Community
Mac / PC / Unix
Open Source
7500 packages
9
Documentation
10. WheRe ? How ?
https://www.cran.r-project.org/
10
Rgui RStudio
11. Using R : Step 1
Export All Urls
11
"request“;"section“;"active“;
"speed“;"compliant“;"depth“;"inlinks"
Packages :
Stringr
Ggplot
Dplyr
Readxl
12. R Examples
Crawl via Screaming Frog
Classify URLs by :
Section
Load Time
Number of Inlinks
Detect Active Pages
Min 1 visit per month
Detect Compliant Pages
Canonical Not Equal
Meta No-index
Bad HTTP Status Code
Detect Duplicate Meta
12
21. R : ggplot2 command 21
DATA
Create the ggplot object and populate it with data (always a data frame)
ggplot( mydata, aes( x=section,y=count, fill=active ))
LAYERS
Add layer(s)
+ geom_point()
FACET
Used for conditionning on variable(s)
+ facet_grid(~rescode)
26. R : SEO Cheat Sheet 26
Package Dplyr
select() allows you to rapidly zoom in on a useful subset using operations that usually only work on numeric variable positions
mutate() a data frame by adding new or replacing existing columns
filter() allows you to select a subset of rows in a data frame.
Package Gplot2
aes - geom
ggsave()
Package Readxl
read_excel()
read.csv2()
write.csv2()
32. PaasLogs 32
164 noeuds au sein du cluster Elastic Search
180 machines connectées
Entre 100 000 et 300 000 logs traités par seconde
12 milliards de logs transitent tous les jours
211 milliards de documents enregistrés
8 clicks and 3 copy/paste to use it !
36. PaasLogs : Streams 36
The Streams are the recipient of your logs. When you send a log with the
right stream token, it arrives automatically to your stream in a awesome
software named Graylog.
37. PaasLogs : Dashboards 37
The Dashboard is the global view of your logs, A Dashboard is an efficient
way to exploit your logs and to view global information like metrics and
trends about your data without being overwhelmed by the logs details.
38. PaasLogs : Aliases 38
The Aliases will allow you to access directly your data from your Kibana or
using an Elasticsearch query
DON’T FORGET TO ENABLE KIBANA INDICES AND WRITE YOUR USER PASSWORD
39. PaasLogs : Inputs 39
The Inputs will allow you to ask OVH to host your own dedicated collector
like Logstash or Flowgger.
51. Kibana : Install 51
Download Kibana 4.1
• Download and unzip Kibana 4
• Extract your archive
• Open config/kibana.yml in an editor
• Set the elasticsearch.url to point at your Elasticsearch instance
• Run ./bin/kibana (or binkibana.bat on Windows)
• Point your browser athttp://yourhost.com:5601
59. Test yourself 59
Use Screaming Frog Spider Tool
www.screamingfrog.co.uk
Teach R
www.datacamp.com
www.data-seo.com
www.moise-le-geek.fr/push-your-hands-in-the-r-introduction/
Test PassLogs
www.runabove.com
Install Kibana
www.elastic.co/downloads/kibana
60. TODO List 60
- Create a GitHub Repository with all source code
- Add Plugin Logstash to do a reverse DNS lookup
- Schedule A Crawl By Command Line
- Upload Screaming Frog File to web server
61. Thank you
Keep in touch
June 10th, 2016
@vincentterrasi Vincent Terrasi