Data Scraping with Excel - Campixx 2013 - Maik Schmidt

Data Scraping with Excel – by Maik Schmidt
17.03.2013 – Berlin - SEO Campixx

Wer ich bin

• Maik Schmidt
• SEO Consultant bei Catbird Seat (2010)
• SEO-Contest „KubaSEOTräume“ Gewinner

@chillboyy

facebook.com/chillboy.de

xing.com/profile/Maik_Schmidt11

Was scrapen wir heute?

• Standard KPIs
• Malware Checker
• Index Checker
• Google SERPs
• Google Suggest

Warum Excel?

• Weil ich nicht programmieren kann

Nachteile:
• Langsam
?
• Begrenzte Datenmengen

Was benötige ich?

• Excel
• Niels Bosma SEO Tools for Excel

http://nielsbosma.se/projects/seotools/

Niels Bosma SEO Tools (1/4)

Onpage Content
• LinkCount • FindDuplicatedContent
• HtmlTitle • CountWords
• HtmlMetaDescription • LCS
• HtmlMetaKeywords • SpinText
• HtmlMeta Backlinks
• HtmlFirst • CheckBacklink
• HtmlH1 • GooglePageRank
• HtmlH2 • GoogleResultCount
• HtmlH3 • GoogleIndexCount
• HtmlCanonical • GoogleLinkCount
• W3CValidate • AlexaReach
• PageCodeToTextRatio • AlexaPopularity
• PageSize • AlexaLinkCount
• PageTextSize • DmozEntries
• PageCodeSize • WikipediaLinks
• HttpStatus
• HttpHeader
Social
• FacebookLikes u.v.m.
• ResponseTime • GooglePlusCount
• PageEncoding • TwitterCount
• IsFoundOnPage


SEOlytics
• Backlinks
• SVR (Sichtbarkeit)
• Keyword Rankings
• Domain Metriken
• LinkCount/URL
• Link History


MajesticSEO
• Größte Backlink DB
• Fresh Index
• Historischer Index
• Trust/Citation Flow


Google Analytics
• Ähnlich:
http://ga-dev-tools.appspot.com/explorer/
• =GoogleAnalytics(
string id,
string metrics,
string startDate,
string endDate,
[string dimensions,
string segment,
string filter,
string sort,
integer startIndex,
integer maxResults,
bool excludeHeaderInResult,
bool excludeDimensionsInResult]) :
{string}

X-Path Basics

Mit X-Path kann man bestimmte Teile innerhalb eines XML-Dokumentes adressieren

Beispiele: Um Sichtbarkeitsindex.de zu scrapen

Document root node: /html/body/div/div/div/h3[position()=1
/
Direct child element: Holt sich in diesem Pfad den Inhalt des ersten H3 Tags
XML_element_name
Direct child of the root node:
/XML_element_name
Child of a child: Um Google SERPs zu scrapen
XML_element_name/XML_element_name
Descendant of the root: //h3[@class='x']/a);"href"
//XML_element_name Holt sich alle Links innerhalb H3 Tags mit der Class „X“
Descendant of a node:
XML_element_name//XML_element_name
Parent of a node:
../
A far cousin of a node
../../XML_element_name/XML_element_name

X-Path easy rausfinden

Mit dem Firefox Plugin Firebug (und FirePath) lässt sich der X-Path ziemlich
schnell und leicht finden:

Standard KPIs

QUELLEN:
Free SI: Sichtbarkeitsindex.de/deinedomain.de
SI API: http://api.sistrix.net/domain.sichtbarkeitsindex?api_key=xy&domain=deinewebseite.de
Alexa Rank: http://www.alexa.com/siteinfo/deinedomain.de
=XPathOnUrl([Alexa
=XPathOnUrl[URL];"/html/body/ URL];"//table[@id='siteStats']/tbody/
div/div/div/h3[position()=1]") tr[1]/td[2]/div")

=XPathOnUrl([SI API
URL];"response/answer/sichtbarkeitsindex";"value")

Google Save Browsing API

Quelle:
http://safebrowsing.clients.google.com/safebrowsing/diagnostic?site=domain.de

=UrlProperty([URL];"domain")

=XPathOnUrl([Google SafeBrowsing URL];
"/html/body/center/div/div/blockquote/p[position()=1]")

Index Checker

Quelle:
http://www.google.de/search?gcx=c&sourceid=chrome&ie=UTF-8&pws=0&
q=info:deinewebseite.de
=WENN(HtmlCanonical(A2)=A2;"self
=HttpStatus([USER URL]) canonical";HtmlCanonical(A2))

=WENN(ISTFEHLER(IDENTISCH(TEIL(XPathOnUrl("http://www.google.de/search?gc
x=c&sourceid=chrome&ie=UTF-8&q=
"&("info:"&(A2))&"&pws=0";"//li[@class='g']//h3[@class='r']//a";"href");8;LÄNGE
(A2));A2));"not indexed";"indexed")

Google Suggest Scrapen

• Quelle:
http://google.de/complete/sear
ch?output=toolbar&hl=de&q=
• Scraped das KW + mit/ohne
Leerzeichen und einem
Buchstaben
• Matrix Funktion um 10er
Ergebnisse zu scrapen
• 2. Iteration der Top 10

Über 600 suggested Keywords!

Google SERPs scrapen

Quelle:
http://www.google.de/search?q=deinkeyword&num=100&start=0&pws=0

Formel:
=XPathOnUrl([URL];"(//h3[@class='r']/a)["&A1&"]";"href")

Ergebnis:
/url?q=http://de.wikipedia.org/wiki/Suchmaschinenoptimierung&sa
=U&ei=bTU2UP6sPMfNsgbAnoHYBQ&ved=0CB0QFjAA&us
g=AFQjCNHwx6lcRxVC0-eBeDJ6GgHBiHGtFQ

=RECHTS(C1;LÄNGE(C1)-
=RECHTS(B1;LÄNGE(B1)-7) & SUCHEN("&amp";C1))

Watt noch?

Analytics for Twitter
von Microsoft

&
Power Pivot

Ende

Mit gezeigten Beispielen & Tools kann man theoretisch jede x-
beliebige Webseite abscrapen und in Excel verarbeiten

Be Creative!
Die live gezeigte Excel-Dateien werden auf dem Blog von
www.catbirdseat.de als Download zur Verfügung stehen

Data Scraping with Excel - Campixx 2013 - Maik Schmidt

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Data Scraping with Excel - Campixx 2013 - Maik Schmidt

Ähnlich wie Data Scraping with Excel - Campixx 2013 - Maik Schmidt (20)

Data Scraping with Excel - Campixx 2013 - Maik Schmidt