XML Sitemap and Robots.TXT Guide for SEO Beginners

robots.txt and sitemap.xml

PRACTICAL GUIDE FOR SEO BEGINNERS

WHAT ARE WEB ROBOTS?

 Web Robots (also known as Web Wanderers,
Crawlers, or Spiders), are programs that
traverse the Web automatically. Search engines
such as Google use them to index the web
content, spammers use them to scan for email
addresses, and they have many other uses.

WHAT IS ROBOTS.TXT?

 Robots.txt is a plain text file that you upload to
the root directory of your site. Once the web
spiders (ants, bots, indexers) that index your
webpage search your site, they first look at that
text file and process it. Put differently, robots.txt
says to the spider which pages to crawl.

THE SIMPLEST VERSION OF ROBOTS.TXT
User-agent: *
Disallow:

 The first line “user agent asterisk” indicates
that the following lines apply to all agents.
Space after "disallow:" means that nothing is
limited. This robots.txt file does nothing it
allows all types of robots to see everything on
the site.

SOME MORE EXAMPLES OF ROBOTS.TXT
 To exclude all robots from the entire server
User-agent: *
Disallow: /

 To allow all robots complete access
User-agent: *
Disallow:

(or just create an empty "/robots.txt" file, or don't use
one at all)

 To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

 To exclude a single robot
User-agent: BadBot
Disallow: /

 To allow a single robot
User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

 You can disallow single pages:
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html


 You can specify the Sitemap location in your
robots.txt file

User-agent: *
Disallow: /

Sitemap: http://www.example.com/sitemap.xml

ABOUT THE ROBOTS <META> TAG
 You can use a special HTML <META> tag to tell
robots not to index the content of a page, and/or
not scan it for links to follow.

<html>
<head>
<title>...</title>
<META NAME="ROBOTS"
CONTENT="NOINDEX, NOFOLLOW">
</head>

WHAT ARE SITEMAPS?

 Tells search engines which pages are available
for crawling.
 A Sitemap is an XML file that lists URLs for a
site along with additional metadata about each
URL.
 when it was last updated
 how often it usually changes

 how important it is, relative to other URLs in the site

SITEMAPS XML FORMAT
 The Sitemap must:
 Begin with an opening <urlset> tag and end with a
closing </urlset> tag.
 Specify the namespace (protocol standard) within the
<urlset> tag.
 Include a <url> entry for each URL, as a parent XML
tag.
 Include a <loc> child entry for each <url> parent tag.
 All URLs in a Sitemap must be from a single host, such
as www.example.com or store.example.com.
 Sitemap file must be UTF-8 encoded
 No more than 50,000 URLs
 File must not be larger than 10MB

SAMPLE XML SITEMAP
 <?xml version="1.0" encoding="UTF-8"?>

 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

 <url>

 <loc>http://www.example.com/</loc>

 <lastmod>2005-01-01</lastmod>

 <changefreq>monthly</changefreq>

 <priority>0.8</priority>

 </url>

 </urlset>

USING SITEMAP INDEX FILES (TO GROUP
MULTIPLE SITEMAP FILES)

 The Sitemap index file must:
 Begin with an opening <sitemapindex> tag and end with a
closing </sitemapindex> tag.
 Include a <sitemap> entry for each Sitemap as a parent
XML tag.
 Include a <loc> child entry for each <sitemap> parent tag.
 The optional <lastmod> tag is also available for Sitemap
index files.
 Note: A Sitemap index file can only specify Sitemaps
that are found on the same site as the Sitemap index
file. For example,
http://www.yoursite.com/sitemap_index.xml can include
Sitemaps on http://www.yoursite.com but not on
http://www.example.com or
http://yourhost.yoursite.com.

SAMPLE XML SITEMAP INDEX
 <?xml version="1.0" encoding="UTF-8"?>

 <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

 <sitemap>

 <loc>http://www.example.com/sitemap1.xml.gz</loc>

 <lastmod>2004-10-01T18:23:17+00:00</lastmod>

 </sitemap>

 <sitemap>

 <loc>http://www.example.com/sitemap2.xml.gz</loc>

 <lastmod>2005-01-01</lastmod>

 </sitemap>

 </sitemapindex>

SITEMAP FILE LOCATION

 The location of a Sitemap file determines the
set of URLs that can be included in that
Sitemap. A Sitemap file located at
http://example.com/catalog/sitemap.xml can
include any URLs starting with
http://example.com/catalog/ but can not
include URLs starting with
http://example.com/images/.

THANK YOU
ADITYA TODAWAL
PROJECT COORDINATOR (SEO)
SEARCH RESULTS MEDIA – INTERNET MARKETING TORONTO

XML Sitemap and Robots.TXT Guide for SEO Beginners

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie XML Sitemap and Robots.TXT Guide for SEO Beginners

Ähnlich wie XML Sitemap and Robots.TXT Guide for SEO Beginners (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

XML Sitemap and Robots.TXT Guide for SEO Beginners