Duplicate Content Issues

•Download as PPTX, PDF•

1 like•964 views

SMFB ENGINE

Duplicate Content Issues, diagnosis & causes by Kristjan Mar Hauksson Nordic eMarketing

Business

Double trouble
DC issues - Diagnosis & causes

Kristjan Mar Hauksson
Nordic eMarketing
Director Internet Marketing
@optimizeyourweb

London| 18–21 February

London| 18–21 February 2013 | #SESLON

“- They ALL have some degree of Duplicate content
problems – Every single site I have ever analyzed
does!”

Mikkel DeMib

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

“Duplicate content is in most cases due to the way
CMS’ are set up …..or we might have a team of lazy
content writers on our hands.”

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

“Understand your content management system:
Make sure you're familiar with how content is
displayed on your website. Blogs, forums, and
related systems often show the same content in
multiple formats.”

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Diagnosis & Causes

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Couple of easy to use “tools”

• virante.org/seo-tools/duplicate-content
• Xenu
• Zoom Search Engine
• Google (Search, Webmaster Tools, etc..)
• Manual testing
• Screaming Frog
More on: support.google.com/webmasters/bin/answer.py?hl=en&answer=66359

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Using the site: command

• Site:yoursite.com
• This should show you how Google crawls your site
and what it finds
• Does this site have 46,800 products and categories?

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Another simple way to identify DC is to search

• Look at the content you have on your site, take
something like a news headline and Google it
• This will in most cases show you how Google is
crawling your site and what it finds

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Sample content leak

.dk

.se

.no

.fr

.co.uk

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Using Xenu

• If the site allows being crawled you can use Xenu to
crawl it and then look at the information that comes
out of it
• Arrange it and behold ….

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Using Copyscape

• Copyscape was originally created to find “stolen” copy
but works great when it comes to DC

London| 18–21 February 2013 | #SESLON

Content ownership

• Websites are often developed on a DEV url, which is
in many cases open, but only used for collaboration
between developers and site owners, then somebody
uses Google mail to share it or it is sniffed by a
subdomain finder. Then content ownership can be an
issue… for a long time.

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Image Plagiarism
• A search for ‚ritzy bryan‘
gives 895.000 results

• When you click images...
5 of top 9 top are the
photographers

• But the top two are not
on his website

• Click on the image

• Click ‚Image details‘ and
you get lots of similar
images

• Scroll down and you get
lots of plagiarizing
websites

London| 18–21 February 2013 | #SESLON

Frequent causes when starting a new site

• Firstly make sure that your dev.server is under lock
and key – Close it when you are done
• If you are using something like a news or a product
module over multiple sites, make sure that the
ownership is clear
• Not all of our content creates duplicate content on
your site – Scrapers can give you hell!
• Report plagiarism to Google as soon as you find it
– take ownership.
@optimizeyourweb

London| 18–21 February 2013 | #SESLON

301, 404 – Default or not default and ….
• 404s that are not 404s – Things can go a bit crazy if not inserted
properly on large commerce sites as an example

• WWW, Non-WWW & Default pages

• Query strings and session IDs

• Template content

• Boilerplate repetition, publishing stubs & similar content

• User generated duplicate (replica) content

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

The mother of all checklists ;-)

• Take everything that is a likely cause and create a
checklist and go through these items one by one and
make sure they are in order
• This is all common sense stuff and there is so much
information online. You should not have to do the
same mistakes as those before you….
• Know your CMS before you start implementing it!

@optimizeyourweb

London| 18–21 February 2013 | #SESLON

Thank you

@optimizeyourweb

Viewers also liked

Voce ja pensou em si mesmaMensagens Virtuais

Derecho Romano UTPLjfrancolls

M&CSAATCHI.GAD.snack_planning_19Benoît Pellevoizin

Perú ¿País desarrollado? - Fernando Villarán IPAE

Efectos del ruido en el ser humanoJACQUELM

Supliment scã‚nteiaMircea Tivadar

Eco1 sesión 2Rossita Miranda

Analisis de resultadosJosé Luis Contreras Muñoz

Prueba de computación wiki Annita Chávez

John watson e o behaviorismoFrancisca Maria

Formulario inscripción fcArturo Blanco

Eco y NarcisoCarlos .

flambagemBruna Húngaro

Bloque 2Angel Aldair Hernandez Ojeda

Apostila de armazenamento mar 2011Marcio Roberto Patelli

Seis etapas del cambio prochaskaNydia Barreiro

Técnicas comunes de RPAlicia De la Peña

BIAG NI LAM ANGherculesvalenzuela

Nombramiento de juecesYanet Caldas

Versailles, Paris - ChateauJerry Daperro

Viewers also liked (20)

Voce ja pensou em si mesma

Derecho Romano UTPL

M&CSAATCHI.GAD.snack_planning_19

Perú ¿País desarrollado? - Fernando Villarán

Efectos del ruido en el ser humano

Supliment scã‚nteia

Eco1 sesión 2

Analisis de resultados

Prueba de computación wiki

John watson e o behaviorismo

Formulario inscripción fc

Eco y Narciso

flambagem

Bloque 2

Apostila de armazenamento mar 2011

Seis etapas del cambio prochaska

Técnicas comunes de RP

BIAG NI LAM ANG

Nombramiento de jueces

Versailles, Paris - Chateau

Recently uploaded (20)

Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...

APRIL2024_UKRAINE_xml_0000000000000 .pdf

Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck

8447779800, Low rate Call girls in Tughlakabad Delhi NCR

Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...

NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf

Investment in The Coconut Industry by Nancy Cheruiyot

Intro to BCG's Carbon Emissions Benchmark_vF.pdf

Digital Transformation in the PLM domain - distrib.pdf

(Best) ENJOY Call Girls in Faridabad Ex | 8377087607

MAHA Global and IPR: Do Actions Speak Louder Than Words?

International Business Environments and Operations 16th Global Edition test b...

Japan IT Week 2024 Brochure by 47Billion (English)

Keppel Ltd. 1Q 2024 Business Update Presentation Slides

Annual General Meeting Presentation Slides

8447779800, Low rate Call girls in Saket Delhi NCR

Cybersecurity Awareness Training Presentation v2024.03

Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...

Independent Call Girls Andheri Nightlaila 9967584737

Case study on tata clothing brand zudio in detail

Duplicate Content Issues

1. Double trouble DC issues - Diagnosis & causes Kristjan Mar Hauksson Nordic eMarketing Director Internet Marketing @optimizeyourweb London| 18–21 February

2. London| 18–21 February 2013 | #SESLON “- They ALL have some degree of Duplicate content problems – Every single site I have ever analyzed does!” Mikkel DeMib @optimizeyourweb

3. London| 18–21 February 2013 | #SESLON “Duplicate content is in most cases due to the way CMS’ are set up …..or we might have a team of lazy content writers on our hands.” @optimizeyourweb

4. London| 18–21 February 2013 | #SESLON “Understand your content management system: Make sure you're familiar with how content is displayed on your website. Blogs, forums, and related systems often show the same content in multiple formats.” @optimizeyourweb

5. London| 18–21 February 2013 | #SESLON Diagnosis & Causes @optimizeyourweb

6. London| 18–21 February 2013 | #SESLON Couple of easy to use “tools” • virante.org/seo-tools/duplicate-content • Xenu • Zoom Search Engine • Google (Search, Webmaster Tools, etc..) • Manual testing • Screaming Frog More on: support.google.com/webmasters/bin/answer.py?hl=en&answer=66359 @optimizeyourweb

7. London| 18–21 February 2013 | #SESLON Using the site: command • Site:yoursite.com • This should show you how Google crawls your site and what it finds • Does this site have 46,800 products and categories? @optimizeyourweb

8. London| 18–21 February 2013 | #SESLON Another simple way to identify DC is to search • Look at the content you have on your site, take something like a news headline and Google it • This will in most cases show you how Google is crawling your site and what it finds @optimizeyourweb

9. London| 18–21 February 2013 | #SESLON Sample content leak .dk .se .no .fr .co.uk

10. London| 18–21 February 2013 | #SESLON

11. London| 18–21 February 2013 | #SESLON @optimizeyourweb

12. London| 18–21 February 2013 | #SESLON Using Xenu • If the site allows being crawled you can use Xenu to crawl it and then look at the information that comes out of it • Arrange it and behold …. @optimizeyourweb

13. London| 18–21 February 2013 | #SESLON Using Copyscape • Copyscape was originally created to find “stolen” copy but works great when it comes to DC

14. London| 18–21 February 2013 | #SESLON Content ownership • Websites are often developed on a DEV url, which is in many cases open, but only used for collaboration between developers and site owners, then somebody uses Google mail to share it or it is sniffed by a subdomain finder. Then content ownership can be an issue… for a long time. @optimizeyourweb

15. London| 18–21 February 2013 | #SESLON Image Plagiarism • A search for ‚ritzy bryan‘ gives 895.000 results • When you click images... 5 of top 9 top are the photographers • But the top two are not on his website • Click on the image • Click ‚Image details‘ and you get lots of similar images • Scroll down and you get lots of plagiarizing websites

16. London| 18–21 February 2013 | #SESLON Diagnosis & Causes @optimizeyourweb

17. London| 18–21 February 2013 | #SESLON Frequent causes when starting a new site • Firstly make sure that your dev.server is under lock and key – Close it when you are done • If you are using something like a news or a product module over multiple sites, make sure that the ownership is clear • Not all of our content creates duplicate content on your site – Scrapers can give you hell! • Report plagiarism to Google as soon as you find it – take ownership. @optimizeyourweb

18. London| 18–21 February 2013 | #SESLON 301, 404 – Default or not default and …. • 404s that are not 404s – Things can go a bit crazy if not inserted properly on large commerce sites as an example • WWW, Non-WWW & Default pages • Query strings and session IDs • Template content • Boilerplate repetition, publishing stubs & similar content • User generated duplicate (replica) content @optimizeyourweb

19. London| 18–21 February 2013 | #SESLON The mother of all checklists ;-) • Take everything that is a likely cause and create a checklist and go through these items one by one and make sure they are in order • This is all common sense stuff and there is so much information online. You should not have to do the same mistakes as those before you…. • Know your CMS before you start implementing it! @optimizeyourweb

20. London| 18–21 February 2013 | #SESLON Thank you @optimizeyourweb

Editor's Notes

DC issues - Diagnosis & causes
For example, a blog entry may appear on the home page of a blog, in an archive page, and in a page of other entries with the same label.
Not forgettingprintfriendlypages,tracking and sortingURL parameters, www and not www etc…https://www.google.com/webmasters/tools/spamreport?hl=en&pli=1
Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."Boiler plate" originally referred to the maker's label used to identify the builder of steam boilers. This link shows an example of a boiler plate.In the field of printing, the term dates back to the early 1900s. From the 1890s onwards, printing plates of text for widespread reproduction such as advertisements or syndicated columns were cast or stamped in steel (instead of the much softer and less durable lead alloys used otherwise) ready for the printing press and distributed to newspapers around the United States. They came to be known as 'boilerplates'. Until the 1950s, thousands of newspapers received and used this kind of boilerplate from the nation's largest supplier, the Western Newspaper Union.[citation needed]Some companies also sent out press releases as boilerplate so that they had to be printed as written. The modern equivalent is the press release boilerplate, or "boiler," a paragraph or two that describes the company and its products.
Interestingreads:http://www.thegooglecache.com/white-hat-seo/duplicate-content-round-up-diagnosis-and-correction-with-free-tools/

Duplicate Content Issues

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Duplicate Content Issues

Similar to Duplicate Content Issues (20)

More from SMFB ENGINE

More from SMFB ENGINE (19)

Recently uploaded

Recently uploaded (20)

Duplicate Content Issues

Editor's Notes