SEO Tips – How to Avoid Duplicate Pages?

How to Avoid Duplicate Pages?

How to Avoid Duplicate Pages?

Una delle Penalità che Google può dare ad un sito è basata sulla eventuale presenza di Pagine Duplicate.

Cosa fare nel caso in cui un sito abbia pagine duplicate ovvero pagine che differiscono solo per l'url ma che presentano lo stesso contenuto?

Ecco una lista di consigli su come evitare tale problema e quindi migliorare l'indicizzazione delle pagine di un sito su Google.

One of the Penalties that Google can give to a site is based on the eventual presence of Duplicate Pages.

What to do if a website has duplicate pages or pages that differ only in the url but with the same content?

If you search your websites or other websites on google you can find different pageg with the same content. The difference it's just the url, for example:

yoursite.com

www.yoursite.com

yoursite.com/index.htm

or wiki pages...

wikiname.com/product1
wikiname.com/article/product1
wikiname.com/category/product1

We must avoid this situation because Google takes his time to check this duplicate pages and not for new pages or new contents on our website.

Here is a list of Tips on how to avoid this problem and thus improve the indexing of pages of a site to Google.

(click on photos for large version)

Use Google WebMaster Tools

If you do not already know them, are tools provided by Google to help webmasters in the process of indexing their websites.

URL > http://www.google.com/webmasters/tools/

Google Webmaster Tools - Dashboard

Google Webmaster Tools - Dashboard

After the login you can select your website from the list and manage it.

In the Side Menu there's the Diagnostic submenu, this provides you suggestions and warning about you website. For example, Duplicate Meta Descriptions and Duplicate Title Tags. The Diagnostic Tool provides you the url with the duplicate tags so you can check it and solve the problem.

Google Webmaster Tools - Duplicates Tags

Google Webmaster Tools - Duplicates Tags

Delete the Duplicate Pages from the XML Sitemap

Check your Sitemap and if there's the URL of a Duplicate Page, delete it and be sure that only the original versione of the page remains on the XML Sitemap!

Do not do this


<url>
  <loc>http://www.yourname.it/</loc>
  <priority>1.00</priority>
  <lastmod>2009-12-27T14:22:55+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>
<url>
  <loc>http://www.yourname.it/index.html</loc>
  <priority>0.80</priority>
  <lastmod>2009-03-11T14:22:55+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>
...

and do this


<url>
  <loc>http://www.yourname.it/</loc>
  <priority>1.00</priority>
  <lastmod>2009-12-27T14:22:55+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>
<url>
  <loc>http://www.yourname.it/category1.html</loc>
  <priority>0.80</priority>
  <lastmod>2009-03-11T14:22:55+00:00</lastmod>
  <changefreq>daily</changefreq>
</url>
...

The Error of the Index Page

In a webpage we have the Main menu with the link for the home page, the Logo and the Heading element H1. These have the link for the home page; now remember that you can see the home page in different ways:

yoursite.com
www.yoursite.com
yoursite.com/index.htm

The Tip it's don't do this:

<a href="index.html" title"your title" name="your-title">

but do this

<a href="/" title"your title" name="your-title">

...and also in this case, remove index.html (index.php, or index.asp, etc...) from your XML Sitemap and insert only www.yoursite.com

HTTP 301 Status Code

If you have one or more pages with the same contents you can choose to redirect them to the original page using the 301 Status code, a simple and useful redirect.

The message that receives Googlebot it's "Moved Permanently", so it will visit only the original page.

You can use it in different ways:

Redirection with META Refresh (the easiest way!)

<META HTTP-EQUIV="REFRESH" CONTENT="0; URL=http://www.website-name.com/original-page.html">

Redirection with Javascript

<html>
<head>
<script type="text/javascript">
window.location.href='http://www.website-name.com/';
</script>
</head>
<body>
This page has moved to <a href="http://www.website-name.com/">http://www.website-name.com/</a>
</body>
</html>

HTTP 301 Redirect in PHP

<?php
// Permanent redirection
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.website-name.com/");
exit();
?>

For the complete list of the Permanent Redirect Methods (perl, cold fusion, asp, etc...) >Permanent Redirect with HTTP 301

...but the 301 Redirect it's not the only possible redirect!

The Canonical Page

If you have duplicate pages you can choose to add in the head section this code:

<link rel="canonical" href="http://www.website-name.com/original-page.htm"/>

You can use it for relative or absolut links. The canonical link it's a suggestion for Googlebot, not a directive.
Remember: use it only if you can't delete the duplicate pages or the content of these. The content of the pages must be identical!

Duplicate Pages and Robots.txt

An other simple tip to solve the problem of the duplicate pages! In robots.txt you can choose the directories that Googlebot does not follow.

How do this? It's simple!

User-Agent: *
Disallow: /directory/subdirectory/
Disallow: /directory/file.html
Allow: /

In this way Googlebot doe's NOT follow /directory/subdirectory/ and /directory/file.html but follows the others. With Google Webmaster Tools you can automatically generate your robots.txt in a few clicks.

robots.txt generated with Google Webmaster Tools

robots.txt generated with Google Webmaster Tools

For more informations about robots.txt visit the official website: http://www.robotstxt.org/

Comments/Suggestions are welcome!

Follow us on Twitter for Extra-News and Resources!