On February 12, 2009 each of the major search engines made an announcement that they would be supporting a new “canonical URL tag” that would allow website owners to eliminate various types of duplicate content showing in their indexes. Duplicate content has long been a problem for both Search Engines (due to the problems they have identifying the main document) and Webmasters (due to the problems that search engines have in identifying the main document).
For a well thought out explanation on all this, I encourage you to watch a short video by Matt Cutts of Google’s web spam team about canonical URLs and duplicate content.
How does the canonical URL tag work?
The canonical URL tag is a html element that lives in the <head> area of your HTML document. If you choose to insert the canonical URL tag you will be able to nominate your preferred version of a particular URL to show in the search engine index. The canonical URL will also allow you to consolidate your link popularity to the selected URL rather than having it distributed across any number of duplicate versions of the same page. It is kind of like an invisible redirect for the search engines.
The correct formatting when applying the canonical URL is as follows:
<link rel=”canonical” href=”http://www.shoppingcartstrategies.com” />
Now that we are all familiar with how the canonical URL tag looks and where we need to put it I suppose we should look a bit deeper and describe how it all works.
Basic Example of Software Produced Duplicate Content
Sadly, most forms of duplicate content are accidentally created through added functionality that is built into your shopping cart software. For example, a lot of shopping carts will have a function that will allow you to sort the products within a category by price — viewing the lowest priced item through to the highest priced item (or vice versa). Generally this type of functionality will add a series of parameters to the end of the URL of the category that you are viewing (see diagram below).

As you can see, what is a very useful feature implemented by your software provider for your customers can inadvertently cause you problems with the way that search engines treat and index your site.
Web Analytics and Duplicate Content
Before I identify how potential duplicate content can occur through Web analytics I must stress — if you are not using any type of web analytics to track your website performance START NOW!
Typically if you are purchasing any type of advertising or are serious about social media you will be tagging your links to make sure you are getting maximum value for your time and money, for example:
http://www.shoppingcartstrategies.com/?utm_source=yahoo&utm_medium=ppc&utm_campaign=twitter_coupons
Fortunately, if you have implemented the canonical URL tag that was referenced above you would not have to worry about potential duplicate content issues as the tracking parameters would simply be ignored by the search engine spiders, in preference for your main URL.
Shopping Cart Developers Adopting Canonical URL Tag
Interspire Shopping Cart
Magento Shopping Cart
osCommerce Canonical URL
If you know of any other shopping carts that have implemented will have third-party modules developed supporting the canonical URL please let us know in the comments below and we’ll add them to our list.
Additional Reading
Also, if you have implemented the canonical URL link element onto your site please let us know how the search engines are handling them.
{ 5 comments… read them below or add one }
This is something that every shopping cart, cms and website should have. It's simple to setup and makes a website so much more seo friendly and human friendly. I cannot think of many tools that give both sides positive benefits.
Thanks for the comment Nigel.
I have had some success using the canonical URL element and agree that it really is a no-brainer to implement.
I do not even know why website software doesn't have it setup by default. Maybe all the programmers had no idea that search engines like keywords in the address and the human can read the link a lot more easily than a bunch of random numbers and letters.
Nigel it would be nice if all software developers were able to “correctly” implement these types of solutions, but at the end of the day duplicate content can be a really tricky business, particularly with e-commerce applications. We all want feature rich applications that our customers are able to interact with, but on the flip side the necessary database queries needed to create rich functionality is hard for search engine indexing. Good programmers that have their head around SEO issues are worth their weight in gold.
Eric Enge recently posted a fantastic interview with Matt Cutts that highlights these difficulties — http://www.stonetemple.com/articles/interview-m…
That was a long post but a great interview with Matt Cutts. It answered a number of things that I have been wondering about. I have created a post for the general website owner with some useful questions and answers for those types of people – GoogleBot Questions: From Eric Enge’s interview with Matt Cutts from Google