Duplicate Content Case Study
In this article, I’ll explain how to make profit from thin air by eliminating duplicate content from your website. No, this is not some shady ‘get-rich-quick scheme’ full of hollow promises. This is a real example of how the identification, diagnosis and elimination of duplicate content became the biggest single contributory factor in achieving an increase of 145% in the number of website visitors from free search (aka organic search or natural search) for one of our clients. Just to be clear, an increase of 145% means that the number of visitors was 2.45 times the number of visitors in the same period in the previous year. We followed Google’s guidelines on this topic.
The Benefits of Eliminating Duplicate Content
The graph below shows the number of organic visits in the period January 2010 to December 2011 inclusive.
You can click on the image to see a bigger picture.
AdJuice was engaged by Cavendish French towards the end of March 2011. Cavendish French specialise in a wide range of handcrafted silver jewellery, including stone set silver jewellery with contemporary, classic and vintage inspired collections. We started addressing the issue of duplicate content in May 2011 and the effects started to show almost immediately. Since the benefit really only started to be felt in the second half of the year, we have calculated the 145% increase by comparing the number of visits in the period 1st July to 31st December 2011 with the number of visits in the same 6 months of 2010. If you compare the whole of 2011 with the whole of 2010, the increase is still 70%. What might that growth have been but for the dire economic circumstances we face?
Content May Be King But More is Not Always Better
Sounds like a contradiction, I know. In the world of SEO, content is king. As a rough rule of thumb, more content brings more visitors to your website. Where this does not apply is where the content does not the meet the requirements of being useful, unique and relevant. So more content of a duplicate nature not only brings no incremental benefit but may adversely impact the performance of your site as a whole. In the past, it was just the duplicates that got ignored by Google. Since Google’s ‘Panda Update’ in 2011, duplicate content and other content with low utility value may also impair the performance of the good content (in search marketing terms).
It has never been more important therefore to attend to these kinds of issues. Fortunately Google provides lots of guidance on this problem to help website owners to get the best performance from their sites.
In the summer of 2011, our review led to the elimination of 90% of the pages that google.co.uk had in its index for www.cavendishfrench.com. The number of pages of their website indexed by Google was reduced from over 22,000 to just over 2,000. You can see the number of pages in Google’s index for any website by entering “site:www.anywebsite.com” into Google’s search box. Thanks go to Cavendish French for trusting our recommendations and also permission to publish this.
So, What is Duplicate Content?
Duplicate content comes in many different forms. To the layman, it may suggest that a web page has been copied i.e. the content on the page has been copied and reproduced on another page of another website or another page of the same website. These cases may be legal, illegal, plagiarised, etc.
Ironically, there are vast numbers of cases where website performance is being adversely impacted by duplicate content issues but nobody has knowingly copied anything. Duplicate content exists for a whole variety of technical reasons and I can’t cover them all here. The scope of this article extends only to the circumstances that directly impacted this project. For those that are interested to read up on all the forms of duplicate content that can exist and how to address them, I have included a couple of links at the bottom of this article to excellent resources.
The most common instances of duplicate content in this project related to URL issues. These problems arise where exactly the same or substantially the same content can be accessed in different ways and found at different URL addresses. This problem is very common with online shopping sites.
To get specific, when browsing on the Cavendish French site, the shopper has the option to display 12 products on a page, 20 products on a page or 40 products on page. So, when you arrive on the ‘silver bangles’ page, this is the URL (page address) you first see in your browser address.
The default is set to display 12 products on the above page. If you then choose to display 20 or 40 items per page, the URL addresses change to these, respectively.
And if you change back to display 12 products, the page URL is not back where you started but is this one.
So there are 4 page URLs (so far) for ‘silver bangles’. These are all perfectly valid options for the shopper so there is no fault with the design from a usability point of view or UX (user experience). The problem is that, as far as search engines are concerned, these are all technically different pages but with the same content and may therefore be considered copies or duplicates of each other. The task the search engines face is how to decide which one of these pages to show for a search on ‘silver bangles’.
Read on because it gets more interesting.
The shopper also has the option of sorting the products into descending order of price or ascending order of price (this is the default). Sorting the default page into descending order of price creates this additional URL.
And for 20, 40 and (back again to the default of) 12 items per page, these URLs.
You can start to see how the number of technically different page addresses, for essentially the same product, is beginning to escalate. In addition to being able to select the number of items on each page and the price order in which they are sorted, the shopper can also select which page they want to view i.e. 1st, 2nd, 3rd etc. These options create yet more URLs.
These options can all be combined with each other so the number of URLs compounds. The 100 bangles that currently exist could therefore be displayed as follows.
100 products displayed 12 per page = 9 pages or
100 products displayed 20 per page = 5 pages or
100 products displayed 40 per page = 3 pages.
That makes 9 + 5 + 3 = 17 page addresses so far. All of these could be sorted into descending or ascending price order so the number of pages increases to 17 x 2 = 34. I’ve ignored other combinations involving the default page addresses because I think that’s enough to illustrate the point! One page existing in 34 different forms, each with a different page address, but all displaying what could be considered to be essentially the same content.
There are a number of different approaches to dealing with duplicate content depending on the exact circumstances. We opted to make use of the “rel=canonical” link attribute. This means, firstly making a decision about which page address, out of all possible alternatives, is the ‘preferred’ or ‘canonical’ version of the page address.
Once that decision had been made, then we had to include a small code snippet into the head section of all the relevant pages (i.e. all 34 pages above and any other duplicates that may exist for ‘silver bangles’).
This is not interpreted by Google as a command but rather a request or a signal that we wish this page address to be treated as the primary or canonical version and that all other similar pages are effectively copies and should therefore be subordinated to this one.
The page of this blog post you are reading has a rel=canonical link in the code.
Now Google and the other search engines know which page we intend to be shown and this is what happened. The effects of this are several. It increases the conversion rate of this product category by increasing the chances that the visitor is shown the best page i.e. the one that we have chosen. It concentrates the link equity of the site by focusing it on fewer pages. That means better rankings.
Better rankings coupled with better conversion rates means more profit. Out of thin air.
There is sound logic behind why this outcome is reasonable and plausible. Google’s primary aim is to provide the best user experience by returning the most relevant and useful results as fast as possible to surfers. Those website owners that help Google to achieve this by setting up their websites in accordance with Google’s guidelines are bound to fare better.
De-indexation of Other Low Value Pages
Whilst we were at it, we also endeavoured to de-index some other categories of pages which added no value by being in Google’s index. Some of these pages were de-indexed using the straightforward ‘meta noindex’ command. For others, we used URL parameter settings in Google Webmaster Tools for a ‘belt and braces’ approach and this angle dealt with de-indexing another 6,000 odd pages.
Where is The Proof?
In the strictest sense, there is no absolute proof that it was the elimination of the duplicates in the index that led directly to the improved website performance. However, the circumstantial evidence is overwhelming in terms of timing. Although there was some of the other usual on site work going on (meta tags etc.) these were not extensive so I’m not in any doubt about cause and effect.
Did you find this post useful or have you had any similar epxerience? If so, please feel free to add your comment below, help others find this post by using the social sharing buttons below or subscribe for future updates using the RSS Posts feed at the top of the page.