Blog

The most common reasons for content duplication – how to deal with them?

27 March 2013

Content duplication is an increasing problem for website owners, and it is not only about e-commerce websites, which are most exposed to this phenomenon (due to their size). The problem also affects small websites and even company websites. This article is the first in a series of articles in which we will discuss the topic of duplicate content. In the next text, we will describe the problem that may arise when our content is copied by other websites. Today we will deal with the problem of content duplication occurring within one website.

Main problems resulting from duplication of content

  • When it encounters duplicate content within one website, the Google robot decides which of the pages contains the original content. It sometimes happens that the robot selects incorrectly, thus the wrong URL appears in the search results.
  • If there is a large amount of repeated content, search engine robots may be less effective in searching for unique content. This phenomenon will result in fewer visits by search engine robots to our website. There is no need to write about the consequences of such an action. :)
  • If external links lead to different subpages with the same content, we must take into account the loss of "link juice". The most common example of this phenomenon is the presence of a home page under several URL addresses.
  • Another problem in the case of websites with a lot of duplicate content may be the amount of bandwidth used by search engine robots. Every day, many robots visit your website, and by downloading unnecessary duplicate content, they can slow down the operation of your website and significantly increase bandwidth consumption.

The graphics below show two sample charts from GWT, showing the number of kilobytes downloaded by Googlebot during the day. If we take into account the robots of other search engines, we can quickly conclude that limiting unnecessary content on the website will save us a lot of bandwidth and money.

The most common reasons for duplicate content

You can easily distinguish many reasons for the occurrence of duplicate content within one website. these are mostly technical reasons. This happens because developers very often do not put much emphasis on website optimization aspects.

(1) Same product available at several URLs

This is one of the most common examples of duplicate content on e-commerce websites. It results from the fact that while moving around the store, we can get to the product page from many categories. In other words, there are many different paths that will take us to a specific URL.

Let's assume that we want to buy a blue men's sweatshirt.

Examples of addresses through which we can get to the desired website can be:

  • http://www.domena.pl/bluzy/produkt-45
  • http://www.domena.pl/bluzy/meskie/produkt-45
  • http://www.domena.pl/bluzy/meskie/blue/produkt-45
  • http://www.domena.pl/bluzy/blue/produkt-45
  • http://www.domena.pl/bluzy/bluye/meskie/produkt-45

Of course, this is just an example and in many cases one product may have many more URLs.

Solution: In this case, we suggest – as far as possible — rebuilding the structure of URL addresses in such a way that all products are located in one directory, e.g. domain.pl/produkty/. Therefore, regardless of the path we choose, we will always get an address in the form http://www.domena.pl/produkty/produkt-45 at the end. Unfortunately, this solution is rarely implemented in existing websites.

Another way to avoid duplication of product page content is to introduce canonical pages, i.e. preferred versions of a certain set of pages with similar or the same content. To do this, place the following code in the headers of pages that are duplicates:

< link rel=”canonical” href=”http://www.domena.pl/wlasciwy-adres” / >

We thereby inform the Google robot that the address http://www.domena.pl/wlasciwy-adres is the original product page.

(2) URL with WWW and without WWW

This is another very common example of repeated content. Please remember that Google treats the addresses http://www.domena.pl and http://domena.pl as two completely separate pages.

Solution: In this case, you should choose one address, e.g. http://www.domena.pl and create a 301 redirect from http://www.domena.pl. It is worth paying attention herethat this redirection should be done 1:1, i.e. individual pages on one website should be redirected to the corresponding pages on the other. Below you will find examples of redirections in the .htaccess file.

Redirection from WWW to non-WWW

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^yourdomain.pl [NC] RewriteRule ^(.*)$ http://yourdomain.pl%{REQUEST_URI} [R=301,L]

Redirection without WWW to WWW version

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www.twojadomena.pl [NC] RewriteRule ^(.*)$ http://www.twojadomena.pl%{REQUEST_URI} [R=301,L]

(3) Pagination

I don't think we can remember a website that would not have a problem with pagination. It all comes down to which of the subpages (1,2,3 or maybe 10) is the most important and which of them should appear highest in the search results.

Additionally, in most cases, all these pages have the same meta data: title and description.

Solution: In this case, we do not have to choose one specific solution. We have a choice between using rel=?canonical? or meta tags in rel=?next? and rel=?prev?. Since canonical pages were described in the previous example, this time we will use rel=?next? tags. and rel=?prev?, which we also place in the head section of the page.

The implementation of this solution is not very complicated. Just follow the diagram:

?    On the first page of pagination, we only place the rel=?next? tag.

?    The last page of the pagination should only contain the rel=?prev? tag.

?    On the remaining pages we add both "next" tags and "prev", which should point to the next and previous pages respectively.

(4) Pages to print

We encounter this problem just as often as the previously mentioned examples. It happens that the CMS provides users with the option to print the content.

Most often, by clicking the Print link, we are redirected to a new subpage with the same content, but with a different URL address, e.g.:

  • http://www.domena.pl/artykul1
  • http://www.domena.pl/artykul1-drukuj

Solution: In this case we suggest entering rel=?canonical?, this is one of the places where 301 redirection doesn't work. Imagine if someone wanted to print the content of a page and after clicking Print, they would be redirected to the previous page. In case the printing option seems irrelevant to the site, you can simply remove it.

(5) Affiliate links/tracking parameters in URLs

Probably most of you have dealt with affiliation and are aware that tracking is done in this case by modifying URL addresses, or more precisely, to say: by adding a parameter to the address, e.g. www.domena.pl/?partnerid-7653, where the ?partnerid-7653 element is stored as long as the user navigates the website. This may, of course, lead to duplicate content.

Solution: One solution is to place the partner ID parameter after the # sign instead of ?. As we know, everything after the # sign is not indexed by the Google robot.

Another, slightly more complicated solution to the problem may be to create a 301 redirect (when clicking on the link to our website) to a website without a parameter specifying the partner's ID. This parameter can be stored in cookies.

Looking for duplications

First of all — Google

The easiest way to detect duplicate content on our website is to search for a few random sentences in Google. Using the content from our website in conjunction with the site operator: we can check whether the text has been duplicated on other subpages. Let's use an example from our company website.

It is best to repeat this action with several subpages. You can find more about advanced queries in Google here.

Secondly — Google Tools for Webmasterów

More details about this tool and how to look for duplicate content can be found in our previous articles: Google Webmaster Tools - ? why is it worth using them? and Google Webmaster Tools, as a source of information about the technical condition of the website.

Thirdly — Xenu

This is a small program used to find broken links on our website. However, this is not its only function, by sending the report to Excel we can filter all of theme pages with duplicate titles or descriptions. Software can be downloaded here.

Fourth — Google Analytics

To look for duplications, we must go to the Content-> category Site content -> All pages

Then, select an additional dimension: Page Title. All we need to do next is export the data to Excel and filter out duplicates.

In addition, in the Audience category-> Technology-> Network, by selecting the additional dimension: Hostname, we can check whether we have a problem with duplication of the entire domain.

As all of you have probably noticed, Google has been placing great emphasis on content quality for some time now. It treats us with constant "animal attacks", forcing us to improve the quality of our websites. Therefore, 2013 will certainly be the year in which good content will gain in importance. With all this, we cannot forget about the problem of duplicate content. We hope that after reading this article, you will not have any problems identifying where this problem occurs and that you will be able to solve it yourself.

Also read

Local SEO – what you need to know

Local SEO – what you need to know

Local SEO is crucial for smaller companies that operate on a regional rather than national level.

11 March 2021