Copying the content from some other website to your website? Or copying the content within the website? Whatever if the content is getting duplicated for some reason on your website you may have to take look at the effects of duplicate content on SEO and action is very much needed.
I understand when will a content get duplicated and many times one cannot avoid duplicating it in the highly demanding situations, when investing time and money matters, when reproducing the same quality content is painful.
What will happen, when content is duplicated and when you let the search engines to crawl all of them?
I would say there are very high chances of you losing the organic traffic to your main website based on the quantity of duplication, type of duplication and for the period for which the duplicate content remained on the website and the type of action taken towards the duplicate content.
Let’s understand technically what’s going wrong:
Based on the type of duplication one of the problems or all of the three problems may occur on your website,
Firstly the crawl resource wastage:
Whenever the energy is invested on wrong/invalid web pages we don’t see progress happening on the right pages as these pages to perform bots need to visit the valid pages often, whenever bots invest their time on wrong pages we get into a problem of losing rankings for many keywords pertaining to valid web pages and also there is a risk of eventually these pages getting removed from Google index if this continues for a prolonged period.
Devaluing or deprioritizing the website:
When Google servers already have one copy of content they don’t prefer to invest their effort again to get the exact same copy and they actually ignore or warn us when they are very few such errors, but in case of repeated an too many such issues their filters would start investing less time on our website (devaluing the website) or would visit very rarely or not at all.
Cannibalization Issue/Competing with each other and traffic split:
Would get into the other conflicts/problems such as web pages cannibalization, where we have to give up our rankings/hold to competitors on search engines or eventually all those web pages whichever get into this duplicate conflict would get removed from Google index and therefore there will a loss in traffic as well coming from those web pages.
Different cases/types of content duplication:
Let me take you through the different cases of content duplication and I will also brief you on how it all should be handled whenever you think it’s getting duplicated on your website,
Case 1: Exact replication of complete websites content:
There are cases where companies have to replicate their complete website content from their parent domain to the new domain for some reason, but the parent domain is one on which the SEO is implemented and on the new domain they would not implement the SEO as they must be using it for some other marketing activities other than Search engine optimization or for some reason.
In this case, the new website may impact the performance of your main domain when search engines figure out its all exactly the same as the main domain. The problem is they just don’t impact the new domain and also the old website/main website will also see the effect of this negatively, as discussed in the 3 problems of content duplication.
Solution:
Adding No-index, No-follow:
Implement this particular meta tag in the head section of your web pages across website
<meta name="robots" content="noindex,nofollow" />
This particular tag tells bots not to crawl and index any piece of the content on to the search engines.
Uploading Robots.txt
Create a file robots.txt with the following commands and update them to the root directory of the website, this will tell bots not to crawl the website, but then there is no guarantee for not indexing the webpages and hence we have to implement the no-index, no-follow as well.
Useragent: *
Disallow: /
Case 2: Content syndication:
When you are syndicating the content partially or completely from other sources there is a possibility of your website as well as source website both getting into the risk of traffic split Or possibly would completely lose for the other.
This kind of issues will usually happen with the news websites where the same info or news will be served across several other websites.
Solution:
In this case a syndication meta tag helps bots to understand which is original content and which is copy of it, this cannot be blocked as we may want bots to visit this page and crawl the internal links in the content, the unique elements of the content and may also index and help to rank for as many keywords as possible.
<meta name=”syndication-source” content=”http://www.yourwebsite.com/story.html”>
(Or)
<meta name=”original-source” content=”http:///www.yourwebsite.com/lateststory1.html>
The above syndication meta tag has to be added on the content syndicated from the other website and which is not the original content.
Alternatively, in case of syndication tag, the pagination tag can also be added.
Case 3: Within website similar webpage with minor changes:
In this case, you may want the web page which has got the different listing that needs to be crawled by the search engines, but then a major chunk of the text including the meta tags is same as the original page and you don’t want to block it for bots.
Solution:
Implement a canonical Tag:
<link rel="canonical" href="http://example.com/url" />
This tag has to be implemented in the head section of the HTML on the duplicate page pointing to the original page. Where example.com/url is an original URL and the canonical tag should go on the duplicate page head section lets say example.com/url1
Case 4: Pagination Issues:
Pagination issues is not exactly a duplicate content but then they can be considered to be as mostly redundant pages with the same titles and descriptions mostly, but all the content put upon them can be unique from each page.
Solution:
In this case you can have a canonical tag implemented on each page pointing to the first page, but then since this is a pagination and has too many URLs in the serious with slightly different in their common content block and completely different listings and that’s when the normal canonical is not the best implementation and the pagination canonical is the right option.
Tags to be put up on the page 1:
The next page URL pointing to the current page,
<link rel="next" href="http://www.example.com/topic/page/2" />
<link rel="canonical" href="http://www.example.com/topic/" />
The HTML marked in the red is not compulsory, but then it’s good to have as per the suggestions of Google Search engine. This is the same as self canonical and hence would help your web pages still perform better in typical cases on unknown duplication/versions of pages get created.
Tags to be put up on the page 2:
The previous pages become original to the current page and the next level page is a copy of the current page.
<link rel="prev" href="http://www.example.com/topic/" />
<link rel="next" href="http://www.example.com/topic/page/3/" />
<link rel="canonical" href="http://www.example.com/topic/page/2/"/>
Tags to be put up on the page 3:
Page 3 only has a reference to the previous page since it’s the last one in the sequence:
<link rel="prev" href="http://www.example.com/topic/page/2/" />
<link rel="canonical" href="http://www.example.com/topic/page/3/" />
Note: Canonical and pagination meta tags doesn’t make the duplicate pages to perform on search engines.
Case 5: Major overlap yet needs to perform on search engines:
Major overlap of content on the web page from its parent page, but the child page needs to be allowed for search engines crawl and index and also the page has to be performed and has to contribute for traffic.
All the above implementations doesn’t help you to make those pages to perform to the full or may not at all perform, but in this case, we are going to handle the duplicate content and both the original page and the duplicate page will be performing to their full-on the search engines.
Note: In this case, there will be a major content duplication lets say 90%, but then these pages need to be performed regardless of the overlap and meanwhile enabling these pages to perform is also important.
Solution:
Identifying the common blocks of the content and the unique blocks and making bots to access only the unique blocks, where bots won’t be able to see the common blocks and hence gives a weightage to these pages as well and allows them to performs regardless of the content overlap from its parent page.
Case 6: Complete Content Translation:
If the website has a translated content then it can be considered to be as unique content from the main site and those pages can be allowed for search engines crawl and index.
Instead of auto-translation, a manual translation will be of great value/use to the end user but then from search engines point of view, this is not duplicate content.
Solution:
The translated content can be allowed for search engines crawl and index with a small implementation of a language tag.
<link rel="alternate" hreflang="lang_code"... >
Conclusion:
Content duplication when not handled properly or poor understanding around it may put your business in trouble when you understand the purpose of duplicating the existing content you may have to also work towards achieving the objective without affecting the original content or the system.