Publishing duplicate content can lead you into serious trouble. If your site is dependent on traffic from search engines, the content on your site has to be unique and plagiarism-free.


What is duplicate content

There’s a lot of myth around what duplicate actually is. According to Google, here’s what duplicate content means from the context of facing a penalty:

There are some penalties that are related to the idea of having the same content as another site—for example, if you’re scraping content from other sites and republishing it, or if you republish content without adding any additional value.

From this, we can safely interpret that duplicate content is especially dangerous when it is copied or plagiarised.

Here’s what Google means when they say penalty:

In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.

It may not be your intention to be deceptive or to manipulate your site rankings on Google, but consider this case: your writers may have passed on copied content to bill you.

Whatever may be the case, it is good to be on a safer side by making sure your content is unique and provides value to your users.


How to check duplicate content

Whether it is the text content on your product page or posts on your blog, it is super critical that you check if the content you’re posting is not a duplicate of content from other domains.

The catch here is, from the eyes of Google, your content is considered copied only when it is taken from a source that is already read by Google and is indexed on its servers.

So our tool has to tell us that the content that we’re planning to publish — is not there on Google. Who else can tell better if the content has been indexed on Google apart from Google?

It is reasonable to have the idea of using Google Search as a duplicate content checker tool. Try pasting a blog post content into the search field, you should see something like this:



Your search query is limited to 32 words. Each word is also limited to 128 characters.

So, Google is not an efficient way to do plagiarism check, especially with large swathes of content.

Of course, if the content you want to look up is 32 words or less (and of course none of the word crossing 128 characters), there is no better tool than Google to do this task.

There are online tools to get this job done.

Top 6 tools to check copied content


1. Copyscape



Copyscape is a premium tool that will charge you as you use.

The first thing anyone should be asking is: from where the tool gets its results? I expect the result provider to be transparent. Here’s what Copyscape has to say their results:

Copyscape uses Google and Bing as search providers, under agreed terms. Search providers send standard search results to Copyscape, without any post-processing. Copyscape uses complex proprietary algorithms to modify these search results in order to provide a plagiarism checking service. Any charges are for Copyscape’s value-added services, not for the provision of search results by the search providers.

When I asked them what “Search providers send standard search results to Copyscape” actually meant and I was also sceptical if there were any other types of search results apart from “standard search results”, here’s what they told me:

As the whole sentence states ‘Search providers send standard search results to Copyscape, without any post-processing‘, meaning that we receive non-processed results from our search providers, i.e. they are not at all manipulated by them in any way prior to them sending us the results to our submitted queries.

If you ask me, this is a pretty good deal. Copyscape actually scans and tells if your content is already present in Google or Bing’s index. Their approach totally cleared the anxiety I had while validating originality of my content.

With Copyscape’s Premium Search feature, you can search with either text or URL. Paste your content, and the tool lets you know if the content is already published elsewhere. Paste an URL, and the tool throws the results if there’s any similar content on different domains.

WIth the text search option, Copyscape lets you where the copied content is from (URL), tells you exactly how many words (along with the percentage of plagiarism) are copied content and highlights the matching text. Pretty cool set of features if you ask me.

Some of the other Copyscape useful features include Batch Search and Private Index. With batch search, you can check up to 10,000 pages in a single operation with Batch Search, and the Private Index feature allows you to “check for duplication and plagiarism within your own content, even if it is not online or has not been indexed by search engines.”


2. Duplichecker



Duplichecker is a free plagiarism tool and is quite popular. With a limit of 1000 words per search, you can find if your content is copied from elsewhere.

For testing purposes, I took around 400 words of published and indexed (on Google) content, Copyscape’s results showed that content was 100% copied versus Duplichecker’s result that showed the content was only 41%.

If it were me, I wouldn’t place my bets on Duplichecker for reliability.

Duplichecker doesn’t tell you from where it gets the results — except for the statement that it finds a “duplicate copy anywhere on the internet”.


3. Grammarly



Primarily known for its grammar-checking feature, Grammarly also provides a paid plagiarism checker service.

Here’s what they say about their search results:

Our online plagiarism checker compares your text to over 16 billion web pages and academic papers stored in ProQuest’s databases. When part of your text matches something written online or in a database, you’ll get a plagiarism alert.

They don’t tell you if “16 billion web pages” are the ones that are indexed on Google. In my experience, their tool did not throw plagiarism alert every time when I tested with Google-indexed content.


4. Duplicate Content Checker by SEO Review Tools



Duplicate Content Checker by SEO Review Tools is a free and popular plagiarism checking tool that allows you to search with text and URL. In my tests, it worked very well with finding plagiarism in Google-indexed content. Since they tell that the source is just “indexed content” and tool works well, it’s safe to assume the results are from Google index.

Like all plagiarism tools, these guys are fairly clear about the obvious limitation of the tool:

New content needs to be indexed before it can be returned by this tool. If the page/content is less than 2 days old, chances are slim you will get any results.

This means you cannot find out plagiarism until the original content is indexed.


5. Plagiarism Checker by SmallSEOTools.com



Plagiarism Checker by SmallSEOTools.com is a free tool that allows you to text, URLs, and documents (.docx, Dropbox, and Google Drive) with a limit of 1000 words per search.


6. Quetext



Quetext is another fairly popular tool used by a lot of SEO professionals. Quetext is a freemium tool that allows you to search the first 500 words for free. Quetext depends on their own index using their DeepSearch™ which promises to go beyond simple word matching to find out “contextual” plagiarism. Their premium version allows you to search up to 25,000 words along with an option to search using uploading your documents.

No plagiarism checker tool is perfect. You need to take the results with a grain of salt. But to be on the safer side, make sure finding out plagiarism check is on the top of your content publication checklist.

-

Manjunath Chowdary

Digital Marketing Expert, consultant, Mentor and
Director of KandraDigital Marketing
Solutions Pvt Ltd.

-Kandra Digital

An agency that’s been built with the core purpose of delivering the quality digital marketing in the era where Digital marketing services are just business rather than the value for the business, business owners and their resources/time.

Get to us