Many web site owners put a lot of work into their sites by continually writing new content. It is sad when you find that someone has stolen the content that took you so long to research and write. Here is how to deal with content thieves.
Content theft is becoming a much larger problem than it once was. A lot of lazy web site owners are looking for quick ways to build a site using content that they steal from other web sites. This is made easier because so many web sites, especially blogs, use RSS feeds that make is simple for anyone to steal their content almost as fast as they publish it. Autoblogs or blog scrapers, which are blogs that use plugins designed to use content from other web sites, make it very easy for almost anyone with little or no technical skills to steal content from other web sites and publish it on the spamblog site.
How do I determine if someone is stealing my content?
The easiest way is to take a sentence of unique content for one of your web pages, surround the sentence with double quotes, and do a search for that sentence in Google. This performs and exact match search. Google’s results should return your web page (if it has been indexed), along with any other pages that use that exact sentence in their content.
Another method that works well is to use Copyscape, which is a free service that monitors the web for duplicate content. If you have a large web site, you can contract with Copyscape to watch for plagiarism of your web content.
Most of the content thieves that I find reside in Indonesia, Malaysia, India, Russia, Romania and countries that do not have strict copyright laws. Many of these site owners believe that there is nothing wrong with “borrowing content” from other web sites. Some will provide a link back to your site and think that this justifies the use of your content.
What do I do when I find a web site that is stealing my content?
The first thing to do is to see if you can determine who the domain name is registered to. This can sometimes be accomplished by checking the WHOIS database, which is a directory of domain name registrations. The most common TLDs (top level domains), such as .com, .net and .org can be checked using the access to WHOIS database found at most domain registrars. You will find a link to the WHOIS database in the footer on the GoDaddy home page. If the domain name has a private registration, access to the domain owner’s contact information will be blocked, but sometimes there is information that will identify the hosting company.
If you find contact information, the first step is to contact the site owner via e-mail, identify the URLs in their site that contain content stolen from your site, and ask them to remove it. Many site owners will remove the content if you request that they do so.
If the thief does not respond or refuses to remove the content, threaten to report them to the major search engines for violation of the DMCA (Digital Millennium Copyright Act). Major search engines may de-index the violators site, which is a serious threat to anyone who wants to make money with their web site. Each of the major search engines provides a method for reporting violators.
It would be a good idea to wait a few days after notifying the violator before you pursue reporting them to the search engines, but if you get no response or a negative response, the best path is to report the violator. Thus far, I have never had to go further than threatening to report content thieves to the search engines. They do understand what that means.
Another method is to report the content theft to the site owner’s hosting company. If the hosting company is based in the USA (most are) or Europe, they should warn the site owner to remove the stolen content. If they do not remove it, the hosting company typically will close their hosting account, which shuts down their web site.
How can I prevent content theft?
The short answer is, “You can’t.” No matter what you do, a determined thief will steal your content, either through an automated tool or simply by cutting and pasting the text from the your web pages. All that you can do is make it difficult for them to steal your work.
If you want to protect your content, make sure that you post a copyright notice in the footer on every page. This will serve as a warning to content thieves. Many site owners mistakenly believe that any content on a page that does not have a copyright notice is in the public domain and can be freely used.
There are some simple JavaScript routines that prevent someone from using the right mouse button so that they cannot cut-and-paste the content on a web page, but they are easy to circumvent by simply listing the HTML code using the View > Source feature in a browser. The thieves can then copy the section of content from the code.
If you have a blog, our recommendation is to either disable the RSS feed or configure it to include an excerpt of your blog posts, rather than the entire article.
One method that we started using is to try to include a natural link to a related or supporting article on your web site. This is easier to do as the site grows. Make sure that you include a full URL to the page and not just a relative link. That way, if a content thief steals the content to your page and does not remove the link, at least you will have a link back to the original article, which tells the search engines that the content thief’s page is the duplicate. We have found that when content is stolen from one of our blogs, the link shows up as a trackback link to the the site, which we can trace back to the plagiarizer’s web site.
Remember that is is up to you as a site owner to monitor the web for content theft. If you let it get out of control, it can sometimes seriously damage your web site’s rankings.