Protecting your content
Scrapers are the worst type of visitors to your site. Why? They “scrape” everything, your posts, your images, even links which are likely at the bottom, and post it as their own. Here’s a prime example of a scraper. See the title on the page? It was sraped from my post over here. Now it’s not like I have a Million Dollars worth in my content but see, this is why this guy is in the feed.
So how do you prevent things like this from happening?
Let’s break it down to 2 parts:
- Hotlinking
- RSS Scraping
Hotlinking is not only stealing content, it’s stealing bandwidth as well. It’s bad enough if your site has to deal with the Digg-effect on a regular basis but couple that with images on your server getting ripped then you’re on a one way ticket to exceeding your monthly bandwidth limit. This could mean as little as a few bucks added on your tab or getting your account suspended indefinitely.
To prevent the horrors of hotlinking, here’s what you can do:
Prevent hotlinks – Use .htaccess
If you’re afraid of using your .htaccess file because you might mess something up, don’t be. It’s there to protect yo, not limit you. Here’s one of the better tutorials on understanding how to use .htaccess to block hotlinking.
If you can’t be bothered to read the tutorial, basically what you need are:
- An anti-hotlink image (like the one on the right, click for the full version)
- Access to your server via ftp (you need to upload files)
- The .htaccess file from your server.
Basically, what you need to do is copy the following code to your existing .htaccess file. Note that the sites mentioned below are those blocked from using your files.
Simply edit, copy and paste the code below then upload the .htaccess file as well as the anti-hotlink image.
RewriteEngine on
# Attempt to stop hot linking from these specific sites
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^(.+\.)?multiply.com/ [OR]
RewriteCond %{HTTP_REFERER} ^(.+\.)?blogspot.com/
RewriteCond %{REQUEST_URI} !/hotlink.gif
RewriteRule .*\.(gif|jpg|png|avi)$ http://atmaxplorer.com/hotlink.gif
Note that you can do this automatically if you have CPanel on your server, just look for the Hotlink Protection Icon and input the proper options.
What will the code do?
It will replace every image scraped from your site with the anti-hotlink image. Now if you’re thinking that it’s a too harsh to use, think of the bandwidth you’ll be saving.
Pages: 1 2
Tags: internet, License, Security
Grabe, thanks for the link.. I just found another scrapper here:
http://googleamd.blogspot.com/2008/01/protecting-your-content-sendmerss.html
Wow. That makes 3! At least this time he’s providing a backlink to me. ^^
Hmm… he’s bypassping my .htaccess filter.. I better update. Thanks for the heads up.
Very informative post….. it’s gonna be a big help for everyone.
Yes it is Arnel
Dexter’s last blog post..The Unusual string “?wpcf7=json” Will It Hurt My SEO?