Using htaccess to stop Bad Bots from stealing bandwidth and crashing your server
|
|
Few days ago my site was hit by a bunch of really bad bots which crawl my site continuously until it overloads my web server. Now I'm publishing a way to block these so-called bad robots from ruining your website by their crazy crawling method.
Assuming you are using Apache Http server, create .httaccess file and append this line to the newly created file.
CODE:
-
<ifmodule mod_rewrite.c>
-
RewriteEngine On
-
RewriteCond %{HTTP_REFERER} q=Guestbook [NC,OR]
-
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^GornKer [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
-
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
-
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
-
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Irvine [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Java [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^LWP [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^lwp [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^omniexplorer_bot [NC,OR]
-
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
-
RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [OR]
-
#RewriteCond %{HTTP_USER_AGENT} ^puf [NC,OR]
-
#RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^SearchExpress [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Twiceler [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebBandit [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^libwww [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
-
#RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
-
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg
-
RewriteRule .* - [F,L]
-
</ifmodule>
This will prevent a badly written http crawler bot from accessing your website, thus saves you from wasting those precious bandwidth and your server's CPU resources.
Tags: apache, .htaccess, htaccess, security, bandwidth, ddos, attack, crackers
Keep updated with this website! : Subscribe to your email
WP Cumulus Flash tag cloud by Roy Tanck requires Flash Player 9 or better.














October 22nd, 2007 at 6:19 pm
Akan ku amalkan cara ini kiranya blog ku mengalami symptom yg sama.
October 22nd, 2007 at 11:10 pm
I think you’re missing the last line. Maybe something like:
RewriteRule .* - [F,L]October 23rd, 2007 at 1:11 am
forget to include it.. thanks
November 24th, 2007 at 3:24 am
I have just “installed” your suggested *htaccess file at the website I maintain, and have had NO problems (500 server errors).
I attempted to contact you via your “contact form”, as I didn’t want to clog your blog with a bunch of blather about my situation with “bad bots” — but found it confusing and was unable to get the form to take my message.
Please advise.
Regards and thanks