icon Get the most out of Surmunity, read our tips here! Need an interesting blog to read? You've got to read the Surpass Blog! | Welcome! Please register to access all of our features.

» Surpass Web Hosting Forums » Discussions » Shared Hosting » Robots.txt

Shared Hosting Questions about your shared hosting account.

Reply
 
LinkBack Thread Tools Search this Thread
Old June 26th, 2004, 10:29 PM   #10 (permalink)
Registered User
Seasoned Poster
 
JustinSane's Avatar
 
Joined in Oct 2003
82 posts
Gave thanks: 0
Thanked 0 times
User-agent: MSNBOT

You may want to disallow MSNBOT as it is a very bad bot eating up all sort of bandwidth. Ate up 10 GB of Bandwith in two days on my site until I banned it in .htaccess and wrote them a nasty email! It ignores robots.txt but to put it in there anywho, just place this in it:

Quote:
User-agent: MSNBOT
Disallow: /

You also may want to 'disallow' bot indexing access of the following
User-agent: *
Disallow: /forms
Disallow: /logs
Disallow: /images/
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /themes/
Disallow: /blocks/
Disallow: /modules/
Disallow: /language/

86 most bad bots and email snatchers in .htaccess with the following:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /
# User-Agents with no privileges (mostly spambots/spybots/offline downloaders that ignore robots.txt)
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]|[3-4][0-9]|5[0-5])$" [OR] # Cyveillance spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]|2[0-4][0-9]|25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [OR] # Turnitin spybot
RewriteCond %{HTTP_REFERER} iaea\.org [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} anarchie [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} Atomz [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "compatible ; MSIE 6.0" [OR] # spambot (note extra space before semicolon)
RewriteCond %{HTTP_USER_AGENT} crescent [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^DA \d\.\d+" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "DTS Agent" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^Download" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} EasyDL/\d\.\d+ [OR] # OD
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector|magnet|reaper|siphon|sweeper|ha rvest|collect|wolf) [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} express [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} extractor [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Fetch API Request" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} FlickBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} FrontPage [OR] # stupid user trying to edit my site
RewriteCond %{HTTP_USER_AGENT} getright [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} go.?zilla [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} "efpgmx\.net" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} grabber [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} imagefetch [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Indy Library" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "^Internet Explore" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^IE\ \d\.\d\ Compatible.*Browser$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "LINKS ARoMATIZED" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "mister pix" [NC,OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/4.0$" [OR] # dumb bot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\?\?$" [OR] # formmail attacker
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [OR] # IE’s "make available offline" mode
RewriteCond %{HTTP_USER_AGENT} ^NG [OR] # unknown bot
RewriteCond %{HTTP_USER_AGENT} offline [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} net.?(ants|mechanic|spider|vampire|zip) [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} nicerspro [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ninja [NC,OR] # Download Ninja OD
RewriteCond %{HTTP_USER_AGENT} NPBot [OR] # NameProtect spybot
RewriteCond %{HTTP_USER_AGENT} PersonaPilot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} snagger [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} Sqworm [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} SurveyBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} tele(port|soft) [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} TurnitinBot [OR] # Turnitin spybot
RewriteCond %{HTTP_USER_AGENT} "crawl"
RewriteCond %{HTTP_USER_AGENT} web.?(auto|bandit|collector|copier|devil|downloade r|fetch|hook|mole|miner|mirror|reaper|sauger|sucke r|site|snake|stripper|weasel|zip) [NC,OR] # ODs
RewriteCond %{HTTP_USER_AGENT} vayala [OR] # dumb bot, doesn’t know how to follow links, generates lots of 404s
RewriteCond %{HTTP_USER_AGENT} zeus [NC]
RewriteRule .* - [F,L]
__________________
Justin Sane Lost Vegas but not LOST in Vegas...

Last edited by JustinSane; June 26th, 2004 at 10:32 PM.. Reason: addition
JustinSane is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 5:53 AM   #11 (permalink)
Surpass Fan
Seasoned Poster
 
Joined in Jun 2004
49 posts
Gave thanks: 0
Thanked 0 times
Thanks for that great list of bad bots
SergioP is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 3:14 PM   #12 (permalink)
Registered User
Seasoned Poster
 
JustinSane's Avatar
 
Joined in Oct 2003
82 posts
Gave thanks: 0
Thanked 0 times
Lightbulb No problem but also get sentinel if php nuke!

Quote:
Originally Posted by SergioP
Thanks for that great list of bad bots
No prob, for we've all got to protect outselves from the huge number of abusers on the net.. I also highly recommend getting Sentinel for php nuke if you use it! I was 'hacked' and even though knowing php nuke not very secure, i'd neglected any security measures. Someone installed themselves as a 'god account' I had to delete by emptying out the nuke_authors database which just starts you over on php nuke 'fresh' to registers a superuser account. Then I installed sentinel and it blocked 3 abusers the first nite! All from overseas out of the US.

The jerk that 'hacked' into php nuke using one of the many hack programs available to break into php nuke could of done more damage, but wanted to remain surrendipitous with me not checking 'edit admin' to even notice they'd broken in.. They installed a few 'javascripts' I've managed to eliminate but still perhaps one or two I haven't found - page loading is still goofed up.. This ahole calling himself 'slash' even left his real email address and site where anyone can ''check out'' a credit card number to see if it is 'good.' Located here at http://slashroses.org/ he's on yahoo servers so I'll turn him in to them and Russian authorities (I think he's from there or maybe Italy) when the time.. He's brazen about it...

Anywho, get sentinel here Nuke Scripts Network

Below is the list of email harvesters alone that it blocks besides blocks attempted break in attempts into admin:

alexibot
asterias
backdoorbot
black.hole
blackwidow
blowfish
bot mailto:craftbotyahoo.com
botalot
builtbottough
bullseye
bunnyslippers
cegbfeieh
cheesebot
cherrypicker
chinaclaw
copyrightcheck
cosmos
crescent
custo
disco
dittospyder
download demon
ecatch
eirgrabber
emailcollector
emailsiphon
emailwolf
erocrawler
eseek-larbin
express webpictures
extractorpro
eyenetie
fast
flashget
foobot
frontpage
fscrawler
getright
getweb
go!zilla
go-ahead-got-it
grabnet
grafula
gsa-crawler
harvest
hloader
hmview
httplib
httrack
humanlinks
ia_archiver
image stripper
image sucker
indy library
infonavirobot
interget
internet ninja
jennybot
jetcar
joc web spider
kenjin.spider
keyword.density
larbin
leechftp
lexibot
libweb/clshttp
linkextractorpro
linkscan/8.1a.unix
linkwalker
lwp-trivial
mass downloader
mata.hari
microsoft.url
midown tool
miixpc
mister pix
moget
mozilla/3.mozilla/2.01
mozilla.*newt
navroad
nearsite
netants
netmechanic
netspider
net vampire
netzip
nicerspro
npbot
octopus
offline explorer
offline navigator
openfind
pagegrabber
papa foto
pavuk
pcbrowser
propowerbot/2.14
prowebwalker
queryn.metasearch
realdownload
reget
repomonkey
sitesnagger
slysearch
smartdownload
spankbot
spanner
spiderzilla
steeler
superbot
superhttp
surfbot
suzuran
szukacz
takeout
teleport pro
telesoft
turnitinbot
the.intraformant
thenomad
tighttwatbot
titan
tocrawl/urldispatcher
true_robot
turingos
urly.warning
vci
voideye
web image collector
web sucker
webauto
webbandit
webcopier
webemailextrac.*
webenhancer
webfetch
webgo is
web.image.collector
webleacher
webmasterworldforumbot
webreaper
websauger
website extractor
website quester
webster.pro
webstripper
webwhacker
webzip
wget
widow
webbandit
wwwoffle
www-collector-e
xaldon webspider
xenu link sleuth
zeus
__________________
Justin Sane Lost Vegas but not LOST in Vegas...
JustinSane is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 4:07 PM   #13 (permalink)
Registered User
Seasoned Poster
 
Joined in May 2004
Lives in Cymru
Hosted on Spiffy
76 posts
Gave thanks: 2
Thanked 0 times
I just downloaded Sentinel from that link, and it had a Trojan.Offiz virus in it, so anyone else thinking of doing so, make sure your protections are up-to-date...
__________________
Server: Dedicated
Domain: cwmnicymraeg.com
aran is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 5:15 PM   #14 (permalink)
Registered User
Seasoned Poster
 
JustinSane's Avatar
 
Joined in Oct 2003
82 posts
Gave thanks: 0
Thanked 0 times
I just updated avs antivirus from grisoft and ran it last nite after having already downloaded Sentinel tar.gz and it said responded with no virus found... That grisoft avs program has found viruses that norton antivirus didn't find... So I'll see if I can find a Trojan.Offiz removal tool to see if it did install something. I really doubt those folks at Nuke Scripts would do something like that but who knows....

Just scanned NSN_Sentinel_120.tar.gz downloaded from nuke scripts with avs and nada, nothing "No Virus or suspicious files were detected"
__________________
Justin Sane Lost Vegas but not LOST in Vegas...
JustinSane is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 5:57 PM   #15 (permalink)
minor deity
Super #1
 
Bigjohn's Avatar
 
Joined in Apr 2004
Lives in Georgia
Hosted on XEON
7,386 posts
Gave thanks: 27
Thanked 94 times
justin -

I cut and pasted that stuff into my .htaccess file... but it seems to break things... badly. Can't even open my postnuke site... Gets a 500 error.


Since you seem to be a wiz with that stuff.... how do I program .htaccess to do 'search engine friendly' url's with postnuke? Or with a PHP driven webpage (i.e. one that calls "?page=xxxxx" stuff...)
__________________
Proud to be a Surmunity Mod!
XEON PASS60 PASS61
Make a fundamental difference!
My Sites:
Curious about Brewing Beer? Join the community!
>>>>> Some Change is GOOD! Keep your paycheck! Support the Fair Tax
Get into an Art museum
Victorian London
It's your brain -ON WEB - mybrainhost.com (under development)
What SHOULD Government do? Much Less than it Does!
Bigjohn is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 6:35 PM   #16 (permalink)
Surpass Fan
Comfy Contributor
 
kall's Avatar
 
Joined in May 2004
Lives in Auckland, New Zealand
Hosted on Sync
166 posts
Gave thanks: 0
Thanked 0 times
I got the 500 errors too...bad flag delimiter errors.

Managed to get one list from vbulletin.com working tho.
__________________
From NZ? Want Free Messageboards and Games? - NZB

From NZ? Want Kiwi Web Hosting and Design? - Sparkle Hosting

Sync sync sync...sync sync sync..sync your bootie!
kall is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 29th, 2004, 11:47 PM   #17 (permalink)
Registered User
Seasoned Poster
 
JustinSane's Avatar
 
Joined in Oct 2003
82 posts
Gave thanks: 0
Thanked 0 times
Lightbulb

Copy/pasted from -
Options +FollowSymlinks
down to...
RewriteRule .* - [F,L] ????

I found it somewhere and it seemed to work, although I also tried another one that didn't, so got me...

Click here for a Comprehensive tutorial and guide to .htaccess/ Intro

One of the techs gave me the following easy to understand addy on .htacces - ryan, ben, luke, darin - one of 'em but not relevant.. Webpimps .htaccess - Code Tricks & Tips

Here's the webmasterworld.com forum found by clicking here on .htaccess, mod_rewrite, and other Apache specific topics.

And here's something really easy you can use! A .htaccess code generator to disable hotlinking, blocking users, custom error pages and password protect found by clicking here on Htaccess Disable Hotlinking Code Generator

Also easy to understand tutorials and a slew of other slick generators..

Also an in depth pop up, under, around and thru generator here on Popup Window Maker

That should help....
__________________
Justin Sane Lost Vegas but not LOST in Vegas...
JustinSane is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old June 30th, 2004, 5:14 AM   #18 (permalink)
Registered User
Seasoned Poster
 
Joined in May 2004
Lives in Cymru
Hosted on Spiffy
76 posts
Gave thanks: 2
Thanked 0 times
Quote:
Originally Posted by JustinSane
I really doubt those folks at Nuke Scripts would do something like that but who knows....

Fair call - I wasn't suggesting that there was anything wrong with your advice, or with the site, just sharing my experience. Could be that my request got hijacked, maybe.
__________________
Server: Dedicated
Domain: cwmnicymraeg.com
aran is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On