icon Get the most out of Surmunity, read our tips here! Need an interesting blog to read? You've got to read the Surpass Blog! | Welcome! Please register to access all of our features.

» Surpass Web Hosting Forums » Discussions » All Things Techy » Site Maintenance » How do I control the Bots?

Site Maintenance Program updates, securing your website, creating backups.

Reply
 
LinkBack Thread Tools Search this Thread
Old August 21st, 2007, 5:26 PM   #1 (permalink)
Lord of the Rant.
Seasoned Poster
 
turtle2472's Avatar
 
Joined in Mar 2007
Lives in Tidewater, Va.
Hosted on sh121
59 posts
Gave thanks: 15
Thanked 0 times
How do I control the Bots?

There was a thread started in 2004 but I'm such a novice I don't understand much of it. I went to robotstxt.org and was able create a basic block-all-bots
text file but I want to be able to do more.

I saw something about a "sandtrap" thing in the old thread that would ban IPs of bad bots that ignore robots.txt. Thing is I know there is a .htaccess and I have a general understanding of most of the concepts, but I have no idea how to implement them.

I figured it would be cool to ask here since there are many who obviously know this stuff and likely even more that have no clue how. If we can keep the bots down it'll keep all of our servers running better. Maybe this can serve as a very good tutorial. Please?
__________________
sh121 | Main Domain: kellyinternationalinc.com | My Photo Blog
turtle2472 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old August 21st, 2007, 7:35 PM   #2 (permalink)
Race Surpass
Super #1
 
MarkRH's Avatar
 
Joined in Jul 2006
Lives in Oklahoma City, OK
Hosted on sh102
1,222 posts
Gave thanks: 18
Thanked 86 times
Well.. the robots.txt file will generally only control the well behaved search bots like google, yahoo, msn and so on. Well, supposedly.

Then you have what I call comment spam bots that go hunting for guestbooks, forums, blogs, gallerys, or whatever else they might be able add a post or comment to with their ad and/or links to their sites. They do this to get higher page rankings on the search engines.

To protect oneself from these.. most forums, blogs, and gallerys have anti-spam options and plugins. Things like requiring some kind of image verification or answering some mathematical or other question that only a person could.

Some bots look for specific versions of scripts to exploit vulnerabilities. Best thing here is just to keep all scripts up-to-date as they update them with security fixes.

Another thing is to turn off indexing so that neither people or bots can view the contents of any directory. This is done by adding the following to your .htaccess file in /public_html/:

Code:
Options All -Indexes
Beyond that is to deny access all together to an ip address or range by adding them to the .htaccess file:

Code:
# ip ranges of comment spammers
deny from 80.58.205.
deny from 203.162.27.
# end ip ranges of comment spammers

# specific ip of comment spammers
deny from 58.211.230.16
deny from 61.7.156.54
deny from 61.90.248.250
deny from 61.152.145.19
deny from 62.231.243.136
deny from 62.231.243.137
deny from 62.231.243.138
(list goes on...)
# end specific ip of comment spammers
The ones I ban are the ones that try to add comments into my guestbook which fail the image verification. I log all the failed attempts and ban the worst ones that keep trying and trying. The IP addresses are all over the place.. many are probably from infected PCs.

I've removed the ability to add to my guestbook and renamed the pages and scripts, and have actually removed my deny list. I am wanting them to not find my guestbook pages in hopes I'll get off their list.

So, now my error_log just fills up with page not found errors related to my guestbook.
MarkRH is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old August 21st, 2007, 8:43 PM   #3 (permalink)
Lord of the Rant.
Seasoned Poster
 
turtle2472's Avatar
 
Joined in Mar 2007
Lives in Tidewater, Va.
Hosted on sh121
59 posts
Gave thanks: 15
Thanked 0 times
I have noticed that most of my "attacks" came after I created a forum using phpBB. It's a private forum so I require admin approval for registration to keep bots etc from posting crap on it. I do get some errors related to other domains, but the forums.1veryhappyfamily.com gets the most attempts by far.

So in following your guidelines I have installed the no indexes in the root .htaccess, does this carry through to sub-domains? Just in case I added the line to /forums/.htaccess. I have 4 domains that point to my shared host though.

For banning IP addresses and ranges, does that have to be done per site or only the root /public_html/.htaccess?
__________________
sh121 | Main Domain: kellyinternationalinc.com | My Photo Blog
turtle2472 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old August 22nd, 2007, 12:20 AM   #4 (permalink)
Race Surpass
Super #1
 
MarkRH's Avatar
 
Joined in Jul 2006
Lives in Oklahoma City, OK
Hosted on sh102
1,222 posts
Gave thanks: 18
Thanked 86 times
You only have to put the no indexes and deny IP addresses in the root /public_html/.htaccess and all your add-on, sub-domain, and directories will be protected.

You'd put things in your add-on or sub-domain directories that you only want to effect those...
MarkRH is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
This user thanks MarkRH for this great post!
turtle2472 (August 22nd, 2007)
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On