icon Get the most out of Surmunity, read our tips here! Need an interesting blog to read? You've got to read the Surpass Blog! | Welcome! Please register to access all of our features.
Old February 28th, 2006, 1:07 PM   #1 (permalink)
DoA
Registered User
Fresh Surpasser
 
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
no session ids for bots! (help needed)

hi all, I am an webslave (there are no webmasters) of an ecommerce site (www.domestic.lumoslighting.co.uk) (www.lumoslighting.co.uk being the main page)

anyway, the site uses php session ids to keep track of what users add to their cart.

I've heard theres a technique that allows you to detect a users ip address and serve them a different page depending on the ip (this is called cloaking)


so basically, i want to give the search spiders a plain-text optimised page - without the session ids, whilst giving the real users, the normal pages.

I know theres a way to do it in htaccess - but does anyone have any scripts or examples of how to get it working?


thanks in advance!

Adam
DoA is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old February 28th, 2006, 6:15 PM   #2 (permalink)
All Ur Base R Belong 2 Us
Excelling Contributor
 
mr_fern's Avatar
 
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
Well in php, $_SERVER['REMOTE_ADDR'] is the variable for the visitor's IP

You could basically do at the top of a page

$ip = $_SERVER['REMOTE_ADDR'];
if ( $ip not an allowed ip ) {
header('Location: {uri to plain text optimized page}');
exit;
}

how you want to say that the ip is not an allowed ip is up to you

you might also want to make use of the $_SERVER['HTTP_USER_AGENT'] variable, which gives you the user agent string (i.e. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) ) which tells you what kind of browser they're using.

Regular search engine bots use they're own user agents, but mischevious bots created by people generally fake user agents, so that's something to consider.

Also, any of the stuff you can accomplish in htaccess over scripting, I would suggest doing that first. It would be less load on the CPU to be handled by htaccess over a script. So use htaccess to do the filtering and then PHP to catch the rest.
__________________
Nobody doing nothing
mr_fern is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old March 1st, 2006, 8:45 AM   #3 (permalink)
DoA
Registered User
Fresh Surpasser
 
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
thanks a load for the reply, i was going to mention that naughty people can fake their UA's to look like googlebot and so get access to your plain-text pages. But on reading your post again, i noticed you'd already said that (tired last night lol)

I think i'm gonna go for the ip cloaking method, because the ips for googlebot and msn etc. are widely available, and ips are much hader to fake. As recommended by your good self, I'm gonna go for the htaccess solution - Would you happen to know the syntax for the htaccess file for ip cloaking?


again, many thanks!

Adam
DoA is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old March 1st, 2006, 8:47 PM   #4 (permalink)
All Ur Base R Belong 2 Us
Excelling Contributor
 
mr_fern's Avatar
 
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
I'm trying to figure out a way to use RewriteMap with RewriteCond to minimize the solution.

Right now the only way I can think of is with:
RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$ [OR]
RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$ [OR]
RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$
RewriteRule URLPattern RedirectURL [R]

for each IP address

I'll look into a RewriteMap solution for you.

[edit] Setting the rewritemap can only be done in the config file, so I'll check to see if the map can be accessed from htaccess. If it works, you'll also need to submit a ticket to have them set up the rewritemap in the httpd.conf file [/edit]
__________________
Nobody doing nothing
mr_fern is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old March 3rd, 2006, 2:27 AM   #5 (permalink)
All Ur Base R Belong 2 Us
Excelling Contributor
 
mr_fern's Avatar
 
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
Hey question. Are you sending the bots to separate URLs?

I came up with a RewriteMap set up that would use the RewriteRule based on IP addresses. The initial concept I set up redirected based off IPs to a single URL, and then I came up with a modified working concept to redirect each IP to a different URL. I don't know which would be better for you.

Additionally, if you're just delivering different pages without session ids based off a variable (i.e. an 'is_bot' variable of some sort), an additional Rewrite Condition could be added.

Let me know, and I can hook you up with everything you'll need.
__________________
Nobody doing nothing
mr_fern is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old March 19th, 2006, 7:28 PM   #6 (permalink)
DoA
Registered User
Fresh Surpasser
 
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
hi fern, thanks a load for all the help - been really busy lately, but i've decided that it'll be easier to just send the bots to the same pages without the session ids

can't use user-agent method because its too easy to work around

if you could let me know how to get this working i'll give you an extra 5% off the 'lumos lighting' online prices :P


thanks again

DoA
DoA is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On