|
|
#1 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
|
no session ids for bots! (help needed)
hi all, I am an webslave (there are no webmasters) of an ecommerce site (www.domestic.lumoslighting.co.uk) (www.lumoslighting.co.uk being the main page)
anyway, the site uses php session ids to keep track of what users add to their cart. I've heard theres a technique that allows you to detect a users ip address and serve them a different page depending on the ip (this is called cloaking) so basically, i want to give the search spiders a plain-text optimised page - without the session ids, whilst giving the real users, the normal pages. I know theres a way to do it in htaccess - but does anyone have any scripts or examples of how to get it working? thanks in advance! Adam |
|
|
|
|
|
#2 (permalink) |
|
All Ur Base R Belong 2 Us
Excelling Contributor
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
|
Well in php, $_SERVER['REMOTE_ADDR'] is the variable for the visitor's IP
You could basically do at the top of a page $ip = $_SERVER['REMOTE_ADDR']; if ( $ip not an allowed ip ) { header('Location: {uri to plain text optimized page}'); exit; } how you want to say that the ip is not an allowed ip is up to you you might also want to make use of the $_SERVER['HTTP_USER_AGENT'] variable, which gives you the user agent string (i.e. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) ) which tells you what kind of browser they're using. Regular search engine bots use they're own user agents, but mischevious bots created by people generally fake user agents, so that's something to consider. Also, any of the stuff you can accomplish in htaccess over scripting, I would suggest doing that first. It would be less load on the CPU to be handled by htaccess over a script. So use htaccess to do the filtering and then PHP to catch the rest.
__________________
Nobody doing nothing |
|
|
|
|
|
#3 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
|
thanks a load for the reply, i was going to mention that naughty people can fake their UA's to look like googlebot and so get access to your plain-text pages. But on reading your post again, i noticed you'd already said that (tired last night lol)
I think i'm gonna go for the ip cloaking method, because the ips for googlebot and msn etc. are widely available, and ips are much hader to fake. As recommended by your good self, I'm gonna go for the htaccess solution - Would you happen to know the syntax for the htaccess file for ip cloaking? again, many thanks! Adam |
|
|
|
|
|
#4 (permalink) |
|
All Ur Base R Belong 2 Us
Excelling Contributor
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
|
I'm trying to figure out a way to use RewriteMap with RewriteCond to minimize the solution.
Right now the only way I can think of is with: RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$ [OR] RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$ [OR] RewriteCond %{REMOTE_ADDR} ^255\.255\.255\.255$ RewriteRule URLPattern RedirectURL [R] for each IP address I'll look into a RewriteMap solution for you. [edit] Setting the rewritemap can only be done in the config file, so I'll check to see if the map can be accessed from htaccess. If it works, you'll also need to submit a ticket to have them set up the rewritemap in the httpd.conf file [/edit]
__________________
Nobody doing nothing |
|
|
|
|
|
#5 (permalink) |
|
All Ur Base R Belong 2 Us
Excelling Contributor
Joined in Feb 2005
Lives in Vegas & New York
824 posts
Gave thanks: 2
Thanked 6 times
|
Hey question. Are you sending the bots to separate URLs?
I came up with a RewriteMap set up that would use the RewriteRule based on IP addresses. The initial concept I set up redirected based off IPs to a single URL, and then I came up with a modified working concept to redirect each IP to a different URL. I don't know which would be better for you. Additionally, if you're just delivering different pages without session ids based off a variable (i.e. an 'is_bot' variable of some sort), an additional Rewrite Condition could be added. Let me know, and I can hook you up with everything you'll need.
__________________
Nobody doing nothing |
|
|
|
|
|
#6 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Jan 2006
7 posts
Gave thanks: 0
Thanked 0 times
|
hi fern, thanks a load for all the help - been really busy lately, but i've decided that it'll be easier to just send the bots to the same pages without the session ids
can't use user-agent method because its too easy to work around if you could let me know how to get this working i'll give you an extra 5% off the 'lumos lighting' online prices :P thanks again DoA |
|
|
|