icon Get the most out of Surmunity, read our tips here! Need an interesting blog to read? You've got to read the Surpass Blog! | Welcome! Please register to access all of our features.

» Surpass Web Hosting Forums » Discussions » All Things Techy » Web Standards Design » canonical URI with percent-encoded characters

Web Standards Design Discuss accessibility, CSS, XHTML and more.

Reply
 
LinkBack Thread Tools Search this Thread
Old November 29th, 2007, 4:58 PM   #1 (permalink)
Registered User
Fresh Surpasser
 
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
canonical URI with percent-encoded characters

A bunch of my URIs contain %27 (percent-encoded single quote) but also work with a single quote instead. Is this ancanonical URI issue? Search engines don't differentiate between %27 and single quote, do they?

If so, how can a rewriterule change all single quotes to %27, or vice versa, with a 301 redirect? From what I've read, mod_rewrite sees the request after it's been unencoded to single quotes, so it might just cause an infinite loop.
__________________
Server SH110 (66.7.202.21) - Home of t-rev.net
T-rev is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old November 30th, 2007, 9:30 AM   #2 (permalink)
Senior Member
Super #1
 
FredFredrickson's Avatar
 
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
As far as I know, the search engines ignore punctuation anyhow.
__________________

Surmunity Special-Free Photos To Season Your Site/Blog, Code: fallatsurpass
FredFredrickson is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old November 30th, 2007, 9:33 AM   #3 (permalink)
Senior Member
Super #1
 
FredFredrickson's Avatar
 
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
But adding to that I'd say, you should probably get rid of the need for such characters in your queries. Most "friendly" link schemes utilize a numeric identifier that appears just before the friendly text, making the text completely useless except for to make it more search engine friendly. I'd say just do a replace("%27", "")
__________________

Surmunity Special-Free Photos To Season Your Site/Blog, Code: fallatsurpass
FredFredrickson is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
This user thanks FredFredrickson for this great post!
T-rev (November 30th, 2007)
Old November 30th, 2007, 5:33 PM   #4 (permalink)
Registered User
Fresh Surpasser
 
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
search engines can't ignore punctuation in URIs or the links wouldn't work. you're talking about searches; i really meant to ask about web-crawlers.

my real question, stated better: are search engines/web crawlers going to consider http://www.xxx.yyy/it's_working and http://www.xxx.yyy/it%27s_working to be two separate pages? i would guess not, but i was wondering if anybody knew for sure.

if the answer to that is no, then question number 2 wouldn't matter (though it's still interesting). if the answer is yes, then i ought to do a 301 redirect so the SE/crawler knows they're really the same page.

thus question number 2: how can you do a mod_rewrite replace of ' with %27?

generally you can do replace all using the [N] flag. but mod_rewrite doesn't see the actual request, it sees it after apache has unencoded all the %27s to 's. so if you were trying to replace %27 with ', it would never even see the %27 to know to replace it, whereas if you're replacing ' with %27, it might be an infinite loop (assuming apache unencoded everything everytime it restarted due to the [N] flag, which i have doubts about).

you can generally get around the unencoding problem by directly calling %{THE_REQUEST}, but i really have doubts that this approach would work in combination with the [N] flag, because %{THE_REQUEST} would be the same everytime so it would certainly loop infinitely.

of course, i could just do the redirect from the script instead of with apache, but i prefer to let apache do that kind of thing, since it's built for it.

as far as using a numeric identifier followed by friendly text, yeah, but it might (depending on how smart the web-crawler) still have to be the same friendly text for the crawler to know it's the same page.
__________________
Server SH110 (66.7.202.21) - Home of t-rev.net
T-rev is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old December 4th, 2007, 10:03 AM   #5 (permalink)
Senior Member
Super #1
 
FredFredrickson's Avatar
 
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
Well if you have hyperlinks with %27 in them, search engines should have no trouble with that. As far as text searches go it would ignore the punctuation, but I think you'll be alright with no changes at all.
__________________

Surmunity Special-Free Photos To Season Your Site/Blog, Code: fallatsurpass
FredFredrickson is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old December 4th, 2007, 2:58 PM   #6 (permalink)
Registered User
Fresh Surpasser
 
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
i know. really i'm wondering are search engines/web crawlers going to consider http://www.xxx.yyy/it's_working and http://www.xxx.yyy/it%27s_working to be two separate pages?
__________________
Server SH110 (66.7.202.21) - Home of t-rev.net
T-rev is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Old December 4th, 2007, 4:58 PM   #7 (permalink)
Senior Member
Super #1
 
FredFredrickson's Avatar
 
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
proabably not. And if so, search engines generally will only show 1 or 2 pages with good matches from one site during a search anyway.
__________________

Surmunity Special-Free Photos To Season Your Site/Blog, Code: fallatsurpass
FredFredrickson is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On