| Web Standards Design Discuss accessibility, CSS, XHTML and more. |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread |
|
|
#1 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
|
canonical URI with percent-encoded characters
A bunch of my URIs contain %27 (percent-encoded single quote) but also work with a single quote instead. Is this ancanonical URI issue? Search engines don't differentiate between %27 and single quote, do they?
If so, how can a rewriterule change all single quotes to %27, or vice versa, with a 301 redirect? From what I've read, mod_rewrite sees the request after it's been unencoded to single quotes, so it might just cause an infinite loop. |
|
|
|
|
|
#3 (permalink) |
|
Senior Member
Super #1
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
|
But adding to that I'd say, you should probably get rid of the need for such characters in your queries. Most "friendly" link schemes utilize a numeric identifier that appears just before the friendly text, making the text completely useless except for to make it more search engine friendly. I'd say just do a replace("%27", "")
|
|
|
|
| This user thanks FredFredrickson for this great post! | T-rev (November 30th, 2007) |
|
|
#4 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
|
search engines can't ignore punctuation in URIs or the links wouldn't work. you're talking about searches; i really meant to ask about web-crawlers.
my real question, stated better: are search engines/web crawlers going to consider http://www.xxx.yyy/it's_working and http://www.xxx.yyy/it%27s_working to be two separate pages? i would guess not, but i was wondering if anybody knew for sure. if the answer to that is no, then question number 2 wouldn't matter (though it's still interesting). if the answer is yes, then i ought to do a 301 redirect so the SE/crawler knows they're really the same page. thus question number 2: how can you do a mod_rewrite replace of ' with %27? generally you can do replace all using the [N] flag. but mod_rewrite doesn't see the actual request, it sees it after apache has unencoded all the %27s to 's. so if you were trying to replace %27 with ', it would never even see the %27 to know to replace it, whereas if you're replacing ' with %27, it might be an infinite loop (assuming apache unencoded everything everytime it restarted due to the [N] flag, which i have doubts about). you can generally get around the unencoding problem by directly calling %{THE_REQUEST}, but i really have doubts that this approach would work in combination with the [N] flag, because %{THE_REQUEST} would be the same everytime so it would certainly loop infinitely. of course, i could just do the redirect from the script instead of with apache, but i prefer to let apache do that kind of thing, since it's built for it. as far as using a numeric identifier followed by friendly text, yeah, but it might (depending on how smart the web-crawler) still have to be the same friendly text for the crawler to know it's the same page. |
|
|
|
|
|
#5 (permalink) |
|
Senior Member
Super #1
Joined in Nov 2003
Lives in New Hampshire
1,228 posts
Gave thanks: 3
Thanked 25 times
|
Well if you have hyperlinks with %27 in them, search engines should have no trouble with that. As far as text searches go it would ignore the punctuation, but I think you'll be alright with no changes at all.
|
|
|
|
|
|
#6 (permalink) |
|
Registered User
Fresh Surpasser
Joined in Mar 2007
12 posts
Gave thanks: 3
Thanked 0 times
|
i know. really i'm wondering are search engines/web crawlers going to consider http://www.xxx.yyy/it's_working and http://www.xxx.yyy/it%27s_working to be two separate pages?
|
|
|
|