Thursday, October 16, 2008

what characters are valid in url?

The specification for URLs, RFC1738, limits the use of allowed characters to only a limited subset of the US-ASCII character set (2.2 URL Character Encoding Issues):

"The lower case letters "a"--"z", digits, and the characters plus ("+"), period("."), and hyphen ("-") are allowed.... In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from"0123456789ABCDEF") which forming (sic) the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)"

To insert, for example, the French accented à, you would use %E0 instead of the letter.

No comments: