Monday, October 20, 2008

Valid characters in URL

Be careful with the characters you use in the OBJECT tag attribute. The set of "safe" characters in a URL is severely restricted by the various ways in which they are transported. According to the standard for URL syntax (Request For Comments 1738), only the following characters are allowed unescaped in URLs, aside from letters of the alphabet and digits:

+  -  =  .  _  /  *      (  )  ,  @  '  $  :  ;  &  !  ?

The special characters used in MINSE have been carefully chosen from this set so that you don't have to "escape" them in URLs (using a percent character and a hexadecimal number). After we set aside the parentheses, the comma, and the characters on the left, which are commonly used in expressions, we are left with just six choices for the macro escape character (the ampersand is inconvenient, because it needs to be represented as an entity; but much worse, far too many browsers are broken and will not parse SGML entities in attribute values). The single-quote was chosen for convenience, because it is a non-shifted key on North-American keyboards.

Anything that is not part of the "safe" character set must always be escaped in a URL. In particular, the percent character ("%") and the space must be escaped. Use the following codes:

for:     space     %
use: %20 %25
Sorry about that, but i can't change the standard. There are good reasons for the decisions made in that document.

No comments: