为什么学者们的个人主页地址都有波浪号(~)

转自 http://www.cs.tut.fi/~jkorpela/tilde.html

 

 

Why tilde (~) should not be used in Web addresses (URLs)

If used in Web addresses (URLs), the tilde character (~) should be encoded (as %7e or%7E). Although in most cases things work if you violate this, there is no reason to do so, since well-defined, universally working alternatives exist. This document, in addition to describing the issue in principle, also discusses the different practical problems that may arise when tilde is used in Web addresses.

What the specifications say

In the long-standing RFC on URL format, RFC 1738, there was an explicit requirement that any occurrence of the tilde (~) character in a Web page address (URL, a.k.a. URI) shall be encoded as %7eor, equivalently, as %7E. (For example, http://www.hut.fi/~jkorpela/ was thus incorrect, whilehttp://www.hut.fi/%7ejkorpela/ was and is syntactically correct.)

In a new RFC, namely RFC 2396, some requirements have relaxed. In particular, tilde and some other characters have now been declared as "safe", thereby not requiring encoding.

However, the encoded notation is still a valid alternative and works more reliably. It's not so much of a matter of old networking software; the tilde character causes problems to other software which is used to process documents - and to human readers.

For a short summary of URL format, including the encoding mechanism, see section URLs in my Learning HTML 3.2 by Examples.

The reasons

RFC 1738 explains (in clause 2.2) the reasons for the encoding requirement very briefly. It mentions tilde among those characters which are classified as "unsafe", because "gateways and other transport agents are known to sometimes modify such characters". Some people argue that such problems no more exist in practice. And it is true that probably the great majority of programs directly related to Web browsing (such as browsers and servers) can handle tilde.

However, tilde is still problematic When did you last see a correctly cited URL in your local newspaper? It's almost hopeless when journalists write them by hand. In my experience, they get tildes wrong more than half of the time. To describe the problems more systematically, here is a list:

  • Tilde is not widely known outside the circle of computer professionals, or it is known in a meaning completely different from its most uses in the area of computing. In natural languages, tilde appears as a diacritic mark (e.g. in the Spanish letter ñ or the Portuguese letter ã). Therefore, when seeing tilde people may get confused, especially since presentation varies from one font to another.
  • Partly due to the preceding problem, tilde often gets printed wrong e.g. in newspaper articles. People may not recognize the character or they may misread it if they copy a URL by hand, or the editor or other software they use may process tilde the wrong way. (For example, in the TeX typesetting system, tilde is a special character with various functions, and to produce real tilde on output one must use a notation like /~{}.) Rather often one sees tilde printed as a diacritic instead of the correct presentation as a separate character; for example, a URL which should contain ~o (tilde followed by letter o) might appear as õ (letter o with tilde).
  • The appearance of tilde may vary. I have seen printed publications where tilde appears as a straight, or almost straight, line, causing confusion with the macro sign character
  • On keyboards, producing the tilde character often requires extra tricks, such as first using a key with tilde on it with Alt Gr key and then pressing the space bar, to take an example. Somewhat uncomfortable to experts, and potentially quite problematic to less experienced. Even an expert might find himself confused when trying to produce tilde on a keyboard new to him.
  • The keyboard problems are partly caused by the fact that tilde is among the Ascii characters which w hich can be, and often have been, replaced by national letters in national variants of Ascii. The code position which is occupied by tilde in international Ascii has u umlaut (ü) in several Nordic variants of Ascii, German sharp s (ß) in German Ascii, i with grave (ì) in Italian Ascii, etc. In addition to keyboard problems, this may cause incorrect presentations on screen or paper (especially when old devices and software are used).

Is the %7e solution really good?

Of course, the notation %7e is mystical to most people. Since it looks cryptic, it can easily be misread, misremembered, or mistyped. In a Usenet articleWarren Steel first gives some examples of how unescaped ~ is misunderstood, then explains why %7e might cause problems too:

In my site logs I have noticed an increase in errors due to the mistyping of the tilde: /-mudws /_mudws /=mudws etc. ...

... The combination /%7Emudws also proves troublesome to many--the % is often misread as a & or other symbol, and the introduction of mixed cases to the case-sensitive path segment adds another danger, and /%7EMUDWS is clearly wrong ( /%7emudws is theoretically correct). The one time I gave the "escaped" URL to a newspaper, it was garbled as badly as the tilde version.

As regards to experiences with newspapers, I once sent an article to the leading Finnish newspaper and mentioned the URL http://www.hut.fi/%7ejkorpela/tekoik.html and they printed it ashttp://www.hut.fi@jkorpela/tekoik.html (unbelievable, but true!).

Thus, although using %7e is to be preferred over incorrectly using plain ~ in URLs, it is by no means an optimal solution. But we have to ask what causes the whole problem in the first place.

The real problem: tildes in home page URLs

The need for using tildes in URLs is caused - almost exclusively - by a strange practice of using URLs of the form 
http://server/~username/filename 
(e.g. http://www.hut.fi/~jkorpela/tilde.html)

This is a strange Unixism in the World Wide Web, imitating the Unix practice of referring to the home directory of a user by notations like ~ (the user's own home directory) and ~username (the home directory of user username). More exactly, this is a convention applied in many (but not all) Unix shells, or command interpreters; it does not work universally even in the Unix universe.

There is hardly any explainable reason why such a convention was ever adopted. There is definitely nothing intuitive about it. How could you guess that ~ stands for 'home directory of'? Thus, people with no Unix background most probably have difficulties in realizing what the funny symbol ~ stands for.

Further confusion is caused by the fact that notation ~username does not even have the same meaning in URLs as in (some) Unix shells. Typically, it really refers to a subdirectory of the user's home directory. People have really got confused with this. For example, consider the URL of this document when written in the notation with an unencoded tilde in it: http://www.cs.tut.fi/~jkorpela/tilde.html. People who have direct access to the file system in which the file resides, can not use the file name~jkorpela/tilde.html if they wish to refer to it locally and not via the Web; they need to write~jkorpela/public_html/tilde.html in their Unix commands.

It's really a matter of configuring Web servers properly. People who are responsible for such things should make them map URLs into file names in a manner which makes tildes in URLs unnecessary. Typically, references to people's pages should be something like 
http://server/u/username/filename 
Webmasters may wish to configure the server recognize formats with something more explanatory than uthere (say, users or home), either as the only option or as an additional option. Notice however that having several options there may cause problems, since people and programs may not realize that they are synonymous. Personally, I think u is just fine: it's short, easy to remember, and whatever you think about is mnemonicality, it's definitely better than either ~ or %7e. (On small servers, one might even consider a mapping scheme where the personal page URLs are of the form http://server/username/filename but on large servers that might cause too much maintenance trouble.)

Summary

To conclude, I strongly recommend

  • escaping tildes in URLs
  • asking your webmaster to support referring to personal Web pages with notations which do not require the tilde character in any form (naturally as an alternative to the tilde form if it is already in use at the host)
  • setting things up so that tilde is not needed at all in URLs, when installing a Web server.

Date of last update: 1999-08-27. Technical corrections 2004-12-12.

This document is largely based on a discussion with subject should ~ (tilde) be escaped as %7E? in 1997 in the c.i.w.a.h. newsgroup.

Jukka Korpela

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值