The End-of-Line Story

    最近在用ubuntu,不时需要在windows和ubuntu之间切换,偶尔发现同样的文档打开效果不一样猜测是换行符的问题。wikipidia 讲述的巨详细,才发现小小的换行符有这么多门道。

    The ASCII standard for text does not define a unique end-of-line (EOL) character. Instead, ASCII defines two independent and orthogonal movements of the print head: Carriage Return (CR) and Line Feed (LF). (IBM's EBCDIC did not make this mistake; it defined a single New Line (NL) character.) 

    Early operating system designers had to adopt some "end-of-line" convention using CR and LF; some used LF, some used CR, and some used a two-octet sequence: LF CR or CR LF. During the early ARPAnet research days (~1970-1972), this end-of-line diversity among operating systems made network communication between diverse host systems difficult. 

    After some discussion (recorded in early RFCs), the researchers adopted a single convention: ASCII text transmitted across the network *must* use the two-character sequence: CR LF. This choice was designed to spread the pain equally among all operating systems of the day; each has to translate to and from the CR LF convention when text was transferred across the network. This EOL convention was the core of the initial Telnet protocol definition (negotiated options were added later). Jon Postel was one of the principal protocol policemen enforcing the CR LF requirement. He carried the EOL = CR LF convention Telnet into FTP and SMTP on the ARPAnet, and later these protocols were taken essentially unchanged into the Internet. 

     Few people today are aware of the EOL issue, because systems generally (but not always!) make it transparent. For example, the RFC Editor stores the official RFC archive on a Unix system whose native EOL is a single LF. When you click on a link for an RFC from the RFC Editor Web page, your browser uses an FTP client to retrieve the ASCII text. The RFC's FTP server translates the LF in each text line into CR LF for transmission across the Internet, and your FTP client in turn translates each CR LF into whatever the EOL convention of your system. Many today use Windows, based on MS-DOS, which came along later and adopted CR LF as its EOL convention. This simplifies the picture; no EOL translation is actually required when MS-DOS systems move text across the Internet. 

     RFC 2223, "Instructions for RFC Authors", describes the format of an RFC; it says that every line of an RFC is to be ended by CR LF. This means, *as transmitted across the Internet*; the text is actually stored at ISI and other Unix sites with LF as the EOL delimiter. It should all work, magically. However, misconfiguration or mismatches can still cause confusion about EOL. For example, you may see an extra ^M (Control M, or CR) at the end of every line of an RFC. Or you may be missing the CR entirely, causing bad formatting on a Windows system. On a Unix system, you may have to run the unix2dos utility to remove spurious ^M characters. 

    Note that if you use binary mode FTP, the file is transferred literally byte-by-byte, so the source host's end of line is sent across the network. This normally works OK because it is assumed that binary mode FTP is used only between like systems. The RFC Editor Web page includes tar'd and zip'd collections of RFCs (www.rfc-editor.org/download.html). These compressed files are binary and therefore contain buried EOL sequences. The tar.Z files use the Unix convention (LF), while the .zip files are assumed to be destined for Windows machines and therefore use the MS-DOS convention.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值