字节端顺序

最新推荐文章于 2024-09-22 13:19:22 发布

xbt746

最新推荐文章于 2024-09-22 13:19:22 发布

阅读量1.1k

点赞数

分类专栏： c++ 文章标签： network macros integer 网络 protocols byte

c++ 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

introduction to Endianness
字节端介绍

by Christopher Brown and Michael Barr

Which is the most convenient end on your system? The choices are big endian and little endian.
哪一端是您系统上最方便的一端? 选择有大端 (big endian) 和小端 (little endian)。

Some human languages are read and written from left to right; others from right to left. A similar issue arises in the field of computers, involving the representation of numbers.
某些人类语言从左至右读写，而其他的则是从右至左。涉及数字表示的时候，同样的问题也在计算机的领域里出现。

Endianness is the attribute of a system that indicates whether integers are represented from left to right or right to left. Why, in today's world of virtual machines and gigahertz processors, would a programmer care about such a silly topic? Well, unfortunately, endianness must be chosen every time a hardware or software architecture is designed, and there isn't much in the way of natural law to help decide. So implementations vary.
端是用来表明从左至右或者从右至左表示整数的系统属性。为什么？在今天的虚拟机和G级处理器世界，难道一个程序员还需要注意这样一个愚蠢的问题吗？唔，不幸地，在设计硬件或者软件时，在端这个问题上是必须做出选择的。并且没有按自然的规则来，以助于决策。因此，随着端不同，实现也不同。

Endianness comes in two varieties: big and little. A big-endian representation has a multibyte integer written with its most significant byte on the left; a number represented thus is easily read by English-speaking humans. A little-endian representation, on the other hand, places the most significant byte on the right. Of course, computer architectures don't have an intrinsic "left" or "right" about them. These human terms are borrowed from our written forms of human communication. The following definitions are more precise:
端分为两种：大的和小的。大端表示中，一个多字节整数的最高数位的字节写在左边。如此表示的数字，很容易为说英语的人们所阅读。另一方面，小端表示，将其最高数字的字节置于右边。当然，计算体系结构没有一个有关端表示的天然的 “左” 或者 ” 右” ，这些词汇都是从我们人类沟通中的手写形式中借用而来。下面的定义更为精确：

* Big endian means that the most significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.
大端字节意味着任意多字节数据字段的高位字节存储于低的内存地址中。低内存地址也是大字段的地址。
* Little endian means that the least significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.
小端字节意味着任意多字节数据字段的低位字节存储于低的内存地址中，低内存地址也是大字段的地址。

All processors must be designated as either big endian or little endian. Intel's 80x86 processors and their clones are little endian. Sun's SPARC, Motorola's 68K, and the PowerPC families are all big endian. The Java Virtual Machine is big endian as well. Some processors even have a bit in a register that allows the programmer to select the desired endianness.
任何处理器都必须设计成要么是大端字节序，或者要么小端字节序。Intel 的 80×86 处理及其克隆器是小端字节序。 Sun 的 SPARC, Motorola 的 68K，以及 Power PC 家族等都是大端字节序。Java 虚拟机也是大端字节序。某些处理器，甚至在寄存器中有一个位，可以让程序来选择想要的端。

An endianness difference can cause problems if a computer unknowingly tries to read binary data written in the opposite format from a shared memory location or file. Take a look at an 80x86 memory dump with a 16- or 32-bit integer stored inside, such as that shown in Figure 1. An 80x86 processor stores data in memory with its least significant byte first. However, your mind tends to expect the data to read from the most significant byte to the least.
如果计算机在不知情的情况下,试图从以另一种相对的格式写的文件或者共享内存中读取数据时, 字节端的不同会导致问题，请看一下图1, 该图是保存了一个16位或者32位整数的 80×86内存导出. 在 80×86 系列处理器中,内存中数据,把其最低位置于左边.尽管如此,您的思维可能希望应该至少从高位开始阅读.

Figure 1. A little-endian memory dump
图1. 一个小端字节的内存导出

End to end
端到端的转换

Network stacks and communication protocols must also define their endianness. Otherwise, two nodes of different endianness would be unable to communicate. This is a more substantial example of endianness affecting the embedded programmer. As it turns out, all of the protocol layers in the TCP/IP suite are defined to be big endian. In other words, any 16- or 32-bit value within the various layer headers (for example, an IP address, a packet length, or a checksum) must be sent and received with its most significant byte first.
网络栈和通信协议也必须定义自己的字节端顺序.否则,不同字节端顺序的两个网络节点可能不能通信.这对于嵌入式处理开发程序员而言,是一个更为实质的例子. 最终,在TCP/IP 协议簇中的所有协议层都定义为大端字节序.也就是说,在各种层头部的 16 位或者 32位的值(例如有,IP 地址,包长,校验和), 都必须将其高位字节首先发送与接收.

Let's say you wish to establish a TCP socket connection to a computer whose IP address is 192.0.1.2. IPv4 uses a unique 32-bit integer to identify each network host. The dotted decimal IP address must be translated into such an integer.
让我们假设,您想建立一个到某台计算机的 TCP Socket 连接, 该 IP 地址为 192.0.1.2. IPv4 使用一个唯一的 32 位整数标识每一个网络主机.点号数字IP地址必须转换成一个整数。

The multibyte integer representation used by the TCP/IP protocols is sometimes called network byte order. Even if the computers at each end are little endian, multibyte integers passed between them must be converted to network byte order prior to transmission across the network, and converted back to little endian at the receiving end.
TCP/IP 协议中的多字节整数表示法称之为网络字节序。即使在网络两端的计算机都是小端字节序。在它们之间传输多字节整数，在通过网络传输之前，整数必须转换成网络字节序，并且在接收端转换回小端字节序。

Suppose an 80x86-based PC is to talk to a SPARC-based server over the Internet. Without further manipulation, the 80x86 processor would convert 192.0.1.2 to the little endian integer 0x020100C0 and transmit the bytes in the order 02 01 00 C0. The SPARC would receive the bytes in the order 02 01 00 C0, reconstruct the bytes into a big endian integer 0x020100c0, and misinterpret the address as 2.1.0.192.
假设一基于 80×86 的PC与 Internet 上基于 SPARC 的服务器通信。没有进一步的操作，80×86 处理器将 192.0.1.2转换成小端字节序的整数：0×020100C0 并且以 02 01 00 C0 的顺序传输。SPARC 收到字节的顺序是 02 01 00 C0, 把这些字节重新构造成一个大端字节序整数， 0×020100C0，因此这样就把该地址误解成了 2.1.0.192。

Preventing this sort of confusion leads to an annoying little implementation detail for TCP/IP stack developers. If the stack will run on a little endian processor, it will have to reorder-at run time-the bytes of every multibyte data field within the various layers’ headers. If the stack will run on a big endian processor, there’s nothing to worry about. For the stack to be portable (that is, so it will run on processors of both types), it will have to decide whether or not to do this reordering, typically at compile time.
阻止此类混淆给 TCP/IP 栈的开发者带来了一点讨厌的实现细节。如果该栈运行于一个小端字节序的处理器上，它不得不在运行时，在不同层的头部，重排每一个多节字数据字段的字节顺序。如果该栈运行在一个大端字节序的处理器上，则于此不必担心。为了使栈具有可移植性（也就是说，能正确运行于两种类型的处理器上），它必须决定是否做这种字节重排，这种决定通常在编译时做出。

A common solution to the endianness problem associated with networking is to define a set of four preprocessor macros: htons(), htonl(), ntohs(), and ntohl(), as shown in Listing 1. These macros make the following conversions:
对于与网络关联的端字节序的通用解决方案是定义一组（四个）处理器宏： htons(), htonl(), ntohs(), and ntohl(), 如清单1所示。这些宏完成如下转换：
* htons(): reorder the bytes of a 16-bit unsigned value from processor order to network order. The macro name can be read "host to network short."
* htons(): 重排 16 位无符号值的字节序，从处理器序转换为网络字节序。此宏可以读成 “host to network short.”
* htonl(): reorder the bytes of a 32-bit unsigned value from processor order to network order. The macro name can be read "host to network long."
* htonl(): 重排 32 位无符号值的字节序，从处理器序转换为网络字节序。此宏可以读成 ” “host to network long.”
* ntohs(): reorder the bytes of a 16-bit unsigned value from network order to processor order. The macro name can be read "network to host short."
* ntohs(): 重排 16 位无符号值的字节序，从网络字节序转换为处理器序。此宏可以读成 “network to host short.”
* ntohl(): reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read "network to host long."
* ntohl(): 重排 32 位无符号值的字节序，从网络字节序转换为处理器序。此宏可以读成 “network to host long.”

#if defined(BIG_ENDIAN) && !defined(LITTLE_ENDIAN)
#define htons(A) (A)
#define htonl(A) (A)
#define ntohs(A) (A)
#define ntohl(A) (A)
#elif defined(LITTLE_ENDIAN) && !defined(BIG_ENDIAN)
#define htons(A) ((((uint16)(A) & 0xff00) >> 8 ) |
(((uint16)(A) & 0×00ff) << 8))
#define htonl(A) ((((uint32)(A) & 0xff000000) >> 24) |
(((uint32)(A) & 0×00ff0000) >> 8) |
(((uint32)(A) & 0×0000ff00) << 8) |
(((uint32)(A) & 0×000000ff) << 24))
#define ntohs htons
#define ntohl htohl
#else
#error "Either BIG_ENDIAN or LITTLE_ENDIAN must be #defined, but not both."
#endif

Listing 1. Byte reordering macros
清单1. 字节顺序宏

If the processor on which the TCP/IP stack is to be run is itself also big endian, each of the four macros will be defined to do nothing and there will be no run-time performance impact. If, however, the processor is little endian, the macros will reorder the bytes appropriately. These macros are routinely called when building and parsing network packets and when socket connections are created.
如果运行该 TCP/IP 栈的处理器,也以大端字节序运行,那么, 这四个宏的每一个都会被定义成什么事情也不做.因此也没有什么运行时的性能影响.否则,处理器是小端字节序,这些宏将适应地重排字节顺序。当在构建、解析网络包和 socket 连接被建立的时候，这些宏如期调用。
Serious run-time performance penalties occur when using TCP/IP on a little endian processor. For that reason, it may be unwise to select a little endian processor for use in a device, such as a router or gateway, with an abundance of network functionality.
当在一个小端字节顺序的处理器上使用 TCP/IP 时，会导致严重的运行时性能处罚（译者注：由于本文最初主要面向嵌入式开发者，所以这么说是可以接受的）。正由于该原因，在需要大量网络功能的设备中（比如路由器或者网关），使用一个小端字节序的处理器是不明智的。

The convenient end
方便的端
The origin of the odd terms big endian and little endian can be traced to the 1726 book Gulliver's Travels, by Jonathan Swift. In one part of the story, resistance to an imperial edict to break soft-boiled eggs on the "little end" escalates to civil war. (The plot is a satire of England 's King Henry VIII's break with the Catholic Church.) A few hundred years later, in 1981, Danny Cohen applied the terms and the satire to our current situation in IEEE Computer (vol. 14, no. 10).
有关奇怪的术语“大端”和“小端“的产生，可以追溯到 Jonathan Swift 于 1726 年所出的《格利佛游记》这本书。在该故事的一部分中讲到，对在“小端”上打破软煮的鸡蛋这么一个圣旨的抵制，升级为内战。（这个情节是对英格兰国王亨利三世毁坏天主教教堂的一个讽刺）。百余年以后的1981年，Danny Cohen 在 IEEE 《Computer》(第14卷, 第10篇），将该术语和讽刺应用到我们今天的现状之中。

Unfortunately, both implementations continue to be prevalent. Embedded programmers must be aware of the issue and be prepared to convert between their different representations as required.
不幸的是，两种实现都持续流行。这就需要嵌入式开发者必须注意到这个问题，并且为在它们不同的表示之间做转换做好准备。

This article was published in the January 2002 issue of Embedded Systems Programming. If you wish to cite the article in your own work, you may find the following MLA-style information helpful:
此文于 2002 年 1月发表在《嵌入式系统编程》上，如果您想在您的大作中引用此文，您会发现下面的信息对您有所帮助。
Brown, Christopher and Michael Barr. "Introduction to Endianness" Embedded Systems Programming, January 2002 , pp. 55-56.

如果是PC通信，就不用考虑网络大端小端问题，只需考虑两台机器内部的CPU。如果CPU一致，就不用进行转换，CPU不一致就需要转换。自己暂时还没有涉及到嵌入式编程，暂时没有考虑。但对跨平台来说，大端小端是一个不容忽视的小问题