Introduction to Endianness(大端小端介绍)

Which is the most convenient end on your system?

Some human languages are read and written from left to right; others from right to left. A similar issue arises in the field of computers, involving the representation of numbers.

Endianness is the attribute of a system that indicates whether integers are represented from left to right or right to left. Why, in today's world of virtual machines and gigahertz processors, would a programmer care about such a silly topic? Well, unfortunately, endianness must be chosen every time a hardware or software architecture is designed, and there isn't much in the way of natural law to help decide. So implementations vary.

Endianness comes in two varieties: big and little. A big endian representation has a multibyte integer written with its most significant byte on the left; a number represented thus is easily read by English-speaking humans. A little endian representation, on the other hand, places the most significant byte on the right. Of course, computer architectures don't have an intrinsic "left" or "right" about them. These human terms are borrowed from our written forms of human communication. The following definitions are more precise:

  • Big endian means that the most significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.
  • Little endian means that the least significant byte of any multibyte data field is stored at the lowest memory address, which is also the address of the larger field.

All processors must be designated as either big endian or little endian. Intel's 80x86 processors and their clones are little endian. Sun's SPARC, Motorola's 68K, and the PowerPC families are all big endian. The Java Virtual Machine is big endian as well. Some processors even have a bit in a register that allows the programmer to select the desired endianness.

Figure 1 Little endian memory dump

An endianness difference can cause problems if a computer unknowingly tries to read binary data written in the opposite format from a shared memory location or file. Take a look at an 80x86 memory dump with a 16- or 32-bit integer stored inside, such as that shown in Figure 1. An 80x86 processor stores data in memory with its least significant byte first. However, your mind tends to expect the data to read from the most significant byte to the least.

End to end

Network stacks and communication protocols must also define their endianness. Otherwise, two nodes of different endianness would be unable to communicate. This is a more substantial example of endianness affecting the embedded programmer. As it turns out, all of the protocol layers in the TCP/IP suite are defined to be big endian. In other words, any 16- or 32-bit value within the various layer headers (for example, an IP address, a packet length, or a checksum) must be sent and received with its most significant byte first.

Let's say you wish to establish a TCP socket connection to a computer whose IP address is 192.0.1.2. IPv4 uses a unique 32-bit integer to identify each network host. The dotted decimal IP address must be translated into such an integer.

The multibyte integer representation used by the TCP/IP protocols is sometimes called network byte order. Even if the computers at each end are little endian, multibyte integers passed between them must be converted to network byte order prior to transmission across the network, and converted back to little endian at the receiving end.

Suppose an 80x86-based PC is to talk to a SPARC-based server over the Internet. Without further manipulation, the 80x86 processor would convert 192.0.1.2 to the little endian integer 0x020100C0 and transmit the bytes in the order 02 01 00 C0. The SPARC would receive the bytes in the order 02 01 00 X0, reconstruct the bytes into a big endian integer 0x020100c0, and misinterpret the address as 2.1.0.192.

Preventing this sort of confusion leads to an annoying little implementation detail for TCP/IP stack developers. If the stack will run on a little endian processor, it will have to reorder-at run time-the bytes of every multibyte data field within the various layers' headers. If the stack will run on a big endian processor, there's nothing to worry about. For the stack to be portable (that is, so it will run on processors of both types), it will have to decide whether or not to do this reordering, typically at compile time.

Listing 1 Byte reordering macros

#if defined(BIG_ENDIAN)

#define htons(A)	(A)
#define htonl(A)	(A)
#define ntohs(A)	(A)
#define ntohl(A)	(A)

#elif defined(LITTLE_ENDIAN)

#define htons(A)	((((A) & 0xff00) >> 8) | ((A) & 0x00ff) << 8))
#define htonl(A)	((((A) & 0xff000000) >> 24) | (((A) & 0x00ff0000) >> 8) | \
		(((A) & 0x0000ff00) << 8) | (((A) & 0x000000ff) << 24))
#define ntohs	htons
#define ntohl	htohl

#else

#error "One of BIG_ENDIAN or LITTLE_ENDIAN must be #define'd."

#endif

A common solution to the endianness problem associated with networking is to define a set of four preprocessor macros:htons()htonl()ntohs(), and ntohl(), as shown in Listing 1. These macros make the following conversions:

  • htons(): reorder the bytes of a 16-bit value from processor order to network order. The macro name can be read "host to network short."
  • htonl(): reorder the bytes of a 32-bit value from processor order to network order. The macro name can be read "host to network long."
  • ntohs(): reorder the bytes of a 16-bit value from network order to processor order. The macro name can be read "network to host short."
  • ntohl(): reorder the bytes of a 32-bit value from network order to processor order. The macro name can be read "network to host long."

If the processor on which the TCP/IP stack is to be run is itself also big endian, each of the four macros will be defined to do nothing and there will be no run-time performance impact. If, however, the processor is little endian, the macros will reorder the bytes appropriately. These macros are routinely called when building and parsing network packets and when socket connections are created.

Serious run-time performance penalties occur when using TCP/IP on a little endian processor. For that reason, it may be unwise to select a little endian processor for use in a device, such as a router or gateway, with an abundance of network functionality.

The convenient end

The origin of the odd terms big endian and little endian can be traced to the 1726 book Gulliver's Travels, by Jonathan Swift. In one part of the story, resistance to an imperial edict to break soft-boiled eggs on the "little end" escalates to civil war. (The plot is a satire of England's King Henry VIII's break with the Catholic Church.) A few hundred years later, in 1981, Danny Cohen applied the terms and the satire to our current situation inIEEE Computer (vol. 14, no. 10).

Unfortunately, both implementations continue to be prevalent. Embedded programmers must be aware of the issue and be prepared to convert between their different representations as required.

Christopher Brown began programming in 1982 and now runs a hardware and software development company. His e-mail address is cjbrown@browncomputer.com.

Michael Barr is the editor in chief of ESP. He is also the author of Programming Embedded Systems in C and C++ and an adjunct faculty member at the University of Maryland College Park. Contact him at mbarr@cmp.com.

Return to January 2002 Table of Contents

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值