Byte and Bit Order Dissection

Sep 02, 2003  By Kevin Kaichuan He

 in
Discussing the differences between big and little endianness, bit and byte order and what it all means.

Editors' Note: This article has beenupdated since its original posting.

Software and hardware engineers who have to deal with byteand bit order issues know the process is like walking a maze.Though we usually come out of it, we consume a handful of our braincells each time. This article tries to summarize the various areasin which the business of byte and bit order plays a role, includingCPU, buses, devices and networking protocols. We dive into thedetails and hope to provide a good reference on this topic. Thearticle also tries to suggest some guidelines and rules of thumbdeveloped from practice.

Byte Order: the Endianness

We probably are familiar with the word endianness. Firstintroduced by Danny Cohen in 1980, it describes the method acomputer system uses to represent multi-byte integers.

Two types of endianness exist, big endian and little endian.Big endian refers to the method that stores the most significantbyte of an integer at the lowest byte address. Little endian is theopposite; it refers to the method of storing the most significantbyte of an integer at the highest byte address.

Bit order usually follows the same endianness as the byteorder for a given computer system. That is, in a big endian systemthe most significant bit is stored at the lowest bit address; in alittle endian system, the least significant bit is stored at thelowest bit address.

Every effort is made to avoid bit swapping in software whendesigning a system, because bit swapping is both expensive andtedious. Later sections describe how hardware takes care ofit.

Documentation Guideline

Just as most people write a number from left to right, thelayout of a multi-byte integer should flow from left to right, thatis, from the most significant to the least significant byte. Thisis the most clear way to write integers, as we can see in thefollowing examples.

Here is how we would write the integer 0x0a0b0c0d for bothbig endian and little endian systems, according to the ruleabove:

Write Integer for Big Endian System

byte  addr       0         1       2        3
bit  offset  01234567 01234567 01234567 01234567
     binary  00001010 00001011 00001100 00001101
        hex     0a       0b      0c        0d

Write Integer for Little Endian System

byte  addr      3         2       1        0
bit  offset  76543210 76543210 76543210 76543210
     binary  00001010 00001011 00001100 00001101
        hex     0a       0b      0c        0d

In both cases above, we can read from left to right and thenumber is 0x0a0b0c0d.

If we do not follow the rule, we might write the number inthe following way:

byte  addr      0         1       2        3
bit  offset  01234567 01234567 01234567 01234567
     binary  10110000 00110000 11010000 01010000

As you can see, it's hard to make out what number we'retrying to represent.

Simplified Computer System Used in thisArticle

Without losing generality, a simplified view of the computersystem discussed in this article is drawn below.

CPU, local bus and internal memory/cache all are consideredto be CPU, because they usually share the same endianness.Discussion of bus endianness, however, covers only external bus.The CPU register width, memory word width and bus width are assumedto be 32 bits for this article.

Endianness of CPU

The CPU endianness is the byte and bit order in which itinterprets multi-byte integers from on-chip registers, local bus,in-line cache, memory and so on.

Little endian CPUs include Intel and DEC. Big endian CPUsinclude Motorola 680x0, Sun Sparc and IBM (e.g., PowerPC). MIPs andARM can be configured either way.

The CPU endianness affects the CPU's instruction set.Different GNU C toolchains for compiling the C code ought to beused for CPUs of different endianness. For example, mips-linux-gccand mipsel-linux-gcc are used to compile MIPs code for big endianand little endian, respectively.

The CPU endianness also has an impact on software programs ifwe need to access part of a multi-byte integer. The followingprogram illustrates that situation. If one accesses the whole32-bit integer, the CPU endianness is invisible to softwareprograms.

union {
    uint32_t my_int;
    uint8_t  my_bytes[4];
} endian_tester;
endian_tester et;
et.my_int = 0x0a0b0c0d;
if(et.my_bytes[0] == 0x0a )
    printf( "I'm on a big-endian system\n" );
else
    printf( "I'm on a little-endian system\n" );

Endianness of Bus

The bus we refer to here is the external bus we showed in thefigure above. We use PCI as an example below. The bus, as we know,is an intermediary component that interconnects CPUs, devices andvarious other components on the system. The endianness of bus is astandard for byte/bit order that bus protocol defines and withwhich other components comply.

Take an example of the PCI bus known as little endian. Itimplies the following: among the 32 address/data bus line AD[31:0], it expects a 32-bit device and connects its mostsignificant data line to AD31 and least significant data line toAD0. A big endian bus protocol would be the opposite.

For a partial word device connected to bus, for example, an8-bit device, little endian bus-like PCI specifies that the eightdata lines of the device be connected to AD[7:0]. For a big endianbus protocol, it would be connected to AD[24:31].

In addition, for PCI bus the protocol requires each PCIdevice to implement a configuration space. This is a set ofconfiguration registers that have the same byte order as thebus.

Just as all the devices need to follow bus's rules regardingbyte/bit endianness, so does the CPU. If a CPU operates in anendianness different from the bus, the bus controller/bridgeusually is the place where the conversion is performed.

An alert reader nows ask this question, "so what happens ifthe endianness of the device is different from the endianness ofthe bus?" In this case, we need to do some extra work forcommunication to occur, which is covered in the nextsection.

Endianness of Devices

Kevin's Theory #1: When a multi-byte data unit travels acrossthe boundary of two reverse endian systems, the conversion is madesuch that memory contiguousness to the unit is preserved.

We assume CPU and bus share the same endianness in thefollowing discussion. If the endianness of a device is the same asthat of CPU/bus, then no conversion is needed.

In the case of different endianness between the device andthe CPU/bus, we offer two solutions here from a hardware wiringpoint of view. We assume CPU/bus is little endian and the device isbig endian in the following discussion.

Word Consistent Approach

In this approach, we swap the entire 32-bit word of thedevice data line. We represent the data line of device as D[0:31],where D(0) stores the most significant bit, and bus line asAD[31:0]. This approach suggests wiring D(i) to AD(31-i), where i =0, ..., 31. Word Consistent means the semantic of the whole word ispreserved.

To illustrate, the following code represents a 32-bitdescriptor register in a big endian NIC card:

After applying the Word Consistent swap (wiring D[0:31] toAD[31:0]) , the result in the CPU/bus is:

Notice that it automatically is little endian for CPU/bus. Nosoftware byte or bit swapping is needed.

The above example is for those simple cases where data doesnot cross a 32-bit memory boundary. Now, let's take a look at acase where it does. In the following code, vlan[0:24] has a valueof 0xabcdef and crosses a 32-bit memory boundary.

After the Word Consistent swap, the result is:

Do you see what happened? The vlan field has been broken intotwo noncontiguous memory spaces: bytes[1:0] and byte(7). Itviolates Kevin's Theory #1, and we are not able to define a nice Cstructure to access the in-contiguous vlan fields.

Therefore, the Word Consistent solution works only for datawithin word boundaries and does not work for data that may cross aword boundary. The second approach solves this problem forus.

Byte Consistent Approach

In this approach, we do not swap bytes, but we do swap thebits within each byte lane (bit at device bit-offset i goes to busbit-offset (7-i), where i=0...7) in hardware wiring. ByteConsistent means the semantic of the byte is preserved.

After applying this method, the big endian NIC device valuein above results in this CPU/bus value:

Now, the three bytes of the vlan field are in contiguousmemory space, and the content of each byte reads correctly. Butthis result still looks messy in byte order. However, because wenow occupy a contiguous memory space, let the software do a byteswap for this 5-byte data structure. We get the followingresult:

We see that software byte swapping needs to be performed asthe second procedure in this approach. Byte swapping is affordablein software, unlike bit swapping.

Kevin's Theory #2: In a C structure that contains bit fields,if field A is defined in front of field B, then field A alwaysoccupies a lower bit address than field B.

Now that everything is sorted out nicely, we can define the Cstructure as the following to access the descriptor in theNIC:

struct nic_tag_reg {
        uint64_t vlan:24 __attribute__((packed));
        uint64_t rx  :6  __attribute__((packed));
        uint64_t tag :10 __attribute__((packed));
};
Endianness of Network Protocols

The endianness of network protocols defines the order inwhich the bits and bytes of an integer field of a network protocolheader are sent and received. We also introduce a term called wireaddress here. A lower wire address bit or byte always istransmitted and received in front of a higher wire address bit orbyte.

In fact, for network endianness, it is a little differentthan what we have seen so far. Another factor is in the picture:the bit transmission/reception order on the physical wire. Lowerlayer protocols, such as Ethernet, have specifications for bittransmission/reception order, and sometimes it can be the reverseof the upper layer protocol endianness. We look at this situationin our examples.

The endianness of NIC devices usually follow the endiannessof the network protocols they support, so it could be differentfrom the endianness of the CPU on the system. Most networkprotocols are big endian; here we take Ethernet and IP asexamples.

Endianness of Ethernet

Ethernet is big endian. This means the most significant byteof an integer field is placed at a lower wire byte address andtransmitted/received in front of the least significant byte. Forexample, the protocol field with a value of 0x0806(ARP) in theEthernet header has a wire layout like this:

wire byte offset:     0       1
hex             :    08      06

Notice that the MAC address field of the Ethernet header isconsidered as a string of characters, in which case the byte orderdoes not matter. For example, a MAC address 12:34:56:78:9a:bc has alayout on the wire like that shown below, and byte 12 istransmitted first.

Bit Transmission/Reception Order

The bit transmission/reception order specifies how the bitswithin a byte are transmitted/received on the wire. For Ethernet,the order is from the least significant bit (lower wire addressoffset) to the most significant bit (higher wire address offset).This apparently is little endian. The byte order remains the sameas big endian, as described in early section. Therefore, here wesee the situation where the byte order and the bittransmission/reception order are the reverse.

The following is an illustration of Ethernet bittransmission/reception order:

We see from this that the group (multicast) bit, the leastsignificant bit of the first byte, appeared as the first bit on thewire. Ethernet and 802.3 hardware behave consistently with the bittransmission/reception order above.

In this case, where the protocol byte order and the bittransmission/reception order are different, the NIC must convertthe bit transmission/reception order from/to the host(CPU) bitorder. By doing so, the upper layers do not have to worry about bitorder and need only to sort out the byte order. In fact, this isanother form of the Byte Consistent approach, where byte semanticsare preserved when data travels across different endiandomains.

The bit transmission/reception order generally is invisibleto the CPU and software, but is important to hardwareconsiderations such as the serdes (serializer/deserializer) of PHYand the wiring of NIC device data lines to the bus.

Parsing Ethernet Header in Software

For either endianness, the Ethernet header can be parsed bysoftware with the C structure below:

struct ethhdr
{
        unsigned char   h_dest[ETH_ALEN];       
        unsigned char   h_source[ETH_ALEN];     
        unsigned short  h_proto;                
};

The h_dest and h_sourcefields are byte arrays, so no conversion is needed. Theh_proto field here is an integer, therefore antohs() is needed before the host accesses this field, and htons()is needed before the host fills up this field.

Endianness of IP

IP's byte order also is big endian. The bit endianness of IPinherits that of the CPU, and the NIC takes care of converting itfrom/to the bit transmission/reception order on the wire.

For big endian hosts, IP header fields can be accesseddirectly. For little endian hosts, which are most PCs in the world(x86), byte swap needs to be be performed in software for theinteger fields in the IP header.

Below is the structure of iphdr from the Linux kernel. We usentohs() before reading integer fields and htons() before writingthem. Essentially, these two functions do nothing for big endianhosts and perform byte swapping for little endian hosts.

struct iphdr {
#if defined(__LITTLE_ENDIAN_BITFIELD)
        __u8    ihl:4,
                version:4;
#elif defined (__BIG_ENDIAN_BITFIELD)
        __u8    version:4,
                ihl:4;
#else
#error  "Please fix <asm/byteorder.h>"
#endif
        __u8    tos;
        __u16   tot_len;
        __u16   id;
        __u16   frag_off;
        __u8    ttl;
        __u8    protocol;
        __u16   check;
        __u32   saddr;
        __u32   daddr;
        /*The options start here. */
};

Take a look at some interesting fields in the IPheader:

version andihl fields: According to IP standard,version is the most significant four bits of the first byte of anIP header. ihl is the least significant four bits of the first byteof the IP header.

There are two methods to access these fields. Method 1directly extracts them from the data. If ver_ihl holds the firstbyte of the IP header, then (ver_ihl & 0x0f) gives the ihlfield and (ver_ihl > > 4) gives the version field. Thisapplies for hosts with either endianness.

Method 2 is to define the structure as above, then accessthese fields from the structure itself. In the above structure, ifthe host is little endian, then we define ihl before version; ifthe host is big endian, we define version before ihl. If we applyKevin's Theory #2 here that an earlier defined field alwaysoccupies a lower memory address, we find that the above definitionin C structure fits the IP standard pretty well.

saddr anddaddr fields: these two fields can betreated as either byte or integer arrays. If they are treated asbyte arrays, there is no need to do endianness conversion. If theyare treated as integers, then conversions need to be performed asneeded. Below is a function with integer interpretation:

/*  dot2ip - convert a dotted decimal string into an 
 *           IP address 
 */
uint32_t dot2ip(char *pdot)
{
  uint32_t i,my_ip;
  my_ip=0;
  for (i=0; i<IP_ALEN; ++i) {
    my_ip = my_ip*256+atoi(pdot);
    if ((pdot = (char *) index(pdot, '.')) == NULL)
        break;             
        ++pdot;
    }
    return my_ip;
}

And here is the function with byte arrayinterpretation:

uint32_t dot2ip2(char *pdot)
{
  int i;
  uint8_t ip[IP_ALEN];
  for (i=0; i<IP_ALEN; ++i) {
    ip[i] = atoi(pdot);
    if ((pdot = (char *) index(pdot, '.')) == NULL)
        break;          
     ++pdot;
  }
  return *((uint32_t *)ip);
}
Summary

The topic of byte and bit endianness can go even further thanwhat we discussed here. Hopefully this article has covered the mainaspects of it. See you next time in the maze.

Kevin Kaichuan He is asenior system software engineer at Solustek Corp. He currently isworking on board bring-up, embedded Linux and networking stacksprojects. His previous work experience includes being a softwareengineer at Cisco Systems and a research assistant in ComputerScience at Purdue University. In his spare time, he enjoys digitalphotography, PS2 games and movies.



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值