Java中的Big/Little问题,注意后面的英文

最新推荐文章于 2023-12-10 14:53:26 发布

chief1985

最新推荐文章于 2023-12-10 14:53:26 发布

阅读量2.7k

点赞数

分类专栏：原理文章标签： java byte motorola ibm apple parallel

本文链接：https://blog.csdn.net/chief1985/article/details/2223875

版权

原理专栏收录该内容

59 篇文章 0 订阅

订阅专栏

导读：
　　Java中的Big/Little问题
　　1. 解决Endian问题：一个总结
　　Java二进制文件中的所有东西都以big-endian形式存在，高字节优先，这有时被称为网络顺序。这是一个好的消息，意味着如果你只使用Java。所有文件在所有平台（Mac,PC,Solaris等）上按同样的方式进行处理。可以自由地交换二进制数据，以电子形式在Internet上，或在软盘上，而无需考虑endian问题。存在的问题是当你与那些不是使用Java编写的程序交换数据文件时，会存在一些问题。因为这些程序使用的是little-endian顺序，通常是在PC上使用的C语言。有些平台内部使用big-endian字节顺序（Mac,IBM390）；有些平台使用little-endian字节顺序（Intel）。Java对用户隐瞒了endian问题。
　　休闲居编辑
　　在二进制文件中，在域之间没有分割符，文件是二进制形式的，不可读的ASCII。如果你想读的数据不是标准格式，通常由非Java程序准备的。可以由四种选择：
　　1). 重写提供输入文件的输出程序。它可以直接输出big-endian字节流DataOutputStream或者字符DataOutputSream格式。
　　2). 写一个独立的翻译程序，读和排列字节。可以用任何语言编写。
　　3). 以字节形式读数据，并重新安排它们（on the fly）。
　　4). 最简单的方式是，使用我编写的LEDataInputStream, LEDataOutputStream 和LERandomAccessFile模拟 DataInputStream, DataOutputStream and RandomAccessFile ，它们使用的是little-endian字节流。 You can read about LEDataStream. You can download the code and source free. You can get help from the File I/O Amanuensis to show you how to use the classes. Just tell it you have little-endian binary data.
　　2．你可能甚至不会有任何问题。
　　从C来的许多Java新手可能会认为需要考虑它们所依赖的平台内部所使用的是big还是little问题。在Java中这不是一个问题。进一步，不借助于本地类，你无法知道它们是如何存储的。Java has no struct I/O and no unions or any of the other endian-sensitive language constructs.
　　仅在与遗留的C/C++应用程序通讯时需要考虑endian问题。下列代码在big or little endian机器上都将产生同样的结果:
　　// take 16-bit short apart into two 8-bit bytes.
　　short x = 0xabcd;
　　byte high = (byte) (x >>>8);
　　byte low = (byte) x;/* cast implies &0xff */
　　System.out.println ("x=" + x + "high=" + high + "low=" + low );
　　3．读Little-Endian Binary Files
　　The most common problem is dealing with files stored in little-endian format.
　　I had to implement routines parallel to those in java.io.DataInputStream which reads raw binary, in my LEDataInputStream and LEDataOutputStream classes. Don't confuse this with the io.DataInput human-readable character-based file-interchange format.
　　If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream classes, here is the basic technique:
　　Presuming your integers are in 2's complement little-endian format, shorts are pretty easy to handle:
　　--------------------------------------------------------------------------------
　　short readShortLittleEndian( )
　　{
　　// 2 bytes
　　int low = readByte() &0xff;
　　int high = readByte() &0xff;
　　return (short )(high <<8 | low);
　　}
　　Or if you want to get clever and puzzle your readers, you can avoid one mask since the high bits will later be shaved off by conversion back to short.
　　short readShortLittleEndian( )
　　{
　　// 2 bytes
　　int low = readByte() &0xff;
　　int high = readByte();
　　// avoid masking here
　　return (short )(high <<8 | low);
　　}
　　--------------------------------------------------------------------------------
　　Longs are a little more complicated:
　　--------------------------------------------------------------------------------
　　long readLongLittleEndian( )
　　{
　　// 8 bytes
　　long accum = 0;
　　for ( int shiftBy = 0; shiftBy <64; shiftBy+ =8 )
　　{
　　// must cast to long or shift done modulo 32
　　accum |= ( long)(readByte () &0xff) <
　　}
　　return accum;
　　}
　　--------------------------------------------------------------------------------
　　In a similar way we handle char and int.
　　--------------------------------------------------------------------------------
　　char readCharLittleEndian( )
　　{
　　// 2 bytes
　　int low = readByte() &0xff;
　　int high = readByte();
　　return (char )(high <<8 | low);
　　}
　　--------------------------------------------------------------------------------
　　int readIntLittleEndian( )
　　{
　　// 4 bytes
　　int accum = 0;
　　for ( int shiftBy = 0; shiftBy <32; shiftBy+ =8 )
　　{
　　accum |= (readByte () &0xff) <
　　}
　　return accum;
　　}
　　--------------------------------------------------------------------------------
　　Floating point is a little trickier. Presuming your data is in IEEE little-endian format, you need something like this:
　　--------------------------------------------------------------------------------
　　double readDoubleLittleEndian( )
　　{
　　long accum = 0;
　　for ( int shiftBy = 0; shiftBy <64; shiftBy+ =8 )
　　{
　　// must cast to long or shift done modulo 32
　　accum |= ( (long)(readByte() &0xff)) <
　　}
　　return Double.longBitsToDouble (accum);
　　}
　　--------------------------------------------------------------------------------
　　float readFloatLittleEndian( )
　　{
　　int accum = 0;
　　for ( int shiftBy = 0; shiftBy <32; shiftBy+ =8 )
　　{
　　accum |= (readByte () &0xff) <
　　}
　　return Float.intBitsToFloat (accum);
　　}
　　--------------------------------------------------------------------------------
　　You don't need a readByteLittleEndian since the code would be identical to readByte, though you might create one just for consistency:
　　--------------------------------------------------------------------------------
　　byte readByteLittleEndian( )
　　{
　　// 1 byte
　　return readByte();
　　}
　　--------------------------------------------------------------------------------
　　4．History
　　In Gulliver's travels the Lilliputians liked to break their eggs on the small end and the Blefuscudians on the big end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte first? This is sometimes referred to as byte sex.
　　Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes such as the 390, and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.
　　Blefuscudians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if it were one big text field key. Real programmers read hex dumps, and big-endian is a lot easier to comprehend.
　　In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the AMD 6502 popularised by the Apple ][.
　　Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still have it work. Real programmers read hex dumps, and little-endian is more of a stimulating challenge.
　　If a machine is word addressable, with no finer addressing supported, the concept of endianness means nothing since words are fetched from RAM in parallel, both ends first.
　　5．What Sex Is Your CPU?
　　Byte Sex Endianness of CPUs
　　CPU
　　Endianness Notes
　　AMD 6502, Duron, Athlon, Thunderird
　　little
　　6502 was used in the Apple ][, the Duron, Athlon and Thunderbird in Windows 95/08/ME/NT/2000/XP
　　Apple ][ 6502
　　little
　　Apple Mac 68000
　　big
　　Uses Motorola 68000
　　Apple Power PC
　　big
　　CPU is bisexual but stays big in the Mac OS.
　　Burroughs 1700, 1800, 1900
　　bit addressable. Used different interpreter firmware instruction sets for each language.
　　Burroughs 7800
　　Algol machine
　　CDC LGP-30
　　word-addressable only, hence no endianness
　　31½ bit words. Low order bit must be 0 on the drum, but can be 1 in the accumulator.
　　CDC 3300, 6600
　　word-addressable
　　DEC PDP, Vax
　　little
　　IBM 360, 370, 380, 390
　　big
　　IBM 7044, 7090
　　word addressable
　　36 bits
　　IBM AS-400
　　big
　　Power PC
　　either
　　The endian-agnostic Power-PC's have a foot in both camps. They are bisexual, but the OS usually imposes one convention or the other. e.g. Mac PowerPCs are big-endian.
　　Intel 8080, 8080, 8086, 80286, 80386, 80486, Pentium I, II, III, IV
　　little
　　Chips used in PCs
　　Intel 8051
　　big
　　MIPS R4000, R5000, R10000
　　big
　　Used in Silcon Graphics IRIX.
　　Motorola 6800, 6809, 680x0, 68HC11
　　big
　　Early Macs used the 68000. Amiga.
　　NCR 8500
　　big
　　NCR Century
　　big
　　Sun Sparc and UltraSparc
　　big
　　Sun's Solaris. Normally used as big-endian, but also has support for operating for little-endian mode, including being able to switch endianness under program control for particular loads and stores.
　　Univac 1100
　　word-addressable
　　36-bit words.
　　Univac 90/30
　　big
　　IBM 370 clone
　　Zilog Z80
　　little
　　Used in CPM machines.
　　If you know the endianness of other CPUs/OSes/platforms please email me at roedy@mindprod.com.
　　In theory data can have two different byte sexes but CPUs can have four. Let us give thanks, in this world of mixed left and right hand drive, that there are not real CPUs with all four sexes to contend with.
　　The Four Possible Byte Sexes for CPUS
　　Which Byte
　　Is Stored in the
　　Lower-Numbered
　　Address?
　　Which Byte
　　Is Addressed?
　　Used In
　　LSB
　　LSB
　　Intel, AMD, Power PC, DEC.
　　LSB
　　MSB
　　none that I know of.
　　MSB
　　LSB
　　Perhaps one of the old word mark architecture machines.
　　MSB
　　MSB
　　Mac, IBM 390, Power PC
　　--------------------------------------------------------------------------------
　　You are visitor number 8680.

本文转自
http://www.xxju.net/article/200512/16_0101523214.htm

chief1985

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java中的Big/Little问题,注意后面的英文

导读：　　Java中的Big/Little问题　　1. 解决Endian问题：一个总结　　Java二进制文件中的所有东西都以big-endian形式存在，高字节优先，这有时被称为网络顺序。这是一个好的消息，意味着如果你只使用Java。所有文件在所有平台（Mac,PC,Solaris等）上按同样的方式进行处理。可以自由地交换二进制数据，以电子形式在Internet上，或在软盘上，而无需
复制链接

扫一扫