Java中的Big/Little问题,注意后面的英文

导读:
  Java中的Big/Little问题
  1. 解决Endian问题:一个总结
  Java二进制文件中的所有东西都以big-endian形式存在,高字节优先,这有时被称为网络顺序。这是一个好的消息,意味着如果你只使用Java。所有文件在所有平台(Mac,PC,Solaris等)上按同样的方式进行处理。可以自由地交换二进制数据,以电子形式在Internet上,或在软盘上,而无需考虑endian问题。存在的问题是当你与那些不是使用Java编写的程序交换数据文件时,会存在一些问题。因为这些程序使用的是little-endian顺序,通常是在PC上使用的C语言。有些平台内部使用big-endian字节顺序(Mac,IBM390);有些平台使用little-endian字节顺序(Intel)。Java对用户隐瞒了endian问题。
  休 闲 居 编辑
  在二进制文件中,在域之间没有分割符,文件是二进制形式的,不可读的ASCII。如果你想读的数据不是标准格式,通常由非Java程序准备的。可以由四种选择:
  1). 重写提供输入文件的输出程序。它可以直接输出big-endian字节流DataOutputStream或者字符DataOutputSream格式。
  2). 写一个独立的翻译程序,读和排列字节。可以用任何语言编写。
  3). 以字节形式读数据,并重新安排它们(on the fly)。
  4). 最简单的方式是,使用我编写的LEDataInputStream, LEDataOutputStream 和LERandomAccessFile模拟 DataInputStream, DataOutputStream and RandomAccessFile ,它们使用的是little-endian字节流。 You can read about LEDataStream. You can download the code and source free. You can get help from the File I/O Amanuensis to show you how to use the classes. Just tell it you have little-endian binary data.
  2.你可能甚至不会有任何问题。
  从C来的许多Java新手可能会认为需要考虑它们所依赖的平台内部所使用的是big还是little问题。在Java中这不是一个问题。进一步,不借助于本地类,你无法知道它们是如何存储的。Java has no struct I/O and no unions or any of the other endian-sensitive language constructs.
  仅在与遗留的C/C++应用程序通讯时需要考虑endian问题。下列代码在big or little endian机器上都将产生同样的结果:
  // take 16-bit short apart into two 8-bit bytes.
  short x = 0xabcd;
  byte high = (byte) (x >>>8);
  byte low = (byte) x;/* cast implies &0xff */
  System.out.println ("x=" + x + "high=" + high + "low=" + low );
  3.读Little-Endian Binary Files
  The most common problem is dealing with files stored in little-endian format.
  I had to implement routines parallel to those in java.io.DataInputStream which reads raw binary, in my LEDataInputStream and LEDataOutputStream classes. Don't confuse this with the io.DataInput human-readable character-based file-interchange format.
  If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream classes, here is the basic technique:
  Presuming your integers are in 2's complement little-endian format, shorts are pretty easy to handle:
  --------------------------------------------------------------------------------
  short readShortLittleEndian( )
  {
  // 2 bytes
  int low = readByte() &0xff;
  int high = readByte() &0xff;
  return (short )(high <<8 | low);
  }
  Or if you want to get clever and puzzle your readers, you can avoid one mask since the high bits will later be shaved off by conversion back to short.
  short readShortLittleEndian( )
  {
  // 2 bytes
  int low = readByte() &0xff;
  int high = readByte();
  // avoid masking here
  return (short )(high <<8 | low);
  }
  --------------------------------------------------------------------------------
  Longs are a little more complicated:
  --------------------------------------------------------------------------------
  long readLongLittleEndian( )
  {
  // 8 bytes
  long accum = 0;
  for ( int shiftBy = 0; shiftBy <64; shiftBy+ =8 )
  {
  // must cast to long or shift done modulo 32
  accum |= ( long)(readByte () &0xff) <
  }
  return accum;
  }
  --------------------------------------------------------------------------------
  In a similar way we handle char and int.
  --------------------------------------------------------------------------------
  char readCharLittleEndian( )
  {
  // 2 bytes
  int low = readByte() &0xff;
  int high = readByte();
  return (char )(high <<8 | low);
  }
  --------------------------------------------------------------------------------
  int readIntLittleEndian( )
  {
  // 4 bytes
  int accum = 0;
  for ( int shiftBy = 0; shiftBy <32; shiftBy+ =8 )
  {
  accum |= (readByte () &0xff) <
  }
  return accum;
  }
  --------------------------------------------------------------------------------
  Floating point is a little trickier. Presuming your data is in IEEE little-endian format, you need something like this:
  --------------------------------------------------------------------------------
  double readDoubleLittleEndian( )
  {
  long accum = 0;
  for ( int shiftBy = 0; shiftBy <64; shiftBy+ =8 )
  {
  // must cast to long or shift done modulo 32
  accum |= ( (long)(readByte() &0xff)) <
  }
  return Double.longBitsToDouble (accum);
  }
  --------------------------------------------------------------------------------
  float readFloatLittleEndian( )
  {
  int accum = 0;
  for ( int shiftBy = 0; shiftBy <32; shiftBy+ =8 )
  {
  accum |= (readByte () &0xff) <
  }
  return Float.intBitsToFloat (accum);
  }
  --------------------------------------------------------------------------------
  You don't need a readByteLittleEndian since the code would be identical to readByte, though you might create one just for consistency:
  --------------------------------------------------------------------------------
  byte readByteLittleEndian( )
  {
  // 1 byte
  return readByte();
  }
  --------------------------------------------------------------------------------
  4.History
  In Gulliver's travels the Lilliputians liked to break their eggs on the small end and the Blefuscudians on the big end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte first? This is sometimes referred to as byte sex.
  Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes such as the 390, and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.
  Blefuscudians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if it were one big text field key. Real programmers read hex dumps, and big-endian is a lot easier to comprehend.
  In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the AMD 6502 popularised by the Apple ][.
  Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still have it work. Real programmers read hex dumps, and little-endian is more of a stimulating challenge.
  If a machine is word addressable, with no finer addressing supported, the concept of endianness means nothing since words are fetched from RAM in parallel, both ends first.
  5.What Sex Is Your CPU?
  Byte Sex Endianness of CPUs
  CPU
  Endianness Notes
  AMD 6502, Duron, Athlon, Thunderird
  little
  6502 was used in the Apple ][, the Duron, Athlon and Thunderbird in Windows 95/08/ME/NT/2000/XP
  Apple ][ 6502
  little
  Apple Mac 68000
  big
  Uses Motorola 68000
  Apple Power PC
  big
  CPU is bisexual but stays big in the Mac OS.
  Burroughs 1700, 1800, 1900
  bit addressable. Used different interpreter firmware instruction sets for each language.
  Burroughs 7800
  Algol machine
  CDC LGP-30
  word-addressable only, hence no endianness
  31½ bit words. Low order bit must be 0 on the drum, but can be 1 in the accumulator.
  CDC 3300, 6600
  word-addressable
  DEC PDP, Vax
  little
  IBM 360, 370, 380, 390
  big
  IBM 7044, 7090
  word addressable
  36 bits
  IBM AS-400
  big
  Power PC
  either
  The endian-agnostic Power-PC's have a foot in both camps. They are bisexual, but the OS usually imposes one convention or the other. e.g. Mac PowerPCs are big-endian.
  Intel 8080, 8080, 8086, 80286, 80386, 80486, Pentium I, II, III, IV
  little
  Chips used in PCs
  Intel 8051
  big
  MIPS R4000, R5000, R10000
  big
  Used in Silcon Graphics IRIX.
  Motorola 6800, 6809, 680x0, 68HC11
  big
  Early Macs used the 68000. Amiga.
  NCR 8500
  big
  NCR Century
  big
  Sun Sparc and UltraSparc
  big
  Sun's Solaris. Normally used as big-endian, but also has support for operating for little-endian mode, including being able to switch endianness under program control for particular loads and stores.
  Univac 1100
  word-addressable
  36-bit words.
  Univac 90/30
  big
  IBM 370 clone
  Zilog Z80
  little
  Used in CPM machines.
  If you know the endianness of other CPUs/OSes/platforms please email me at roedy@mindprod.com.
  In theory data can have two different byte sexes but CPUs can have four. Let us give thanks, in this world of mixed left and right hand drive, that there are not real CPUs with all four sexes to contend with.
  The Four Possible Byte Sexes for CPUS
  Which Byte
  Is Stored in the
  Lower-Numbered
  Address?
  Which Byte
  Is Addressed?
  Used In
  LSB
  LSB
  Intel, AMD, Power PC, DEC.
  LSB
  MSB
  none that I know of.
  MSB
  LSB
  Perhaps one of the old word mark architecture machines.
  MSB
  MSB
  Mac, IBM 390, Power PC
  --------------------------------------------------------------------------------
  You are visitor number 8680.

本文转自
http://www.xxju.net/article/200512/16_0101523214.htm
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值