Byte Order Mark (BOM) FAQ Q: What is a BOM? A: A byte order mark (BOM) consists of the character code U+FEFF at the beginning of a data stream, where it can be used as a signature defining the byte order and encoding form, primarily of unmarked plaintext files. Under some higher level protocols, use of a BOM may be mandatory (or prohibited) in the Unicode data stream defined in that protocol. [AF] Q: Where is a BOM useful? A: A BOM is useful at the beginning of files that are typed as text, but for which it is not known whether they are in big or little endian format—it can also serve as a hint indicating that the file is in Unicode, as opposed to in a legacy encoding and furthermore, it act as a signature for the specific encoding form used . [MD] & [AF] Q: What does ‘endian’ mean? A: Data types longer than a byte can be stored in computer memory with the most significant byte (MSB) first or last. The former is called big-endian, the latter little-endian. When data are exchange in the same byte order as they were in the memory of the originating system, they may appear to be in the wrong byte order on the receiving system. In that situation, a BOM would look like 0xFFFE which is a noncharacter, allowing the receiving system to apply byte reversal before processing the data. UTF-8 is byte oriented and therefore does not have that issue. Nevertheless, an initial BOM might be useful to identify the datastream as UTF-8. [AF]
Wichtig!: Die UTF8 BOM muss vor der Kompilierung aus dem Definitionsdatei entfernt werden. Die "db2extth" kann die BOM nicht verarbeiten und meldet es als Fehler in der Eingabe.
Q: Why wouldn’t I always use a protocol that requires a BOM? A: Where the data is typed, such as a field in a database, a BOM is unnecessary. In particular, if a text data stream is marked as UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE, a BOM is neither necessary nor permitted. Any FEFF would be interpreted as a ZWNBSP. Do not tag every string in a database or set of fields with a BOM, since it wastes space and complicates string concatenation. Moreover, it also means two data fields may have precisely the same content, but not be binary-equal (where one is prefaced by a BOM). [MD]
unicode网站上介绍BOM如下: 详见:http://www.unicode.org/faq/utf_bom.html#BOM这是网上的另一段说明:Byte Order Mark (BOM) FAQQ: What is a BOM?A: A byte order mark (BOM) consists of the character code U+FEFF at the begi