AS400小型机上是如何表示汉字的?(Work with DBCS data)

博客介绍了在支持DBCS的设备文件应用中处理DBCS数据的方法，包括应用设计、文件标识等。还提到DBCS字符串在混合数据流中的情况，以及OS/400支持的字符集代码范围。此外，阐述了IBM支持的两种DBCS编码方案，分别用于主机服务器和个人计算机。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Work with DBCS data

The following topics describe how you handle DBCS data in applications that use DBCS-capable device files:

A DBCS file is a file that contains double-byte data or is used to process double-byte data. Other files are called alphanumeric files. You can view DBCS files on display, printer, tape, diskette, and ICF devices.

You use data description specifications (DDS) to describe DBCS-capable device files. For information about using DDS, see the DDS Reference: Concepts topic.

You should indicate that a file is DBCS in one or more of the following situations:

The file receives input, or displays or prints output, which has double-byte characters.
The file contains double-byte literals.
The file has double-byte literals in the DDS that are used in the file at processing time (such as constant fields and error messages).
The DDS of the file includes DBCS keywords.
The file stores double-byte data (database files).

DBCS strings in a mixed data stream

Usually, both single-byte characters and double-byte characters are used in a DBCS environment. For example, an accounting firm in Japan uses both English and Japanese for the spreadsheet. If both English and Japanese are being encoded as mixed SBCS and DBCS, the product must be able to understand a mixed character set that contains both single-byte coded characters and double-byte coded characters.

In IBM systems that use EBCDIC, a DBCS string is bracketed in a mixed data stream by a shift-out (SO) control character and a shift-in (SI) control character.

The following example shows the coding for a mixed string:

  sss  (SO)  D1D2D  (SI)  ssss

The following example shows the coding for a mixed hexadecimal string:

  818283  0E     41424143  0F  818283

Supported code ranges

OS/400 supports Japanese, Korean, Simplified Chinese, and Traditional Chinese character-set code ranges.

Using the iSeries Access family of products, the servers also provide support for these non-IBM personal computer DBCS code pages:

Republic of Korea National Standard graphic character set (KS)
Taiwan Industry Standard graphic character set (Big5)
The People's Republic of China National Standard graphic character set (GB)

from:http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/nls/rbagsenadbcs.htm

DBCS code scheme

IBM supports two DBCS code schemes: one for the host servers, the other for personal computers. The IBM-host code scheme has the following code-range characteristics:

First byte

hex 41 to hex FE

Second byte

hex 41 to hex FE

Double-byte blank

hex 4040

In the following figure, using the first byte as the vertical axis and the second byte as the horizontal axis, 256 x 256 intersections or code points are expressed. The lower-right code area is designated as the valid double-byte code area and x is assigned to the double-byte blank.

Figure 30. IBM-Host Code Scheme

Graphic depicting the IBM host code scheme

By assigning the values hex 41 to hex FE in the first and second bytes as the DBCS codes, the codes can be grouped in wards with 192 code points in each ward. For example, the code group with the first byte starting with hex 42 is called ward 42. Ward 42 has the same alphanumeric characters as those in a corresponding single-byte EBCDIC code page, but with double-byte codes. For example, the character A is represented in single-byte EBCDIC code as hex C1 and in IBM-host code as hex 42C1.

The iSeries server supports the following double-byte character sets:

IBM Japanese Character Set
IBM Korean Character Set
IBM Simplified Chinese Character Set
IBM Traditional Chinese Character Set

The following tables show the code ranges for each character set and the number of characters supported in each character set.

Table 28. IBM Japanese Character Set

Wards	Content	Number of Characters
40	Space in 4040	1

41 to 44	Non-Kanji characters Greek, Russian, Roman numeric (Ward 41) Alphanumeric and related symbols (Ward 42) Katakana, Hiragana, and special symbols (Ward 43-44)	549

45 to 55	Basic Kanji characters	3226

56 to 68	Extended Kanji characters	3487

69 to 7F	User-defined characters	Up to 4370

80 to FE	Reserved

Total number of IBM-defined characters: 7263

Table 29. IBM Korean Character Set

Wards	Content	Number of Characters
40	Space in 4040	1

41 to 46	Non-Hangeul/Hanja characters (Latin alphabet, Greek, Roman, Japanese Kana, numeric, special symbols)	939

47 to 4F	Reserved

50 to 6C	Hanja characters	5265

6D to 83	Reserved

84 to D3	Hangeul characters (Jamo included)	2672

D4 to DD	User-defined characters	Up to 1880

DE to FE	Reserved

Total number of IBM-defined characters: 8877

Table 30. IBM Simplified Chinese Character Set

Wards	Content	Number of Characters
40	Space in 4040	1

41 to 47	Non-Chinese characters (Latin alphabet, Greek, Russian, Japanese Kana, numeric, special symbols)	712

48 to 6F	Chinese characters: Level 1 and Level 2	3755 and 3008

70 to 75	Reserved

76 to 7F	User-defined characters	Up to 1880

80 to FE	Reserved

Total number of IBM-defined characters: 7476

Table 31. IBM Traditional Chinese Character Set

Wards	Content	Number of Characters

40	Space in 4040	1

41 to 49	Non-Chinese characters (Latin alphabet, Greek, Roman, Japanese Kana, numeric, special symbols)	1003

4A to 4B	Reserved

4C to 68	Primary Chinese characters	5402

69 to 91	Secondary Chinese characters	7654

92 to C1	Reserved

C2 to E2	User-defined characters	Up to 6204

E3 to FE	Reserved

Total number of IBM-defined characters: 14060

This code scheme applies to the iSeries server, System/36, System/38, as well as the System/370 server. A different DBCS code scheme, called the IBM Personal Computer DBCS code scheme, is used on the Personal System/55. For details of the IBM Personal Computer DBCS code scheme, refer to IBM PS/55 publications.

from:http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/dm/rbal3mstdbcscs.htm