Work with DBCS data
The following topics describe how you handle DBCS data in applications that use DBCS-capable device files:
- Checklist: DBCS application design
- Develop applications that process DBCS data
- DBCS code schemes
- DBCS font tables
- DBCS font files
- DBCS sort tables
- DBCS field definition
A DBCS file is a file that contains double-byte data or is used to process double-byte data. Other files are called alphanumeric files. You can view DBCS files on display, printer, tape, diskette, and ICF devices.
You use data description specifications (DDS) to describe DBCS-capable device files. For information about using DDS, see the DDS Reference: Concepts topic.
You should indicate that a file is DBCS in one or more of the following situations:
- The file receives input, or displays or prints output, which has double-byte characters.
- The file contains double-byte literals.
- The file has double-byte literals in the DDS that are used in the file at processing time (such as constant fields and error messages).
- The DDS of the file includes DBCS keywords.
- The file stores double-byte data (database files).
DBCS strings in a mixed data stream
Usually, both single-byte characters and double-byte characters are used in a DBCS environment. For example, an accounting firm in Japan uses both English and Japanese for the spreadsheet. If both English and Japanese are being encoded as mixed SBCS and DBCS, the product must be able to understand a mixed character set that contains both single-byte coded characters and double-byte coded characters.
In IBM systems that use EBCDIC, a DBCS string is bracketed in a mixed data stream by a shift-out (SO) control character and a shift-in (SI) control character.
The following example shows the coding for a mixed string:
sss (SO) D1D2D (SI) ssss
The following example shows the coding for a mixed hexadecimal string:
818283 0E 41424143 0F 818283
Supported code ranges
OS/400 supports Japanese, Korean, Simplified Chinese, and Traditional Chinese character-set code ranges.
Using the iSeries Access family of products, the servers also provide support for these non-IBM personal computer DBCS code pages:
- Republic of Korea National Standard graphic character set (KS)
- Taiwan Industry Standard graphic character set (Big5)
- The People's Republic of China National Standard graphic character set (GB)
from:http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/nls/rbagsenadbcs.htm
DBCS code scheme
IBM supports two DBCS code schemes: one for the host servers, the other for personal computers. The IBM-host code scheme has the following code-range characteristics:
-
First byte
- hex 41 to hex FE Second byte
- hex 41 to hex FE Double-byte blank
- hex 4040
In the following figure, using the first byte as the vertical axis and the second byte as the horizontal axis, 256 x 256 intersections or code points are expressed. The lower-right code area is designated as the valid double-byte code area and x is assigned to the double-byte blank.
Figure 30. IBM-Host Code Scheme
By assigning the values hex 41 to hex FE in the first and second bytes as the DBCS codes, the codes can be grouped in wards with 192 code points in each ward. For example, the code group with the first byte starting with hex 42 is called ward 42. Ward 42 has the same alphanumeric characters as those in a corresponding single-byte EBCDIC code page, but with double-byte codes. For example, the character A is represented in single-byte EBCDIC code as hex C1 and in IBM-host code as hex 42C1.
The iSeries server supports the following double-byte character sets:
- IBM Japanese Character Set
- IBM Korean Character Set
- IBM Simplified Chinese Character Set
- IBM Traditional Chinese Character Set
The following tables show the code ranges for each character set and the number of characters supported in each character set.
Table 28. IBM Japanese Character Set
Wards | Content | Number of Characters |
---|---|---|
40 | Space in 4040 | 1 |
41 to 44 | Non-Kanji characters
| 549 |
45 to 55 | Basic Kanji characters | 3226 |
56 to 68 | Extended Kanji characters | 3487 |
69 to 7F | User-defined characters | Up to 4370 |
80 to FE | Reserved | |
|
Table 29. IBM Korean Character Set
Wards | Content | Number of Characters |
---|---|---|
40 | Space in 4040 | 1 |
41 to 46 | Non-Hangeul/Hanja characters (Latin alphabet, Greek, Roman, Japanese Kana, numeric, special symbols) | 939 |
47 to 4F | Reserved | |
50 to 6C | Hanja characters | 5265 |
6D to 83 | Reserved | |
84 to D3 | Hangeul characters (Jamo included) | 2672 |
D4 to DD | User-defined characters | Up to 1880 |
DE to FE | Reserved | |
|
Table 30. IBM Simplified Chinese Character Set
Wards | Content | Number of Characters |
---|---|---|
40 | Space in 4040 | 1 |
41 to 47 | Non-Chinese characters (Latin alphabet, Greek, Russian, Japanese Kana, numeric, special symbols) | 712 |
48 to 6F | Chinese characters: Level 1 and Level 2 | 3755 and 3008 |
70 to 75 | Reserved | |
76 to 7F | User-defined characters | Up to 1880 |
80 to FE | Reserved | |
|
Table 31. IBM Traditional Chinese Character Set
Wards | Content | Number of Characters |
---|---|---|
40 | Space in 4040 | 1 |
41 to 49 | Non-Chinese characters (Latin alphabet, Greek, Roman, Japanese Kana, numeric, special symbols) | 1003 |
4A to 4B | Reserved | |
4C to 68 | Primary Chinese characters | 5402 |
69 to 91 | Secondary Chinese characters | 7654 |
92 to C1 | Reserved | |
C2 to E2 | User-defined characters | Up to 6204 |
E3 to FE | Reserved | |
|
This code scheme applies to the iSeries server, System/36, System/38, as well as the System/370 server. A different DBCS code scheme, called the IBM Personal Computer DBCS code scheme, is used on the Personal System/55. For details of the IBM Personal Computer DBCS code scheme, refer to IBM PS/55 publications.
from:http://publib.boulder.ibm.com/iseries/v5r2/ic2924/info/dm/rbal3mstdbcscs.htm