All currently supported encoding schemes are listed below. Specifications
are added to the NLS Libraries on a continuing basis, so refer to the
relevant porting notes for details of which specific version supports
which encoding schemes.
Many different character encoding schemes (commonly called character
sets or code pages) are used by terminals and printers. These are in two
groups, those based on the 7-bit ASCII standard, and those based on
IBM's EBCDIC. Single-byte encoding schemes are used for European
languages, and multi-byte for Asian languages.
In V6, linguistic sort sequences are defined for each character encoding
scheme. This presents a problem in the case of multi-national encoding
schemes that support several languages, since these may require
different sort sequences. For example, the 'a umlaut' is sorted before 'b'
in German but after 'z' in Swedish. Hence, it is not possible in general
to define a sort sequence for an 8-bit multi-national encoding scheme
such as ISO 8859-1 that follows the expected conventions for all
languages simultaneously. To overcome this problem, a 'standard'
multi-national sort is used for such encoding schemes. Additional
specifications are provided where different sort sequences are required
for specific languages. For example, ISO 8859-1 with the 'standard'
multi-national sort is defined using the acronym WE8ISO8859P1. Other
variants of ISO 8859-1 are also defined, using language-specific sorting
sequences, for example:
N8ISO8859P1 - for Norwegian sorting
DK8ISO8859P1 - for Danish sorting
S8ISO8859P1 - for Swedish sorting
SF8ISO8859P1 - for Finnish sorting
In V7, linguistic sort sequences are defined independently of individual
character encoding schemes. Hence, the additional sort variants used in
V6 are obselete in V7. Linguistic sort sequences for V7 (also listed
below) are referred to by name, and in general each supported language
has a defined linguistic sort of the same name. For some languages,
'extended' linguistic sorts are defined, for example 'XSPANISH'. These
extended sort sequences cater for language-specific special cases such
as the Spanish double characters 'll' and 'ch' which as sorted as if they
were single characters.
Single-Byte ANSI and ISO Standards:
ID Acronym Description
--- ------- -----------
001 US7ASCII ASCII 7-bit American
015 SF7ASCII ASCII 7-bit Finnish
031 WE8ISO8859P1 ISO 8859-1 West European
301 N8ISO8859P1 Norwegian sort ISO 8859-1
311 DK8ISO8859P1 Danish sort ISO 8859-1
321 S8ISO8859P1 Swedish sort ISO 8859-1
331 SF8ISO8859P1 Finnish sort ISO 8859-1
341 IS8ISO8859P1 Icelandic sort ISO 8859-1
032 EE8ISO8859P2 ISO 8859-2 East European
154 HU8ISO8859P2 Hungarian sort ISO 8859-2
360 SK8ISO8859P2 Slovak sort ISO 8859-2
364 CS8ISO8859P2 Czech sort ISO 8859-2
366 PL8ISO8859P2 Polish sort ISO 8859-2
033 SE8ISO8859P3 ISO 8859-3 South European
034 NEE8ISO8859P4 ISO 8859-4 North and North-East European
035 CL8ISO8859P5 ISO 8859-5 Latin/Cyrillic
036 AR8ISO8859P6 ISO 8859-6 Latin / Arabic
037 EL8ISO8859P7 ISO 8859-7 Latin / Greek
038 IW8ISO8859P8 ISO 8859-8 Latin / Hebrew
039 WE8ISO8859P9 ISO 8859-9 West European & Turkish
370 TR8ISO8859P9 Turkish version ISO 8859-9
041 TH8TISASCII Thai Industrial Standard 620-2533 (ASCII)
042 TH8TISEBCDIC Thai Industrial Standard 620-2533 (EBCDIC)
Single-Byte DEC Specific:
ID Acronym Description
--- ------- -----------
011 D7DEC VT100 7-bit German
012 F7DEC VT100 7-bit French
013 S7DEC VT100 7-bit Swedish
014 E7DEC VT100 7-bit Spanish
016 NDK7DEC VT100 7-bit Norwegian/Danish (Norwegian sort)
020 DKN7DEC Danish sort DEC VT100 7-bit Norwegian/Danish
017 I7DEC VT100 7-bit Italian
018 NL7DEC VT100 7-bit Dutch
019 CH7DEC VT100 7-bit Swiss (German & French)
021 SF7DEC VT100 7-bit Finnish
022 TR7DEC VT100 7-bit Turkish
002 WE8DEC West European
302 N8DEC Norwegian sort West European
312 DK8DEC Danish sort West European
322 S8DEC Swedish sort West European
332 SF8DEC Finnish sort West European
081 EL8DEC Latin / Greek
082 TR8DEC Turkish
Single-Byte DG Specific:
ID Acronym Description
241 WE8DG DG West European
Single-Byte HP Specific:
ID Acronym Description
--- ------- -----------
003 WE8HPHP LaserJet West European
303 N8HP Norwegian sort HP LaserJet
313 DK8HP Danish sort HP LaserJet
323 S8HP Swedish sort HP LaserJet
333 SF8HP Finnish sort HP LaserJet
261 WE8ROMAN8 HP Roman8 West European
307 N8ROMAN8 Norwegian sort HP Roman8
317 DK8ROMAN8 Danish sort HP Roman8
327 S8ROMAN8 Swedish sort HP Roman8
337 SF8ROMAN8 Finnish sort HP Roman8
Single-Byte IBM PC Specific:
ID Acronym Description
--- ------- -----------
004 US8PC437 Code Page 437 American
380 EL8PC437S Greek modified Code Page 437
010 WE8PC850 Code Page 850 West European
306 N8PC850 Norwegian sort Code Page 850
316 DK8PC850 Danish sort Code Page 850
326 S8PC850 Swedish sort Code Page 850
336 SF8PC850 Finnish sort Code Page 850
150 EE8PC852 Code Page 852 East European
363 HU8PC852 Hungarian sort Code Page 852
361 SK8PC852 Slovak sort Code Page 852
365 CS8PC852 Czech sort Code Page 852
367 PL8PC852 Polish sort Code Page 852
152 RU8PC866 Code Page 866 Latin/Cyrillic
155 RU8PC855 Code Page 855 Latin/Cyrillic
156 TR8PC857 Code Page 857 Turkish
160 WE8PC860 Code Page 860 West European
190 N8PC865 Code Page 865 Norwegian
390 CDN8PC863 Code Page 863 Canadian French
Single-Byte EBCDIC IBM Mainframe Specific:
ID Acronym Description
--- ------- -----------
005 WE8EBCDIC37 Code Page 37 West European
090 WE8EBCDIC37C Code Page 37 Oracle/c
304 N8EBCDIC37 Norwegian sort Code Page 37
314 DK8EBCDIC37 Danish sort Code Page 37
324 S8EBCDIC37 Swedish sort Code Page 37
334 SF8EBCDIC37 Finnish sort Code Page 37
006 WE8EBCDIC500 Code Page 500 West European
091 WE8EBCDIC500C Code Page 500 Oracle/c
305 N8EBCDIC500 Norwegian sort Code Page 500
315 DK8EBCDIC500 Danish sort Code Page 500
325 S8EBCDIC500 Swedish sort Code Page 500
335 SF8EBCDIC500 Finnish sort Code Page 500
180 D8EBCDIC273 Code Page 273/1 Austrian/German
182 DK8EBCDIC277 Code Page 277/1 Danish
183 S8EBCDIC278 Code Page 278/1 Swedish
181 I8EBCDIC280 Code Page 280/1 Italian
381 EL8EBCDIC875 Code Page 875 Greek
Single-Byte MAC Specific:
ID Acronym Description
--- ------- -----------
351 WE8MACROMAN8 Mac Extended ROMAN8 West European
158 CL8MACCYRILLIC Mac Latin/Cyrillic
Single-Byte Microsoft Specific:
ID Acronym Description
--- ------- -----------
157 CL8MSWINDOW31 Windows 3.1 Latin/Cyrillic
Single-Byte NCR Specific:
ID Acronym Description
--- ------- -----------
251 WE8NCR4970 NCR 4970 West European
Single-Byte Siemens Specific:
ID Acronym Description
--- ------- -----------
201 F7SIEMENS9780X 97801/97808 7-bit French
202 E7SIEMENS9780X 97801/97808 7-bit Spanish
203 S7SIEMENS9780X 97801/97808 7-bit Swedish
204 DK7SIEMENS9780X 97801/97808 7-bit Danish
205 N7SIEMENS9780X 97801/97808 7-bit Norwegian
206 I7SIEMENS9780X 97801/97808 7-bit Italian
207 D7SIEMENS9780X 97801/97808 7-bit German
221 US8BS200 9750-62 EBCDIC American
222 D8BS2000 9750-62 EBCDIC German
223 F8BS2000 9750-62 EBCDIC French
224 E8BS200 9750-62 EBCDIC Spanish
225 DK8BS2000 9750-62 EBCDIC Danish
231 WE8BS2000 EBCDIC.DF.04 West European
239 WE8BS2000L5 Siemens EBCDI+.DF.04.L5 WE & Turkish
Single-Byte BESTA Specific:
ID Acronym Description
--- ------- -----------
153 RU8BESTA BESTA Latin/Cyrillic
Single-Byte European Community (EEC) Specific:
ID Acronym Description
--- ------- -----------
100 EEC8EUROPC EURO-PC West European/Greek
101 EEC8EECEUROUNIX EURO-UNIX West European/Greek
102 WE8EEC8859P1 EURO-UNIX copy of ISO 8859-1
103 EL8EEC8859P7 EURO-UNIX copy of ISO 8859-7
110 EEC8EUROASCI Targon 35 ASCI West European/Greek
111 EEC8ISO8859P1 Targon 35 copy of ISO 8859-1
112 EEC8ISO8859P7 Targon 35 copy of ISO 8859-7
113 EEC8EUROPA3 EUROPA3 West European/Greek
Other Single-Byte Customer Specific:
ID Acronym Description
--- ------- -----------
401 HU8ABMOD Hungarian AB Mod
368 HU8CWI2 Hungarian CWI-2
590 LA8TELETEX Teletex Latin
Multi-Byte:
ID Acronym Description
--- ------- -----------
829 JA16VMS Japanese VMS Kanji
830 JA16EUC Japanese Extended UNIX Code
832 JA16SJIS Japanese Shift-JIS
833 JA16DBCS Japanese IBM
843 JA16HP Japanese HP
840 KO16KSC5601 Korean KSC5601
842 KO16DBCS Korean IBM
850 ZHS16CGB231280 Chinese GB2312-80
860 ZHT32CNS1164386 Taiwan Traditional Chinese
865 ZHT16BIG5 Big5 Traditional Chinese
Note:
ID 839 is assigned to a Toshiba version of JEUC (JA16THOSHIBAEUC).
Linguistic Sorts V7:
ID Sort Name(s)
-- ------------
ar Arabic
cs Czech and XCzech
da Danish and XDanish
nl Dutch and XDutch
fi Finnish
fr French
de German and XGerman
de2 German_DIN and XGerman_DIN
el Greek
iw Hebrew
hu Hungarian and XHungarian
is Icelandic
it Italian
lag Latin
lt Lithuanian
no Norwegian
pl Polish
ru Russian
sk Slovak and XSlovak
es Spanish and XSpanish
sv Swedish
ch Swiss and XSwiss
tr Turkish and XTurkish
weg West_European and XWest_European