说明:JDK 5.0 和 JRE 5.0 的欧洲语言版本只支持基本编码集里的编码,国际化版本支持基本编码集和扩展编码集里的所有编码。
Basic Encoding Set (contained in jre/lib/rt.jar)
Canonical Name for java.nio API | Canonical Name for java.io and java.lang API | Description |
---|---|---|
ISO-8859-1 | ISO8859_1 | ISO 8859-1, Latin Alphabet No. 1 |
ISO-8859-2 | ISO8859_2 | Latin Alphabet No. 2 |
ISO-8859-4 | ISO8859_4 | Latin Alphabet No. 4 |
ISO-8859-5 | ISO8859_5 | Latin/Cyrillic Alphabet |
ISO-8859-7 | ISO8859_7 | Latin/Greek Alphabet |
ISO-8859-9 | ISO8859_9 | Latin Alphabet No. 5 |
ISO-8859-13 | ISO8859_13 | Latin Alphabet No. 7 |
ISO-8859-15 | ISO8859_15 | Latin Alphabet No. 9 |
KOI8-R | KOI8_R | KOI8-R, Russian |
US-ASCII | ASCII | American Standard Code for Information Interchange |
UTF-8 | UTF8 | Eight-bit UCS Transformation Format |
UTF-16 | UTF-16 | Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark |
UTF-16BE | UnicodeBigUnmarked | Sixteen-bit Unicode Transformation Format, big-endian byte order |
UTF-16LE | UnicodeLittleUnmarked | Sixteen-bit Unicode Transformation Format, little-endian byte order |
windows-1250 | Cp1250 | Windows Eastern European |
windows-1251 | Cp1251 | Windows Cyrillic |
windows-1252 | Cp1252 | Windows Latin-1 |
windows-1253 | Cp1253 | Windows Greek |
windows-1254 | Cp1254 | Windows Turkish |
windows-1257 | Cp1257 | Windows Baltic |
Not available | UnicodeBig | Sixteen-bit Unicode Transformation Format, big-endian byte order, with byte-order mark |
Not available | UnicodeLittle | Sixteen-bit Unicode Transformation Format, little-endian byte order, with byte-order mark |
Extended Encoding Set (contained in jre/lib/charsets.jar)
Canonical Name for java.nio API | Canonical Name for java.io and java.lang API | Description |
---|---|---|
Big5 | Big5 | Big5, Traditional Chinese |
Big5-HKSCS | Big5_HKSCS | Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision) |
EUC-JP | EUC_JP | JISX 0201, 0208 and 0212, EUC encoding Japanese |
EUC-KR | EUC_KR | KS C 5601, EUC encoding, Korean |
GB18030 | GB18030 | Simplified Chinese, PRC standard |
GB2312 | EUC_CN | GB2312, EUC encoding, Simplified Chinese |
GBK | GBK | GBK, Simplified Chinese |
IBM-Thai | Cp838 | IBM Thailand extended SBCS |
IBM00858 | Cp858 | Variant of Cp850 with Euro character |
IBM01140 | Cp1140 | Variant of Cp037 with Euro character |
IBM01141 | Cp1141 | Variant of Cp273 with Euro character |
IBM01142 | Cp1142 | Variant of Cp277 with Euro character |
IBM01143 | Cp1143 | Variant of Cp278 with Euro character |
IBM01144 | Cp1144 | Variant of Cp280 with Euro character |
IBM01145 | Cp1145 | Variant of Cp284 with Euro character |
IBM01146 | Cp1146 | Variant of Cp285 with Euro character |
IBM01147 | Cp1147 | Variant of Cp297 with Euro character |
IBM01148 | Cp1148 | Variant of Cp500 with Euro character |
IBM01149 | Cp1149 | Variant of Cp871 with Euro character |
IBM037 | Cp037 | USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia |
IBM1026 | Cp1026 | IBM Latin-5, Turkey |
IBM1047 | Cp1047 | Latin-1 character set for EBCDIC hosts |
IBM273 | Cp273 | IBM Austria, Germany |
IBM277 | Cp277 | IBM Denmark, Norway |
IBM278 | Cp278 | IBM Finland, Sweden |
IBM280 | Cp280 | IBM Italy |
IBM284 | Cp284 | IBM Catalan/Spain, Spanish Latin America |
IBM285 | Cp285 | IBM United Kingdom, Ireland |
IBM297 | Cp297 | IBM France |
IBM420 | Cp420 | IBM Arabic |
IBM424 | Cp424 | IBM Hebrew |
IBM437 | Cp437 | MS-DOS United States, Australia, New Zealand, South Africa |
IBM500 | Cp500 | EBCDIC 500V1 |
IBM775 | Cp775 | PC Baltic |
IBM850 | Cp850 | MS-DOS Latin-1 |
IBM852 | Cp852 | MS-DOS Latin-2 |
IBM855 | Cp855 | IBM Cyrillic |
IBM857 | Cp857 | IBM Turkish |
IBM860 | Cp860 | MS-DOS Portuguese |
IBM861 | Cp861 | MS-DOS Icelandic |
IBM862 | Cp862 | PC Hebrew |
IBM863 | Cp863 | MS-DOS Canadian French |
IBM864 | Cp864 | PC Arabic |
IBM865 | Cp865 | MS-DOS Nordic |
IBM866 | Cp866 | MS-DOS Russian |
IBM868 | Cp868 | MS-DOS Pakistan |
IBM869 | Cp869 | IBM Modern Greek |
IBM870 | Cp870 | IBM Multilingual Latin-2 |
IBM871 | Cp871 | IBM Iceland |
IBM918 | Cp918 | IBM Pakistan (Urdu) |
ISO-2022-CN | ISO2022CN | GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only) |
ISO-2022-JP | ISO2022JP | JIS X 0201, 0208, in ISO 2022 form, Japanese |
ISO-2022-KR | ISO2022KR | ISO 2022 KR, Korean |
ISO-8859-3 | ISO8859_3 | Latin Alphabet No. 3 |
ISO-8859-6 | ISO8859_6 | Latin/Arabic Alphabet |
ISO-8859-8 | ISO8859_8 | Latin/Hebrew Alphabet |
Shift_JIS | SJIS | Shift-JIS, Japanese |
TIS-620 | TIS620 | TIS620, Thai |
windows-1255 | Cp1255 | Windows Hebrew |
windows-1256 | Cp1256 | Windows Arabic |
windows-1258 | Cp1258 | Windows Vietnamese |
windows-31j | MS932 | Windows Japanese |
x-Big5_Solaris | Big5_Solaris | Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale |
x-euc-jp-linux | EUC_JP_LINUX | JISX 0201, 0208, EUC encoding Japanese |
x-EUC-TW | EUC_TW | CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese |
x-eucJP-Open | EUC_JP_Solaris | JISX 0201, 0208, 0212, EUC encoding Japanese |
x-IBM1006 | Cp1006 | IBM AIX Pakistan (Urdu) |
x-IBM1025 | Cp1025 | IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR) |
x-IBM1046 | Cp1046 | IBM Arabic - Windows |
x-IBM1097 | Cp1097 | IBM Iran (Farsi)/Persian |
x-IBM1098 | Cp1098 | IBM Iran (Farsi)/Persian (PC) |
x-IBM1112 | Cp1112 | IBM Latvia, Lithuania |
x-IBM1122 | Cp1122 | IBM Estonia |
x-IBM1123 | Cp1123 | IBM Ukraine |
x-IBM1124 | Cp1124 | IBM AIX Ukraine |
x-IBM1381 | Cp1381 | IBM OS/2, DOS People's Republic of China (PRC) |
x-IBM1383 | Cp1383 | IBM AIX People's Republic of China (PRC) |
x-IBM33722 | Cp33722 | IBM-eucJP - Japanese (superset of 5050) |
x-IBM737 | Cp737 | PC Greek |
x-IBM856 | Cp856 | IBM Hebrew |
x-IBM874 | Cp874 | IBM Thai |
x-IBM875 | Cp875 | IBM Greek |
x-IBM921 | Cp921 | IBM Latvia, Lithuania (AIX, DOS) |
x-IBM922 | Cp922 | IBM Estonia (AIX, DOS) |
x-IBM930 | Cp930 | Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026 |
x-IBM933 | Cp933 | Korean Mixed with 1880 UDC, superset of 5029 |
x-IBM935 | Cp935 | Simplified Chinese Host mixed with 1880 UDC, superset of 5031 |
x-IBM937 | Cp937 | Traditional Chinese Host miexed with 6204 UDC, superset of 5033 |
x-IBM939 | Cp939 | Japanese Latin Kanji mixed with 4370 UDC, superset of 5035 |
x-IBM942 | Cp942 | IBM OS/2 Japanese, superset of Cp932 |
x-IBM942C | Cp942C | Variant of Cp942 |
x-IBM943 | Cp943 | IBM OS/2 Japanese, superset of Cp932 and Shift-JIS |
x-IBM943C | Cp943C | Variant of Cp943 |
x-IBM948 | Cp948 | OS/2 Chinese (Taiwan) superset of 938 |
x-IBM949 | Cp949 | PC Korean |
x-IBM949C | Cp949C | Variant of Cp949 |
x-IBM950 | Cp950 | PC Chinese (Hong Kong, Taiwan) |
x-IBM964 | Cp964 | AIX Chinese (Taiwan) |
x-IBM970 | Cp970 | AIX Korean |
x-ISCII91 | ISCII91 | ISCII91 encoding of Indic scripts |
x-ISO2022-CN-CNS | ISO2022_CN_CNS | CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only) |
x-ISO2022-CN-GB | ISO2022_CN_GB | GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only) |
x-iso-8859-11 | x-iso-8859-11 | Latin/Thai Alphabet |
x-JISAutoDetect | JISAutoDetect | Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only) |
x-Johab | x-Johab | Korean, Johab character set |
x-MacArabic | MacArabic | Macintosh Arabic |
x-MacCentralEurope | MacCentralEurope | Macintosh Latin-2 |
x-MacCroatian | MacCroatian | Macintosh Croatian |
x-MacCyrillic | MacCyrillic | Macintosh Cyrillic |
x-MacDingbat | MacDingbat | Macintosh Dingbat |
x-MacGreek | MacGreek | Macintosh Greek |
x-MacHebrew | MacHebrew | Macintosh Hebrew |
x-MacIceland | MacIceland | Macintosh Iceland |
x-MacRoman | MacRoman | Macintosh Roman |
x-MacRomania | MacRomania | Macintosh Romania |
x-MacSymbol | MacSymbol | Macintosh Symbol |
x-MacThai | MacThai | Macintosh Thai |
x-MacTurkish | MacTurkish | Macintosh Turkish |
x-MacUkraine | MacUkraine | Macintosh Ukraine |
x-MS950-HKSCS | MS950_HKSCS | Windows Traditional Chinese with Hong Kong extensions |
x-mswin-936 | MS936 | Windows Simplified Chinese |
x-PCK | PCK | Solaris version of Shift_JIS |
x-windows-874 | MS874 | Windows Thai |
x-windows-949 | MS949 | Windows Korean |
x-windows-950 | MS950 | Windows Traditional Chinese |
以上编码集按编码名排序后的列表如下:
-------------------------------------------------------------
Converter Description
Class
-------------------------------------------------------------
8859_1 ISO 8859-1
8859_2 ISO 8859-2
8859_3 ISO 8859-3
8859_4 ISO 8859-4
8859_5 ISO 8859-5
8859_6 ISO 8859-6
8859_7 ISO 8859-7
8859_8 ISO 8859-8
8859_9 ISO 8859-9
Big5 Big5, Traditional Chinese
CNS11643 CNS 11643, Traditional Chinese
Cp037 USA, Canada(Bilingual, French), Netherlands,
Portugal, Brazil, Australia
Cp1006 IBM AIX Pakistan (Urdu)
Cp1025 IBM Multilingual Cyrillic: Bulgaria, Bosnia,
Herzegovinia, Macedonia(FYR)
Cp1026 IBM Latin-5, Turkey
Cp1046 IBM Open Edition US EBCDIC
Cp1097 IBM Iran(Farsi)/Persian
Cp1098 IBM Iran(Farsi)/Persian (PC)
Cp1112 IBM Latvia, Lithuania
Cp1122 IBM Estonia
Cp1123 IBM Ukraine
Cp1124 IBM AIX Ukraine
Cp1125 IBM Ukraine (PC)
Cp1250 Windows Eastern European
Cp1251 Windows Cyrillic
Cp1252 Windows Latin-1
Cp1253 Windows Greek
Cp1254 Windows Turkish
Cp1255 Windows Hebrew
Cp1256 Windows Arabic
Cp1257 Windows Baltic
Cp1258 Windows Vietnamese
Cp1381 IBM OS/2, DOS People's Republic of China (PRC)
Cp1383 IBM AIX People's Republic of China (PRC)
Cp273 IBM Austria, Germany
Cp277 IBM Denmark, Norway
Cp278 IBM Finland, Sweden
Cp280 IBM Italy
Cp284 IBM Catalan/Spain, Spanish Latin America
Cp285 IBM United Kingdom, Ireland
Cp297 IBM France
Cp33722 IBM-eucJP - Japanese (superset of 5050)
Cp420 IBM Arabic
Cp424 IBM Hebrew
Cp437 MS-DOS United States, Australia, New Zealand,
South Africa
Cp500 EBCDIC 500V1
Cp737 PC Greek
Cp775 PC Baltic
Cp838 IBM Thailand extended SBCS
Cp850 MS-DOS Latin-1
Cp852 MS-DOS Latin-2
Cp855 IBM Cyrillic
Cp857 IBM Turkish
Cp860 MS-DOS Portuguese
Cp861 MS-DOS Icelandic
Cp862 PC Hebrew
Cp863 MS-DOS Canadian French
Cp864 PC Arabic
Cp865 MS-DOS Nordic
Cp866 MS-DOS Russian
Cp868 MS-DOS Pakistan
Cp869 IBM Modern Greek
Cp870 IBM Multilingual Latin-2
Cp871 IBM Iceland
Cp874 IBM Thai
Cp875 IBM Greek
Cp918 IBM Pakistan(Urdu)
Cp921 IBM Latvia, Lithuania (AIX, DOS)
Cp922 IBM Estonia (AIX, DOS)
Cp930 Japanese Katakana-Kanji mixed with 4370 UDC,
superset of 5026
Cp933 Korean Mixed with 1880 UDC, superset of 5029
Cp935 Simplified Chinese Host mixed with 1880 UDC,
superset of 5031
Cp937 Traditional Chinese Host miexed with 6204 UDC,
superset of 5033
Cp939 Japanese Latin Kanji mixed with 4370 UDC,
superset of 5035
Cp942 Japanese (OS/2) superset of 932
Cp948 OS/2 Chinese (Taiwan) superset of 938
Cp949 PC Korean
Cp950 PC Chinese (Hong Kong, Taiwan)
Cp964 AIX Chinese (Taiwan)
Cp970 AIX Korean
EUCJIS JIS, EUC Encoding, Japanese
GB2312 GB2312, EUC encoding, Simplified Chinese
GBK GBK, Simplified Chinese
ISO2022CN ISO 2022 CN, Chinese
ISO2022CN_CNS CNS 11643 in ISO-2022-CN form, T. Chinese
ISO2022CN_GB GB 2312 in ISO-2022-CN form, S. Chinese
ISO2022KR ISO 2022 KR, Korean
JIS JIS, Japanese
JIS0208 JIS 0208, Japanese
KOI8_R KOI8-R, Russian
KSC5601 KS C 5601, Korean
MS874 Windows Thai
MacArabic Macintosh Arabic
MacCentralEurope Macintosh Latin-2
MacCroatian Macintosh Croatian
MacCyrillic Macintosh Cyrillic
MacDingbat Macintosh Dingbat
MacGreek Macintosh Greek
MacHebrew Macintosh Hebrew
MacIceland Macintosh Iceland
MacRoman Macintosh Roman
MacRomania Macintosh Romania
MacSymbol Macintosh Symbol
MacThai Macintosh Thai
MacTurkish Macintosh Turkish
MacUkraine Macintosh Ukraine
SJIS Shift-JIS, Japanese
UTF8 UTF-8
-------------------------------------------------------------
参考:
1. http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
2. http://www.chinaitpower.com/A200507/2005-07-24/165716.html