作者:瀚高PG实验室 (Highgo PG Lab)
目录
文档用途
详细信息
文档用途
本文说明了在瀚高数据库中查看数据库字符集、客户端字符集的方法,不同字符集之间迁移的方法及风险
详细信息
一、背景:
由于瀚高数据库/PostgreSQL数据库是多库结构,因此需要明确说明如下:
1.瀚高数据库/PostgreSQL数据库没有cluster(即:实例)层面的字符集。
2.瀚高数据库/PostgreSQL数据库的字符集是指瀚高数据库/PostgreSQL数据库cluster中某一个database的字符集。
3.本文所说的字符集是Encoding
瀚高数据库/PostgreSQL数据库支持的字符集见下:
名称 | 描述 | 语言 | 是否服务器端字符集 | ICU? | Bytes/Char | 别名 |
---|---|---|---|---|---|---|
BIG5 | Big Five | Traditional Chinese | No | No | 1-2 | WIN950 , Windows950 |
EUC_CN | Extended UNIX Code-CN | Simplified Chinese | Yes | Yes | 1-3 | |
EUC_JP | Extended UNIX Code-JP | Japanese | Yes | Yes | 1-3 | |
EUC_JIS_2004 | Extended UNIX Code-JP, JIS X 0213 | Japanese | Yes | No | 1-3 | |
EUC_KR | Extended UNIX Code-KR | Korean | Yes | Yes | 1-3 | |
EUC_TW | Extended UNIX Code-TW | Traditional Chinese, Taiwanese | Yes | Yes | 1-3 | |
GB18030 | National Standard | Chinese | No | No | 1-4 | |
GBK | Extended National Standard | Simplified Chinese | No | No | 1-2 | WIN936 , Windows936 |
ISO_8859_5 | ISO 8859-5, ECMA 113 | Latin/Cyrillic | Yes | Yes | 1 | |
ISO_8859_6 | ISO 8859-6, ECMA 114 | Latin/Arabic | Yes | Yes | 1 | |
ISO_8859_7 | ISO 8859-7, ECMA 118 | Latin/Greek | Yes | Yes | 1 | |
ISO_8859_8 | ISO 8859-8, ECMA 121 | Latin/Hebrew | Yes | Yes | 1 | |
JOHAB | JOHAB | Korean (Hangul) | No | No | 1-3 | |
KOI8R | KOI8-R | Cyrillic (Russian) | Yes | Yes | 1 | KOI8 |
KOI8U | KOI8-U | Cyrillic (Ukrainian) | Yes | Yes | 1 | |
LATIN1 | ISO 8859-1, ECMA 94 | Western European | Yes | Yes | 1 | ISO88591 |
LATIN2 | ISO 8859-2, ECMA 94 | Central European | Yes | Yes | 1 | ISO88592 |
LATIN3 | ISO 8859-3, ECMA 94 | South European | Yes | Yes | 1 | ISO88593 |
LATIN4 | ISO 8859-4, ECMA 94 | North European | Yes | Yes | 1 | ISO88594 |
LATIN5 | ISO 8859-9, ECMA 128 | Turkish | Yes | Yes | 1 | ISO88599 |
LATIN6 | ISO 8859-10, ECMA 144 | Nordic | Yes | Yes | 1 | ISO885910 |
LATIN7 | ISO 8859-13 | Baltic | Yes | Yes | 1 | ISO885913 |
LATIN8 | ISO 8859-14 | Celtic | Yes | Yes | 1 | ISO885914 |
LATIN9 | ISO 8859-15 | LATIN1 with Euro and accents | Yes | Yes | 1 | ISO885915 |
LATIN10 | ISO 8859-16, ASRO SR 14111 | Romanian | Yes | No | 1 | ISO885916 |
MULE_INTERNAL | Mule internal code | Multilingual Emacs | Yes | No | 1-4 | |
SJIS | Shift JIS | Japanese | No | No | 1-2 | Mskanji , ShiftJIS , WIN932 , Windows932 |
SHIFT_JIS_2004 | Shift JIS, JIS X 0213 | Japanese | No | No | 1-2 | |
SQL_ASCII | unspecified (see text) | any | Yes | No | 1 | |
UHC | Unified Hangul Code | Korean | No | No | 1-2 | WIN949 , Windows949 |
UTF8 | Unicode, 8-bit | all | Yes | Yes | 1-4 | Unicode |
WIN866 | Windows CP866 | Cyrillic | Yes | Yes | 1 | ALT |
WIN874 | Windows CP874 | Thai | Yes | No | 1 | |
WIN1250 | Windows CP1250 | Central European | Yes | Yes | 1 | |
WIN1251 | Windows CP1251 | Cyrillic | Yes | Yes | 1 | WIN |
WIN1252 | Windows CP1252 | Western European | Yes | Yes | 1 | |
WIN1253 | Windows CP1253 | Greek | Yes | Yes | 1 | |
WIN1254 | Windows CP1254 | Turkish | Yes | Yes | 1 | |
WIN1255 | Windows CP1255 | Hebrew | Yes | Yes | 1 | |
WIN1256 | Windows CP1256 | Arabic | Yes | Yes | 1 | |
WIN1257 | Windows CP1257 | Baltic | Yes | Yes | 1 | |
WIN1258 | Windows CP1258 | Vietnamese | Yes | Yes | 1 | ABC , TCVN , TCVN5712 , VSCII |
数据库字符集是建立数据库的时候指定的,比如下面的两个命令:
createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
或者
CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
需要注意一点:
上面第一条语句不应该简写为:createdb -E
原因是:在瀚高数据库中的一个重要限制是:数据库的字符集需要与LC_COLLATE、LC_CTYPE相匹配。
若是在createdb -E时不指定LC_COLLATE、LC_CTYPE,那么createdb -E是从环境变量中获取这两个值。
CREATE DATABASE语句同理。
查看某个database字符集的方法
1.psql -l或者查询pg_database系统表或者psql中执行\l
2.psql中执行show server_encoding;因为登录psql时需要指定database,所以,只要能成功登录psql,就说明是成功登录了该database。
查看客户端字符集的方法:
psql中执行show client_encoding;
设置客户端字符集的方法:
1.psql中执行\encoding SJIS
2.libpq中设置
3.psql中执行set client_encoding to 'LATIN1';
4.psql中执行set names 'LATIN3';
5.在客户端上使用PGCLIENTENCODING环境变量,此时,在与服务器建立连接时会自动选择该PGCLIENTENCODING的值。之后,可以使用上面的任何一个方法覆盖掉环境变量PGCLIENTENCODING的值
6.使用配置参数client_encoding,若是client_encoding被设置,当客户端与服务器端建立连接之后,会自动选择client_encoding参数值,之后,可以使用上面的任何一个方法覆盖掉环境变量PGCLIENTENCODING的值
二、不同字符集之间迁移数据的方法
迁移需要在满足“瀚高数据库支持client端与服务器端转换的字符集匹配关系”的前提下进行转换,该匹配关系见下面的表格:
服务器端字符集 | 可用的客户端字符集 |
---|---|
BIG5 | not supported as a server encoding |
EUC_CN | EUC_CN, MULE_INTERNAL , UTF8 |
EUC_JP | EUC_JP, MULE_INTERNAL , SJIS , UTF8 |
EUC_JIS_2004 | EUC_JIS_2004, SHIFT_JIS_2004 , UTF8 |
EUC_KR | EUC_KR, MULE_INTERNAL , UTF8 |
EUC_TW | EUC_TW, BIG5 , MULE_INTERNAL , UTF8 |
GB18030 | not supported as a server encoding |
GBK | not supported as a server encoding |
ISO_8859_5 | ISO_8859_5, KOI8R , MULE_INTERNAL , UTF8 , WIN866 , WIN1251 |
ISO_8859_6 | ISO_8859_6, UTF8 |
ISO_8859_7 | ISO_8859_7, UTF8 |
ISO_8859_8 | ISO_8859_8, UTF8 |
JOHAB | not supported as a server encoding |
KOI8R | KOI8R, ISO_8859_5 , MULE_INTERNAL , UTF8 , WIN866 , WIN1251 |
KOI8U | KOI8U, UTF8 |
LATIN1 | LATIN1, MULE_INTERNAL , UTF8 |
LATIN2 | LATIN2, MULE_INTERNAL , UTF8 , WIN1250 |
LATIN3 | LATIN3, MULE_INTERNAL , UTF8 |
LATIN4 | LATIN4, MULE_INTERNAL , UTF8 |
LATIN5 | LATIN5, UTF8 |
LATIN6 | LATIN6, UTF8 |
LATIN7 | LATIN7, UTF8 |
LATIN8 | LATIN8, UTF8 |
LATIN9 | LATIN9, UTF8 |
LATIN10 | LATIN10, UTF8 |
MULE_INTERNAL | MULE_INTERNAL, BIG5 , EUC_CN , EUC_JP , EUC_KR , EUC_TW , ISO_8859_5 , KOI8R , LATIN1 to LATIN4 , SJIS , WIN866 , WIN1250 , WIN1251 |
SJIS | not supported as a server encoding |
SHIFT_JIS_2004 | not supported as a server encoding |
SQL_ASCII | any (no conversion will be performed) |
UHC | not supported as a server encoding |
UTF8 | all supported encodings |
WIN866 | WIN866, ISO_8859_5 , KOI8R , MULE_INTERNAL , UTF8 , WIN1251 |
WIN874 | WIN874, UTF8 |
WIN1250 | WIN1250, LATIN2 , MULE_INTERNAL , UTF8 |
WIN1251 | WIN1251, ISO_8859_5 , KOI8R , MULE_INTERNAL , UTF8 , WIN866 |
WIN1252 | WIN1252, UTF8 |
WIN1253 | WIN1253, UTF8 |
WIN1254 | WIN1254, UTF8 |
WIN1255 | WIN1255, UTF8 |
WIN1256 | WIN1256, UTF8 |
WIN1257 | WIN1257, UTF8 |
WIN1258 | WIN1258, UTF8 |
更多详细信息请登录【瀚高技术支持平台】查看https://support.highgo.com/#/index/docContentHighgo/72aef7e803f95152