Postgresql的字符集问题

从SQLServer把数据导出为CSV文件后,为了适应新表的结构,手动添加了两列,再导入到Postgresql中,然后使用select查看数据时会出现

ERROR:  character with byte sequence 0xef 0xbb 0xbf in encoding "UTF8" has no equivalent in encoding "GBK"  错误

然后认真看了看关于字符集的知识

一,中文网站

http://blog.csdn.net/BeiiGang/article/details/39582583

Query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0xe99d2c


原因是客户端字符集和插入内容的字符集不匹配。PostgreSQL默认不做字符集转换,如果数据库是UTF8的字符集,一般终端的中文字符集会设置为GBK(可以看LANG环境变量确认),所以这个编码不经转换的存入数据库中,而数据库是UTF8的,PostgreSQL发现不是UTF8编码,就报上面的错。
要想打开自动字符集转换功能,必须告诉 pg 客户端使用的字符集。这时可以设置pg客户端编码为GBK,pg就会自动做字符集转换。

下面是实验:



1
[root@hostalonetest ~]# psql -h 192.168.18.210 -Upostgres beiigang
psql.bin (9.3.5, server 9.1.14)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.


beiigang=# 


2
beiigang=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 beiigang   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 


3
beiigang=# show server_encoding;
 server_encoding 
-----------------
 UTF8
(1 row)


4
beiigang=# show client_encoding;
 client_encoding 
-----------------
 UTF8
(1 row)


5
beiigang=# create table tb_tt (id int, ctnr varchar(60));
CREATE TABLE


6
beiigang=# insert into tb_tt(id,ctnr) values(1,'新華網');
ERROR:  invalid byte sequence for encoding "UTF8": 0xd0c2


7
beiigang=# \encoding GBK


8
beiigang=# show client_encoding;
 client_encoding 
-----------------
 GBK
(1 row)


9
beiigang=# insert into tb_tt(id,ctnr) values(1,'新華網');
INSERT 0 1


10
beiigang=# select * from tb_tt;
 id |  ctnr  
----+--------
  1 | 新華網
(1 row)


11
beiigang=# show client_encoding;
 client_encoding 
-----------------
 GBK
(1 row)


beiigang=# 
beiigang=# reset client_encoding;
RESET
beiigang=# show client_encoding;
 client_encoding 
-----------------
 UTF8
(1 row)


写代码时可以根据情况在建立数据库链接时指定客户端字符集编码

参考:

http://www.postgresql.org/docs/9.3/interactive/multibyte.html




二,英文网站

I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?


回答


67 down vote accepted

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

三,书pg45

当客户端的字符编码与服务器不一样时可能会显示乱码,可以使用\encoding 命令指定客户端的字符编码,如使用\encoding gbk; 命令设置客户端的字符编码为gbk,使用\encoding utf8; 命令设置客户端的字符编码为utf8。


四,手册

Notes for Windows Users
psql is built as a “console application”. Since the Windows console windows use a different encoding
than the rest of the system, you must take special care when using 8-bit characters within psql. If psql
detects a problematic console code page, it will warn you at startup. To change the console code page,
two things are necessary:
• Set the code page by entering cmd.exe /c chcp 1252. (1252 is a code page that is appropriate
for German; replace it with your value.) If you are using Cygwin, you can put this command in
/etc/profile.
• Set the console font to Lucida Console, because the raster font does not work with the ANSI
code page.

67 down vote accepted

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值