MySQL 字符集编码及乱码问题

最常见的问题就是数据库插入查看或导入导出出现了乱码,这应该是所有用过mysql的人都遇到过了吧。所以解决乱码问题,就应该理解其原因:就是客户端连接设置的编码与表编码不一致!


数据库字符集有多种,linux 系统默认是 utf-8 编码,而 windows 默认为 gbk 编码。一般我们常用的系统都是中英文命名和存储数据d系统,所以服务器和mysql数据库设置编码为 utf8。mysql 还有字符集 utf8mb4,兼容 utf8 ,可保持 emoji 表情,但比 utf8 多占用1个字节。


查看linux系统编码

shell> cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

shell> echo $LANG
en_US.UTF-8

shell> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

查看数据库可设置的字符集及校对规则

mysql> SHOW CHARACTER SET;
mysql> SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS ORDER BY CHARACTER_SET_NAME;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                     | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| binary             | binary               | Binary pseudo charset           |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| utf8mb4            | utf8mb4_general_ci   | UTF-8 Unicode                   |      4 |
+--------------------+----------------------+---------------------------------+--------+

查看 mysql 数据库系统默认字符集

mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name            | Value                                                       |
+--------------------------+-------------------------------------------------------------+
| character_set_client     | utf8                                                        |
| character_set_connection | utf8                                                        |
| character_set_database   | latin1                                                      |
| character_set_filesystem | binary                                                      |
| character_set_results    | utf8                                                        |
| character_set_server     | latin1                                                      |
| character_set_system     | utf8                                                        |
| character_sets_dir       | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+

说明:

character_set_client :客户端使用的字符集
character_set_results :服务器查询结果到客户端字符集
character_set_connection :客户端到服务器默认转换的编码
character_set_database :数据库创建时默认的字符集(MySQL 5.7.6 后不推荐使用)
character_set_server :数据库创建时默认的字符集
character_set_system :系统默认元数据字符集,总是为 “utf8”
character_set_filesystem :导入导出时文件名字符集,默认binary则保持与client设置一样。

更改系统字符集,改为 “gb2312” 

shell> vi /etc/sysconfig/i18n
LANG="zh_CN.GB2312"
#LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

设置生效
shell> . /etc/sysconfig/i18n

改了系统编码后,再访问查看mysql字符集.字符集没有什么变化.

mysql> SHOW GLOBAL VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name            | Value                                                       |
+--------------------------+-------------------------------------------------------------+
| character_set_client     | latin1                                                      |
| character_set_connection | latin1                                                      |
| character_set_database   | latin1                                                      |
| character_set_filesystem | binary                                                      |
| character_set_results    | latin1                                                      |
| character_set_server     | latin1                                                      |
| character_set_system     | utf8                                                        |
| character_sets_dir       | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+

在linux服务器端的终端访问mysql,再查看当前会话的字符集,发现客户端这几个字符集编码变为 “gb2312”!

mysql> SHOW VARIABLES LIKE '%character_set%';
+--------------------------+-------------------------------------------------------------+
| Variable_name            | Value                                                       |
+--------------------------+-------------------------------------------------------------+
| character_set_client     | gb2312                                                      |
| character_set_connection | gb2312                                                      |
| character_set_database   | latin1                                                      |
| character_set_filesystem | binary                                                      |
| character_set_results    | gb2312                                                      |
| character_set_server     | latin1                                                      |
| character_set_system     | utf8                                                        |
| character_sets_dir       | /usr/local/mysql-5.7.16-linux-glibc2.5-i686/share/charsets/ |
+--------------------------+-------------------------------------------------------------+

数据库默认字符集为 "character_set_database = latin1",所以当我们创建数据库没有指定字符集时,则使用默认设置。

mysql> CREATE DATABASE test;
mysql> SHOW CREATE DATABASE test;
+----------+-----------------------------------------------------------------+
| Database | Create Database                                                 |
+----------+-----------------------------------------------------------------+
| test     | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+

此外,我们创建表的默认字符集为当前数据库的字符集。如下,新创建的表字符集也为 “latin1”

mysql> USE test;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> SHOW CREATE  TABLE tab\G;
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

现在创建数据库,并且指定默认字符集为“utf8”,再创建一张未指定字符集的表和一张指定其他字符集的表。发现,未指定字符集的表默认字符集与数据库一致的。

mysql> CREATE DATABASE test2 CHARACTER SET utf8;
mysql> USE test2;
mysql> CREATE TABLE tab(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD));
mysql> CREATE TABLE tab1(id INT NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,PRIMARY KEY(iD)) DEFAULT CHARSET=utf8mb4;

mysql> SHOW CREATE TABLE tab\G;
*************************** 1. row ***************************
       Table: tab
Create Table: CREATE TABLE `tab` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

mysql> SHOW CREATE TABLE tab1\G;
*************************** 1. row ***************************
       Table: tab1
Create Table: CREATE TABLE `tab1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(30) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

查看所有数据库字符集及排序规则

mysql> SELECT SCHEMA_NAME,DEFAULT_CHARACTER_SET_NAME,DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
+--------------------+----------------------------+------------------------+
| SCHEMA_NAME        | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+--------------------+----------------------------+------------------------+
| information_schema | utf8                       | utf8_general_ci        |
| mysql              | latin1                     | latin1_swedish_ci      |
| performance_schema | utf8                       | utf8_general_ci        |
| sys                | utf8                       | utf8_general_ci        |
| test               | latin1                     | latin1_swedish_ci      |
| test2              | utf8                       | utf8_general_ci        |
+--------------------+----------------------------+------------------------+

查看所有表的排序规则(字符集)

mysql> SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA IN('test','test2');
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_COLLATION    |
+--------------+------------+--------------------+
| test         | tab        | latin1_swedish_ci  |
| test2        | tab        | utf8_general_ci    |
| test2        | tab1       | utf8mb4_general_ci |
+--------------+------------+--------------------+

[navicat 测试]
现在开始插入表数据,我们插入数据到数据库表 test.tab,该数据库及表字符集都是 “latin1”。
当前我用 navicat 连接,该客户端默认设置的相关连接编码为:
character_set_client=utf8
character_set_connection=utf8
character_set_results=utf8

--插入英文字符,正常
mysql> use test;
mysql> insert into tab values(1,'123'),(2,'abc');

--插入中外字符,失败
mysql> insert into tab values(3,'文档');
1366 - Incorrect string value: '\xE6\x96\x87\xE6\xA1\xA3' for column 'name' at row 1

--怎么插入中文呢??现在将客户端会话编码设置为latin1,保持与数据库一致,再执行插入中文。
mysql> SET names latin1;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> insert into tab values(3,'文档');
mysql> select * from tab;
+----+------+
| id | name |
+----+------+
|  1 | 123  |
|  2 | abc  |
|  3 | 文档 |
+----+------+

这会儿就没有报错了!查询也正常!现在将连接会话改为 utf8 ,再次查询。
mysql> SET names utf8;
mysql> SHOW VARIABLES LIKE '%character_set%';
mysql> select * from tab;
+----+--------+
| id | name   |
+----+--------+
|  1 | 123    |
|  2 | abc    |
|  3 | ?–???£ |
+----+--------+

因为表字符编码(latin1)与当前客户端会话的字符编码(utf8)不一样,所以查询出来了乱码!!
为了防止乱码出现,插入数据时客户端要与表字符集一样,查询数据时,客户端也要与表编码一样。
以下更详细一步进行测试,再次确认,系统当前默认配置的字符编码如下:

+--------------------------+-------------------------------------------------------------+
| Variable_name            | Value                                                       |
+--------------------------+-------------------------------------------------------------+
| character_set_client     | latin1                                                      |
| character_set_connection | latin1                                                      |
| character_set_database   | latin1                                                      |
| character_set_filesystem | binary                                                      |
| character_set_results    | latin1                                                      |
| character_set_server     | latin1                                                      |
| character_set_system     | utf8                                                        |
+--------------------------+-------------------------------------------------------------+


--	现将客户端字符集 utf8 改为 latin1 ,插入时中文不乱码(下一步有测试),查询时乱码!
mysql> set character_set_client = latin1; 
mysql> insert into tab values(4,'测试');
mysql> select * from tab;
+----+--------+
| id | name   |
+----+--------+
|  1 | 123    |
|  2 | abc    |
|  3 | ?–???£ |
|  4 | ?μ?èˉ? |
+----+--------+

--	单独将服务器查询结果字符集 utf8 改为 latin1,则查询结果正常;不改则乱码。
mysql> set character_set_results = latin1; 
mysql> select * from tab;
+----+--------+
| id | name   |
+----+--------+
|  1 | 123    |
|  2 | abc    |
|  3 | 文档   |
|  4 | 测试   |
+----+--------+

客户端插入和查询都正常了,那么 character_set_connection 是干嘛用的呢?干嘛用的呢?

character_set_connection 只作为中间转换,如果字符串未设置字符集且设置了排序规则,默认字符集为character_set_connection设置的字符集,且会判断字符串显式设置的排序规则是否准确,不正确则出错。

+--------------------------------------------------------------------------------------+
|                                                                                      |
|character_set_client <------> character_set_connection <------> character_set_results |
|                                                                                      |
+--------------------------------------------------------------------------------------+


--设置 connection 字符集为 utf8,
mysql> set character_set_connection = utf8; 

--只设置排序规则,但非当前 connection 对应的排序规则。错误!
mysql> select 'Müller' COLLATE latin1_german1_ci;
1253 - COLLATION 'latin1_german1_ci' is not valid for CHARACTER SET 'utf8'

--只设置排序规则,为当前connection对应的排序规则。正确!
mysql> select 'Müller' COLLATE utf8_general_ci;

--如果字符前面强制字符集 _latin1或_utf8 ,则不会进行connection转换。

【导出导入数据乱码问题】

数据库表字符集为 latin1 ,客户端字符集默认与linux操作系统设置一致为 utf8.导出数据时乱码!!

shell> mysqldump -uroot -p --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'?–???£'),(4,'?μ?èˉ?');
UNLOCK TABLES;
 
现在设置导出时字符集为 latin1 ,与表一样。结果正常!!

shell> mysqldump -uroot -p --default-character-set=latin1 --opt -t test > test.sql
shell> egrep -v "#|\*|--|^$" test.sql
LOCK TABLES `tab` WRITE;
INSERT INTO `tab` VALUES (1,'123'),(2,'abc'),(3,'文档'),(4,'测试');
UNLOCK TABLES;
导入同理,若导入导出出现了乱码,那只要设置相同的字符集即可。

【更改字符集】

客户端的都设置,为会话级别:
mysql> SET names latin1;

客户端单个设置,为会话级别:
mysql> set character_set_connection = utf8; 
mysql> set @@session.character_set_connection = utf8; 

客户端单个设置,为全局设置:
mysql> set global character_set_client = gb2312; 
mysql> set @@global.character_set_client = latin1; 
 

永久设置,在配置文件中更改:
shell> vi /etc/my.cnf

[client]
default_character_set = utf8

[mysql] #客户端设置
default_character_set = utf8

[mysqld] #服务器设置
init-connect='SET NAMES utf8'
character-set-server = utf8
collation-server = utf8_general_ci
#default-character-set = utf8 
skip-character-set-client-handshake#忽略客户端字符集,使用服务器设置


其他参考:MySQL字符集编码解析

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值