mysql-5.6.37的 character_set 和 collation 设置
一、概念
官方解释:
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.
意思是,character set是符号和编码的集合,即一组符号及该符号的编码,collation 是一组符号和编码之间的比较规则,是规则集合。
character set 通常称之为 “字符集” 。
collation 通常称之为 “校对规则集”。
举例解释:
假设我们有一个字母表,有四字母: A, B, a, b。我们给每个字母一个数字: A = 0, B = 1, a = 2,b = 3。字母 A 是一个符号(symbol), 数字0是 A 的编码(encoding), 而所有四字母及其编码的组合是字符集(character set)。
假设我们要比较两个字符串 A 和 B的值。最简单的方法是查看编码: 0 为 A, 1 为 B。因为0是小于 1, 我们说 A 是小于 B。我们刚才所做的(对字符A、B的比较)就是对我们的字符集进行比较。collation是一组规则 (在本例中只有一个规则): "比较字符编码的大小“。我们称这最简单的所有可能的collation为二进制collation。
但是, 如果我们想说, 小写字母和大写字母是等价的呢?那么我们至少有两个规则: (1) 将小写字母 a 和 b 等同于 大写字母A 和 B;(2) 然后比较编码。我们称之为 case-insensitive collation。它比二进制collation稍微复杂一些。
在现实生活中, 大多数字符集有许多字符: 不只是 a 和 b, 而是整个字母表, 有时是多个字母或东方文字系统, 有数以千计的字符, 还有许多特殊的符号和标点符号。在现实生活中, 大多数collation都有许多规则, 不仅是为了区分 letter case, 而且还为是否区分重音 ("重音" 是附加到字符的标记, 如在德语ö中), 以及多字符映射 (如ö = oe 在两个德国归类中的一个的规则)。
mysql 可以为您做如下这些事情:
- 可以使用多种字符集存储字符串。
- 可以使用多种校对规则比较字符串。
- 在同一台服务器、同一数据库或甚至同一个表中混合使用不同的字符集或比较规则。
- 允许定义任何级别的字符集和校对规则。
要有效地使用这些功能, 您必须知道哪些字符集和校对规则可用, 如何更改默认值, 以及它们如何影响字符串运算符和函数的行为。
mysql 服务器支持多个字符集
可用如下命令查看mysql支持的字符集及其默认的collation
也可以用如下命令查看特定的字符集:
mysql>SHOW CHARACTER SET LIKE 'latin%';
一个字符集至少有一个校对规则(collation),也有可能有多个校对规则(collation)
可用如下命令查看一个字符集的校对规则
mysql>SHOW COLLATION
如果查看指定的字符集(如latin1)的校对规则,使用如下命令:
mysql>SHOW COLLATION WHERE Charset = 'latin1';
校对规则一般有这些特征:
- 两个不同的字符集不能有相同的校对规则。
- 每个字符集有一个默认校对规则。例如, latin1默认校对规则是latin1_swedish_ci。
- 存在校对规则命名约定:它们以其相关的字符集名开始,通常包括一个语言名,并且以_ci(大小写不敏感)、 _cs(大小写敏感)或_bin(二进制)结束。
创建一个数据库,查看数据库的character set 和collation,如下图:
character set每项参数解释如下:
Variable_name | Value | Desc |
---|---|---|
character_set_client | utf8 | 客户端发送的sql语句使用的字符集 |
character_set_connection | utf8 | 客户端和服务端建立的连接使用的字符集 |
character_set_database | latin1 | mysql中创建的数据库的字符集,不同的数据库编码可以不一致 |
character_set_filesystem | binary | |
character_set_results | utf8 | mysql server返回查询结果到客户端使用的字符集 |
character_set_server | latin1 | mysql server(服务器)字符集 |
character_set_system | utf8 | 数据库元数据编码,如数据库名,表名,字段名等,跟存储的数据无关。 |
character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ | 字符集的文件路径 |
collation每项参数解释如下:
Variable_name | Value | Desc |
---|---|---|
collation_connection | utf8_general_ci | |
collation_database | latin1_swedish_ci | |
collation_server | latin1_swedish_ci | 服务器校对规则 |
字符集和校对规则有4个级别的默认设置:服务器级(server)、数据库级(database)、表级(table)和列级(column)。
服务器级字符集(character_set_server)和服务器校对规则(collation_server)
设置方式1:
启动mysql 服务时,指定character
$ mysqld_safe --character_set_server=utf8
设置方式2:
在/etc/my.cnf文件中配置参数
在[mysqld]组下增加如下参数
character_set_server=utf8
collation_server参数默认使用character_set_server=utf8的对应的默认值:utf8_general_ci
设置方式3:
在运行中的mysql中设置
mysql> set character_set_server=utf8;
Query OK, 0 rows affected (0.27 sec)
mysql>
这种设置方式为临时设置,即mysql服务重启后失效。设置方式1和设置方式2为持久化设置。如果需要设置正在运行中的mysql的character,需要 (设置方式1或设置方式2)+设置方式3
验证:
设置前:
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.10 sec)
mysql> show variables like '%colla%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.04 sec)
mysql>
设置后:
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.16 sec)
mysql> show variables like '%colla%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.46 sec)
mysql>
数据库字符集和数据库校对规则
数据库字符集:character_set_database,数据库校对规则:collation_database
每个数据库都有一个数据库字符集和数据库校对规则,create database 和 atler database语句可以指定字符集和校对规则(可选参数)
mysql>
mysql> create database test1 character set gbk collate gbk_chinese_ci;
Query OK, 1 row affected (0.00 sec)
mysql> use test1;
Database changed
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | gbk |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.00 sec)
mysql> show variables like '%col%';
+---------------------------+-----------------+
| Variable_name | Value |
+---------------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | gbk_chinese_ci |
| collation_server | utf8_general_ci |
| protocol_version | 10 |
| slave_compressed_protocol | OFF |
+---------------------------+-----------------+
5 rows in set (0.00 sec)
mysql>
客户端字符集和数据库连接校对规则
character_set_connection、character_set_client、character_set_client三个字符集是客户端的字符集,跟mysql server无关,当客户端连接mysql server(服务器)的时候,客户端会将自己想要的字符集发给mysql server(服务器),然后mysql server(服务器)会使用这个字符集去设置character_set_connection、character_set_client、character_set_results这三个参数值。
character_set_connection 对应的客户端连接的校对规则:collation_connection。
character_set_client 和 character_set_client无对应的校对规则
设置方式1:
连接时指定字符集,客户端连接数据库时增加一个参数:--default_character_set=gbk,这个参数会同时设置character_set_client,character_set_connection,character_set_results这三个参数的值。
$ mysql -u dap -p -h 150.221.75.35 -P 3306 --default_character_set=gbk
验证:
设置前:
[cheng@localhost ~]$ mysql -u dap -p -h 150.221.75.35 -P 3306
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.6.37 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.58 sec)
mysql>
[cheng@localhost ~]$ mysql -u dap -p -h 150.221.75.35 -P 3306
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.37 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like '%coll%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.15 sec)
mysql>
设置后:
[cheng@localhost ~]$ mysql -u dap -p -h 150.221.75.35 -P 3306 --default_character_set=gbk
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.6.37 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.25 sec)
mysql>
[cheng@localhost ~]$ mysql -u dap -p -h 150.221.75.35 -P 3306 --default_character_set=gbk
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.37 MySQL Community Server (GPL)
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like '%coll%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | gbk_chinese_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (2.82 sec)
mysql>
设置方式2:
客户端连接后,在mysql命令行修改客户端字符集,character_set_client,character_set_connection,character_set_results这三个参数的值每次只能修改一个。
mysql>
mysql> set character_set_client=utf8;
Query OK, 0 rows affected (0.28 sec)
mysql> set character_set_connection=utf8;
Query OK, 0 rows affected (0.30 sec)
mysql> set character_set_results=utf8;
Query OK, 0 rows affected (0.17 sec)
mysql>
collation_connection会默认使用 character_set_connection=utf-8的对应的值(utf-8)
mysql> set character_set_connection=gbk;
Query OK, 0 rows affected (0.16 sec)
mysql> show variables like 'col%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | gbk_chinese_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.02 sec)
mysql> show variables like '%char%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.37/share/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.20 sec)
mysql>