mysql utf8 czech ci_MySQL字符集小结

备注:测试数据库版本为MySQL 8.0

这个blog我们来聊聊MySQL的字符集

前言:

1.字符集和排序规则

说实话我对这两个概念比较模糊,其实可以简单的理解:

字符集(character set):定义了字符以及字符的编码。

排序规则(collation):定义了字符的比较规则。

一个字符集对应至少一种排序规则(一般是1对多)

两个不同的字符集不能有相同的排序规则

每个字符集都有默认的排序规则

2.MySQL字符集相关参数:

character_set_client 客户端来源数据使用的字符集,默认值:utf8mb4

character_set_connection 连接层字符集,默认值:utf8mb4

character_set_database 当前选中数据库的默认字符集,默认值:utf8mb4

character_set_results 查询结果字符集,默认值:utf8mb4

character_set_server 默认的内部操作字符集,默认值:utf8mb4

character_set_system 系统元数据(字段名等)字符集,默认值:utf8

这些参数真的多,不过还好MySQL 8.0开始,这些都是默认utf8mb4了(character_set_system除外)。

3.MySQL支持的字符集、排序规则:

MySQl支持的字符集

mysql> SHOW CHARACTER SET;

+----------+---------------------------------+---------------------+--------+

| Charset | Description | Default collation | Maxlen |

+----------+---------------------------------+---------------------+--------+

| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |

| ascii | US ASCII | ascii_general_ci | 1 |

| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |

| binary | Binary pseudo charset | binary | 1 |

| cp1250 | Windows Central European | cp1250_general_ci | 1 |

| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |

| cp1256 | Windows Arabic | cp1256_general_ci | 1 |

| cp1257 | Windows Baltic | cp1257_general_ci | 1 |

| cp850 | DOS West European | cp850_general_ci | 1 |

| cp852 | DOS Central European | cp852_general_ci | 1 |

| cp866 | DOS Russian | cp866_general_ci | 1 |

| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |

| dec8 | DEC West European | dec8_swedish_ci | 1 |

| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |

| euckr | EUC-KR Korean | euckr_korean_ci | 2 |

| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |

| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |

| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |

| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |

| greek | ISO 8859-7 Greek | greek_general_ci | 1 |

| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |

| hp8 | HP West European | hp8_english_ci | 1 |

| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |

| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |

| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |

| latin1 | cp1252 West European | latin1_swedish_ci | 1 |

| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |

| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |

| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |

| macce | Mac Central European | macce_general_ci | 1 |

| macroman | Mac West European | macroman_general_ci | 1 |

| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |

| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |

| tis620 | TIS620 Thai | tis620_thai_ci | 1 |

| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |

| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |

| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |

| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |

| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |

| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |

| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |

+----------+---------------------------------+---------------------+--------+

41 rows in set (0.00 sec)

MySQL utf8mb4支持的排序规则:

mysql> SHOW COLLATION WHERE Charset = 'utf8mb4';

+----------------------------+---------+-----+---------+----------+---------+--------------

| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute

+----------------------------+---------+-----+---------+----------+---------+--------------

| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD

| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD

| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD

| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD

| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE

| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE

| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD

| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD

| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | PAD SPACE

| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | PAD SPACE

| utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 | NO PAD

| utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 | NO PAD

| utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 | NO PAD

| utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 | NO PAD

| utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 | NO PAD

| utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 | NO PAD

| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | PAD SPACE

| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | PAD SPACE

| utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | | Yes | 0 | NO PAD

| utf8mb4_es_0900_as_cs | utf8mb4 | 286 | | Yes | 0 | NO PAD

| utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | | Yes | 0 | NO PAD

| utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | | Yes | 0 | NO PAD

| utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | | Yes | 0 | NO PAD

| utf8mb4_et_0900_as_cs | utf8mb4 | 285 | | Yes | 0 | NO PAD

| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE

| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | PAD SPACE

| utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | | Yes | 0 | NO PAD

| utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | | Yes | 0 | NO PAD

| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | PAD SPACE

| utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | | Yes | 0 | NO PAD

| utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | | Yes | 0 | NO PAD

| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | PAD SPACE

| utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | | Yes | 0 | NO PAD

| utf8mb4_is_0900_as_cs | utf8mb4 | 280 | | Yes | 0 | NO PAD

| utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | | Yes | 0 | NO PAD

| utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | | Yes | 24 | NO PAD

| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | PAD SPACE

| utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | | Yes | 0 | NO PAD

| utf8mb4_la_0900_as_cs | utf8mb4 | 294 | | Yes | 0 | NO PAD

| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | PAD SPACE

| utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | | Yes | 0 | NO PAD

| utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | | Yes | 0 | NO PAD

| utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | | Yes | 0 | NO PAD

| utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | | Yes | 0 | NO PAD

| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | PAD SPACE

| utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | | Yes | 0 | NO PAD

| utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | | Yes | 0 | NO PAD

| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | PAD SPACE

| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | PAD SPACE

| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | PAD SPACE

| utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | | Yes | 0 | NO PAD

| utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | | Yes | 0 | NO PAD

| utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | | Yes | 0 | NO PAD

| utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | | Yes | 0 | NO PAD

| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | PAD SPACE

| utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | | Yes | 0 | NO PAD

| utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | | Yes | 0 | NO PAD

| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | PAD SPACE

| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | PAD SPACE

| utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | | Yes | 0 | NO PAD

| utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | | Yes | 0 | NO PAD

| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | PAD SPACE

| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | PAD SPACE

| utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | | Yes | 0 | NO PAD

| utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | | Yes | 0 | NO PAD

| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | PAD SPACE

| utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | | Yes | 0 | NO PAD

| utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | | Yes | 0 | NO PAD

| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | PAD SPACE

| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE

| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE

| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | PAD SPACE

| utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | | Yes | 0 | NO PAD

| utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 | NO PAD

| utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 | NO PAD

+----------------------------+---------+-----+---------+----------+---------+--------------

75 rows in set (0.00 sec)

一.常用的字符集

1.utf8和utf8mb4

MySQL8.0开始,默认字符集修改为了utf8mb4。

MySQL在5.5.3之后增加了utf8mb4的编码,mb4即4-Byte UTF-8 Unicode Encoding,专门用来兼容四字节的unicode。utf8mb4为utf8的超集并兼容utf8,比utf8能表示更多的字符。

如果需要存储4 字节的字符,utf8字符集就会出现问题,例如emoji表情,所以建议选择utf8mb4而非utf8。

当然utf8替换为utf8mb4占的空间更多,如果utf8绝对够用,也可以依旧使用utf8

2.gbk/gb2312/gb18030

GB2312 :中国国家标准简体中文字符集,共收录 6763 个汉字,其中一级汉字 3755 个,二级汉字 3008 个;同时收录了包括拉丁字母、希腊字母、日文平假名及片假名字母、俄语西里尔字母在内的 682 个字符。

GBK :共收入21886个汉字和图形符号,GBK 向下与 GB 2312 完全兼容,向上支持 ISO 10646 国际标准,在前者向后者过渡过程中起到的承上启下的作用。

GB18030 :共收录汉字70244个,GB18030 与 GB 312 和 GBK 兼容

大多数情况下,GBK就够用了,如果生僻字比较多,可以改为GB18030

关于utf8和gbk字符集如何选择:

UTF-8:一个汉字 = 3个字节,英文是1个字节

GBK: 一个汉字 = 2个字节,英文是1个字节

这样看,如果是中文的的系统,选择GBK可以比UTF8更节省磁盘空间,性能也就更好。

测试记录:

-- 测试语句

create table t_utf8(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

insert into t_utf8 values ('a','张');

insert into t_utf8 values ('ab','张三');

create table t_gbk(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci;

insert into t_gbk values ('a','张');

insert into t_gbk values ('ab','张三');

select en,cn,char_length(en),char_length(cn),length(en),length(cn) from t_utf8;

select en,cn,char_length(en),char_length(cn),length(en),length(cn) from t_gbk;

测试结果:

mysql> create table t_utf8(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Query OK, 0 rows affected (0.03 sec)

mysql> insert into t_utf8 values ('a','张');

Query OK, 1 row affected (0.00 sec)

mysql> insert into t_utf8 values ('ab','张三');

Query OK, 1 row affected (0.00 sec)

mysql>

mysql> create table t_gbk(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET gbk COLLATE gbk_chinese_ci;

Query OK, 0 rows affected (0.02 sec)

mysql> insert into t_gbk values ('a','张');

Query OK, 1 row affected (0.01 sec)

mysql> insert into t_gbk values ('ab','张三');

Query OK, 1 row affected (0.00 sec)

mysql>

mysql> select en,cn,char_length(en),char_length(cn),length(en),length(cn) from t_utf8;

+------+--------+-----------------+-----------------+------------+------------+

| en | cn | char_length(en) | char_length(cn) | length(en) | length(cn) |

+------+--------+-----------------+-----------------+------------+------------+

| a | 张 | 1 | 1 | 1 | 3 |

| ab | 张三 | 2 | 2 | 2 | 6 |

+------+--------+-----------------+-----------------+------------+------------+

2 rows in set (0.00 sec)

mysql> select en,cn,char_length(en),char_length(cn),length(en),length(cn) from t_gbk;

+------+--------+-----------------+-----------------+------------+------------+

| en | cn | char_length(en) | char_length(cn) | length(en) | length(cn) |

+------+--------+-----------------+-----------------+------------+------------+

| a | 张 | 1 | 1 | 1 | 2 |

| ab | 张三 | 2 | 2 | 2 | 4 |

+------+--------+-----------------+-----------------+------------+------------+

2 rows in set (0.00 sec)

mysql>

3.latin

光latin字符集就有latin1、latin2、latin5、latin7四种,当然这4种字符集在中国应用很少,我在这里也就不一一讲解了。

MySQL一直到8.0版本才将字符集由latin1改为utf8mb4。

之前版本如果创建数据库和表的时候,不指定字符集,默认都是latin1字符集。

latin1存中文很容易出现乱码问题,给系统带来灾难性的问题,如果是MySQL 8.0之前版本,创建数据库和表,一定要指定字符集。

测试记录:

-- 创建一个latin字符集的表

create table t_latin(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;

-- insert的时候直接报错

insert into t_latin values ('a','程');

insert into t_latin values ('ab','张三');

-- set names latin1 此时可以将中文数据录入表

set names latin1;

insert into t_latin values ('a','张');

insert into t_latin values ('ab','张三');

-- 查询也没有问题,正常显示

select * from t_latin;

-- 退出重登陆后,发现中文乱码了

exit;

mysql -uroot -p

use test;

select * from t_latin;

测试结果:

mysql>

mysql> -- 创建一个latin字符集的表

mysql> create table t_latin(en varchar(100),cn varchar(100)) DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;

Query OK, 0 rows affected (0.02 sec)

mysql> -- insert的时候直接报错

mysql> insert into t_latin values ('a','程');

ERROR 1366 (HY000): Incorrect string value: '\xE7\xA8\x8B' for column 'cn' at row 1

mysql> insert into t_latin values ('ab','张三');

ERROR 1366 (HY000): Incorrect string value: '\xE5\xBC\xA0\xE4\xB8\x89' for column 'cn' at row 1

mysql>

mysql>

mysql> -- set names latin1 此时可以将中文数据录入表

mysql> set names latin1;

Query OK, 0 rows affected (0.00 sec)

mysql> insert into t_latin values ('a','张');

Query OK, 1 row affected (0.00 sec)

mysql> insert into t_latin values ('ab','张三');

Query OK, 1 row affected (0.00 sec)

mysql>

mysql> -- 查询也没有问题,正常显示

mysql> select * from t_latin;

+------+--------+

| en | cn |

+------+--------+

| a | 张 |

| ab | 张三 |

+------+--------+

2 rows in set (0.00 sec)

mysql>

mysql>

mysql> -- 退出重登陆后,发现中文乱码了

mysql> exit;

Bye

C:\>

C:\>mysql -uroot -p

Enter password: ******

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 11

Server version: 8.0.20 MySQL Community Server - GPL

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test

Database changed

mysql> select * from t_latin;

+------+---------------+

| en | cn |

+------+---------------+

| a | å¼ |

| ab | å¼ ä¸‰ |

+------+---------------+

2 rows in set (0.00 sec)

mysql>

解决latin1字符集显示显示乱码的两种方法

mysql>

mysql> select en,cn from t_latin;

+------+---------------+

| en | cn |

+------+---------------+

| a | å¼ |

| ab | å¼ ä¸‰ |

+------+---------------+

2 rows in set (0.00 sec)

mysql> select en,convert(unhex(hex(convert(cn using latin1))) using utf8) as cn from t_latin;

+------+--------+

| en | cn |

+------+--------+

| a | 张 |

| ab | 张三 |

+------+--------+

2 rows in set, 1 warning (0.00 sec)

mysql> set names latin;

ERROR 1115 (42000): Unknown character set: 'latin'

mysql>

mysql> set names latin1;

Query OK, 0 rows affected (0.00 sec)

mysql> select en,cn from t_latin;

+------+--------+

| en | cn |

+------+--------+

| a | 张 |

| ab | 张三 |

+------+--------+

2 rows in set (0.00 sec)

二.常用MySQL的排序规则介绍

utf8_general_ci 不区分大小写,这个你在注册用户名和邮箱的时候就要使用。

utf8_general_cs 区分大小写,如果用户名和邮箱用这个 就会造成不良后果

utf8_bin:字符串每个字符串用二进制数据编译存储。 区分大小写,而且可以存二进制的内容

utf8_general_ci校对速度快,但准确度稍差。

utf8_unicode_ci准确度高,但校对速度稍慢。

mysql> show collation where charset = 'utf8';

+--------------------------+---------+-----+---------+----------+---------+---------------+

| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |

+--------------------------+---------+-----+---------+----------+---------+---------------+

| utf8_bin | utf8 | 83 | | Yes | 1 | PAD SPACE |

| utf8_croatian_ci | utf8 | 213 | | Yes | 8 | PAD SPACE |

| utf8_czech_ci | utf8 | 202 | | Yes | 8 | PAD SPACE |

| utf8_danish_ci | utf8 | 203 | | Yes | 8 | PAD SPACE |

| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 | PAD SPACE |

| utf8_estonian_ci | utf8 | 198 | | Yes | 8 | PAD SPACE |

| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | PAD SPACE |

| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | PAD SPACE |

| utf8_german2_ci | utf8 | 212 | | Yes | 8 | PAD SPACE |

| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 | PAD SPACE |

| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 | PAD SPACE |

| utf8_latvian_ci | utf8 | 194 | | Yes | 8 | PAD SPACE |

| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 | PAD SPACE |

| utf8_persian_ci | utf8 | 208 | | Yes | 8 | PAD SPACE |

| utf8_polish_ci | utf8 | 197 | | Yes | 8 | PAD SPACE |

| utf8_romanian_ci | utf8 | 195 | | Yes | 8 | PAD SPACE |

| utf8_roman_ci | utf8 | 207 | | Yes | 8 | PAD SPACE |

| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 | PAD SPACE |

| utf8_slovak_ci | utf8 | 205 | | Yes | 8 | PAD SPACE |

| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 | PAD SPACE |

| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 | PAD SPACE |

| utf8_spanish_ci | utf8 | 199 | | Yes | 8 | PAD SPACE |

| utf8_swedish_ci | utf8 | 200 | | Yes | 8 | PAD SPACE |

| utf8_tolower_ci | utf8 | 76 | | Yes | 1 | PAD SPACE |

| utf8_turkish_ci | utf8 | 201 | | Yes | 8 | PAD SPACE |

| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 | PAD SPACE |

| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE |

| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 | PAD SPACE |

+--------------------------+---------+-----+---------+----------+---------+---------------+

28 rows in set (0.01 sec)

mysql> show collation where charset = 'utf8mb4';

+----------------------------+---------+-----+---------+----------+---------+---------------+

| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |

+----------------------------+---------+-----+---------+----------+---------+---------------+

| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |

| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |

| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |

| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |

| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE |

| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE |

| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD |

| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD |

| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | PAD SPACE |

| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | PAD SPACE |

| utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 | NO PAD |

| utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 | NO PAD |

| utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 | NO PAD |

| utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 | NO PAD |

| utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 | NO PAD |

| utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 | NO PAD |

| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | PAD SPACE |

| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | PAD SPACE |

| utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | | Yes | 0 | NO PAD |

| utf8mb4_es_0900_as_cs | utf8mb4 | 286 | | Yes | 0 | NO PAD |

| utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | | Yes | 0 | NO PAD |

| utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | | Yes | 0 | NO PAD |

| utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | | Yes | 0 | NO PAD |

| utf8mb4_et_0900_as_cs | utf8mb4 | 285 | | Yes | 0 | NO PAD |

| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE |

| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | PAD SPACE |

| utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | | Yes | 0 | NO PAD |

| utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | | Yes | 0 | NO PAD |

| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | PAD SPACE |

| utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | | Yes | 0 | NO PAD |

| utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | | Yes | 0 | NO PAD |

| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | PAD SPACE |

| utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | | Yes | 0 | NO PAD |

| utf8mb4_is_0900_as_cs | utf8mb4 | 280 | | Yes | 0 | NO PAD |

| utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | | Yes | 0 | NO PAD |

| utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | | Yes | 24 | NO PAD |

| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | PAD SPACE |

| utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | | Yes | 0 | NO PAD |

| utf8mb4_la_0900_as_cs | utf8mb4 | 294 | | Yes | 0 | NO PAD |

| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | PAD SPACE |

| utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | | Yes | 0 | NO PAD |

| utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | | Yes | 0 | NO PAD |

| utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | | Yes | 0 | NO PAD |

| utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | | Yes | 0 | NO PAD |

| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | PAD SPACE |

| utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | | Yes | 0 | NO PAD |

| utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | | Yes | 0 | NO PAD |

| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | PAD SPACE |

| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | PAD SPACE |

| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | PAD SPACE |

| utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | | Yes | 0 | NO PAD |

| utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | | Yes | 0 | NO PAD |

| utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | | Yes | 0 | NO PAD |

| utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | | Yes | 0 | NO PAD |

| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | PAD SPACE |

| utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | | Yes | 0 | NO PAD |

| utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | | Yes | 0 | NO PAD |

| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | PAD SPACE |

| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | PAD SPACE |

| utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | | Yes | 0 | NO PAD |

| utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | | Yes | 0 | NO PAD |

| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | PAD SPACE |

| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | PAD SPACE |

| utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | | Yes | 0 | NO PAD |

| utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | | Yes | 0 | NO PAD |

| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | PAD SPACE |

| utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | | Yes | 0 | NO PAD |

| utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | | Yes | 0 | NO PAD |

| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | PAD SPACE |

| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE |

| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE |

| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | PAD SPACE |

| utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | | Yes | 0 | NO PAD |

| utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 | NO PAD |

| utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 | NO PAD |

+----------------------------+---------+-----+---------+----------+---------+---------------+

75 rows in set (0.01 sec)

mysql> show collation where charset = 'gbk';

+----------------+---------+----+---------+----------+---------+---------------+

| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |

+----------------+---------+----+---------+----------+---------+---------------+

| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE |

| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE |

+----------------+---------+----+---------+----------+---------+---------------+

2 rows in set (0.01 sec)

如果字符集选择utf8,则排序规则选择默认的utf8_general_ci

如果字符集选择utf8mb4,则排序规则选择默认的utf8mb4_0900_ai_ci

如果字符集选择gbk,则排序规则选择默认的gbk_chinese_ci

排序规则大小写敏感测试:

create table t_utf8mb4(en1 varchar(100),en2 varchar(100)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

alter table t_utf8mb4 modify column en2 varchar(100) charset utf8mb4 collate utf8mb4_0900_as_cs ;

insert into t_utf8mb4 values ('a','a');

insert into t_utf8mb4 values ('A','A');

insert into t_utf8mb4 values ('a','A');

insert into t_utf8mb4 values ('A','a');

select * from t_utf8mb4 where en1='a';

select * from t_utf8mb4 where en1='A';

select * from t_utf8mb4 where en2='a';

select * from t_utf8mb4 where en2='A';

测试结果:

mysql>

mysql> create table t_utf8mb4(en1 varchar(100),en2 varchar(100)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

Query OK, 0 rows affected (0.03 sec)

mysql> alter table t_utf8mb4 modify column en2 varchar(100) charset utf8mb4 collate utf8mb4_0900_as_cs ;

Query OK, 0 rows affected (0.07 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql>

mysql>

mysql> insert into t_utf8mb4 values ('a','a');

Query OK, 1 row affected (0.00 sec)

mysql> insert into t_utf8mb4 values ('A','A');

Query OK, 1 row affected (0.01 sec)

mysql> insert into t_utf8mb4 values ('a','A');

Query OK, 1 row affected (0.00 sec)

mysql> insert into t_utf8mb4 values ('A','a');

Query OK, 1 row affected (0.01 sec)

mysql>

mysql>

mysql> select * from t_utf8mb4;

+------+------+

| en1 | en2 |

+------+------+

| a | a |

| A | A |

| a | A |

| A | a |

+------+------+

4 rows in set (0.00 sec)

mysql>

-- 查询不区分大小写的排序规则列,果然全部输出

mysql> select * from t_utf8mb4 where en1='a';

+------+------+

| en1 | en2 |

+------+------+

| a | a |

| A | A |

| a | A |

| A | a |

+------+------+

4 rows in set (0.00 sec)

mysql> select * from t_utf8mb4 where en1='A';

+------+------+

| en1 | en2 |

+------+------+

| a | a |

| A | A |

| a | A |

| A | a |

+------+------+

4 rows in set (0.00 sec)

-- 查询区分大小写的排序规则列,值输出全部是小写或全部是大写的

mysql> select * from t_utf8mb4 where en2='a';

+------+------+

| en1 | en2 |

+------+------+

| a | a |

| A | a |

+------+------+

2 rows in set (0.00 sec)

mysql> select * from t_utf8mb4 where en2='A';

+------+------+

| en1 | en2 |

+------+------+

| A | A |

| a | A |

+------+------+

2 rows in set (0.00 sec)

三.MySQL字符集相关参数

3.1 字符集参数介绍

character_set_client 客户端来源数据使用的字符集,默认值:utf8mb4

character_set_connection 连接层字符集,默认值:utf8mb4

character_set_database 当前选中数据库的默认字符集,默认值:utf8mb4

character_set_results 查询结果字符集,默认值:utf8mb4

character_set_server 默认的内部操作字符集,默认值:utf8mb4

character_set_system 系统元数据(字段名等)字符集,默认值:utf8

也许大家会有疑问,弄一个字符集不好吗,为什么有这6个字符集

我这边来讲讲这6个字符集的使用场景

因为服务器端和客户端,字符集会存在一定的差异。

服务器端总是假设客户端是按照 character_set_client设置的字符来传输数据和SQL语句的。

当服务器收到客户端的SQL语句时,它先将其转换为字符集character_set_connection。它还是用这个设置来决定如何将数据转换成字符串。

当服务器端返回数据或错误信息给客户端时,它会将其转换成character_set_result。

这样就保证了,服务器端和客户端在中转数据的时候,不会出现乱码。

至于character_set_server和character_set_database这两个,一个是系统默认的字符集,一个是创建数据库默认的字符集。

3.2字符集参数修改

备注:character_set_system参数是只读参数,不能修改,其余参数可以通过set命令进行修改

上述5个参数需要保持一致,这样才能避免出现乱码,如果需要修改,可以参考如下:

mysql> set character_set_client = utf8;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> set character_set_connection = utf8;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>

mysql> set character_set_database = utf8;

Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql>

mysql> set character_set_results = utf8;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> set character_set_server = utf8;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>

mysql>

mysql> show variables like 'character_set%';

+--------------------------+----------------------------------------------+

| Variable_name | Value |

+--------------------------+----------------------------------------------+

| character_set_client | utf8 |

| character_set_connection | utf8 |

| character_set_database | utf8 |

| character_set_filesystem | binary |

| character_set_results | utf8 |

| character_set_server | utf8 |

| character_set_system | utf8 |

| character_sets_dir | E:\mysql\mysql-8.0.19-winx64\share\charsets\ |

+--------------------------+----------------------------------------------+

8 rows in set, 1 warning (0.00 sec)

mysql>

-- 修改

mysql> set character_set_system = utf8;

ERROR 1238 (HY000): Variable 'character_set_system' is a read only variable

四.数据库、表、列的字符集

创建数据库时,如不指定字符集,将根据服务器上的character_set_server设置来设定该数据库的默认字符集。

创建表时,如不指定字符集,将根据数据库的字符集设置指定这个表的字符集。

创建列时,如不指定字符集,将根据表的设置指定列的字符集。

我们来演练一个 数据库、表、列的字符集不相同的例子。

-- 数据库字符集是 latin1

mysql> create database test1 charset latin1;

Query OK, 1 row affected (0.03 sec)

mysql> use test1;

Database changed

-- 表的字符集是 utf8mb4 而列的字符集有 utf8也有gbk

mysql> create table t1(id int,name1 varchar(100) charset gbk,name2 varchar(100) charset utf8) default charset = utf8mb4;

Query OK, 0 rows affected, 1 warning (0.06 sec)

mysql> insert into t1 values(1,'张三','张三');

Query OK, 1 row affected (0.01 sec)

mysql> select * from t1;

+------+--------+--------+

| id | name1 | name2 |

+------+--------+--------+

| 1 | 张三 | 张三 |

+------+--------+--------+

1 row in set (0.00 sec)

这样看,数据库、表、列可以有不同的字符集和排序规则。

那么我们如何选择数据库、表、列的字符集和排序规则呢?

最好是先为服务器(数据库)选择一个合理的字符集,然后表和列都遵循这个字符集。

当有特殊需求的时候,可以显式的指定表或列的字符集,让表和列选择自己合适的字符集。

五.字符集和排序规则如何影响查询

1.order by collate时 可能导致无法使用索引进行排序

测试数据:

create table t1(name1 varchar(100) charset utf8mb4 ,name2 varchar(100) charset utf8 collate utf8_bin,name3 varchar(100) charset gbk);

insert into t1 values('张三','张三','张三');

insert into t1 values('李四','李四','李四');

-- 创建索引

create index i_t1_02 on t1(name2);

explain select name2 from t1 order by name2;

explain select name2 from t1 order by name2 collate utf8_general_ci;

执行记录:

mysql> create table t1(name1 varchar(100) charset utf8mb4 ,name2 varchar(100) charset utf8 collate utf8_bin,name3 varchar(100) charset gbk);

Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql> insert into t1 values('张三','张三','张三');

Query OK, 1 row affected (0.01 sec)

mysql> insert into t1 values('李四','李四','李四');

Query OK, 1 row affected (0.01 sec)

mysql> select * from t1;

+--------+--------+--------+

| name1 | name2 | name3 |

+--------+--------+--------+

| 张三 | 张三 | 张三 |

| 李四 | 李四 | 李四 |

+--------+--------+--------+

2 rows in set (0.00 sec)

mysql> create index i_t1_02 on t1(name2);

Query OK, 0 rows affected (0.03 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> explain select name2 from t1 order by name2;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

| 1 | SIMPLE | t1 | NULL | index | NULL | i_t1_02 | 303 | NULL | 2 | 100.00 | Using index |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

1 row in set, 1 warning (0.00 sec)

mysql>

mysql> explain select name2 from t1 order by name2 collate utf8_general_ci;

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------------+

| 1 | SIMPLE | t1 | NULL | index | NULL | i_t1_02 | 303 | NULL | 2 | 100.00 | Using index; Using filesort |

+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-----------------------------+

1 row in set, 1 warning (0.00 sec)

mysql>

可以看到 order by name2 collate utf8_general_ci 由于排序规则并不是创建时指定的utf8_bin,最后使用了文件排序 Using filesort

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值