mysql默认排序规则utf8mb4_MySQL utf8mb4排序规则

最新推荐文章于 2024-08-05 09:00:28 发布

漩凝

最新推荐文章于 2024-08-05 09:00:28 发布

阅读量3.1k

点赞数

文章标签： mysql默认排序规则utf8mb4

本文链接：https://blog.csdn.net/weixin_30833209/article/details/113271788

版权

文章直通车：

utf8mb4 和 utf8

utf8mb4排序规则

一、先了解下 utf8mb4 和 utf8

参考MySQL文档:

utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character.

utf8: An alias for utf8mb3.

Note

The utf8mb3 character set is deprecated and will be removed in a future MySQL release. Please use utf8mb4 instead. Although utf8 is currently an alias for utf8mb3, at some point utf8 will become a reference to utf8mb4. To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references instead of utf8.

UTF-8是使用1~4个字节，一种变长的编码格式。(字符编码 )

mb4即 most bytes 4，使用4个字节来表示完整的UTF-8。而MySQL中的utf8是utfmb3，只有三个字节，节省空间但不能表达全部的UTF-8(比如emoji表情)，只能支持“基本多文种平面”(Basic Multilingual Plane，BMP)。

所以推荐使用utf8mb4。

二、utf8mb4排序规则：utf8mb4_unicode_ci、utf8mb4_general_ci、utf8mb4_bin

utf8mb4_unicode_ci 和 utf8mb4_general_ci 的对比：

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect of this in comparisons or searches, see Section 10.8.6, “Examples of the Effect of Collation”):

Ä = A

Ö = O

Ü = U

A difference between the collations is that this is true for utf8_general_ci:

ß = s

Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):

ß = ss

MySQL implements utf8 language-specific collations if the ordering with utf8_unicode_ci does not work well for a language. For example, utf8_unicode_ci works fine for German dictionary order and French, so there is no need to create special utf8 collations.

utf8_general_ci also is satisfactory for both German and French, except that ß is equal to s, and not to ss. If this is acceptable for your application, you should use utf8_general_ci because it is faster. If this is not acceptable (for example, if you require German dictionary order), use utf8_unicode_ci because it is more accurate.

utf8mb4_general_ci, utf8mb4_unicode_ci：ci即case insensitive，不区分大小写。

准确性：

utf8mb4_unicode_ci 是基于标准的Unicode来排序和比较，能够在各种语言之间精确排序

utf8mb4_general_ci 没有实现Unicode排序规则，在遇到某些特殊语言或者字符集，排序结果可能不一致。但是，在绝大多数情况下，这些特殊字符的顺序并不需要那么精确

性能：

utf8mb4_general_ci 在比较和排序的时候更快

utf8mb4_unicode_ci 在特殊情况下，Unicode排序规则为了能够处理特殊字符的情况，实现了略微复杂的排序算法。但是在绝大多数情况下发，不会发生此类复杂比较。相比选择哪一种collation，使用者更应该关心字符集与排序规则在db里需要统一。

utf8mb4_bin：将字符串每个字符用二进制数据编译存储，区分大小写，而且可以存二进制的内容。

参考：

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

https://dev.mysql.com/doc/refman/8.0/en/charset-collation-effect.html

漩凝

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫