mysql 字符集排序_关于mysql字符集及排序规则设置

最新推荐文章于 2023-01-19 20:26:37 发布

weixin_39610722

最新推荐文章于 2023-01-19 20:26:37 发布

阅读量58

点赞数

文章标签： mysql 字符集排序

本文链接：https://blog.csdn.net/weixin_39610722/article/details/113148880

版权

How Can I Change the Default Character Set and Collation for a Database in MySQL? (Doc ID 1023320.1)

To determine the default character set and collation for a database, enter this SQL statement:

查询：

Beginning with MySQL 5.0, you can also obtain this information from the INFORMATION_SCHEMA using a query such as this:

SELECT SCHEMA_NAME,

DEFAULT_CHARACTER_SET_NAME,

DEFAULT_COLLATION_NAME

FROM INFORMATION_SCHEMA.SCHEMATA

WHERE SCHEMA_NAME='database1';

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

utf8mb4设置：

[client]

default-character-set=utf8mb4

[mysqld]

character-set-server = utf8mb4 ### 服务器字符集

collation-server = utf8mb4_unicode_ci

init_connect='SET NAMES utf8mb4' 设置所有字符集为utf8mb4，即：client、connection、results为utf8(o´ω`o)ﾉ

skip-character-set-client-handshake = true

[mysql]

default-character-set = utf8mb4

关于排序规则，参考：

What Is The Difference Between utf8mb4_unicode_ci And utf8mb4_general_ci collations? Choose One For Asian Languages With Particular Characters (Doc ID 2230559.1)

* What is the difference between utf8mb4_unicode_ci and utf8mb4_general_ci collations?

These two collations are both for the utf8mb4 character encoding. The differences are in how text is sorted and compared:

1) Accuracy

utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.

utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.

2) Performance

utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.

utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.

On modern servers, this performance consumption will be almost negligible.

* Which one should we use if we store lots of Asian Languages characters?

You should use utf8mb4_unicode_ci.

The reason is:

utf8mb4_unicode_ci provides much more accurate for Asian languages (e.g. Chinese) than utf8mb4_general_ci does. Using utf8mb4_general_ci for Asian languages with particular characters could be inadequate.

Be advised that when migrating from general_ci to unicode_ci, there are some differences in results expected.

General_ci is a simplified set of sorting rules which aims to do as well as it can while taking many short-cuts designed to improve speed. It does not follow the Unicode rules and will result in undesirable sorting or comparison in some situations, such as when using particular languages or characters.

One example is that some characters from Asian languages modifiers like accents can be treated differently in General_ci and Unicode_ci, so when filtering results you should expect different results.

For this specific example and if you want to maintain similar behaviour from general_ci you should consider using utf8mb4_0900_as_ci in MySQL 8.0

weixin_39610722

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mysql 字符集排序_关于mysql字符集及排序规则设置

How Can I Change the Default Character Set and Collation for a Database in MySQL? (Doc ID 1023320.1)To determine the default character set and collation for a database, enter this SQL statement:查询：Beg...
复制链接

扫一扫