mysql 字符集排序_关于mysql字符集及排序规则设置

How Can I Change the Default Character Set and Collation for a Database in MySQL? (Doc ID 1023320.1)

To determine the default character set and collation for a database, enter this SQL statement:

查询:

Beginning with MySQL 5.0, you can also obtain this information from the INFORMATION_SCHEMA using a query such as this:

SELECT SCHEMA_NAME,

DEFAULT_CHARACTER_SET_NAME,

DEFAULT_COLLATION_NAME

FROM INFORMATION_SCHEMA.SCHEMATA

WHERE SCHEMA_NAME='database1';

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

utf8mb4设置:

[client]

default-character-set=utf8mb4

[mysqld]

character-set-server = utf8mb4 ### 服务器字符集

collation-server = utf8mb4_unicode_ci

init_connect='SET NAMES utf8mb4'  设置所有字符集为utf8mb4,即:client、connection、results为utf8(o´ω`o)ノ

skip-character-set-client-handshake = true

[mysql]

default-character-set = utf8mb4

关于排序规则,参考:

What Is The Difference Between utf8mb4_unicode_ci And utf8mb4_general_ci collations? Choose One For Asian Languages With Particular Characters (Doc ID 2230559.1)

* What is the difference between utf8mb4_unicode_ci and utf8mb4_general_ci collations?

These two collations are both for the utf8mb4 character encoding. The differences are in how text is sorted and compared:

1) Accuracy

utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.

utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.

2) Performance

utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.

utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.

On modern servers, this performance consumption will be almost negligible.

* Which one should we use if we store lots of Asian Languages characters?

You should use utf8mb4_unicode_ci.

The reason is:

utf8mb4_unicode_ci provides much more accurate for Asian languages (e.g. Chinese) than utf8mb4_general_ci does. Using utf8mb4_general_ci for Asian languages with particular characters could be inadequate.

Be advised that when migrating from general_ci to unicode_ci, there are some differences in results expected.

General_ci is a simplified set of sorting rules which aims to do as well as it can while taking many short-cuts designed to improve speed. It does not follow the Unicode rules and will result in undesirable sorting or comparison in some situations, such as when using particular languages or characters.

One example is that some characters from Asian languages modifiers like accents can be treated differently in General_ci and Unicode_ci, so when filtering results you should expect different results.

For this specific example and if you want to maintain similar behaviour from general_ci you should consider using utf8mb4_0900_as_ci in MySQL 8.0

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值