MySQL中utf8mb4和utf8字符集有什么区别?

本文翻译自:What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL ? MySQL中 utf8mb4utf8字符集有什么区别?

I already know about ASCII , UTF-8 , UTF-16 and UTF-32 encodings; 我已经知道ASCIIUTF-8UTF-16UTF-32编码; but I'm curious to know whats the difference of utf8mb4 group of encodings with other encoding types defined in MySQL Server . 但是我很想知道utf8mb4编码组与MySQL Server中定义的其他编码类型有什么区别。

Are there any special benefits/proposes of using utf8mb4 rather than utf8 ? 使用utf8mb4而不是utf8有什么特殊的好处/建议吗?


#1楼

参考:https://stackoom.com/question/22bKs/MySQL中utf-mb-和utf-字符集有什么区别


#2楼

UTF-8 is a variable-length encoding. UTF-8是可变长度编码。 In the case of UTF-8, this means that storing one code point requires one to four bytes. 对于UTF-8,这意味着存储一个代码点需要1-4个字节。 However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point. 但是,MySQL的编码称为“ utf8”(别名为“ utf8mb3”),每个代码点最多只能存储三个字节。

So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the " Basic Multilingual Plane ". 因此,字符集“ utf8” /“ utf8mb3”不能存储所有Unicode代码点:它仅支持范围0x000到0xFFFF,这被称为“ 基本多语言平面 ”。 See also Comparison of Unicode encodings . 另请参见Unicode编码比较

This is what (a previous version of the same page at) the MySQL documentation has to say about it: 这是MySQL文档必须说的(同一页面的先前版本):

The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. 名为utf8 [/ utf8mb3]的字符集每个字符最多使用三个字节,并且仅包含BMP字符。 As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters: 从MySQL 5.5.3开始,utf8mb4字符集每个字符最多使用四个字节,支持补充字符:

  • For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length. 对于BMP字符,utf8 [/ utf8mb3]和utf8mb4具有相同的存储特征:相同的代码值,相同的编码,相同的长度。

  • For a supplementary character, utf8[/utf8mb3] cannot store the character at all , while utf8mb4 requires four bytes to store it. 对于补充字符, utf8 [/ utf8mb3]根本无法存储该字符 ,而utf8mb4需要四个字节来存储它。 Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL. 由于utf8 [/ utf8mb3]根本无法存储字符,因此utf8 [/ utf8mb3]列中没有任何补充字符,并且在从较早版本的utf8 [/ utf8mb3]数据升级时,您无需担心转换字符或丢失数据的麻烦。 MySQL的。

So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji , use "utf8mb4". 因此,如果您希望您的列支持存储BMP之外的字符(并且通常希望这样做),例如emoji ,请使用“ utf8mb4”。 See also What are the most common non-BMP Unicode characters in actual use? 另请参阅实际使用中最常见的非BMP Unicode字符是什么? .


#3楼

The utf8mb4 character set is useful because nowadays we need support for storing not only language characters but also symbols, newly introduced emojis, and so on. utf8mb4字符集非常有用,因为如今我们需要支持不仅存储语言字符,而且还存储符号,新引入的表情符号等。

A nice read on How to support full Unicode in MySQL databases by Mathias Bynens can also shed some light on this. Mathias Bynens撰写的有关如何在MySQL数据库中支持完整Unicode的很好的读物也对此有所启发。


#4楼

Taken from the MySQL 8.0 Reference Manual : 取自《 MySQL 8.0参考手册》

  • utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb4Unicode字符集的UTF-8编码,每个字符使用一到四个字节

  • utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. utf8mb3Unicode字符集的UTF-8编码,每个字符使用一到三个字节

In MySQL utf8 is currently an alias for utf8mb3 which is deprecated and will be removed in a future MySQL release. MySQL中, utf8目前是utf8mb3的别名,该别名已被弃用 ,并将在将来的MySQL版本中删除。 At that point utf8 will become a reference to utf8mb4 . 届时utf8 将成为对 utf8mb4 的引用

So regardless of this alias, you can consciously set yourself an utf8mb4 encoding. 因此,无论使用哪种别名,您都可以自觉地为自己设置utf8mb4编码。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值