What are the differences between UTF-8, UTF-16, and UTF-32?

最新推荐文章于 2024-07-09 13:35:52 发布

小白笑苍

最新推荐文章于 2024-07-09 13:35:52 发布

阅读量220

点赞数

分类专栏： C C++

本文链接：https://blog.csdn.net/toyijiu/article/details/105945920

版权

C C++ 专栏收录该内容

16 篇文章 2 订阅

订阅专栏

Answer1

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file.

UTF-16 is better where ASCII is not predominant, since it uses 2 bytes per character, primarily. UTF-8 will start to use 3 or more bytes for the higher order characters where UTF-16 remains at just 2 bytes for most characters.

UTF-32 will cover all possible characters in 4 bytes. This makes it pretty bloated. I can’t think of any advantage to using it.

Answer2

UTF-8: Variable-width encoding, backwards compatible with ASCII. ASCII characters (U+0000 to U+007F) take 1 byte, code points U+0080 to U+07FF take 2 bytes, code points U+0800 to U+FFFF take 3 bytes, code points U+10000 to U+10FFFF take 4 bytes. Good for English text, not so good for Asian text.
UTF-16: Variable-width encoding. Code points U+0000 to U+FFFF take 2 bytes, code points U+10000 to U+10FFFF take 4 bytes. Bad for English text, good for Asian text.
UTF-32: Fixed-width encoding. All code points take four bytes. An enormous memory hog, but fast to operate on. Rarely used.