mysql+表情字符,MySQL,UTF-8和表情符号字符

I'm working on an iOS app with a PHP+MySQL backend. The app has a chat section, which needs to support emoji.

My tables are utf8_unicode_ci. If I don't call 'set names utf8' in my scripts, emoji it actually works - whatever is entered in the database, is returned to the clients as it should.

The problem is that this (if I understand it correctly) stores special characters incorrectly in the database, and this breaks string comparing (ie ï is no longer the same as i when comparing strings).

However, if I do call set names utf8, suddenly the emoji characters are inserted as a bunch of questionmarks.

Any suggestions on the proper way of handling this? Thanks!

解决方案

The issue is wether the db has a diacritical insensitive compare. The other issue is composed characters, ï can be expressed as either one unicode character or two forming a surrogate pair. There are methods to convert a string to a pre-composed or decomposed form: precomposedStringWith* and decomposedStringWith*.

It seems that MySQL supports two forms of unicode ucs2 (that is an older form that was supersede by utf16) which is 16-bits per character and utf8 up to 3 bytes per character. The bad news is that neither form is going to support plane 1 characters which require at 17 bits. (mainly emoji). It looks like MySQL 5.5.3 and up also support utf8mb4, utf16, and utf32 support BMP and supplementary characters (read emoji). See MySQL Unicode Character Sets.

Here is some code and results to demonstrate the different unicode byte representations.

Unicode is a 21 bit encoding system.

UTF32 directly represents the code points and clearly demonstrates decomposed surrogate pairs.

UTF8 and UTF16 require one or more bytes to represent a unicode character.

NSLog(@"character: %@", @"Å");

NSLog(@"decomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);

NSLog(@"decomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);

NSLog(@"decomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" decomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

NSLog(@"precomposedStringWithCanonicalMapping UTF8: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF8StringEncoding]);

NSLog(@"precomposedStringWithCanonicalMapping UTF16: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF16BigEndianStringEncoding]);

NSLog(@"precomposedStringWithCanonicalMapping UTF32: %@", [[@"Å" precomposedStringWithCanonicalMapping] dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

NSLog(@"character: %@", @"😱");

NSLog(@"dataUsingEncoding UTF8: %@", [@"😱" dataUsingEncoding:NSUTF8StringEncoding]);

NSLog(@"dataUsingEncoding UTF16: %@", [@"😱" dataUsingEncoding:NSUTF16BigEndianStringEncoding]);

NSLog(@"dataUsingEncoding UTF32: %@", [@"😱" dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

// For some surrogate pairs there is no other form

NSString *aReverse = [[NSString alloc] initWithBytes:"\xD8\x3C\xDD\x70\x00" length:4 encoding:NSUTF16BigEndianStringEncoding];

NSLog(@"character: %@", aReverse);

NSLog(@"dataUsingEncoding UTF8: %@", [aReverse dataUsingEncoding:NSUTF8StringEncoding]);

NSLog(@"dataUsingEncoding UTF16: %@", [aReverse dataUsingEncoding:NSUTF16BigEndianStringEncoding]);

NSLog(@"dataUsingEncoding UTF32: %@", [aReverse dataUsingEncoding:NSUTF32BigEndianStringEncoding]);

NSLog output:

character: Å

decomposedStringWithCanonicalMapping UTF8: <41cc8a>

decomposedStringWithCanonicalMapping UTF16: <0041030a>

decomposedStringWithCanonicalMapping UTF32: <00000041 0000030a>

precomposedStringWithCanonicalMapping UTF8:

precomposedStringWithCanonicalMapping UTF16: <00c5>

precomposedStringWithCanonicalMapping UTF32: <000000c5>

character: 😱

dataUsingEncoding UTF8:

dataUsingEncoding UTF16:

dataUsingEncoding UTF32: <0001f631>

character: 🅰

dataUsingEncoding UTF8:

dataUsingEncoding UTF16:

dataUsingEncoding UTF32: <0001f170>

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值