数据库存储emoji遇到的问题,过滤emoji表情,将在下面一一论述
- emoji过滤
- emoji存储常见问题
- emoji存储问题解决
- 常见emoji编码
emoji过滤
private static boolean isNotEmojiCharacter(char codePoint)
{
return (codePoint == 0x0) ||
(codePoint == 0x9) ||
(codePoint == 0xA) ||
(codePoint == 0xD) ||
((codePoint >= 0x20) && (codePoint <= 0xD7FF)) ||
((codePoint >= 0xE000) && (codePoint <= 0xFFFD)) ||
((codePoint >= 0x10000) && (codePoint <= 0x10FFFF));
}
/**
* 过滤emoji 或者 其他非文字类型的字符
* @param source
* @return
*/
public static String filterEmoji(String source)
{
int len = source.length();
StringBuilder buf = new StringBuilder(len);
for (int i = 0; i < len; i++)
{
char codePoint = source.charAt(i);
if (isNotEmojiCharacter(codePoint))
{
buf.append(codePoint);
}
}
return buf.toString();
emoji存储常见问题
前台应用抓取微博信息,每天总有几条数据插入不成功。应用日志显示:
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\xAA",...' for column 'raw_json' at row 1, 异常:org.springframework.jdbc.UncategorizedSQLException:
其中raw_json字段显示如下:
由上图可知,插入的字段里面包含emoji表情符。基本可以判定是字符编码的问题。我们的数据库里面使用的是utf8编码,普通的字符串或者表情都是占位3个字节,所以utf8足够用了,但是移动端的表情符号占位是4个字节,普通的utf8就不够用了,为了应对无线互联网的机遇和挑战、避免 emoji 表情符号带来的问题、涉及无线相关的 MySQL 数据库建议都提前采用utf8mb4 字符集,这必须要作为移动互联网行业的一个技术选型的要点。
utf8可能是2或3或4个字节,而MySQL的utf8是3个字节,存放一个emoji是需要4个字节的,自然不够
解决方案
尝试修改此列的字符集:
ALTER TABLE xx_pnl_weibo_usershow <br>MODIFY ` raw_json` varchar(3000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
alter table xx_pnl_weibo_usershow CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;(修改表和字段的字符集)
改完后测试,插入失败。
修改库级别的字符集
1. 修改my.ini [mysqld] character-set-server=utf8mb4
2. 在Connector/J的连接参数中,不要加characterEncoding参数。 不加这个参数时,默认值就时autodetect。
3. 将已经建好的表也转换成utf8mb4
命令:ALTER TABLE `TABLE_NAME` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; (将TABLE_NAME替换成你的表名)
4. 将需要使用emoji的字段设置类型为:
命令:ALTER TABLE `TABLE_NAME`MODIFY COLUMN `COLUMN_NAME` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
或者用下述方式:
vi /etc/my.cnf
[client]
default-character-set=utf8mb4
[mysqld]
character-set-server = utf8mb4
service mysqld restart
改完后测试,插入成功。查看插入的数据:
root@localhost zx_prod >select id,raw_json,hex(raw_json),length(raw_json) from xx_pnl_weibo_usershow where id=2;
+----+------------------+----------------------------------+---------------------+
| id | raw_json | hex(raw_json) | length(raw_json) |
+----+------------------+----------------------------------+---------------------+
| 2 | | F09F9894F09F9882F09F9898F09F9894 | 16 |
+----+------------------+----------------------------------+---------------------+
1 row in set (0.00 sec)
常见emoji编码
NSArray *array = @[@"\uE415",@"\uE056",@"\uE057",@"\uE414",@"\uE405",@"\uE106",@"\uE418",@"\uE417",
@"\uE40d",@"\uE40a",@"\uE404",@"\uE105",@"\uE409",@"\uE40e",@"\uE402",@"\uE108",
@"\uE403",@"\uE058",@"\uE407",@"\uE401",@"\uE40f",@"\uE40b",@"\uE406",@"\uE413",
@"\uE411",@"\uE412",@"\uE410",@"\uE107",@"\uE059",@"\uE416",@"\uE408",@"\uE40c",
@"\uE11a",@"\uE10c",@"\uE32c",@"\uE32a",@"\uE32d",@"\uE328",@"\uE32b",@"\uE022",
@"\uE023",@"\uE327",@"\uE329",@"\uE32e",@"\uE32f",@"\uE335",@"\uE334",@"\uE021",
@"\uE336",@"\uE13c",@"\uE337",@"\uE020",@"\uE330",@"\uE331",@"\uE326",@"\uE03e",
@"\uE11d",@"\uE05a",@"\uE00e",@"\uE421",@"\uE420",@"\uE00d",@"\uE010",@"\uE011",
@"\uE41e",@"\uE012",@"\uE422",@"\uE22e",@"\uE22f",@"\uE231",@"\uE230",@"\uE427",
@"\uE41d",@"\uE00f",@"\uE41f",@"\uE14c",@"\uE201",@"\uE115",@"\uE428",@"\uE51f",
@"\uE429",@"\uE424",@"\uE423",@"\uE253",@"\uE426",@"\uE111",@"\uE425",@"\uE31e",
@"\uE31f",@"\uE31d",@"\uE001",@"\uE002",@"\uE005",@"\uE004",@"\uE51a",@"\uE519",
@"\uE518",@"\uE515",@"\uE516",@"\uE517",@"\uE51b",@"\uE152",@"\uE04e",@"\uE51c",
@"\uE51e",@"\uE11c",@"\uE536",@"\uE003",@"\uE41c",@"\uE41b",@"\uE419",@"\uE41a"];
NSString *labelStr = @"";
for (NSString *str in array) {
labelStr = [NSString stringWithFormat:@"%@ %@",labelStr,str];
}
self.myLabel.text = labelStr;