android+7+emoji,Android - How to filter emoji (emoticons) from a string?

Latest emoji data can be found here:

There is a folder named with emoji version.

As app developers a good idea is to use latest version available.

When You look inside a folder, You'll see text files in it.

You should check emoji-data.txt. It contains all standard emoji codes.

There are a lot of small symbol code ranges for emoji.

Best support will be to check all these in Your app.

Some people ask why there are 5 digit codes when we can only specify 4 after \u.

Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.

For example, we have a string.

String s = ...;

UTF-16 representation

byte[] utf16 = s.getBytes("UTF-16BE");

Iterate over UTF-16

for(int i = 0; i < utf16.length; i += 2) {

Get one char

char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));

Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.

if(c >= 0xd800 && c <= 0xd83f) {

high = c;

continue;

}

For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.

else if(c >= 0xdc00 && c <= 0xdfff) {

low = c;

long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;

}

All other symbols are not pairs so process them as is.

else {

long unicode = c;

}

Now use data from emoji-data.txt to check if it's emoji.

If it is, then skip it. If not then copy bytes to output byte array.

Finally byte array is converted to String by

String out = new String(outarray, Charset.forName("UTF-16BE"));

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值